Monday, April 26th, 2010

Scrollin’ Scrollin’ Scrollin’ to the NextPoint

Category: Examples, JavaScript

<p>This is a guest post from the folks at Nextpoint. We’ve previously mentioned their pioneering use of Ajax in the legal industry and open sourcing of Growl4Rails. Here they bring us some details on their scrollable-document interface — make sure to make it to the end of the post for a link to the sample code.

If your application involves reading large documents, books, or articles there’s a good chance you’ve had to put some serious effort into building a nice reader. At Nextpoint, we build web-based litigation applications for discovery and evidence management and our users often find themselves reading very long documents. We decided to build a nice scrollable reader because we believe it’s the most natural way to navigate through large documents in a browser and because we needed to have other meta-data and functionality remain in static locations surrounding the document. You may have seen something similar at Google Books or Bing image search. With potentially thousands of pages and images in each document there are some very interesting technical challenges that arise so we’ll walk through how we tackled them.

Challenges

Documents can be thousands of pages long and consist of imaged pages making document sizes quite large. They vary in resolution and range in quality from poorly scanned images to rich graphical diagrams. Because many of the documents contain poor quality images we also implemented a zoom feature which leverages our Theater document presentation tool, previously covered here, to expand images on the fly. And finally, as always, it needs to be fast when scanning or following search hits throughout the documents.

Our approach

Our basic reader structure is a container DIV styled with “overflow:auto”. The initial markup is very lightweight with JavaScript doing all of the heavy lifting. The basic metadata for each page (page number, image size, and URL) is loaded in sets of 20 on the initial request and then using ajax while navigating the document. The IMAGE elements are created on the fly as pages get near the visible area. Those are the basics. Here’s a brief demo and then we’ll hash out some of the details.

Preparing the scroll area

There are two main methods for incrementally loading — expand the scrollable container as you load more data (such as Bing image search), or pre-set the height of the container for the entire content (such as Google Books uses). We chose to pre-set the height by drawing very simple placeholders for each page initially. These consist of a “Page 123″ label and a div with the actual or estimated page height (depending if we’ve loaded that page’s metadata yet), which will eventually act as a container for the page.

This allows the user to scroll through the document immediately, without causing confusion as the content jumps around due to the scrollable content getting “taller.” We keep the placeholders as simple as possible so the initial setup and rendering of the page is quick. After drawing the placeholders, we cache the offsetTop of each placeholder in an array so we can quickly determine the “current visible page” by comparing the scrollTop of the container DIV to this array. Finally, we set up event handlers to watch the scroll position. We trap the onScroll event and use it to update the current page number in our navigation bar, and then separately a setInterval callback that checks if the user has “landed” on a particular page for long enough (~200ms) that we should load its images. (This allows for quick scrolling through the document without unnecessarily loading in-between images.)

Loading content as you go

When the user lands on a particular page, we first try to load all the currently-visible pages, then we pre-load a few of the surrounding pages, so those will be ready when they scroll down further. Loading the pages consists of checking whether we’ve already loaded the page metadata for it — and if not, making an ajax call to load that plus 20 or so nearby pages’ metadata. Once we have metadata for the page, we update its dimensions and create DOM content for the labels, a checkered “loading…” type background, and the page image itself. We use a bit of a trick, to help make sure the images for pages in the visible area load first, by setting all those up, and preloading the surrounding pages in a separate method on a 1ms setTimeout callback.

This lets the browser start loading the visible pages before adding the surrounding ones to the DOM. Also, if we update the dimensions of the page (because an initial estimate of height was a bit off), we re-cache the list of placeholder offsetTop values. Beyond that, the main UI piece is navigation. Next- and previous-page buttons are easy, along with a box where the user can type a page to jump to. Since the offsetTop of each page is already in an array, you can just scroll there and let the onScroll handler update as usual.

Gotchas

As usual, the Web has some quirks that corrupt this simple concept. One is that the browser doesn’t render the DOM changes until Javascript events finish. So we end up using that setTimeout trick more often — in particular, right before updating the cache of offsetTop values. It’s always like this: setTimeout(updatePlaceholderCache, 1). Another is that, because we load the images from Amazon S3, there’s a possibility of the URLs expiring if the user is reading a particularly long document, so we have to add an expire time to the page metadata, and check that before loading the page image. If it’s expired, we have to re-request the page metadata before displaying the image. Also, we want a page to be considered “current” when it’s scrolled most of the way into the visible area (when just the tail of the previous page is visible at the top), so whenever we check what the current page is, we add a slop amount to the container DIV’s scrollTop value. Finally, since the container DIV may sometimes be larger than a single page of a document, it could be impossible to scroll down far enough that the last page is considered “current.” So we add a spacer at the bottom of the container, just tall enough to let the last page scroll into the current position. (While not strictly necessary, extra features such as a “link to this page” option wouldn’t work for the last page if it never was seen as the current page.)

A Line of Code is worth a Thousand Words?

For those interested in more of the details, we’ve put together a more generic version of the code, available here. No setup necessary and you’ll be able to explore the basic concept without the Nextpoint dependencies.

Thanks for reading and happy scrolling!

Posted by Dion Almaer at 6:56 am
3 Comments

++++-
4.5 rating from 2 votes

3 Comments »

Comments feed TrackBack URI

Why would you want a scrollbar when your content is “thousands of pages” long? This renders the scrollbar useless. Move it a few pixels & you’re lost.
A drag system makes a lot more sense imo.
The rest looks interesting, thou

Comment by ProPuke — April 26, 2010

To check out a live document viewer that uses many of the same techniques as the Nextpoint viewer, take a look at the New York Times’ annotated emails from Goldman Sachs:

http://documents.nytimes.com/goldman-sachs-internal-emails

Comment by jashkenas — April 26, 2010

ScrollTo is a jQuery plugin which would help with something like this. You can call it on a scrollable object and pass it a selector (or a DOM element, a position, etc.). It will then scroll the object to the top of the first selector match. I’m using it to create something similar to the above, but with text rather than images in the div.

Comment by Skilldrick — May 18, 2010

Leave a comment

You must be logged in to post a comment.