Monday, April 26th, 2010
This is a guest post from the folks at Nextpoint. We’ve previously mentioned their pioneering use of Ajax in the legal industry and open sourcing of Growl4Rails. Here they bring us some details on their scrollable-document interface — make sure to make it to the end of the post for a link to the sample code.
If your application involves reading large documents, books, or articles there’s a good chance you’ve had to put some serious effort into building a nice reader. At Nextpoint, we build web-based litigation applications for discovery and evidence management and our users often find themselves reading very long documents. We decided to build a nice scrollable reader because we believe it’s the most natural way to navigate through large documents in a browser and because we needed to have other meta-data and functionality remain in static locations surrounding the document. You may have seen something similar at Google Books or Bing image search. With potentially thousands of pages and images in each document there are some very interesting technical challenges that arise so we’ll walk through how we tackled them.
Documents can be thousands of pages long and consist of imaged pages making document sizes quite large. They vary in resolution and range in quality from poorly scanned images to rich graphical diagrams. Because many of the documents contain poor quality images we also implemented a zoom feature which leverages our Theater document presentation tool, previously covered here, to expand images on the fly. And finally, as always, it needs to be fast when scanning or following search hits throughout the documents.
Preparing the scroll area
There are two main methods for incrementally loading — expand the scrollable container as you load more data (such as Bing image search), or pre-set the height of the container for the entire content (such as Google Books uses). We chose to pre-set the height by drawing very simple placeholders for each page initially. These consist of a “Page 123” label and a div with the actual or estimated page height (depending if we’ve loaded that page’s metadata yet), which will eventually act as a container for the page.
This allows the user to scroll through the document immediately, without causing confusion as the content jumps around due to the scrollable content getting “taller.” We keep the placeholders as simple as possible so the initial setup and rendering of the page is quick. After drawing the placeholders, we cache the offsetTop of each placeholder in an array so we can quickly determine the “current visible page” by comparing the scrollTop of the container DIV to this array. Finally, we set up event handlers to watch the scroll position. We trap the onScroll event and use it to update the current page number in our navigation bar, and then separately a setInterval callback that checks if the user has “landed” on a particular page for long enough (~200ms) that we should load its images. (This allows for quick scrolling through the document without unnecessarily loading in-between images.)
Loading content as you go
When the user lands on a particular page, we first try to load all the currently-visible pages, then we pre-load a few of the surrounding pages, so those will be ready when they scroll down further. Loading the pages consists of checking whether we’ve already loaded the page metadata for it — and if not, making an ajax call to load that plus 20 or so nearby pages’ metadata. Once we have metadata for the page, we update its dimensions and create DOM content for the labels, a checkered “loading…” type background, and the page image itself. We use a bit of a trick, to help make sure the images for pages in the visible area load first, by setting all those up, and preloading the surrounding pages in a separate method on a 1ms setTimeout callback.
This lets the browser start loading the visible pages before adding the surrounding ones to the DOM. Also, if we update the dimensions of the page (because an initial estimate of height was a bit off), we re-cache the list of placeholder offsetTop values. Beyond that, the main UI piece is navigation. Next- and previous-page buttons are easy, along with a box where the user can type a page to jump to. Since the offsetTop of each page is already in an array, you can just scroll there and let the onScroll handler update as usual.
setTimeout(updatePlaceholderCache, 1). Another is that, because we load the images from Amazon S3, there’s a possibility of the URLs expiring if the user is reading a particularly long document, so we have to add an expire time to the page metadata, and check that before loading the page image. If it’s expired, we have to re-request the page metadata before displaying the image. Also, we want a page to be considered “current” when it’s scrolled most of the way into the visible area (when just the tail of the previous page is visible at the top), so whenever we check what the current page is, we add a slop amount to the container DIV’s scrollTop value. Finally, since the container DIV may sometimes be larger than a single page of a document, it could be impossible to scroll down far enough that the last page is considered “current.” So we add a spacer at the bottom of the container, just tall enough to let the last page scroll into the current position. (While not strictly necessary, extra features such as a “link to this page” option wouldn’t work for the last page if it never was seen as the current page.)
A Line of Code is worth a Thousand Words?
For those interested in more of the details, we’ve put together a more generic version of the code, available here. No setup necessary and you’ll be able to explore the basic concept without the Nextpoint dependencies.
Thanks for reading and happy scrolling!
Posted by Dion Almaer at 6:56 am