Tuesday, May 18th, 2010p>Scribd is my "favourite company of the month". First they show off their move from Flash to HTML5 and now they are generously taking time to share with us details on their implementation in a three part series.
The first part delves into the bowels of
@font-face, starting with the simple:
src: url("scrivano.svg") format('svg');
src: local('u263a'), url('scrivano.otf')
and moving to how they support angled text such as this:
How do you encode the diagonal text in this document in a HTML page?
Short of using element transformations (-moz-transform, DXImageTransform etc.) which we found to be rather impractical, we encode the above HTML with a custom font created by transforming the original font. Here’s how our generated font looks in FontForge:
From the above font screenshot you also notice that we reduce fonts to only the characters that are actually used in the document; that helps save space and network bandwidth. Usually, fonts in the pdfs are already reduced, so this is not always necessary.
Naturally, for fonts with diagonal characters every character needs to be offset to a different vertical position (we encode fonts as left-to-right). In fact, this is how other HTML converters basically work: they place every single character on the page using a div with position:absolute:HTML:
At Scribd, we invested a lot of time in optimizing this, to the degree that we can now convert almost all documents to “nice” HTML markup. We detect character spacing, line-heights, paragraphs, justification and a lot of other attributes of the input document that can be encoded natively in the HTML. So a PDF document uploaded to Scribd may, in it’s HTML version, look like this (style attributes omitted for legibility):
Together with tags for graphic elements on pages, we can now represent every PDF document in HTML while preserving fonts, layout and style, with text selectability, searchability, and making full use of the optimized rendering engines built into browsers.
I am looking forward to part 2 and 3!