Kyle Simpson has developed LABjs, a library that lets you define your JavaScript file dependencies, and then loads them as efficiently as possible.
Kyle told us:
This project is a simple little tool (1.6k compressed!) for being able to load javascript files dynamically. It's like a lot of similar projects where the goal is to improve the speed of page load by allowing scripts to load in parallel. The thing it does slightly differently than most others like it is it allows you to "block", which is to say, load one or more scripts in parallel, then wait for them to finish, before going on to something else, like loading more scripts.
What I wanted was a pattern where I could load scripts in parallel, just like with script tags, but also block and wait if there was an explicit ordering dependency that required it.
What most loaders fail to do well is let you define "dependencies" simply based on loading order. With regular script tags, the browser blocks for you, so you can make sure for instance that jquery.js loads before jqueryui.js. But imagine you've got 3 scripts that can download in parallel (not dependent on each other), and then two more that need to wait for those 3 to load. You can't do that with script tags, and you also can't do that very easily with a lot of the script loaders/frameworks that I've found.
Most of them rely on intrusive concepts to do "dependency" management. For instance, each child script has to "signal" (callback) that it's done loading, to the parent page. Or the parent script and child scripts have to explicitly declare dependencies using some framework or conventions. Also, some other loader libraries rely on attaching a single load callback handler for EACH script. This makes it awkward or difficult to wait for several to load at a time, before proceeding, since you as the author have to keep track of what has loaded yourself.
jsLAB lets you load pretty much any script file, whether you control it or not, with no intrusion or convention for dependencies, other than the order and blocking that you define. It keeps track of what you've asked for and what has downloaded, only loads a unique script filename once, and lets you only define your handler once for a set of scripts that will load together in parallel. The API style (with chaining) makes is very easy to convert a set of script tags in your page into code to load them, without having to worry that race conditions will cause issues for scripts loading in the wrong order if there are implicit dependencies involved.
In the above example, "jquery.ui.js" and "myplugin.jquery.js" can load in parallel because there's no dependencies, but they will wait for "jquery.js" to load first, since they depend on it, and then "initpage.js" will wait for all of them to load before it runs, to it makes sure all code it will call is in place, similar to a $document.ready(...) concept.
The page link above also shows a few other variations on the .script(...) signature. For instance, you don't have to do a single script() call for each file (though I think it makes thing more readable). You can pass as many scripts singularly as parameters to one script() call. You can also pass an array of scripts, and it will loop through them and load them in the same way. Lastly, you can pass in an object instead of string, and the object literal can contain "src", "type", and "language" specifications, if you want to override the defaults of "text/javascript" and "Javascript", for some reason.
Developers tend to tease MySpace for its look, but the insiders are incredibly impressed by some of the engineering behind the scenes (e.g. their internal monitoring tools are said to be second to none).
They have surprised us again with their new tool MSFast which is "a browser plugin that help developers to improve their code performance by capturing and measuring possible bottlenecks on their web pages."
The IE 8 tools are getting better, but in general no one has been able to touch Firebug (and the new WebKit Inspector improvements) but this tool is actually plugin for IE! It captures a lot:
Measure the CPU hit and memory footprint of your pages as they render on the client’s browser
Review screen shots of the page while it renders
Review the rendered HTML on each point of the page’s lifecycle
Measure and show estimates of the time it takes to render each section of the page in different connection speeds
Validate the content of your page against a set of proven “best practice” rules of web development
Review downloaded files and show download time estimation on different bandwidths
That is some impressive data, and great to be able to test on IE where it has been SO hard to do so in the past.
The WebKit Open Source Project provides a JavaScript test Suite dubbed SunSpider. According to the description on the SunSpider home page, “this benchmark tests the core JavaScript language only, not the DOM or other browser APIs. It is designed to compare different versions of the same browser, and different browsers to each other.” We at Medialets have found it to be one of the best attempts to measure real world JavaScript performance in a balanced and statistically sound way.
Medialets ran the SunSpider test suite in the following environments:
Safari 4.0.1 on a 2.0 GHz Intel Core 2 Duo White MacBook.
The MacBook results were used as a baseline for relative comparisons.
Mobile Safari on the iPhone 3G with iPhone OS v2.2.1
Mobile Safari on the iPhone 3G with iPhone OS v3.0
Mobile Safari on the iPhone 3GS with iPhone OS v3.0
The “Browser” app on the T-Mobile G1 with Android OS v1.5 (Cupcake)
The “Web” app on the Palm Pre with Web OS v1.0.2
Each device was fully restored and rebooted immediately before running the test suite. Every attempt was made to assure that no atypical background tasks were executing while the tests were running. The SunSpider tests automatically run five times sequentially and the mean average from all five tests are reported. Network speed and latency have no effect on the results of the test.
We all know to beware of benchmarks, but it does show off how powerful these devices are getting!
There have been many tools to help make image spriting easier, by packaging up your images into one large image and splitting it up again via CSS.
Steve Souders just showed off a new little tool he created, Sprite Me at the Velocity conference that kicked off today. He has made it easier to work with sprites by:
finds background images: SpriteMe generates a list of all background images in the page. Hovering over the its URL displays the image.
Each of the DOM elements that use that image are also listed. [DONE]
groups images: It's hard to figure out which images can be sprited together, and how they should be laid out. For example, background images that repeat horizontally must fill the entire width of the sprite. Background images positioned left bottom must be at the right top of the sprite if their container might be bigger than the image. SpriteMe determines which images should be sprited together based on these constraints.[IN PROCESS]
creates sprites: SpriteMe generates the sprite for each grouping of images. This is done using open source tools, such as CSS Sprite Generator. [TBD]
updates CSS: The final tricky part of using sprites is changing the CSS. Sometimes the CSS is a rule in a stylesheet. Or it might be a rule in an inline style block. Or it might be specified in an element's style attribute. Because SpriteMe runs inside your web page, it can find the CSS that needs to be updated. It makes the changes in realtime, so you can visually check to confirm the sprites look right.You can export the modified CSS to integrate back into your code. [TBD]
Great, a simple new bookmarklet to work with Sprites. It is always a good idea to sprite up right? Not exactly.
Vlad Vuki?evi?, a leader in the Mozilla community (and brought us cool stuff like Canvas 3D!) has spoken up on the internals of the browser, which shows you the trade-offs for the spriting approach:
The biggest problem with CSS sprites is memory usage. Unless the sprite image is carefully constructed, you end up with incredible amounts of wasted space. My favourite example is from WHIT TV's web site, where this image is used as a sprite. Note that this is a 1299x15,000 PNG. It compresses quite well — the actual download size is around 26K — but browsers don't render compressed image data. When this image is downloaded and decompressed, it will use almost 75MB in memory (1299 * 15000 * 4). If the image didn't have any alpha transparency, this could be maybe optimized to 1299 * 15000 * 3, though often at the expense of rendering speed. Even then, we'd be talking about 55MB. The vast majority of this image is blank; there is nothing there, no useful content whatsoever. Just loading the main WHIT page will cause your browser's memory usage to go up by at least 75+MB, just due to that one image.
That's not the right tradeoff to make for a website.
What alternatives are there? None right now.... but they are hopefully on the way. Some folks have been talking about the idea of packaging up images in zip files, and then the browser can manage more than just the download process, but also only load up what it needs:
Many browsers have support for offline manifests already; it might be possible to extend that to allow downloading one file (like a jar/zip file) that contains a manifest of resources and equivalent URLs that are contained inside it.
Sprites have the advantage of working right now, but maybe there should be a way to serve up a multipart response with your sprite images as well. That would cut down on CSS rule count and maintenance, but still group the images in one HTTP request. Authors are already giving up the advantages of separate resources in return for speed, so maybe this is worth doing.
You can (in theory… haha) get some of these advantages with HTTP pipelining, but a multipart response would allow the server optimize the response order as they do with sprites today.
Richard Rabbat and Bryan McQuade have introduced a new tool called Page Speed that is a fully open source (e.g. it has a public bug database, process for accepting code contributions, roadmap, etc) performance Firebug plugin:
Page Speed is a tool we've been using internally to improve the performance of our web pages -- it's a Firefox Add-on integrated with Firebug. When you run Page Speed, you get immediate suggestions on how you can change your web pages to improve their speed. For example, Page Speed automatically optimizes images for you, giving you a compressed image that you can use immediately on your web site. It also identifies issues such as JavaScript and CSS loaded by your page that wasn't actually used to display the page, which can help reduce time your users spend waiting for the page to download and display.
People will obviously compare it to YSlow (I wish they called it GFast ;) but there are some differences beyond the fact that it is Open Source which will hopefully allow a community to grow:
Not only will the tool note that images can be optimized, but it will do the work for you!
It automatically minifies your JavaScript for you and outputs it so you can use that
An enhanced net panel with a better into cache hits and misses. I believe this is written in C++, so it gets around the "in process" limitations of Firebug's Net Panel.
Many different rules and checks
A must see performance tool to add to your belt. You can check out all of the rules and enjoy the tool!
Micah Snyder of Digg posted on DUI.Stream, an experimental library that implements a multipart XHR technique to bundle resources into one request and then breaks them out at the other end:
One of the ways that high-performance websites like Yahoo suggest speeding up load times is by reducing the number of HTTP requests per page. We started thinking about what we could do to reduce HTTP overhead, and where we could get the biggest benefits from it. Well, one thing led to another and the next thing we knew we were talking about writing a generalized framework for bundling files, sending them through a single request, then separating them for use once they head down the pipe.
We call this technique MXHR (short for Multipart XMLHttpRequests), and we wrote an addition to our Digg User Interface library called DUI.Stream to implement it. Specifically, DUI.Stream opens and reads multipart HTTP responses piece-by-piece through an XHR, passing each chunk to a JavaScript handler as it loads.
Why do this? Well, DUI.Stream will allow developers to drastically improve the speed of uncached page loads by bundling most of their resources into a single HTTP request, with a single time-to-first-byte and no request throttling by the user agent. Additionally, the size of the response has no effect on the rendering time of each chunk, as the client handles each piece of the response on the fly and can inject it into the DOM for rendering immediately, in the exact order you specify. On a high traffic, high-activity site like Digg, we have to display incredible amounts of data on each permalink — typically hundreds of user images within the first 50 comment threads on a page alone, not to mention the UI chrome and actual comment data. (You can see this for yourself: notice the number of HTTP requests that queue up when you expand a page of comments). So our primary use case for DUI.Stream is turning that first long, arduous page load on an empty cache into something nearly indistinguishable from a page of data with fully cached resources.
You can take a look at a demo in action. Reloading the puppy shows how life varies so much on each request. The demos looks like this:
Let’s talk a bit about the architectural benefits of implementing MXHRs with DUI.Stream. Back when the web was based largely on a page metaphor (i.e.: one central document with external references), whenever you loaded the page, the page requested its images, stylesheets, etc, then you were done. These days you’re just as often loading an application; the page progressively enhances into a stateful UI by loading extra stylesheets, scripts and a whole mess of UI chrome after the initial request. Yet, we’re still using the old model flow of get markup –> render markup –> request external resources –> load and display externals.
Take our modal login dialog box for example. In order to reduce requests we bundle its JavaScript in with the rest of the page, we put its CSS up in the header with the rest of the styles, then we request only the markup for the dialog box, render it, and let it fire its own HTTP requests for the images that make up its chrome. In this broken model, HTTP connections and rendering behaviors split our UI architecture up into different parts of the page that all render at different times at the browser’s discretion. Even if we put everything into one cohesive structure and loaded the CSS link, script tag and markup together, they’d still all fire their own HTTP requests and the images would still come in afterwards on the first page load. This just won’t do.
Now, let’s rethink how our login dialog could work using DUI.Stream. We can request a Stream that contains everything needed to render and use the dialog box. As each part comes in, it gets passed through to be built, and renders immediately with no image backfill or delayed JS behavior. The DUI.Stream framework can then pass those resources back into cacheable elements for our next page load, which can happily 302 its way quickly through the rendering process. Pretty sweet right? Right.
We have covered memoizers in the past, but John Hann has posted on a nice implementation that takes advantage of closures, arity, and recursion -- 3 concepts/features that Javascript was meant to use.
// JScript doesn't grok the arity property, but uses length instead
var arity = func.arity || func.length;
return memoizeArg(arity - 1);
}
and this conclusion:
Yes, memoization is a neat concept. But why use it rather than just hand-coded caching mechanisms? It’s easy enough to write a caching routine, right? Here are a few good reasons:
hand-coded caching mechanisms obfuscate your code
multi-variate caching routines are bulky in Javascript
There are many tools that can track HTTP at various levels, but they each have their own format. What if we lived in a world where there was a common format which would enable the following:
Steve Souders: "Hey Dion, Facebook is doing something wacky on their category pages. Take a look at the waterfall that you see in this data that I exported from AOL PageTest"
Dion Almaer: "Interesting. I just imported it into Firebug, and I see what you mean."
They have various advantages over each other. For example, in-browser tools can easily group requests by page and analyze browser-cache usage while network-level tools can easily gather low level detailed info (e.g. HTTP compression). But in general, they all can be used to track HTTP traffic.
It would be obviously very beneficial to have a common export/import format that is used across all HTTP tracing tools and perhaps other projects. This would allow effective processing and analyzing data coming from various sources.
I have put together a document (fist draft) that represents a proposal for HTTP Trace Data export/import format (based on HTTPWatch's structure, but designed for JSON). Any comments and suggestions are greatly appreciated!
There are two ways to include a stylesheet in your web page. You can use the LINK tag:
<link rel='stylesheet' href='a.css'>
Or you can use the @import rule:
<style>
@import url('a.css');
</style>
I prefer using LINK for simplicity—you have to remember to put @import at the top of the style block or else it won’t work. It turns out that avoiding @import is better for performance, too.
He shows that while always using @import by itself is actually okay, there are a number of scenarios where @import can jack you up:
link mixed with @import breaks parallel downloads in IE
using @import from within a LINKed stylesheet breaks parallel downloads in all browsers
LINK blocks @import embedded in other stylesheets in IE
@import causes resources to be downloaded out-of-order in IE
His conclusion:
It’s especially bad that resources can end up getting downloaded in a different order. All browsers should implement a small lookahead when downloading stylesheets to extract any @import rules and start those downloads immediately. Until browsers make these changes, I recommend avoiding @import and instead using LINK for inserting stylesheets.
See the full blog post for fancy charts and more detail.
The Qooxdoo gang have created tests for Taskspeed with some surprising results:
On IE qooxdoo is by far the fastest framework.
Across browsers and frameworks, qooxdoo gained the highest ranks on all versions of IE (i.e. 6, 7 and 8), and made its lowest mark coming out third on Firefox 3.0. This exceptional IE performance also leads to the best overall score. The IE results are a big surprise and we'll try to investigate, what we do different (better) than all the other JavaScript libraries.
As always performance tests should be taken with a grain of salt. It's hard to judge whether all implementations are really equivalent. For example in the jQuery tests John Resig implemented all tests in a pure jQuery way. There are obvious optimizations he consciously omited, but it apparently reflects the genuine jQuery coding style. There is no official qooxdoo way to work with the DOM yet, so we modeled our tests closely after the Dojo and jQuery tests.
Fabian Jakobs analyzes why they've performed so well, speculating that because they built a GUI toolkit they've been optimizing DOM operations since the beginning to keep it fast--and because they use Sizzle, their lack of attention to CSS optimizations didn't kill them.
Fabian also mentions that these results encourage their intention to make qooxdoo's DOM API available stand-alone:
These results show that we have a good base and encourage us to move forward in this direction.
I didn't want the TaskSpeed library task test suite to be lost in the Dojo 1.3 announcement. Alex called it out:
Pete Higgins has been working on a new set of benchmarks with the help of other toolkit vendors (to ensure fairness) called “TaskSpeed“. Dojo 1.3 wins by a wide margin. Across all the reported browsers so far, Dojo is at least 2 times faster than other toolkits on common DOM operations. We’ve worked very hard over the years to make sure that Dojo’s APIs don’t encourage you to do things that will hurt you later, and TaskSpeed finally shows how much this philosophy pays off:
Given that DOM is the primary bottleneck in most apps, these tests demonstrate how Dojo’s approach to keeping things fast pays off not just on micro benchmarks like CSS selector speed, performance improvements to single toolkit functions, or even file size - but on aggregate performance where it really matters. Dojo’s modern, compact syntax for these common operations doesn’t slow it down, either. For instance, if you go check out the TaskSpeed reporting page, you’ll see that where browsers are slowest (IE6/7/8, etc.), Dojo’s focus on performance pays off most. Why use a toolkit that’s going to hurt you when it really counts, particularly when Dojo so easy to get started with? Dojo’s Core has been designed from the ground up with APIs that encourage you to do things that are fast and keep you from doing things that are slow unless you really know what you’re doing. In some cases, we’ve made hard size-on-the-wire tradeoffs in order to keep actual app performance speedy. That hard engineering doesn’t show up in micro-benchmarks or single test release-over-release improvements or the “my toolkit is smaller” comparisons that some would prefer that web developers focus on. It’s easy to win rigged games, after all. It’s only when you see APIs composed together in real-world ways, across browsers, that you can start to see the real impact of a toolkit’s design philosophy. Dojo is designed to help you make things that are awesome for users, and that means they need to be FAST.
Other toolkits have released performance numbers of late, and most of them have been either reported badly or run without much rigor, so it’s exciting to see everyone finally pitching in to build end-to-end tests that show how library design decisions interact with real-world realities of browsers. The TaskSpeed tests have been designed to be both even-handed and reliable (no times below timer resolution, etc.). The reporting page is also designed to make the results understandable and put them in context. A lot of care has been taken to keep this benchmark honest. JavaScript developers have suffered at the hand of chart junk for far too long.
It is interesting indeed to see the browsers on the graph. I will let you guess which browsers are which, but the visual difference is astounding:
A repaint occurs when changes are made to an elements skin that changes visibility, but do not affect its layout. Examples of this include outline, visibility, or background color. According to Opera, repaint is expensive because the browser must verify the visibility of all other nodes in the DOM tree. A reflow is even more critical to performance because it involves changes that affect the layout of a portion of the page (or the whole page). Reflow of an element causes the subsequent reflow of all child and ancestor elements as well as any elements following it in the DOM.
And what are they, and how can you avoid them?
So, if they’re so awful for performance, what causes a reflow?
Unfortunately, lots of things. Among them some which are particularly relevant when writing CSS:
Resizing the window
Changing the font
Adding or removing a stylesheet
Content changes, such as a user typing text in
an input box
Activation of CSS pseudo classes such as :hover (in IE the activation of the pseudo class of a sibling)
How to avoid reflows or at least minimize their impact on performance?
Note: I’m limiting myself to discussing the CSS impact of reflows, if you are a JavaScripter I’d definitely recommend reading my reflow links, there is some really good stuff there that isn’t directly related to CSS.
Another guru that we know and love has been doing some very interesting work looking at the actual data behind all of this, and it may have some surprises. I hope to be able to post about that soon!
The main findings of the team were that eval() is not only evil but also very slow whereas dynamic script nodes are fast but insecure. The solution was to do a custom evaluation of string data rather than using JSON:
Having set the performance bar pretty high with the last approach, we dove into custom data formats. The challenge would be to create a format that we could parse ourselves, using JavaScript’s String and RegExp methods, that would also match the speed of JSON executed natively. This would allow us to use Ajax again, but keep the data restricted to our domain.
Since we had already discovered that some methods of string manipulation didn’t perform well on large strings, we restricted ourselves to a method that we knew to be fast: split(). We used control characters to delimit each contact, and a different control character to delimit the fields within each contact. This allowed us to parse the string into contact objects with one split, then loop through that array and split again on each string.
for(var n = 0, len = that.contacts.length, contactSplit; n <len; n++){
contactSplit = that.contacts[n].split("\\a");
that.contacts[n] = {};
that.contacts[n].n = contactSplit[0];
that.contacts[n].e = contactSplit[1];
that.contacts[n].u = contactSplit[2];
that.contacts[n].r = contactSplit[3];
that.contacts[n].s = contactSplit[4];
that.contacts[n].f = contactSplit[5];
that.contacts[n].a = contactSplit[6];
that.contacts[n].d = contactSplit[7];
that.contacts[n].y = contactSplit[8];
}
Once this had been speeded up, all they needed to use was the YUI AutoComplete control and voilà - fast client side searches even with massive datasets.
For most web sites, the possible performance gains from optimizing CSS selectors will be small, and are not worth the costs. There are some types of CSS rules and interactions with JavaScript that can make a page noticeably slower. This is where the focus should be. So I’m starting to collect real world examples of small CSS style-related issues (offsetWidth, :hover) that put the hurt on performance.
This comes from Steve Souders latest post where he surprisingly gets some data to see that optimizing selectors may not give you a huge bang for the buck.
How did this start off? Steve fills us in:
I read a series on Testing CSS Performance from Jon Sykes in three parts: part 1, part 2, and part 3. It’s fun to see how his tests evolve, so part 3 is really the one to read. This had me convinced that optimizing CSS selectors was a key step to fast pages.
But there were two things about the tests that troubled me. First, the large number of DOM elements and rules worried me. The pages contain 60,000 DOM elements and 20,000 CSS rules. This is an order of magnitude more than most pages. Pages this large make browsers behave in unusual ways (we’ll get back to that later).
A set of test cases later, and he has some data:
Why do the results from my tests suggest something different from what’s been said lately? One difference comes from looking at things at such a large scale. It’s okay to exaggerate test cases if the results are proportional to common use cases. But in this case, browsers behave differently when confronted with a 3 megabyte page with 60,000 elements and 20,000 rules. I especially noticed that my results were much different for IE 6&7. I wondered if there was a hockey stick in how IE handled CSS selectors. To investigate this I loaded the child selector and descendant selector pages with increasing number of anchors and rules, from 1000 to 20,000. The results, shown in the chart below, reveal that IE hits a cliff around 18,000 rules. But when IE 6&7 work on a page that is closer to reality, as in my tests, they’re actually the fastest performers.
I am curious to see the difference between the role of CSS selectors in a one pass web page, compared to applying rules to content changes in an Ajax application. Imagine Gmail adding a new message to the DOM.... how does the CSS work change there?