Tuesday, January 6th, 2009
Category: Browsers
, JavaScript
, Performance

Nicholas Zakas decided to dive deep on everyone’s favorite sign that you’ve done something wrong:
Few web developers truly understand what triggers the long-running script dialog in various browsers, including myself. So I decided to sit down and figure out under what circumstances you’ll see this dialog. There are basically two different ways of determining that a script is long-running. First is by tracking how many statements have been executed and second is by timing how long the script takes to execute. Not surprisingly, the approach each browser takes is slightly different.
He finds that Internet Explorer’s warning is based on total statements executed (5 million, and since it’s Windows, you can change it via the Registry), Firefox and Safari time the actual script time (10 and 5 seconds, respectively), Chrome is a bit of a mystery, and Opera doesn’t appear to have such a mechanism (and interestingly, appears to put its UI on a different thread than page rendering / script execution).
Regardless of the details, the lesson remains the same (again quoting from Nicholas’ post):
Brendan Eich, creator of JavaScript, is quoted as saying, “[JavaScript] that executes in whole seconds is probably doing something wrong…” My personal threshold is actually much smaller: no script should take longer than 100ms to execute on any browser at any time. If it takes any longer than that, the processing must be split up into smaller chunks.
An interesting read!
Monday, December 29th, 2008
Category: Performance
Steve Souders has detailed the coupling of script loading with various asynchronous techniques with examples that show the timings that you can get. First he sets the scene:
One issue with async script loading is dealing with inline scripts that use symbols defined in the external script. If the external script is loading asynchronously without thought to the inlined code, race conditions may result in undefined symbol errors. It’s necessary to ensure that the async external script and the inline script are coupled in such a way that the inlined code isn’t executed until after the async script finishes loading.
There are a few ways to couple async scripts with inline scripts.
- window’s onload - The inlined code can be tied to the window’s onload event. This is pretty simple to implement, but the inlined code won’t execute as early as it could.
- script’s onreadystatechange - The inlined code can be tied to the script’s onreadystatechange and onload events. (You need to implement both to cover all popular browsers.) This code is lengthier and more complex, but ensures that the inlined code is called as soon as the script finishes loading.
- hardcoded callback - The external script can be modified to explicitly kickoff the inlined script through a callback function. This is fine if the external script and inline script are being developed by the same team, but doesn’t provide the flexibility needed to couple 3rd party scripts with inlined code.
Then he goes into an example showing a simple asynchronous loading technique, and then coupling John Resig’s degrading script tags pattern, added to the end of the target script, and lazy loadingit:
JAVASCRIPT:
-
-
// -- lazily load the script
-
window.onload = function() {
-
var script = document.createElement('script');
-
script.src = "sorttable-async.js";
-
script.text = "sorttable.init()";
-
document.getElementsByTagName('head')[0].appendChild(script);
-
}
-
-
// -- at the end of the script
-
var scripts = document.getElementsByTagName("script");
-
var cntr = scripts.length;
-
while ( cntr ) {
-
var curScript = scripts[cntr-1];
-
if ( -1 != curScript.src.indexOf('sorttable-async.js') ) {
-
eval( curScript.innerHTML );
-
break;
-
}
-
cntr--;
-
}
-
Finally, the conclusion:
Loading scripts asynchronously and lazyloading scripts improve page load times by avoiding the blocking behavior that scripts typically cause. This is shown in the different versions of adding sorttable to UA Profiler:
The times above indicate when the onload event occurred. For other web apps, improving when the asynchronously loaded functionality is attached might be a higher priority. In that case, the Asynchronous Script Loading version is slightly better (~400 ms versus 417 ms). In both cases, being able to couple inline scripts with the external script is a necessity. The technique shown here is a way to do that while also improving page load times.
Wednesday, December 24th, 2008
Category: Browsers
, Performance
I posted on my personal blog about using the crowd to tell us about browser responsiveness in which I discussed giving developers information about browser responsiveness and how add-ons can affect it:
I have had some folks talk to me about responsiveness issues with Firefox 3. I have had a fantastic experience, and currently I run Mozilla nightlies / Minefield / Shiretoka (3.1.*) and WebKit nightlies side by side. I am very happy with the shape that Minefield is in.
Of course, the issue with the extension mechanism with Firefox is that you get a window to the entire world (which has also been a reason that lead to amazing add-ons). Since this is the case a bad add-on can do a lot.
Chrome does a good job showing you basic info about a tab (memory etc). What if we did that and more for add-ons. Give me top for the browser.
Now, this is a lot of engineering away, so can we use the crowd to help out?
What if we created an add-on that would track responsiveness information and send it back (anonymously) to the cloud (say, to Weave). We could use math to work out probable culprits and could even ship that information back to the people using the add-on. Thus, you would then find out that FooAddOn seems to be a culprit that slows down the browser. Maybe it could be called Vacinate-addon.
Then I got talking with Dav Glass who is working on a very interesting proof of concept BrowserPlus Profiler:
A service that analyzes the memory and cpu usage of a web browser. The service can take 1 sample or multiple samples at a specified interval. When sampling at intervals, at most 1,000 samples are taken. If you provide a callback function, your javascript will be called after every sample is taken. If no callback is provided, all samples are stored in an array and returned after start() completes or stop() is called.
The sample object is a map with the following keys (most values are floats):
- [sample] - the sample number (1-1,000)
- [time] - the string representation of the time the sample was taken
- [sys] - the percentage CPU "sys" processes are using
- [user] - the percentage CPU "user" processes are using
- [ffxcpu] - the percentage CPU Firefox is using, or -1.0 if it is not running
- [ffxmem] - the amout of memory Firefox is using, or -1.0 if it is not running
- [safcpu] - the percentage CPU Safari is using, or -1.0 if it is not running
- [safmem] - the amout of memory Safari is using, or -1.0 if it is not running
This is very early stage, and they are looking for good people and ideas on how t get good data across platforms (browsers and operating systems). I would love to see this.
Category: Performance

Steve Souders has updated his UA Profiler tool that tracks the performance traits of various browsers. Being able to drill down and see the differences from build to build is great stuff, and here are all the new features:
- drilldown
-
Previously, I had one label for a browser. For example, Firefox 3.0 and 3.1 results were all lumped under “Firefox 3?. This week I added the ability to drilldown to see more detailed data. The results can be viewed in five ways:
- Top Browsers - The most popular browsers as well as major new versions on the horizon.
- Browser Families - The full list of unique browser names: Android, Avant, Camino, Chrome, etc.
- Major Versions - Grouped by first version number: Firefox 2, Firefox 3, IE 6, IE7, etc.
- Minor Versions - Grouped by first and second version numbers: Firefox 3.0, Firefox 3.1, Chrome 0.2, Chrome 0.3, etc.
- All Versions - At most I save three levels of version numbers. Here you can see Firefox 3.0.1, Firefox 3.0.2, Firefox 3.0.3, etc.
- hiding sparse data
- The result tables grew lengthy due to unusual User Agent strings with atypical version numbers. These might be the result of nightly builds or manual tweaking of the User Agent. Now, I only show browsers tested by at least two different people a total of four or more times. If you want to see all browsers, regardless of the amount of testing, check “show sparse results” at the top.
- individual tests
- Several people asked to see the individual test results, that is, each test that was run for a certain browser. There were several motivations: Was there much variation for test X? What were the exact User Agent strings that were matched to this browser? When were the tests done (because that problem was fixed on such-and-such a date)? When looking at a results table, clicking on the Browser name will open a new table that shows the results for each test under that browser.
- sort
- Once I sat down to do it, it took me ~5 minutes to make the results table sortable using Stuart Langridge’s sorttable. Now you can sort to your heart’s content. (This weekend I’ll write a post about how I made his code work when loaded asynchronously using a variation of John Resig’s Degrading Script Tags pattern.)
Thursday, December 18th, 2008
Category: Performance
Steve Souders has a nice performance roundup for 2008 that details some of the important utilities and knowledge that we gained this year.
His post gets even more interesting when he posits about the future, including:
- Visibility into the Browser: Packet sniffers (like HTTPWatch, Fiddler, and WireShark) and tools like YSlow allow developers to investigate many of the “old school” performance issues: compression, Expires headers, redirects, etc. In order to optimize Web 2.0 apps, developers need to see the impact of JavaScript and CSS as the page loads, and gather stats on CPU load and memory consumption.
- Think “Web 2.0?: Web 2.0 pages are often developed with a Web 1.0 mentality. In Web 1.0, the amount of CSS, JavaScript, and DOM elements on your page was more tolerable because it would be cleared away with the user’s next action. That’s not the case in Web 2.0. Web 2.0 apps can persist for minutes or even hours. If there are a lot of CSS selectors that have to be parsed with each repaint - that pain is felt again and again. If we include the JavaScript for all possible user actions, the size of JavaScript bloats and increases memory consumption and garbage collection. Dynamically adding elements to the DOM slows down our CSS (more selector matching) and JavaScript (think getElementsByTagName). As developers, we need to develop a new way of thinking about the shape and behavior of our web apps in a way that addresses the long page persistence that comes with Web 2.0.
- Performance Standards: As the industry becomes more focused on web performance, a need for industry standards is going to arise. Many companies, tools, and services measure “response time”, but it’s unclear that they’re all measuring the same thing. Benchmarks exist for the browser JavaScript engines, but benchmarks are needed for other aspects of browser performance, like CSS and DOM. And current benchmarks are fairly theoretical and extreme. In addition, test suites are needed that gather measurements under more real world conditions. Standard libraries for measuring performance are needed, a la Jiffy, as well as standard formats for saving and exchanging performance data.
- JavaScript Help: Browsers also need to make it easier for developers to load JavaScript with less of a performance impact. I’d like to see two new attributes for the SCRIPT tag: DEFER and POSTONLOAD. DEFER isn’t really “new” - IE has had the DEFER attribute since IE 5.5. DEFER is part of the HTML 4.0 specification, and it has been added to Firefox 3.1. One problem is you can’t use DEFER with scripts that utilize document.write, and yet this is critical for mitigating the performance impact of ads. Opera has shown that it’s possible to have deferred scripts still support document.write. This is the model that all browsers should follow for implementing DEFER. The POSTONLOAD attribute would tell the browser to load this script after all other resources have finished downloading, allowing the user to see other critical content more quickly. Developers can work around these issues with more code, but we’ll see wider adoption and more fast pages if browsers can help do the heavy lifting.
Steve has been teaching a class on performance at Stanford this semester. You can check out his final and midterm to see if you could ace the exam. Also, you can see material from the guest lecturers. Check it out.
And, what are you looking forward to in 2009 wrt performance?
Tuesday, November 25th, 2008
Category: Browsers
, Performance

We posted on Steve's UA Profiler tool, and John Resig has taken a nice look at the current results.
It actually now looks like Minefield (Firefox nightly) is getting 10 out of 11, and the other browsers are doing great too.
Jonas Sicking of Mozilla has a really nice comment that talks about what the engines are doing and some nuances. For example, if you have a CSS file and a JS file, do you block just in case the JS looks into CSS values (e.g. "in case there is a call to .offsetTop in the script"). How about looking ahead to see? That is the case. You can download away and try to do the right thing. document.write() is another beast that seems to do a lot of harm. Having the browser be smart about it ("they don't do that") will be good.
Back to John, he also discusses features that we can use as developers:
Prefetching
This is part of the HTML 5 specification and allows for pages to specify resources which should be opportunistically downloaded in case they should be used in the future (the common example of image rollovers could be used here).
There's a full page describing how to use them on the Mozilla developer wiki but it isn't that hard to get started. It's as simple as including a new link element in the top of your site:
HTML:
-
<link rel="prefetch" href="/images/big.jpeg">
And that resource will be downloaded preemptively.
Inline Images
The final case that the profiler tests for is the ability of a browser to support inline images using a data: URI. Data URIs give developers the ability to include the image data directly within the page itself. While this saves an extra HTTP request it's important to note that the resource will not be cached (at least not as external resource - it may be cached as part of the complete page). The use of this technique will vary on a case-by-case basis but having a browser support it is absolutely important.
Thursday, November 13th, 2008
Category: JavaScript
, Performance
This is officially the week of John. If he delivers top notch posts for the rest of the week he wins an Ajaxian award or something. Maybe we need to bring back the "pack of cards" where each card is an Ajax personality and John gets to be Ace of Hearts or something.
I remember talking with some of the V8 team about how poor the world of timing is. Chrome is a lot more accurate in its timing, which can do it a disservice in browser performance tests. Some browsers would respond with "0" when Chrome would return "0.001" and it would hence suffer.
Add that to the flawed "just add up the total time for all tests" mentality of some tests and you end up with very skewed results (you could do amazingly bad on one test that in practice never matters and really well on the others, but it all evens out).
Here comes John with a post on the accuracy of JavaScript timing which came out of a bad situation:
I was running some performance tests, on Internet Explorer, in the SlickSpeed selector test suite and noticed the result times drastically fluctuating. When trying to figure out if changes that you've made are beneficial, or not, it's incredibly difficult to have the times constantly shifting by 15 - 60ms every page reload.
This lead him to tests life on various browsers and operating systems and he put up the raw data for you to check out.
He concludes:
Testing JavaScript performance on Windows XP and Vista is a crapshoot, at best. With the system times constantly being rounded down to the last queried time (each about 15ms apart) the quality of performance results is seriously compromised. Dramatically improved performance test suites are going to be needed in order to filter out these impurities, going forward.

Monday, November 10th, 2008
Category: Performance
, Testing
, Utility

Robert Kieffer has announced JSLitmus a tool "designed specifically to allow you to quickly and easily write a JavaScript test (or test suite), run it on any modern browser, and document and share the results."
To see it in action, Robert writes a test on "++" and plots the results for different browsers, and then draws some conclusions.
The API for creating a test is simple:
JAVASCRIPT:
-
-
JSLitmus.test('Empty function test', function() {});
-
Once you define your tests you can run them in the page thanks to the popup, and then it will do its thing and give you a Google Chart at the other end of things.
A nice little tool.
Thursday, November 6th, 2008
Category: Performance

Coach Wei has updated Razor Optimizer, "a JavaScript optimization tool for reducing code footprint and increasing runtime performnace. As a cross-browser web application itself, Razor Optimizer can be access either online as a service, or to be downloaded to run locally.
Razor Optimizer is based on a new approach for JavaScript optimization called "razor". While other optimization techniques such as JS minimization and concatenttion are based on static lexical analysis, Razor uses dynamic runtime profile information to achieve breakthrough results of 60% to 90% savings."
How it works
Razor Optimizer itself is a web based JavaScript application that runs in any browser. It contains a server component and a client component. Razor Optimizer client is an Ajax application based on Dojo 1.1. Razor Optimizer Server is a Java web application that runs inside any Java Servlet container. The following figure shows the architecture of Razor Optimizer.
The Idea Behind Razor Optimizer
Razor is based on the following observations:
- JavaScript functions are the basic low level building blocks of JavaScript code. Though typical JavaScript applications are made up of JavaScript files, functions are at a lower level than files because each JavaScript file is composed of JavaScript functions. While current JavaScript optimization techniques operates on a “file” level, performing optimization at the function level could yield much better result;
- At any moment of time, the browser needs only one function because only one JavaScript function is executed at any moment of time.
- Theoretically, the application would work fine if we download only one function at a time, right before the function is going to be called. Other functions are not needed. They can stay on the server side without being downloaded until they are going to be called. There is no need to download all the code up front, and there no need to download them at once;
- If only one function needs to be downloaded and stay on the client side, we can achieve breakthrough savings in both download size as well as client memory/CPU footprint, resulting in significant performance improvements above any other techniques.
The basic idea of Razor is to “trim” the “not needed” functions and only download these functions that are necessary for a specific usage scenario. This “trimming” process is called “raze”. After the initial download, if a “razed” function is needed, Razor will download this function on demand in the background.
Wouldn't downloading one function at a time be very slow? Indeed. However, if you package a bunch of related functions together and if this one "package" is enough to fulfill one or more use scenarios, the user wouldn't notice any negative performance impact of incremental downloading.
So the key to this approach is to understand when/which function is called during different runtime scenarios. For example, if we know exactly which functions are called and when they are called during the initial application loading, we can trim all other code from the initial download without breaking the application. This would significantly save the initial download size and improve page loading performance.
The knowledge of “when/which function is executed” can be achieved by profiling the application. By recording the profile data, we can have accurate knowledge of the dynamic runtime behavior of the application beyond static lexical analysis for delivering breakthrough optimization results.
What do you think of this approach?
Tuesday, October 28th, 2008
Category: JavaScript
, Performance
Matt has a nice post on delaying JavaScript execution in a way that waits for certain events to finish:
If you're looking to execute javascript code whenever someone finishes (or stops temporary) scrolling, moving the mouse, or resizing the page, you may find the following segment of code useful.
He shares the following boilerplate code:
JAVASCRIPT:
-
-
var onFooEndFunc = function() {
-
var delay = 50; /* milliseconds - vary as desired */
-
var executionTimer;
-
-
return function() {
-
if (executionTimer) {
-
clearTimeout(executionTimer);
-
}
-
-
executionTimer = setTimeout(function() {
-
// YOUR CODE HERE
-
}, delay);
-
};
-
}();
-
This can be useful in a variety of ways, but it got me thinking about having the ability to download code lazily. For example, a friend shared information on an app that would wait for a click and then download code to run that functionality. This was bad, as it made it seem very slow indeed. Instead, the code could be split up into core (what has to be loaded as soon as possible) and then load other code when idly using this technique.
Tuesday, October 14th, 2008
Category: Performance
Steve Souders posted on Runtime Page Optimizer a tool that you can think of as a performance proxy. It sits on the server side, and cleans up content before it is sent back to the browser.
What can it do? Steve let us know:
RPO automatically implements many of the best practices from my book and YSlow, so the guys from Aptimize contacted me and showed me an early version. Here are the performance improvements RPO delivers:
- minifies, combines and compresses JavaScript files
- minifies, combines and compresses stylesheets
- combines images into CSS sprites
- inlines images inside the stylesheet
- turns on gzip compression
- sets far future Expires headers
- loads scripts asynchronously
RPO reduces the number of HTTP requests as well as reducing the amount of data that is transmitted, resulting in a page that loads faster. In doing this the big question is, how much overhead does this add at runtime? RPO caches the resources it generates (combined scripts, combined stylesheets, sprites). The primary realtime cost is changing the HTML markup. Static pages, after they are massaged, are also cached. Dynamic HTML can be optimized without a significant slowdown, much less than what’s gained by adding these performance benefits.
Steve had another couple of interesting posts recently:
- Say no to IE6 discusses how we need to do something to help upgrade IE6 users (to IE7 is fine!)
- Raising the bar talks about results from Steve's UA Profiler tests and how new browsers are pushing forward
Wednesday, October 8th, 2008
Category: JavaScript
, Performance
Ars Technica has a new columnist, John Resig. His first piece is on Extreme JavaScript Performance which has started to come to us in abundance recently!
His article focuses on the latest updates to the fish, SquirrelFish Extreme:
A popular technique that is gaining traction amongst JavaScript engine implementers is that of optimizing the engine, while it's still processing the JavaScript code, to determine the "type" of the object that is being used. Since JavaScript doesn't include any sort of explicit type system JavaScript engines are frequently forced to check and re-check the values that they are handling, to insure their integrity. SFX rounds out the collection of other modern JavaScript engines, namely V8 and TraceMonkey, to provide this form of polymorphic inline caching. Interestingly, the idea for this form of caching comes from the Self programming language, the origin of many of the ideas in JavaScript (such as using prototypal inheritance instead of the more-common classical form of object inheritance seen in languages like Java).
JavaScript engines are serving as the test bed for new forms of dynamic language optimization. No other language is seeing this level of competition and rapid improvement that JavaScript is. This is optimal considering that JavaScript is one of the most widely-deployed programming languages available.
The SquirrelFish Extreme release currently stands as the fastest JavaScript engine [based on SunSpider] (although that's certain to change as healthy competition continues).
Tuesday, October 7th, 2008
Category: JavaScript
, Performance
Based on its performance on the regexes it does handle, WREC (WebKit Regular Expression Compiler) is indeed an awesome design. regexp-dna.js, however, is flawed and exaggerates SFX performance.
We could use nanojit to make a regex compiler for SpiderMonkey that would perform as well as WREC. But I don’t know if it’s worthwhile yet. Regex performance is much less important for today’s web than it is for SunSpider–I hope to link to a report on that in a future post.
That was the conclusion that David Mandelin of the Tamarin project as he looked into how "SquirrelFish Extreme (SFX) is kicking our butts so badly on regexp-dna.js."
I love David's posts, as they go into the real meat of the tech:
Technical details: the design of WREC. There are two main ways to implement regular expressions: using a backtracking matching engine, or by transforming the regex to a finite automaton (NFA, aka “state machine”), which does not backtrack. Most Perl-type regex engines, including both SpiderMonkey’s and WREC, follow the backtracking design. I don’t know the exact history of that choice, but at present it is much easier to implement features like group capture and backreferences in the backtracking design. Also, although some regexes scale only if implemented as NFAs, my tests suggest that many simple regexes, including those in SunSpider, are faster with backtracking.
As of this writing, WREC’s implementation strategy is dirt simple (which is a good thing). There are no transformations or fancy optimizations on the regex. WREC simply generates native code that directly implements the backtracking search. Thus, within a single match operation, there are no function calls, no traversals of regular expression ASTs, and few option tests, so almost all of the overhead is eliminated.
WREC’s code is very easy to read, so if you want to know exactly how it works, just read it in WREC.cpp. It’s also great example code for anyone implementing a compiler for a simple language like regular expressions. The basic plan is to parse the regular expression with functions named things like parseDisjunction (the | operator). Those functions directly call functions like generateDisjunction that generate the native code using the same assembler that the call-threading interpreter uses. There’s also the oddly named “gererateParenthesesResetTrampoline”. Inexplicably preserved typo, or watermark to detect copying of WREC code?
Wednesday, October 1st, 2008
Category: Performance
, Utility

Steve Souders is launching Hammerhead today at The Ajax Experience.
What is Hammerhead? I kinda think of it as continuous integration for performance. It is a Firebug plugin that you can setup to monitor the performance of your application. Imagine if you add a new feature that you think will speed things up, this tool will let you know how performance was really affected.
There are also cool features when you just want to whip it up on your own Firebug:
Even if you’re not hammering a site, other features make Hammerhead a useful add-on. The Cache & Time panel, shown in Figure 3, shows the current URL’s load time. It also contains buttons to clear the disk and memory cache, or just the memory cache. It has another feature that I haven’t seen anywhere else. You can choose to have Hammerhead clear these caches after every page view. This is a nice feature for me when I’m loading the same page again and again to see it’s performance in an empty or a primed cache state. If you forget to switch this back, it gets reset automatically next time you restart Firefox.
Finally, Steve Lamm posted on the Google Code blog about testing slower connections as well as the high speed one that you are probably on, and the techniques for doing that with Hammerhead.
Steve continues to come up with small useful tools for Web developers. Thanks Steve!