Saturday, September 20th, 2008

SquirrelFish Extreme: JIT comes to SquirrelFish with extreme results

Category: JavaScript, Performance, WebKit

<p>

While Ben and I were talking about JavaScript performance (and other things) at Web 2.0 Expo NYC, Maciej Stachowiak announced SquirrelFish Extreme, the very new and improved version that appears to do very well at SunSpider:

SquirrelFish Extreme:	943.3 ms
V8: 1280.6 ms
TraceMonkey: 1464.6 ms

What makes it so fast?

SquirrelFish Extreme uses four different technologies to deliver much better performance than the original SquirrelFish: bytecode optimizations, polymorphic inline caching, a lightweight “context threaded” JIT compiler, and a new regular expression engine that uses our JIT infrastructure.

1. Bytecode Optimizations

When we first announced SquirrelFish, we mentioned that we thought that the basic design had lots of room for improvement from optimizations at the bytecode level. Thanks to hard work by Oliver Hunt, Geoff Garen, Cameron Zwarich, myself and others, we implemented lots of effective optimizations at the bytecode level.

One of the things we did was to optimize within opcodes. Many JavaScript operations are highly polymorphic – they have different behavior in lots of different cases. Just by checking for the most common and fastest cases first, you can speed up JavaScript programs quite a bit.

In addition, we’ve improved the bytecode instruction set, and built optimizations that take advantage of these improvements. We’ve added combo instructions, peephole optimizations, faster handling of constants and some specialized opcodes for common cases of general operations.

2. Polymorphic Inline Cache

One of our most exciting new optimizations in SquirrelFish Extreme is a polymorphic inline cache. This is an old technique originally developed for the Self language, which other JavaScript engines have used to good effect.

Here is the basic idea: JavaScript is an incredibly dynamic language by design. But in most programs, many objects are actually used in a way that resembles more structured object-oriented classes. For example, many JavaScript libraries are designed to use objects with “x” and “y” properties, and only those properties, to represent points. We can use this knowledge to optimize the case where many objects have the same underlying structure – as people in the dynamic language community say, “you can cheat as long as you don’t get caught”.

So how exactly do we cheat? We detect when objects actually have the same underlying structure — the same properties in the same order — and associate them with a structure identifier, or StructureID. Whenever a property access is performed, we do the usual hash lookup (using our highly optimized hashtables) the first time, and record the StructureID and the offset where the property was found. Subsequent times, we check for a match on the StructureID – usually the same piece of code will be working on objects of the same structure. If we get a hit, we can use the cached offset to perform the lookup in only a few machine instructions, which is much faster than hashing.

Here is the classic Self paper that describes the original technique. You can look at Geoff’s implementation of the StructureID class in Subversion to see more details of how we did it.

We’ve only taken the first steps on polymorphic inline caching. We have lots of ideas on how to improve the technique to get even more speed. But already, you’ll see a huge difference on performance tests where the bottleneck is object property access.

3. Context Threaded JIT

Another major change we’ve made with SFX is to introduce native code generation. Our starting point is a technique called a “context threaded interpreter”, which is a bit of a misnomer, because this is actually a simple but effective form of JIT compiler. In the original SquirrelFish announcement, we described our use of direct threading, which is about the fastest form of bytecode intepretation short of generating native code. Context threading takes the next step and introduces some native code generation.

The basic idea of context threading is to convert bytecode to native code, one opcode at a time. Complex opcodes are converted to function calls into the language runtime. Simple opcodes, or in some cases the common fast paths of otherwise complex opcodes, are inlined directly into the native code stream. This has two major advantages. First, the control flow between opcodes is directly exposed to the CPU as straight line code, so much dispatch overhead is removed. Second, many branches that were formally between opcodes are now inline, and made highly predictable to the CPU’s branch predictor.

Here is a paper describing the basic idea of context threading. Our initial prototype of context threading was created by Gavin Barraclough. Several of us helped him polish it and tune the performance over the past few weeks.

One of the great things about our lightweight JIT is that there’s only about 4,000 lines of code involved in native code generation. All the other code remains cross platform. It’s also surprisingly hackable. If you thought compiling to native code is rocket science, think again. Besides Gavin, most of us have little prior experience with native codegen, but we were able to jump right in.

Currently the code is limited to x86 32-bit, but we plan to refactor and add support for more CPU architectures. CPUs that are not yet supported by the JIT can still use the interpreter. We also think we can get a lot more speedups out of the JIT through techniques such as type specialization, better register allocation and liveness analysis. The SquirrelFish bytecode is a good representation for making many of these kinds of transforms.

4. Regular Expression JIT

As we built the basic JIT infrastructure for the main JavaScript language, we found that we could easily apply it to regular expressions as well, and get up to a 5x speedup on regular expression matching. So we went ahead and did that. Not all code spends a bunch of time in regexps, but with the speed of our new regular expression engine, WREC (the WebKit Regular Expression Compiler), you can write the kind of text processing code you’d want to do in Perl or Python or Ruby, and do it in JavaScript instead. In fact we believe that in many cases our regular expression engine will beat the highly tuned regexp processing in those other languages.

Since the SunSpider JavaScript benchmark has a fair amount of regexp content, some may feel that developing a regexp JIT is an “unfair” advantage. A year ago, regexp processing was a fairly small part of the test, but JS engines have improved in other areas a lot more than on regexps. For example, most of the individual tests on SunSpider have gotten 5-10x faster in JavaScriptCore — in some cases over 70x faster than the Safari 3.0 version of WebKit. But until recently, regexp performance hadn’t improved much at all.

We thought that making regular expressions fast was a better thing to do than changing the benchmark. A lot of real tasks on the web involve a lot of regexp processing. After all, fundamental tasks on the web, like JSON validation and parsing, depend on regular expressions. And emerging technologies — like John Resig’s processing.js library — extend that dependency ever further.

Major kudos to the entire SFX team for pulling this off. Now, to grab a new nightly…

Related Content:

  • Extreme backup
    Newsflash: Conventional data protection has reached its limits and will be dramatically changing in the next couple of years. But all roads don't lead...
  • Extreme system administration
    Some considerations in applying the concepts of extreme programming to system...
  • Extreme data recovery
    When Orly Productions had a LTO tape immersed in mud following the Christchurch earthquake, they turned to Kroll Ontrack to recover the critical...
  • Data warehouse architecture: Extreme makeover
    When it comes to data warehouse architecture, there's no need to reinvent the wheel, says guest columnist Rick Sherman. There's always something to be...
  • Extreme makeover -- Data warehouse edition
    When it comes to building a data warehouse, there's no need to reinvent the wheel, says guest columnist Rick Sherman. There's always something to be...

Posted by Dion Almaer at 12:09 am
13 Comments

++++-
4.4 rating from 30 votes

13 Comments »

Comments feed TrackBack URI

I wish they would list build numbers on their charts. Their results seem to contradict the info here: http://weblogs.mozillazine.org/roadmap/archives/2008/09/tracemonkey_update.html

50% faster javascript is nice but when they compare to Firefox, Firefox’s biggest problem is it’s rendering engine. Gecko crawls. Try just collapsing a 50 row table, row by row on a timer, in IE vs Safari vs Opera vs Chrome vs Firefox. Firefox loses by far, even with JIT turned on in 3.1

Also, fast javascript is only useful to the users of that browser. For developers, you still have to deal with the slowest javascript of all your potential users, IE6 being the “modern” low end.

So IE6 is the new NS4, ugh. We’ll still be throwing in code to deal with the stragglers through 2010.

Comment by ck2 — September 20, 2008

I think the only way mozilla can catch up is to restart from scratch and rewrite all the codes. Firefox will be still slow if only the javascript engine gets updated, the rendering engine is still slower compared webkit.

Comment by rizqi — September 20, 2008

@rizqi:
FYI more and more of the native C++ in FireFox is being moved over to JavaScript. Plus I guess you never heard of Thebes

Comment by TNO — September 20, 2008

>>Also, fast javascript is only useful to the users of that browser. For developers, you still have to deal with the slowest javascript of all your potential users, IE6 being the “modern” low end.
.
Each developer has to deal with the disparity of speed differences between browsers. You don’t want to limit what you can do just because IE6 exists.
.
What I do is write to modern browsers and then adjust for IE when running on IE.
.
For instance, I update my canvas charts only once every four frames in IE, and I turn off some other functionality.
.
I’m also planning on giving my IE6 users the ability to download a (naturally faster, since it uses WebKit) version as an Adobe AIR app.
.
To encourage IE6 users to at least grab Chrome, Opera, or Firefox (if they cannot throw out IE6 for some reason, such as needing IE6 to run corporate internet apps), IE6 users see the following message. I worked hard on it so as not to insult IE6 users, but I’m open to opinions on the wording.
.
“You are using Internet Explorer 6.0.
.
MYWEBSITENAME performs many calculations, so it works best with modern browsers such as IE7, IE8, Firefox, Opera, and Google Chrome.
.
If you have another browser, please switch to it now.
.
This version of MYWEBSITENAME does support IE6, but some operations will be very slow. Please be patient.”

Comment by Nosredna — September 20, 2008

I think that all benchmarks should use IE6 as their baseline, so that we can hammer home to IE6 users, how far today’s browsers have advanced. So how does this compare to IE6 performance?

Comment by pugmaster — September 20, 2008

well… IE6 beats the crap out of Firefox 2 and 3 in String concatting (if using Array + join )

Comment by Montago — September 20, 2008

>>well… IE6 beats the crap out of Firefox 2 and 3 in String concatting (if using Array + join )
.
Heh. Yeah. Maybe Microsoft should use that in their advertising.

Comment by Nosredna — September 20, 2008

Bill Gates: What do you think about speed, Jerry?
.
Jerry Seinfeld: Woah, Bill. I’m already a little bit fidgety. I don’t think speed would suit me.
.
Bill: No Jerry, I was thinking about the fact that Internet Explorer beats Firefox when you’re concatenating strings using an array join.
.
Jerry: Are you a little hungry? I’m hungry.

Comment by Nosredna — September 20, 2008

> I think that all benchmarks should use IE6 as their baseline, so
> that we can hammer home to IE6 users, how far today’s
> browsers have advanced. So how does this compare to IE6 performance?

IE6 JS sucks but it still has the fastest rendering performance when there are background images present during dhtml, compared to ALL other modern browsers that I have benchmarked (FF3.1 Safari3 Opera9.5)

So JS rendering has come a long way but rendering still has a way to go. I don’t understand with the advances in video cards why it’s still so slow except that maybe because it’s only 2D.

Comment by ck2 — September 21, 2008

@ck2: Could you provide a benchmark demonstrating this?

Comment by eyelidlessness — September 21, 2008

@eyelidlessness, forgive the quicky/poorly made code but this will prove my point – open IE6 and Firefox full screen and run this code:

http://pastebin.com/m2fce0d31

Firefox will be half the speed of IE. Then try removing the background images and Firefox is fine. I suspect this is related to the Gecko smooth scrolling performance bugs which were finally dealt with for FF3 but still slow down overall performance. (Note that window size is critical, if you make the window small, you won’t see the slow down – but screen sizes have never been larger).

There is probably a better way to demonstrate this issue but was easiest way I could whip up quickly.

Comment by ck2 — September 22, 2008

Where is the “fastest browser on earth”-Opera?

Comment by Jeria — September 22, 2008

By the way, Safari 4 for Windows is the first browser to run my little benchmark smoothly (fullscreen). Everything else, Firefox 3.1 and even Chrome is not smooth enough.

Looks like the last pastebin went dead, so here’s another copy if anyone else wants to see what I mean:

http://pastebin.com/f4cd4165d

Comment by ck2 — February 25, 2009

Leave a comment

You must be logged in to post a comment.