Wednesday, February 4th, 2009

irregexp: New fast Regexp engine in Chrome

Category: Chrome, Performance

>We are seeing great work with faster JavaScript, but what about faster DOM? Or faster CSS? Or faster libraries? Or faster Regexp? Well, members of the Chrome team have announced irregexp, a faster Regexp engine for Chrome. They go into detail on what they had, what they did, and why they did it.

It is also well known that SunSpider tests Regexp that isn’t targeted to what is actually done on the Web. The V8 benchmark has updated tests that aim to do a better job here. Being Google, they also have this index laying around, so they took advantage of that to test against popular webpages:

During development we have tested Irregexp against one million of the most popular webpages to ensure that the new implementation stays compatible with our previous implementation and the web. We have also used this data to create a new benchmark which is included in version 3 of the V8 Benchmark Suite. We feel this is a good reflection of what is found on the web.

I would love to see this available for other browsers to test on :)

And now, here are the details on the changes:

While the V8 team has been working hard to improve JavaScript performance, one part of the language that we have so far not given much attention is regexps. Our previous implementation was based on the widely used PCRE library developed by Philip Hazel at the University of Cambridge. The version we used, known as JSCRE, was adapted and improved by the WebKit project for use with JavaScript. Using JSCRE gave us a regular expression implementation that was compatible with industry standards and has served us well. However, as we’ve improved other parts of the language, regexps started to stand out as being slower than the rest. We felt it should be possible to improve performance by integrating with our existing infrastructure rather than using an external library. The SquirrelFish team is following a similar approach with their JavaScript engine.

A fundamental decision we made early in the design of Irregexp was that we would be willing to spend extra time compiling a regular expression if that would make running it faster. During compilation Irregexp first converts a regexp into an intermediate automaton representation. This is in many ways the “natural” and most accessible representation and makes it much easier to analyze and optimize the regexp. For instance, when compiling /Sun|Mon/ the automaton representation lets us recognize that both alternatives have an ‘n’ as their third character. We can quickly scan the input until we find an ‘n’ and then start to match the regexp two characters earlier. Irregexp looks up to four characters ahead and matches up to four characters at a time.

After optimization we generate native machine code which uses backtracking to try different alternatives. Backtracking can be time-consuming so we use optimizations to avoid as much of it as we can. There are techniques to avoid backtracking altogether but the nature of regexps in JavaScript makes it difficult to apply them in our case, though it is something we may implement in the future.

Related Content:

Posted by Dion Almaer at 5:19 pm
5 Comments

++++-
4.4 rating from 26 votes

5 Comments »

Comments feed TrackBack URI

Cool, glad to know they’re making improvements anywhere they can.

Comment by Chiper — February 4, 2009

Mozilla’s progress for those interested (slightly dated):
https://bugzilla.mozilla.org/show_bug.cgi?id=406271
http://blog.mozilla.com/dmandelin/2008/10/06/squirrelfishing-in-regexp-dnajs/

Comment by TNO — February 4, 2009

Good to see that regexp is getting some attention but maybe even more important than making faster, how about making it complete? This page: http://www.regular-expressions.info/javascript.html has a list of features that are missing from the regexp in js. I’d like to see some of them added at some point. Of course, it’s useless if it happens in only one browser so it’s not really up to google, apple, mozilla, opera, or… seems like I’m forgetting somebody here.. no I guess not.

Comment by okonomiyaki3000 — February 4, 2009

FYI – the benchmark for v8 with regexp measurements has been posted and is available here:

http://v8.googlecode.com/svn/data/benchmarks/v3/run.html

Comment by mikebel — February 4, 2009

I wish they’d add regex lookbehind as well. :/

Yeah I know, I’m quite offtopic (since that mostly has to do with ECMAScript), but I felt like whining a bit :P

Comment by LeaVerou — February 4, 2009

Leave a comment

You must be logged in to post a comment.