Wednesday, February 4th, 2009
It is also well known that SunSpider tests Regexp that isn’t targeted to what is actually done on the Web. The V8 benchmark has updated tests that aim to do a better job here. Being Google, they also have this index laying around, so they took advantage of that to test against popular webpages:
During development we have tested Irregexp against one million of the most popular webpages to ensure that the new implementation stays compatible with our previous implementation and the web. We have also used this data to create a new benchmark which is included in version 3 of the V8 Benchmark Suite. We feel this is a good reflection of what is found on the web.
I would love to see this available for other browsers to test on :)
And now, here are the details on the changes:
A fundamental decision we made early in the design of Irregexp was that we would be willing to spend extra time compiling a regular expression if that would make running it faster. During compilation Irregexp first converts a regexp into an intermediate automaton representation. This is in many ways the “natural” and most accessible representation and makes it much easier to analyze and optimize the regexp. For instance, when compiling /Sun|Mon/ the automaton representation lets us recognize that both alternatives have an ‘n’ as their third character. We can quickly scan the input until we find an ‘n’ and then start to match the regexp two characters earlier. Irregexp looks up to four characters ahead and matches up to four characters at a time.
Posted by Dion Almaer at 5:19 pm