Monday, April 7th, 2008

JavaScript SAX Based Parser

Category: JavaScript, Library

Gregory Reimer fancies SAX, and wished that a SAX parser was given to us by the JavaScript host environment. You can’t blame him for not living DOM, but how about E4X? Or, StAX? Anyway, Gregory decided to build a SAX based parser in JavaScript itself, using simple search and replace:

After reading Search and Don’t Replace over at John Resig’s blog, it got me wondering if you could use that technique as the basis for a SAX parser in JavaScript. Of course there’s nothing stopping you from building a SAX parser from scratch in JavaScript, but (methinks) the string tokenizer part of it would be a bit of a beast. However, by taking advantage of the optimization built into JavaScript’s RegExp replacement engine, you might just be able to work a nice little souped-up tokenizing engine out of the deal.

So I thought I’d give it a try. What I came up with is nowhere near anything resembling a real-world, valid XML parser. All it knows how to deal with are elements, text nodes and character entities. And not all of the error messages are as helpful as a real world implementation should be. And I’m sure there are plenty of bugs since I banged this out in less than an afternoon. But it ran like scalded cats on a 422kb file

You get to use the simple SAX modules a la:

javascript

  1. function doStartTag(name){alert("opening tag: "+name);}
  2. function doEndTag(name){alert("closing tag: "+name);}
  3. function doAttribute(name,val){alert("attribute: "+name+'="'+val+'"');}
  4. function doText(str){
  5.     str=str.normalize();
  6.     if(!str){str='[whitespace]';}
  7.     alert("encountered text node: "+str);
  8. }

Downlaod the SAX parser.

Posted by Dion Almaer at 4:00 am
2 Comments

-----
-17647056.1 rating from 17 votes

2 Comments »

Comments feed TrackBack URI

A JS SAX parser has been available for a while from Mozilla’s JSLib:

http://jslib.mozdev.org/libraries/utils/sax.js.html

However, I personally only see SAX as a relativiely minor step up from straight DOM code, I prefer something a little more high-level, such a JS implementation of Apache Commons Digester:

http://books.google.com/books?id=tkARdW-sRoAC&pg=PA246&lpg=PA246&dq=jsdigester&source=web&ots=Yr1OCWMzCE&sig=MIR_i3TNQHLYDY-T0-5y_9-gkdw&hl=en#PPA231,M1

Or, for the Java enthusiasts:

http://javawebparts.sourceforge.net/javadocs/javawebparts/taglib/jstags/JSDigesterTag.html

I wonder if Gregory’s implementation might make JSDigester more performant? Might be an interesting experiment, if I can find the time.

Comment by fzammetti — April 7, 2008

SAX is a somewhat outdated xml parsing model, the latest and most adavnced model is vtd-xml

http://vtd-xml.sf.net

Comment by jzhang2009 — December 2, 2009

Leave a comment

You must be logged in to post a comment.