Tuesday, April 14th, 2009

Browser storage: What is the correct API? SQL? JSON?

Category: Browsers

(The following is repost from my personal blog).


Ian Hickson: “I expect I’ll be reverse-engineering SQLite and speccing that, if nothing better is picked first. As it is, people are starting to use the database feature in actual Web apps (e.g. mobile GMail, iirc).”

When I read that comment to Vlad’s post on HTML 5 Web Storage I gulped. This would basically make SQLite the HTML 5 for storage in the browser. You would have to be a little crazy to re-write the exact semantics (including bugs) of SQLite and its dialect. What if you couldn’t use the public domain code?

Gears lead out strong with making a relational database part of the toolbox for developers. It embedded its own SQLite, in fact one that was customized to have the very cool full text search ability. However, this brings up the point of “which SQLite do you standardize on?”

The beauty of using SQL and SQLite is that many developers already know it. RDBMS has been mainstream for donkey’s years; we have tools to manage SQL, to view the model, and to tweak for performance. It has gone through the test of time.

However, SQL has always been at odds with many developers. Ted Neward brought up ORM as the vietnam of computer science (which is going a touch far ;). I was just lamenting with a friend at Microsoft on how developers spend 90% of their time munging data. Our life is one of transformations, and that is why I am interested in a world of JavaScript on client and server AND database. We aren’t there yet, but hopefully we can make progress.

One of Vlad’s main questions is “Is SQL the right API for Web developers?” and it is a valid one. I quickly found that for most of my tasks with the DB I just wanted to deal with JSON and hence created a wrapper GearsDB to let me insert/update/select/delete the database with a JSON view of the world. You probably wouldn’t want to do this on large production applications for performance reasons, but it works well for me.

Now a days, we have interesting APIs such as JSONQuery which Persevere (and other databases) use. I would love to see Firefox and other browsers support something like this and let us live in JSON throughout the stack. It feels so much more Webby, and also, some of the reasons that made us stay with SQL don’t matter as much in the client side world. For example, when OODBMS took off in some Enterprises, I remember having all of these Versant to Oracle exports just so people could report on the darn data. On the client the database is used for a very different reason (local storage) so lets use JSON!

That being said, at this point there are applications such as Gmail, MySpace search, Zoho, and many iPhone Web applications that use the SQL storage in browsers. In fact, if we had the API in Firefox I would have Bespin using it right now! We had a version of this that abstracted on top of stores, but it was a pain. I would love to just use HTML 5 storage and be done.

So, I think that Firefox should actually support this for practical reasons (and we have SQLite right there!) but should push JSON APIs and let developers decide. I hope that JSON wins, you? I also hope that Hixie doesn’t have to spec SQLite :/

It was also interesting to just read this post Abusing Web Storage via Sam Ruby:

Alberto Trivero: The aim of this white paper is to analyze security implications of the new HTML 5 client-side storage technology, showing how different attacks can be conduct in order to steal storage data in the client’s machine.

Posted by Dion Almaer at 5:12 am

4.2 rating from 19 votes


Comments feed TrackBack URI

I feel quite strongly that SQL is, at best, overkill and, at worst, the wrong hammer for the job. I would prefer a JSON centric key/value storage system, with some kind of simple map/reduce search search functionality. E.g. pass a callback to a search function that returns the value to include in the result set. On a server this would be a performance nightmare as one server (or a cluster) would have to serve many requests and performing a search of this kind would not scale. However on the client, I have a the entire machine at my disposal (ok, it might be an iphone, but that still quite a bit of raw CPU). Also, in my opinion its *unlikely* that we would be talking more than thousands of records stored on a client (i.e. not millions) due to the difficulty of initially downloading and syncing….

Comment by aljames — April 14, 2009

The few times that i used persistens storgae i used JSON. I cannot imagine converting to SQL and back would make a web developer happy.

Comment by Jaaap — April 14, 2009

JSON is clearly the way to go.

Comment by andysky — April 14, 2009

SQLite and its features “forever”. I use this little monster since version 2 as portable desktop applications engine and website database as well (faster than MySQL in many SELECT cases).
With JSON in a couple of months we all will reinvent the wheel creating libraries to search words in saved strings, order data, etc etc.
If you need JSON, use JSON, if you need a database, SQLite is the best choice ever, IMO. Both things are completely different and I hope nobody will consider seriously to remove SQLite as storage engine from browsers.

Comment by WebReflection — April 14, 2009

I think something like CouchDB would be interesting to look at as a model. It already does synchronization well and uses JavaScript as a query/view language. Its document oriented as opposed to relational or OO, but its an interesting thought.

However, the practical side of me says that this kind of thing is way too new and that despite difficulties, people would rather use what they know – sql.

Comment by genericallyloud — April 14, 2009

With JSON in the browser and JSON on the wire, JSON in the database is an obvious requirement! Up to now, my REST APIs mostly converting DTOs from Java/Python/etc. to JSON and vice-versa, after the translation between SQL/GQL/etc. and programming languages. So much time and CPU wasted :(

Will JSON be the final format? I hope not because I would like to start carrying enhanced messages, a-la RDF with triples for example…

Comment by DomDerrien — April 14, 2009

I don’t see how you get decent filtering performance on anything but a SQL engine. I want live filtering, with the same abilities as where clauses (filter on dates, count(), …). Otherwise a client-side database just is not useful for the sort of work I would use it for.

Comment by Joeri — April 14, 2009

@Joeri: Believe it or not it is possible to build indices and not use SQL (I know it is difficult to imagine efficient filtering can be done without a specific query syntax :P ). We’ll have a post coming out soon showing how Persevere (which uses JSON-style storage with JSONQuery query language) can actually outperform the typical MySQL setup.

@aljames: Map/Reduce is a cool technology for a specific set of problems, but it is a very poor fit for the broad general use cases of the web.

Comment by kriszyp — April 14, 2009

@kriszyp, is it better to have the possibility to implement a JSON layer over a database engine with great SQL support, transactions, and full-text searches, or have only the JSON layer without natural database development possibility?
I think we should have the possibility to use SQLite, the rest will be part of this or that library. What I mean, who cares about libraries users? Most of us do not even deeply know JavaScript or DOM, just a lib API … SQLite is the right way, imo

Comment by WebReflection — April 14, 2009

@WebReflection: The point is that querying, transaction, and full-text search is not dependent on SQL, extensive DB capabilities do not require SQL to be the query language. Relational DBs/SQL are a poor fit for storage within the realm of persisting JS data. Relational DBs within JS would almost always end up at ORM, which basically means you have lost, the DB didn’t fit the data, it requires translation. On other hand, we have the opportunity to use a query language and storage engines that are actually designed for JavaScript, not just carried over from archaic traditional design.

Comment by kriszyp — April 14, 2009

JSON, like turtles, all the way down. Doesn’t have to be that under the covers, but that is what the interface should be.

Comment by mpcm — April 14, 2009

what amazes me is xsl didn’t get the mass adoption it deserves. it’s xml oriented and transformation oriented which is that 90%.

Comment by ilazarte — April 14, 2009

@kriszyp, my point is that I prefer “low level” layers rather than high levels with inevitable lacks of this or that feature. What you would like to have is possible via intermediates layers and thanks to recent JSON.parse while having only JSON means our “fantasy” will be blocked because of a missed layer, the direct db manipulation via SQL language.
Why are you guys that scared from a client db engine? Possibilities are numerous and hacks problem will be the same with whaever JSON layer we put in the middle. I do not even get the discussion above … SQLite is SQLite, it is not new, it is fast, it is reliable, it is transactional, full-text, etc etc … I wonder why you do not use JSON in the server side as well, since I cannot figure out any difference between these two environment, only difficulties to sync them because of people whose do not want “to learn” SQL … again, where is the problem? It is like getter/setter discussion, useless, imo.

Comment by WebReflection — April 14, 2009

@WebReflection: I don’t have any issue with a low-level API, but SQL isn’t low-level, it only becomes so when you are forced to rebuild the stack on top of it. A low-level API that provided direct access to tables and indexes through iterators and binary search would be fine. Libraries could then implement query languages on top of it (SQL or JSONQuery). However, this isn’t entirely about SQL, it’s about using a relational DBs to store data from an environment that is very object-oriented. If you look at all the functionality that ORMs provide, all of that is basically doing translation where data paradigms are mismatched. Why build a mismatched storage system when the data structure of the language is known (the JS object model)?

Comment by kriszyp — April 14, 2009

Dion, thanks for the post.

I must say that having SQL on the client is a good idea. Once we have JS on the server become a norm, the only concern will become identifying what part of the logic is private (server) and what is public (server/client). As for Relational to Object models, they both have their strengths and weaknesses and there are tools to bridge the two. I wouldn’t want to lose the performance and legacy of SQL until a mature alternative is presented.

Comment by dimitryz — April 14, 2009


nice pratchett ref!

Comment by thnkfstr — April 14, 2009

@genericallyloud I agree with you mate, CouchDB could certainly be the way to go. The query interface is already JSON, and it’s document-centric nature means that it essentially avoids the Object Oriented-Relational mismatch.

It’s kinda frustrating that after so many years of work and technological advances, the best thing we’ve really come up with to address this mismatch is ORM. Part of me always resents having to fiddle with my domain model, even in small ways, just to satisfy the persistence strategy I’ve chosen.

Comment by GingerNinja — April 14, 2009

I have been working on the nextdb.net project, which is a hosted relational database with a javascript API that allows you to update, delete, insert and execute queries that return pure JSON. The server returns pure JSON that is structured within the very definition of the queries that are defined and securely compiled on the server. SQL is a great query language but it was not designed for the web. Nextdb.net has a query language that addresses the security issues making it safe and easy for javascript programming by allowing the user to sculpt the object structure of the JSON response within the query language itself, the query language is called NextQuery. No ORM mismatch, no synchronization issues. The project is still in alpha but getting ready for beta soon.

Comment by BH23 — April 15, 2009

@kriszyp I disagree entirely. Map/reduce at its core is just a function that returns a value (or not) for a given object in the set. This is actually a very intuitive way of working with persistent storage in object based languages (particularly loosely typed ones). It has many advantages (the mot significant of which is that you dont need to learn SQL and whatever language you are using). I am not talking about the complexity of parallel searching that is necessary on a large scale cluster.

Comment by aljames — April 15, 2009

@aljames: Map/reduce is intuitive, but it doesn’t help with the general use cases of a database, it does nothing to provide the workhorse parameterized queries with O(log n) access to data, that is the cornerstone of database querying. It is easy and does a great job of distributed processing for tasks that require analyzing large sets of data, but it works in O(n), so it will never compare to pre-indexed binary tree access for the majority of database query operations.

Comment by kriszyp — April 15, 2009

I used JSON and implement query syntax like SQL:

create data: http://vnjs.net/?id=1000000069
insert: http://vnjs.net/?id=1000000077
update: http://vnjs.net/?id=1000000079
delete: http://vnjs.net/?id=1000000075
select: http://vnjs.net/?id=1000000078
and select HTML element from DOM:

end link to download my framework: http://vnjs.net/www/src/kombai.rar

Comment by mnx2012 — April 15, 2009

@kriszyp Agreed that indexes are important (although remember that we are talking about *tiny* databases here), but pre-indexed O(log n) access is not the sole domain of SQL!
For instance, whats the difference between “SELECT * FROM rows WHERE id>5” and “function(row){return (row.id>5) ? row : null; }”?

Both could be computed with a index on id, but instead of the query optimizer optimizing SQL, it would optimize javascript instead. So here we have the same as SQL, i.e. a small subset of possible queries are solvable in O(log n) using indexes, others require a table scan.

In an ideal world, the indexing would be carried out behind the scenes, like on google app engine developer kit. I.e. if you execute a query that could be solved on an index on field A so many times, it generates it so only the first access is slower.

Comment by aljames — April 17, 2009

Leave a comment

You must be logged in to post a comment.