As preparation for an upcoming tech talk about Placemaker I thought it would be good to take a bit of the pain out of the geolocation service by making an interface for it. Placemaker works the following way: you post some content or a URL to it, it goes through the content or gets the content from the URL and analyzes it. It then finds geographical locations in the text and disambiguates them (for example to skip names like "Jack London" and not consider it the city London). Finally you get it back as XML.
The annoying thing is that Placemaker only support POST request and does not have a JSON output - for now. GeoMaker allows non-developers to enter some text or a URL, filter the results (using YUI datatable) to remove false positives (no system is perfect) and get the embed code for a Yahoo Map or a list of microformatted locations as copy+paste. See the screencast to get the end user experience:
Of course, every time you build something like that, red-blooded developers will ask for an API to do the same thing (and pointing them to Placemaker wasn't enough). So here it is:
http://icant.co.uk/geomaker/api.php takes two parameters: url of the web document to load and output which could be map, kml, microformats, csv, or json (with callback for JSON-P). Using this you can analyze a url in JavaScript and get the data back as an array:
It includes compatibility tables, and will try to run the tests on your browser to give you feedback. It also includes sample code to check web browser support that you can use in your own projects.
When you provide a JSON output for developers it does make sense to also allow for a callback parameter. That way your code can be used in script nodes without having to use any backend at all. Commonly this is called JSON-P and has been covered here in the long long ago. One of the issues with JSON-P is that when the callback method is not defined it throws an error.
The Bing API is the first instance where I have seen that they worked around that as the output is this:
I have no clue what the /* pageview_candidate */ is about and frown upon omitting the {} of the if statement, but I must say I do like this. One issue is that while end users don't get annoyed with errors, developers don't have a clue what happened either as the error is silent. One proposal would be to use a console.log() when there is an error:
The latest beta of Persevere features a new native object storage engine called JavaScriptDB that provides high-end scalability and performance. Persevere now outperforms the common PHP and MySQL combination for accessing data via HTTP by about 40% and outperforms CouchDB by 249%. The new storage engine is designed and optimized specifically for persisting JavaScript and JSON data with dynamic object structures. It is also built for extreme scalability, with support for up to 9,000 petabytes of JSON/JS data in addition to any binary data.
This comparison isn't exactly apples-to-apples as it turns out–for the web app use case, Perservere has a bunch of value-adds on top of data storage:
Persevere/JavaScriptDB goes further [than relational DBs] with the flexibility to evolve schemas and handle partial schemas. Persevere also provides integrated server side JavaScript (SSJS) with persistence, Comet-driven data change notifications, JSONQuery, standards based HTTP interface with content negotiation, JSON-RPC interface to SSJS, cross-domain handling, CSRF protection, and more. All of these things are additional features that one would have to add to the stack for other storage systems, making them even slower. Persevere includes this functionality out of the box, while still maintaining extremely fast performance.
Kris spends a bit of time in his post explaining his test setup, but then gets to the good stuff:
So how does Persevere achieve this level of performance with the JavaScriptDB storage? The dynamic object-oriented nature of the data that is stored in JavaScriptDB is much different than that of a traditional relational database, so a number of innovative approaches were employed.
He goes into quite a bit of detail explaining the implementation details behind JavaScriptDB. The summary (with lightly edited quotes) is:
Direct Data-Bound Object Representation: "In a traditional application stack, a record must have separate in-memory representations for [the database] and [the application] result set which then might be mapped to an object representation. With JavaScriptDB, the single in-memory object is reused for all result sets and data caching."
Shared Cache of Objects with Copy-on-Write
Append-based Database Storage: "Many traditional database commit data to a transaction log before committing data to the table, requiring multiple writes. JavaScriptDB appends transactional data directly to the main storage file; writes can be committed with a single IO operation."
BrowserCouch is an attempt at an in-browser MapReduce implementation. It's written entirely in JavaScript and intended to work on all browsers, gracefully upgrading when support for better efficiency or feature set is detected.
Not coincidentally, this library is intended to mimic the functionality of CouchDB on the client-side, and may even support integration with CouchDB in the future.
Why?
This prototype is intended as a response to Vladimir Vuki?evi?'s blog post entitled HTML5 Web Storage and SQL. A CouchDB-like API seems like a nice solution to persistent storage on the Web because so many of its semantics are delegated out to the JavaScript language, which makes it potentially easy to standardize. Furthermore, the MapReduce paradigm also naturally takes advantage of multiple processor cores—something that is increasingly common in today's computing devices.
There is also an interactive CouchDB project that "is an emulator written in 100% JavaScript with tons of jQuery thrown in. It also implements the collation schemes as well as the map/reduce algorithms. While it doesn’t demonstrate replication, conflict management and a host of other capabilities in CouchDB, it does strive to illustrate concepts like schema-less JSON documents, map/reduce and how these things fit together."
When Aaron talked about the Chrome extension API he mentioned how he would see if JSON Schema could help them have a JSON heavy API and allow them to easily validate.
He has quickly reported back and is happy with the results.
The main findings of the team were that eval() is not only evil but also very slow whereas dynamic script nodes are fast but insecure. The solution was to do a custom evaluation of string data rather than using JSON:
Having set the performance bar pretty high with the last approach, we dove into custom data formats. The challenge would be to create a format that we could parse ourselves, using JavaScript’s String and RegExp methods, that would also match the speed of JSON executed natively. This would allow us to use Ajax again, but keep the data restricted to our domain.
Since we had already discovered that some methods of string manipulation didn’t perform well on large strings, we restricted ourselves to a method that we knew to be fast: split(). We used control characters to delimit each contact, and a different control character to delimit the fields within each contact. This allowed us to parse the string into contact objects with one split, then loop through that array and split again on each string.
for(var n = 0, len = that.contacts.length, contactSplit; n <len; n++){
contactSplit = that.contacts[n].split("\\a");
that.contacts[n] = {};
that.contacts[n].n = contactSplit[0];
that.contacts[n].e = contactSplit[1];
that.contacts[n].u = contactSplit[2];
that.contacts[n].r = contactSplit[3];
that.contacts[n].s = contactSplit[4];
that.contacts[n].f = contactSplit[5];
that.contacts[n].a = contactSplit[6];
that.contacts[n].d = contactSplit[7];
that.contacts[n].y = contactSplit[8];
}
Once this had been speeded up, all they needed to use was the YUI AutoComplete control and voilà - fast client side searches even with massive datasets.
JSONView is a new Firefox extension that gives you a nice way to view your JSON documents (JSONovich also does the trick).
Ben Hollis talks about the extension:
The extension itself is pretty simple. I wasn’t sure how to approach the problem of supporting a new content type for Firefox, so I followed the example of the wmlbrowser extension and implemented a custom nsIStreamConverter. What this means is that I created a new component that tells Firefox “I know how to translate documents of type application/json into HTML”. And that it does - parsing the JSON using the new native JSON support in Firefox 3 (for speed and security) and then constructing an HTML document that it passes along the chain. This seems to work pretty well, though there are some problems - some parts of Firefox forget the original type of the document and treat it as HTML, so “View Page Info” reports “text/html” instead of “application/json”, “Save as…” saves the generated HTML, Firebug sees the generated HTML, etc. Just recently I came across the nsIURLContentListener interface, which might offer a better way of implementing JSONView, but I’m honestly not sure - the Mozilla documentation is pretty sparse and it was hard enough to get as far as I did. I’m hoping some Mozilla gurus can give me some pointers now that it’s out in the open.
The native JSON API is part of the upcoming 3.1 revision of ECMAScript, so we should see it adopted in browsers pretty quickly. It’s also API compatible with json2.js, as you note, so many many web users will get the performance win without apps needing to update.
I suspect that the performance advantage for native JSON is even more pronounced on the encoding side, but I don’t have tests to hand to back it up.
This was a comment by Mike Shaver, VP of Engineering at Mozilla (disclaimer: where I work).
In case you haven’t heard, one of Firefox 3.1’s awesome new features will be native JSON support. This is totally sweet for two reasons:
eval’ing JSON in the browser is unsafe. Using native JSON parsing protects you against possible code execution.
Safely eval’ing JSON with a 3rd party library can be orders of magnitude slower. Native JSON parsing is much faster.
How does native JSON work compared to plain old eval? Simple:
var jsonString = '{"name":"Ryan", "address":"Mountain View, CA"}';
var person = JSON.parse(jsonString);
// 'person' is now a JavaScript object with 2 properties; name and address
Pretty easy huh? And here’s how to get a JSON string from an object:
var personString = JSON.stringify(person);
// 'personString' now holds the string '{"name":"Ryan", "address":"Mountain View, CA"}'
“But wait!”, you say. “How is it safer? How much faster is it compared to eval?”. Ok, I’ll show you.
Gotta love it when browsers implement something that just starts to make apps faster without potentially knowing about it.
An almost non-lossy serialization format for sending XML as JSON (plain text in between elements is ignored). Uses the (element-name, attribute-dictionary, list-of-children) tuple format, which sadly means many common cases end up taking more bytes than the original XML. Still an improvement on serializations that behave differently when a list of children has only one item in it.
Have you seen this path before? Hating on XML and going for JSON, but then trying to JSON as XML and at the end of it you just s/</{/g.
We don't often post about general APIs. I let John Musser handle that on Programmable Web, but this one strikes a chord with me.
Kiva, the distributed micro loan platform, has released a new developer API that gives third parties access to create innovative applications on top of the platform:
The initial release of the Kiva API is concerned with helping you find and present public data from Kiva. We want to help you find things like loans, lenders, and partners and help you represent those in your applications and products. We think there are a lot of really useful things that can be done just by helping the Kiva community find information in different ways or experience it in a new context. Moreover, understanding our core set of data is the most important building block to doing more complex operations in future versions of the API.
I have just started poking at the data available, as I suspect network analysis will be able to help predict rates of return.
When I knew that Skylar Woodward was in on the action I knew something cool was coming. The API so far is typically JSON and RESTful (and XML if you haaave too). I wonder if anyone has written a nice JavaScript wrapper around the API yet?
In our age of information and technology, there isn't as much mystery as there used to be. In that sense, short URLs (e.g., tinyurl.com/123) can be fun! Who knows where you'll wind up.
Some folks aren't as happy with uncertainty in hyperlinking; one of them, Darragh Curran, wrote in to tell us about his project: Long URL Please.
Long URL please (http://www.longurlplease.com) is a JSON webservice to
efficiently convert short urls (tinyurl.com/123) to their originals.
I've got a simple jquery plugin to take advantage of it, and a firefox
plugin. It's running on google app engine.
Darragh hates short URLs so much he's offering to contribute his time to help wipe them off the face of the web:
I'd love to see it used in apps like twhirl/tweetdeck/twitterific, on
microblogging sites and pretty much anywhere that's got lots of short
urls. In that respect I'll happily contribute my time to help those
people/teams integrate with the service.
I like getting data from the web and I love JSON - as it is easy to use. The issue is that not many things on the web come as JSON from the get-go. Hence we need converters. You can use cURL and beautiful soup or roll your own hell of regular expressions. Alternatively you can use Yahoo Pipes to build your converter. Pipes is the bomb but a lot of people complained that there is no version control and that you need to use the very graphical interface to get to your data (which was the point of Pipes but let's not go there).
AlasRejoice for there is a solution now available and it is called YQL. YQL is a SQL-style language to get information from all kind of web services, and - using oAuth - even Yahoo's social graph. There is a test console available for you to get to grips with all the information it gives you access to (which is a lot!):
Here comes the kicker though: for all the open services that don't need authentication you can use these YQL statements as a REST API with JSON output and an optional callback function for JSON-P by adding it to http://query.yahooapis.com/v1/public/yql?. For example to get the latest three headlines from Ajaxian's RSS feed as JSON and wrap it in a function called leechajaxian do the following:
"abstract": "Introducing <b>JSON</b> <b>...</b> <b>JSON</b> (JavaScript Object Notation) is a lightweight data-interchange format. <b>...</b> <b>JSON</b> is a text format that is completely language <b>...</b>",
"title": "<b>JSON</b>",
"url": "http://www.json.org/"
},
{
"abstract": "The <b>JSON</b> format is specified in RFC 4627 by Douglas Crockford. <b>...</b> Although <b>JSON</b> was based on a subset of the JavaScript programming language <b>...</b>",
"title": "<b>JSON</b> - Wikipedia, the free encyclopedia",
"url": "http://en.wikipedia.org/wiki/JSON"
},
{
"abstract": "Matthew Morley has posted his PHP library for <b>JSON</b>-RPC 2.0. <b>...</b> I am think it is great to see the expanding <b>JSON</b> toolset available in JavaScript libraries. <b>...</b>",
"title": "<b>JSON</b>",
"url": "http://www.json.com/"
}
]
}
}
});
What about screenscraping? You can get data from any valid HTML document using XPATH with select * from html. For example to get the first 3 tag links on my blog you can do the following:
The team is working on making this easier - while we run every page that is indexed through tidy there is still a lot of choking going on (if people wrote valid HTML that wouldn't happen).
YQL is a pretty easy but also versatile language. You can even use complex aggregation and filtering by for example hosting a lot of URLs in a spreadsheet and loading them one by one before aggregating. The example given in the console is "select * from rss where url in (select title from atom where url="http://spreadsheets.google.com/feeds/list/pg_T0Mv3iBwIJoc82J1G8aQ/od6/public/basic") and description like "Wall Street" LIMIT 10 | unique(field="title")"
BOSS - Build Your Own Search Service (the your is silent for reasons I cannot tell you as it would endanger the lives of our agents in the field) is a Yahoo! API to access their search index and get the data back either as XML or JSON. Whilst there is ample documentation available it can still be a bit daunting to use the API purely in JavaScript - especially when you want to have several asynchronous calls - for example when you want to search the images, news and web sites for a certain query.
I've had several complaints of Hackers at the Open Hack Day in Brazil about this and wanted to make their lives easier by writing a wrapper for the API.
This wrapper is yboss and it was a big success at the Hack Day with the winning hack in the BOSS category actually being based on it.
To use the wrapper all you need to do is to embed it in your document
Then you have access to the get method of the wrapper which searches all the defined search options with the query you provided. You can define a callback method that will be called when all searches have been successfully performed.
The data is provided either as a JSON object with all the mandatory properties for a BOSS result display or - as shown here - as HTML lists that can be written out via innerHTML.
Ben Lisbakken, an ex-colleague from Google and all round good guy, has created a simple JSONP service (in the vein of json-time and html-whitelist) that calculates the users language based on browser headers:
Sam Ruby has that way about him that sees things very clearly. He just took a peak at jQuery for the first time and was able to really put into words what I think jQuery enthusiasts like about the library:
The notable thing about this is that despite all of the asynchronous events taking place, the code is sequential (nested, but sequential), and that the JSON results of the AJAX call is immediately available to the function that is invoked when the selection changes.
This is based on the following code that he wrote: