Friday, December 12th, 2008

YQL – converting the web to JSON with mock SQL

Category: JavaScript, JSON

<p>I like getting data from the web and I love JSON – as it is easy to use. The issue is that not many things on the web come as JSON from the get-go. Hence we need converters. You can use cURL and beautiful soup or roll your own hell of regular expressions. Alternatively you can use Yahoo Pipes to build your converter. Pipes is the bomb but a lot of people complained that there is no version control and that you need to use the very graphical interface to get to your data (which was the point of Pipes but let’s not go there).

AlasRejoice for there is a solution now available and it is called YQL. YQL is a SQL-style language to get information from all kind of web services, and – using oAuth – even Yahoo’s social graph. There is a test console available for you to get to grips with all the information it gives you access to (which is a lot!):

The YQL console

Here comes the kicker though: for all the open services that don’t need authentication you can use these YQL statements as a REST API with JSON output and an optional callback function for JSON-P by adding it to http://query.yahooapis.com/v1/public/yql?. For example to get the latest three headlines from Ajaxian’s RSS feed as JSON and wrap it in a function called leechajaxian do the following:

http://query.yahooapis.com/v1/public/yql?select title from rss where url=”http://feeds.feedburner.com/ajaxian” limit 3

The result is:

javascript
< view plain text >
  1. leechajaxian({
  2.  "query": {
  3.   "count": "3",
  4.   "created": "2008-12-12T09:01:13Z",
  5.   "lang": "en-US",
  6.   "updated": "2008-12-12T09:01:13Z",
  7.   "uri": "http://query.yahooapis.com/v1/yql?q=select+title+from+rss+where+url%3D%22http%3A%2F%2Ffeeds.feedburner.com%2Fajaxian%22+limit+3",
  8.   "diagnostics": {
  9.    "url": {
  10.     "execution-time": "17",
  11.     "content": "http://feeds.feedburner.com/ajaxian"
  12.    },
  13.    "user-time": "22",
  14.    "service-time": "17",
  15.    "build-version": "2008.12.03.14:01"
  16.   },
  17.   "results": {
  18.    "item": [
  19.     {
  20.      "title": "Travians: Sims meets Cultures, with Ajax"
  21.     },
  22.     {
  23.      "title": "Cappuccino 0.6 Released"
  24.     },
  25.     {
  26.      "title": "The fundamental problems with CSS3"
  27.     }
  28.    ]
  29.   }
  30.  }
  31. });

You can also search the web with YQL: http://query.yahooapis.com/v1/public/yql?q=select title,abstract,url from search.web where query=”json” limit 3&format=json&callback=leechajaxian

javascript
< view plain text >
  1. leechajaxian({
  2.  "query": {
  3.   "count": "3",
  4.   "created": "2008-12-12T09:06:41Z",
  5.   "lang": "en-US",
  6.   "updated": "2008-12-12T09:06:41Z",
  7.   "uri": "http://query.yahooapis.com/v1/yql?q=select+title%2Cabstract%2Curl+from+search.web+where+query%3D%22json%22+limit+3",
  8.   "diagnostics": {
  9.    "url": {
  10.     "execution-time": "43",
  11.     "content": "http://boss.yahooapis.com/ysearch/web/v1/json?count=10&start=0&format=xml"
  12.    },
  13.    "user-time": "45",
  14.    "service-time": "43",
  15.    "build-version": "2008.12.03.14:01"
  16.   },
  17.   "results": {
  18.    "result": [
  19.     {
  20.      "abstract": "Introducing <b>JSON</b> <b>...</b> <b>JSON</b> (JavaScript Object Notation) is a lightweight data-interchange format. <b>...</b> <b>JSON</b> is a text format that is completely language <b>...</b>",
  21.      "title": "<b>JSON</b>",
  22.      "url": "http://www.json.org/"
  23.     },
  24.     {
  25.      "abstract": "The <b>JSON</b> format is specified in RFC 4627 by Douglas Crockford. <b>...</b> Although <b>JSON</b> was based on a subset of the JavaScript programming language <b>...</b>",
  26.      "title": "<b>JSON</b> - Wikipedia, the free encyclopedia",
  27.      "url": "http://en.wikipedia.org/wiki/JSON"
  28.     },
  29.     {
  30.      "abstract": "Matthew Morley has posted his PHP library for <b>JSON</b>-RPC 2.0. <b>...</b> I am think it is great to see the expanding <b>JSON</b> toolset available in JavaScript libraries. <b>...</b>",
  31.      "title": "<b>JSON</b>",
  32.      "url": "http://www.json.com/"
  33.     }
  34.    ]
  35.   }
  36.  }
  37. });

What about screenscraping? You can get data from any valid HTML document using XPATH with select * from html. For example to get the first 3 tag links on my blog you can do the following:

http://query.yahooapis.com/v1/public/yql?q=select * from html where url=”http://wait-till-i.com” and xpath=’//a[@rel="tag"]‘ limit 3&format=json&callback=leechajaxian

The team is working on making this easier – while we run every page that is indexed through tidy there is still a lot of choking going on (if people wrote valid HTML that wouldn’t happen).

YQL is a pretty easy but also versatile language. You can even use complex aggregation and filtering by for example hosting a lot of URLs in a spreadsheet and loading them one by one before aggregating. The example given in the console is “select * from rss where url in (select title from atom where url=”http://spreadsheets.google.com/feeds/list/pg_T0Mv3iBwIJoc82J1G8aQ/od6/public/basic”) and description like “Wall Street” LIMIT 10 | unique(field=”title”)”

Happy converting!

Posted by Chris Heilmann at 4:20 pm
15 Comments

++++-
4.3 rating from 52 votes

15 Comments »

Comments feed TrackBack URI

holy ****! Very forward thinking of the Y team. It’s clear they’ve been reading up on semantic querying technologies like SparQL.

My first impression is “awesome”.

Comment by ilazarte — December 12, 2008

YQL is very easy to use; here’s a prototype that supplies the missing del.icio.us full-text search API:

http://kentbrewster.com/delicious-search-exploder/

Comment by kentbrew — December 12, 2008

“Alas, there is a solution now available and it is called YQL. ”
.
Alas?
.
Why “alas?” What’s wrong with YQL that it merits an “alas?”

Comment by Nosredna — December 12, 2008

Now this is true evangelism. Way to go , @codepo8 !

Comment by vvladescu — December 12, 2008

Awesome! I think YQL is one of the better things to come out of Yahoo lately. Using the familiar SQL syntax instantly makes it more widely available to so many more programmers. This is why I also like JQuery (no offence to YUI!), which uses CSS-esque selectors. It makes it seem so familiar while learning it.

Comment by davidcalhoun — December 13, 2008

This is very similar to http://isixml.sourceforge.net/

Comment by NzO — December 13, 2008

Wow, this could be huge, especially if content providers adopt it and design their site such that it can be queried that way. I didn’t notice anywhere where it could be downloaded, does this mean a round trip through Yahoo to get the data. This is going to make it very easy to query their extensive finance site. Whoopee, No more scraping.

Comment by paulhan — December 13, 2008

“I like getting data from the web and I love JSON”

You need to get out more ;)

Comment by EdSpencer — December 13, 2008

So wait is this able to convert XML into JSON? Thats the big thing I rather be working with JSON than XML.

Comment by cnizz — December 13, 2008

@paulhan yes, there is also the idea of offering data providers to give us a schema/table info to be easily indexable without scraping. And yes, it will mean a conversion trip through Yahoo.
@cnizz exactly that – but this is only the tip of the iceberg of what you can do with it. on my blog I showed for example how to do an image search and show the pictures with a few lines of JavaScript.

Comment by Chris Heilmann — December 14, 2008

@EdSpencer beats waiting through commercial breaks for the next shortened episode of some TV show. We all waste our lives the way we choose to. :)

Comment by Chris Heilmann — December 14, 2008

This looks awesome and could be soon the next huge thing!

Comment by DamirSecki — December 14, 2008

@Chris Heilmann I was kidding, I love those things too ;) More seriously though, this looks awesome

Comment by EdSpencer — December 15, 2008

@Chris Heilmann: Is there a reason not to use JSONQuery/JSONPath (http://www.sitepen.com/blog/2008/07/16/jsonquery-data-querying-beyond-jsonpath/), since it actually the query language designed for JSON? It seems like it would have been the obvious choice. SQL is fraught with conceptually impedances when dealing with data structures like JSON that are vastly different than relational tables. The SQL paradigm is sufficient for very simple structures, but becomes incredibly difficult for more complex JSON structures, which is why we have abandoned it in favor of JSONQuery. SQL is simply not designed for JSON, and JSONQuery is. Plus SQL is an extremely dangerous on the web from a security perspective, anyone that wants to implement this query language (outside of Yahoo) faces terrible complexity in trying to maintain security.

Comment by kriszyp — December 15, 2008

@kriszyp: The language is only SQL like from the syntax. YQL operates on any structured object we can pull from the web (in same way Pipes does). We use a json dot-style syntax for addressing and projecting parts of the items coming back (its not rows, but structured entities). There are many differences to JSONQuery, most notably the ability to join between disparate services, and perform a lot of transformation away from the interpreter.

The docs have more on this http://developer.yahoo.com/yql/guide

Comment by JonathanT — December 15, 2008

Leave a comment

You must be logged in to post a comment.