Friday, May 22nd, 2009p>Content extraction is still a hot topic on the web. We have lots of great text content but not much clue as to what those texts are. To make it more obvious we do term extraction for tagging but also geo location extraction for giving the text some spacial reference.
A fairly new web service that does this for us is Yahoo's Placemaker. What it does is analyze a text (or the document defined by an HTML or feed URL) and give you back all the geographical locations that are mentioned in it. Pretty awesome, but the problem is that the API only allows for POST values and has either XML or RSS output. This means you can't do it in simple XHR because of the cross-domain problem and you can't use generated script nodes as there is no JSON output. You'd have to use a server-side proxy service. This is pretty easy with PHP and cURL as explained in this blog post but can be annoying, too.
Analyzing a text using JS-Placemaker is as easy as this:
Placemaker.getPlaces('Hi I am Chris, I live in London. Originally I am from Germany',
The console output is an object or an array of places the service returned from the text:
The first parameter is the text you want to analyze (this could be a pointer to the innerHTML of a DOM element, for example), the second is the callback function and the third the locale of the text - the demo page shows that Placemaker groks several languages.
select * from geo.placemaker where documentURL="http://slashdot.org" and documentType="text/html" and appid="...the app id..."
Have a Play with the YQL console using the Open Table, but better get your own AppID, before this one exceeds the daily limits.
Posted by Chris Heilmann at 9:28 am