<?xml version="1.0" encoding="UTF-8"?><rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
		>
<channel>
	<title>Comments on: YQL execute now allows you to convert scraped data with server side JavaScript</title>
	<atom:link href="http://ajaxian.com/archives/yql-execute-now-allows-you-to-convert-scraped-data-with-server-side-javascript/feed" rel="self" type="application/rss+xml" />
	<link>http://ajaxian.com/archives/yql-execute-now-allows-you-to-convert-scraped-data-with-server-side-javascript</link>
	<description>Cleaning up the web with Ajax</description>
	<lastBuildDate>Thu, 17 May 2012 07:43:39 +0000</lastBuildDate>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.3.2</generator>
	<item>
		<title>By: sh1mmer</title>
		<link>http://ajaxian.com/archives/yql-execute-now-allows-you-to-convert-scraped-data-with-server-side-javascript/comment-page-1#comment-273271</link>
		<dc:creator>sh1mmer</dc:creator>
		<pubDate>Fri, 01 May 2009 06:30:55 +0000</pubDate>
		<guid isPermaLink="false">http://ajaxian.com/?p=6735#comment-273271</guid>
		<description>I looked at getting Sizzle running to do the CSS selectors before we launched YQL Execute and in order to use it you need a DOM. In order to get a DOM in Rhino you need env.js which currently runs to about 8k lines of code. 

This means in order to get Sizzle working you need about 9k lines of JS. CSS2XPath currently weighs in at under 100 lines of code. XPath is natively implemented in Rhino and doesn&#039;t require any additional code. 

So, from the perspective of speed it&#039;s 9k lines of interpretation vs 100 lines, and from the perspective of the execution cycle limits YQL has you can spend them on processing data, not creating a DOM.</description>
		<content:encoded><![CDATA[<p>I looked at getting Sizzle running to do the CSS selectors before we launched YQL Execute and in order to use it you need a DOM. In order to get a DOM in Rhino you need env.js which currently runs to about 8k lines of code. </p>
<p>This means in order to get Sizzle working you need about 9k lines of JS. CSS2XPath currently weighs in at under 100 lines of code. XPath is natively implemented in Rhino and doesn&#8217;t require any additional code. </p>
<p>So, from the perspective of speed it&#8217;s 9k lines of interpretation vs 100 lines, and from the perspective of the execution cycle limits YQL has you can spend them on processing data, not creating a DOM.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: postream</title>
		<link>http://ajaxian.com/archives/yql-execute-now-allows-you-to-convert-scraped-data-with-server-side-javascript/comment-page-1#comment-273269</link>
		<dc:creator>postream</dc:creator>
		<pubDate>Fri, 01 May 2009 00:38:25 +0000</pubDate>
		<guid isPermaLink="false">http://ajaxian.com/?p=6735#comment-273269</guid>
		<description>unfortunatly you can&#039;t safely scrape everything on the web, because there are some conversion quirks

the main problem is that YQL return well-formed XML, but the web is often a mess of both HTML and XHTML (also notice you can only scrape what&#039;s inside the body tag)

look at this sample page I made (valid HTML 4): &lt;a href=&#039;http://www.playquery.it/sandbox/yql/test3.html&#039; rel=&quot;nofollow&quot;&gt;http://www.playquery.it/sandbox/yql/test3.html&lt;/a&gt;

this is &lt;a&gt;how YQL parse it&lt;/a&gt;

some convertion errors:

- some HTML entities are converted to the corresponding character code (nbsp and reg), and some other not (amp,lt,gt)

- an anchor with a &lt;em&gt;name=&quot;top&quot;&lt;/em&gt; now has also an &lt;em&gt;id=&quot;top&quot;&lt;/em&gt;

- the textarea has some whitespaces inside, but in the YQL result is empty

- the table really freaks out (some p tags added, the form on the bottom of the page is put inside a td tag, the table is moved under the main paragraph)

and, as I noted on James blog some days ago (http://james.padolsey.com/javascript/using-yql-with-jsonp/), if you are forced to use the JSONP format instead of the XML is even worse

but, anyway, if you know very well the source of your query, and it&#039;s XHTML well-formed, I think YQL could be really awesome</description>
		<content:encoded><![CDATA[<p>unfortunatly you can&#8217;t safely scrape everything on the web, because there are some conversion quirks</p>
<p>the main problem is that YQL return well-formed XML, but the web is often a mess of both HTML and XHTML (also notice you can only scrape what&#8217;s inside the body tag)</p>
<p>look at this sample page I made (valid HTML 4): <a href='http://www.playquery.it/sandbox/yql/test3.html' rel="nofollow">http://www.playquery.it/sandbox/yql/test3.html</a></p>
<p>this is <a>how YQL parse it</a></p>
<p>some convertion errors:</p>
<p>- some HTML entities are converted to the corresponding character code (nbsp and reg), and some other not (amp,lt,gt)</p>
<p>- an anchor with a <em>name=&#8221;top&#8221;</em> now has also an <em>id=&#8221;top&#8221;</em></p>
<p>- the textarea has some whitespaces inside, but in the YQL result is empty</p>
<p>- the table really freaks out (some p tags added, the form on the bottom of the page is put inside a td tag, the table is moved under the main paragraph)</p>
<p>and, as I noted on James blog some days ago (<a href="http://james.padolsey.com/javascript/using-yql-with-jsonp/" rel="nofollow">http://james.padolsey.com/javascript/using-yql-with-jsonp/</a>), if you are forced to use the JSONP format instead of the XML is even worse</p>
<p>but, anyway, if you know very well the source of your query, and it&#8217;s XHTML well-formed, I think YQL could be really awesome</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: infosage</title>
		<link>http://ajaxian.com/archives/yql-execute-now-allows-you-to-convert-scraped-data-with-server-side-javascript/comment-page-1#comment-273263</link>
		<dc:creator>infosage</dc:creator>
		<pubDate>Thu, 30 Apr 2009 20:47:07 +0000</pubDate>
		<guid isPermaLink="false">http://ajaxian.com/?p=6735#comment-273263</guid>
		<description>@Nosredna

Three words:
YQL honors robots.txt</description>
		<content:encoded><![CDATA[<p>@Nosredna</p>
<p>Three words:<br />
YQL honors robots.txt</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: WebReflection</title>
		<link>http://ajaxian.com/archives/yql-execute-now-allows-you-to-convert-scraped-data-with-server-side-javascript/comment-page-1#comment-273258</link>
		<dc:creator>WebReflection</dc:creator>
		<pubDate>Thu, 30 Apr 2009 18:34:45 +0000</pubDate>
		<guid isPermaLink="false">http://ajaxian.com/?p=6735#comment-273258</guid>
		<description>JonathanT, I partially agree about a better version but I do not get the &quot;should be part of datatables.org&quot; part ... I mean what&#039;s wrong with my or James website/project? I better see a specific one out of whatever box ... what do you think about?</description>
		<content:encoded><![CDATA[<p>JonathanT, I partially agree about a better version but I do not get the &#8220;should be part of datatables.org&#8221; part &#8230; I mean what&#8217;s wrong with my or James website/project? I better see a specific one out of whatever box &#8230; what do you think about?</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: JonathanT</title>
		<link>http://ajaxian.com/archives/yql-execute-now-allows-you-to-convert-scraped-data-with-server-side-javascript/comment-page-1#comment-273256</link>
		<dc:creator>JonathanT</dc:creator>
		<pubDate>Thu, 30 Apr 2009 18:20:00 +0000</pubDate>
		<guid isPermaLink="false">http://ajaxian.com/?p=6735#comment-273256</guid>
		<description>All,

One of the main reasons we made use of James&#039; CSS/xpath converter to show how easy it was to plug in useful JS functions and libraries into a table, to get new functionality that people want in YQL.

Why not create a better CSS selector open data table and submit it to github for others to use and share? The sample ones aren&#039;t part of the community respository (datatables.org) so that seems a good place for a better version to go.

Jonathan</description>
		<content:encoded><![CDATA[<p>All,</p>
<p>One of the main reasons we made use of James&#8217; CSS/xpath converter to show how easy it was to plug in useful JS functions and libraries into a table, to get new functionality that people want in YQL.</p>
<p>Why not create a better CSS selector open data table and submit it to github for others to use and share? The sample ones aren&#8217;t part of the community respository (datatables.org) so that seems a good place for a better version to go.</p>
<p>Jonathan</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: JimmyP22</title>
		<link>http://ajaxian.com/archives/yql-execute-now-allows-you-to-convert-scraped-data-with-server-side-javascript/comment-page-1#comment-273252</link>
		<dc:creator>JimmyP22</dc:creator>
		<pubDate>Thu, 30 Apr 2009 16:49:56 +0000</pubDate>
		<guid isPermaLink="false">http://ajaxian.com/?p=6735#comment-273252</guid>
		<description>@Chris, Awesome work! I recommend using WebReflection&#039;s converter though; as mentioned it&#039;s more complete (and less buggy) than mine.</description>
		<content:encoded><![CDATA[<p>@Chris, Awesome work! I recommend using WebReflection&#8217;s converter though; as mentioned it&#8217;s more complete (and less buggy) than mine.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: WebReflection</title>
		<link>http://ajaxian.com/archives/yql-execute-now-allows-you-to-convert-scraped-data-with-server-side-javascript/comment-page-1#comment-273249</link>
		<dc:creator>WebReflection</dc:creator>
		<pubDate>Thu, 30 Apr 2009 15:48:58 +0000</pubDate>
		<guid isPermaLink="false">http://ajaxian.com/?p=6735#comment-273249</guid>
		<description>yep, tested right now and James Padolsey function is both incomplete and buggy (with results as well) ... James, give me a shout if you read me.</description>
		<content:encoded><![CDATA[<p>yep, tested right now and James Padolsey function is both incomplete and buggy (with results as well) &#8230; James, give me a shout if you read me.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Nosredna</title>
		<link>http://ajaxian.com/archives/yql-execute-now-allows-you-to-convert-scraped-data-with-server-side-javascript/comment-page-1#comment-273248</link>
		<dc:creator>Nosredna</dc:creator>
		<pubDate>Thu, 30 Apr 2009 15:42:31 +0000</pubDate>
		<guid isPermaLink="false">http://ajaxian.com/?p=6735#comment-273248</guid>
		<description>Thanks for the answer Chris. Agreed that it&#039;s always been possible to scrape. It&#039;s the ease of doing it and the indirection through Yahoo! servers that I was thinking of.

The caching is nice.</description>
		<content:encoded><![CDATA[<p>Thanks for the answer Chris. Agreed that it&#8217;s always been possible to scrape. It&#8217;s the ease of doing it and the indirection through Yahoo! servers that I was thinking of.</p>
<p>The caching is nice.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: WebReflection</title>
		<link>http://ajaxian.com/archives/yql-execute-now-allows-you-to-convert-scraped-data-with-server-side-javascript/comment-page-1#comment-273247</link>
		<dc:creator>WebReflection</dc:creator>
		<pubDate>Thu, 30 Apr 2009 15:37:29 +0000</pubDate>
		<guid isPermaLink="false">http://ajaxian.com/?p=6735#comment-273247</guid>
		<description>I did not know James Padolsey function but it seems quite incomplete compared with the one I created for vice-versa.
Here the specific function via experiments and &lt;a href=&quot;http://vice-versa.googlecode.com/svn/trunk/src/experiments.js&quot; rel=&quot;nofollow&quot;&gt;document.query.css2xpath function&lt;/a&gt;
Maybe me and James could collaborate to create a complete and stable function (mine at least pass every CSS selector used in SlickSpeed test ;) )</description>
		<content:encoded><![CDATA[<p>I did not know James Padolsey function but it seems quite incomplete compared with the one I created for vice-versa.<br />
Here the specific function via experiments and <a href="http://vice-versa.googlecode.com/svn/trunk/src/experiments.js" rel="nofollow">document.query.css2xpath function</a><br />
Maybe me and James could collaborate to create a complete and stable function (mine at least pass every CSS selector used in SlickSpeed test ;) )</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Chris Heilmann</title>
		<link>http://ajaxian.com/archives/yql-execute-now-allows-you-to-convert-scraped-data-with-server-side-javascript/comment-page-1#comment-273246</link>
		<dc:creator>Chris Heilmann</dc:creator>
		<pubDate>Thu, 30 Apr 2009 15:30:45 +0000</pubDate>
		<guid isPermaLink="false">http://ajaxian.com/?p=6735#comment-273246</guid>
		<description>@Nosredna YQL access is limited to a cap that would prevent that: 
&lt;blockquote&gt;
YQL has the following API usage restrictions:
Per application limit (identified by your Access Key):
    * 100,000 calls per day.
Per IP limits:
    * /v1/public/* 1000 calls per hour
    * /v1/yql/* 10000 calls per hour
All rates are subject to change. In addition, you may also be subject to the underlying rate limits of other Yahoo and 3rd party web services.
&lt;/blockquote&gt;

However, what prevents me to curl his page every second? I don&#039;t need YQL to scrape people&#039;s pages. What YQL does though is cache the results which actually means less hits for the scraped page.</description>
		<content:encoded><![CDATA[<p>@Nosredna YQL access is limited to a cap that would prevent that: </p>
<blockquote><p>
YQL has the following API usage restrictions:<br />
Per application limit (identified by your Access Key):<br />
    * 100,000 calls per day.<br />
Per IP limits:<br />
    * /v1/public/* 1000 calls per hour<br />
    * /v1/yql/* 10000 calls per hour<br />
All rates are subject to change. In addition, you may also be subject to the underlying rate limits of other Yahoo and 3rd party web services.
</p></blockquote>
<p>However, what prevents me to curl his page every second? I don&#8217;t need YQL to scrape people&#8217;s pages. What YQL does though is cache the results which actually means less hits for the scraped page.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Nosredna</title>
		<link>http://ajaxian.com/archives/yql-execute-now-allows-you-to-convert-scraped-data-with-server-side-javascript/comment-page-1#comment-273245</link>
		<dc:creator>Nosredna</dc:creator>
		<pubDate>Thu, 30 Apr 2009 15:16:46 +0000</pubDate>
		<guid isPermaLink="false">http://ajaxian.com/?p=6735#comment-273245</guid>
		<description>So I can use Yahoo!&#039;s servers to screen scrape anything I want?

That&#039;s terrific. But what prevents abuse, such as huge attacks on some poor guy&#039;s $5 a month data-limited hosted account?</description>
		<content:encoded><![CDATA[<p>So I can use Yahoo!&#8217;s servers to screen scrape anything I want?</p>
<p>That&#8217;s terrific. But what prevents abuse, such as huge attacks on some poor guy&#8217;s $5 a month data-limited hosted account?</p>
]]></content:encoded>
	</item>
</channel>
</rss>

