Wednesday, April 28th, 2010

Telling robots about your crawl-able Ajax apps

Category: Ajax, Google

Weston Ruter wants to talk to the search robots out there and tell them about the URL format for crawling Ajax apps.

Google came out with a spec for doing this with hash bang URLs such as:!portfolio/interactive/.

What if we could tell Google and others something like:

  1. <meta name="crawlable-fragment-prefix" content="/">

and Google would grok and convert it to

Want flexibility? Or want simplicity and think we should all just hash bang away?

Posted by Dion Almaer at 12:37 am

3.3 rating from 51 votes


Comments feed TrackBack URI

What if we make websites that don’t necessarily need AJAX to work?
I just made this ( if JS is enabled it will display links like /en/hotel/#/restaurant/ (the user started from ‘hotel’ and now is on ‘restaurant’) or just /en/restaurant/ if JS is disabled or if the user starts from there, much like how Facebook works.

It’s really not much harder, I store the different pages in different files and include(php) or get(ajax) them as necessary.

Comment by bfred — April 28, 2010

I think we also need the ability to customize the “_escaped_fragment=” convention that Google changes the Ajax URL’s to. What if my web app already uses that parameter name for something else?

Comment by getify — April 28, 2010

I agree with igstan and (advice from Google) that this isn’t necessary if you are already using progressive enhancement and are intentional about your site and application design.

The primary benefit to this scheme is that Google can now link directly to your hashed URL in the search results.

It still feels like a dirty hack compared to progressive enhancement, though. And having ‘#!’ and ‘_escaped_fragment_’ hard-coded into the spec is a poor design decision. I’m not rushing to ‘hash-bang away’ as i anticipate (or at least hope) that search engine will provide a more flexible alternative in the future.

Comment by ndluthier — April 28, 2010

The upside to Google’s approach is that when someone does a Google search they will see the AJAX-friendly, hash-tastic URL in the search results, instead of what they see now with progressively enhanced sites: the non-JavaScript URL. That’s huge!

The downside to Google’s approach is that if you can’t use their URI decorations and query string param, AJAX crawling won’t work at all, which can be a problem for already-existing web apps and JavaScript history libraries. That sucks!

Being able to specify Google AJAX crawling parameters via META tags will allow more flexibility and backwards-compatibility!

Comment by benalman — April 28, 2010

No no, progressive enhancement with AJAX does not have separate links for JS and non-JS versions, the links are exactly the same. If JS is enabled the app acts just as it should, utilizing (“hijacking” a la Jeremey Keith) parameters on the query string which non-JS enabled clients would otherwise use to render data with the back-end. PE is the way to go.

Comment by bbobek — April 28, 2010

@bbobek, et all… The biggest issue with progressive enhancement, which I like, is when someone posts a link, with the hash into twitter, etc. What google sees via the twitter link isn’t the content that the person posting to twitter/facebook for someone to see. This gives a consistent means of handling those cases.

The problem with this is consistently dealing with the use of the hash-bang, and google-esque responses. I’d say convention is better over configuration in this case. One *could* go so far, as if the expected response is application/json vs text/html, your app returns json for the client-side scripts, using the same end points that the web crawler (google) uses under the covers.

Comment by tracker1 — April 28, 2010

Those arguing for progressive enhancement are missing the point. Googlebot is not JS-enabled, and it won’t see what’s special about those enhanced URLs and so your users won’t see what’s special about your page when searching on Google.

For example, Facebook “speeds up” loading (not sure about this) by not re-loading everything on each click but rather putting the new URL in the hash of the current URL. That’s fine for browsers, and if one of your users shares a link in a forum then that link will work fine for people clicking through it. But since Googlebot doesn’t execute the JS, when it follows the link it will see something else and mis-categorize your content. For example, on Facebook you might share this link:!/chessplay?v=wall
…but Googlebot will not run the JS and so it will see this:

Which is a very different page. In this example, the pages are thematically similar, but that doesn’t have to be the case. This is a tech solution for search results, not ajaxy goodness.

Comment by quixote218 — April 28, 2010

Please note that the Ajaxian linkifier failed to properly linkify the hash in my first sample link, and I didn’t use HTML to help it. You’ll need to cut-and-paste to see my point.

Comment by quixote218 — April 28, 2010

Progressive enhancement does not require a hash-bang convention (unless you need to track history for browser forward/back buttons), you can achieve the exact same effect — crawlable urls that deep link into apps — with simple query string parameters. Paste them wherever, they’re not going to change.

Comment by bbobek — April 28, 2010

Yes, everyone arguing that progressive enhancement is enough really needs to read the Google spec to grok why this is important and different. The spec allows crawlers to index html content using the PE url.

By using the #! you are telling googlebot to reference the non-enhanced page whenever it encounters an enhanced url in the wild.

Comment by chsnyder — April 29, 2010

Leave a comment

You must be logged in to post a comment.