Monday, July 18th, 2005

Making sure that crawlers like Google can grok your Ajaxian apps

Category: Editorial

Some of the concerns that you have when you build an Ajaxian application revolve around usability, and accessability.

One item that I worry about is how crawlers like Google can keep up with the changes.

Google is used to thinking that one URL == one page to index. These days that isn’t always the case. You would have an Ajaxian application that has a whole set of actions within one “page”. How is Google going to grok that?

One simple example revolves around showing/hiding divs.

Take Eric’s page which allows you to click on either a JSP or VB.NET button, which shows two views of the same idea:

div-changing.jpg

The page defaults to vb.net. Let’s say that Google indexes both copies since they just happen to be HTML in divs. If you do a search for some of the JSP code, you will get access to the main URL, which will show the default VB.NET!

Ideally, you would be able to markup the page/application, so Google would know to go to http://domain/path/to/page?show=jsp, and the application itself would look for that, and show the JSP code instead of the default.

One of the GOOD things about Ajax, is that it consists of HTML which is easy for crawlers to index (vs. Flash for example), but we need to come up with practices and ideas on how to get behaviour that makes sense.

Posted by Dion Almaer at 2:10 am
7 Comments

+++--
3.3 rating from 6 votes

7 Comments »

Comments feed

I guess you could put the real content at separate URLs in their own XHTML documents, structured in such a way that if you go to them directly the right thing will happen from a browser (e.g. redirect to the main page with a query argument at worst case). Load the content into the main page from these separate URLs via XMLHTTP. Have plain old links to the real content from the main page, probably styled such that they don’t show up in a browser.

Comment by Bob Ippolito — July 18, 2005

Hi!

Isn’t this typically information that need not make use of Ajax? I think that pages that contain different information should be put in unique pages. Both for usability, ability to bookmark and convenience. And – then there will not be any problems with search-engines.

I see ajax as a tool not for making a page from one thing to another, but to do operations within a page that serves one special purpose.

There might be situations where this may pose problems, but I don’t think this exemplifies that.

Comment by Geir Berset — July 18, 2005

We’ve just published an 8-page technical article on search engine optimization for Single Page Interfaces & AJAX. It might be of interest to anyone publishing a content-rich website using a single URL, although we sometimes refer to the Backbase AJAX Engine in specific.

Download the PDF here (or click on my name below this comment): http://www.backbase.com/download/articles/DesigningRIAsForSearchEngineAccessibility.pdf

On our own website (www.backbase.com) we’re using an XSLT-based content management system and it took us only a couple of hours to publish a static site that is optimized for indexing by search engines. This works similar to the solution Bob suggested in the comment above (details are in the PDF).

Jep Castelein
Backbase

Comment by Jep Castelein — July 18, 2005

I think Google Sitemaps (http://www.google.com/webmasters/sitemaps/) could be handy with AJAX. It lets you list all the URLs at a site to crawl.

Comment by Eric Wettstein — July 18, 2005

Creating unique URLs with Ajax …

http://www.contentwithstyle.co.uk/Articles/38/fixing-the-back-button-and-enabling-bookmarking-for-ajax-apps

The problem here is it relies on anchors (#) specifically so the page isn’t reloaded, and most robots probably won’t follow anchors. However, you could make it flexible, so that it relies on # and something else in the url, eg (?).

Then, you can set up a site map with all the possible URLs that you’d want indexed.

Comment by Michael Mahemoff — July 18, 2005

Interesting problem.

I haven’t seen an AJAX-powered blog yet, but that would be a great testing ground for this.
It seems that supporting indexing would be the same as supporting non-javascript browsers.

For example, the links would have default, non-javascript hrefs, but javascript handlers. This way, regular browsers get the rich experience and indexers and non-javascript browsers get an old-style experience.

The question is what framework can make it easy to write a site like that, that support both partial/delta updates of pages (AJAX) and regular/full reloads of pages.

Comment by Julien Couvreur — July 18, 2005

Thanks for picking on me! lol. The one flaw I see in a lot of Ajax based apps. is how in the world do I get back here. I really do not want to follow those 10 steps to get here again. Searches are famous for this. (At least the default for that blog without JavaScript is to show both versions of the source code! lol! I was going to set it up to show both by default, but I changed it to what you see now. At least the code I am posting tomorrow is only .NET!!)

Comment by Eric Pascarello — July 18, 2005

Leave a comment

You must be logged in to post a comment.