Friday, October 17th, 2008

MAMA, who is using Web standards?

Category: Opera

>Ars discussed the new Opera initiative MAMA, the study that only 4.13% of the web is standards-compliant whatever that means :)

It is cool to see Opera doing this kind of work, with “Metadata Analysis and Mining Application (MAMA), a tool that crawls the web and indexes the markup and scripting data from approximately 3.5 million pages.”

Ian Hickson did some great work using the Google index to look at how developers use HTML, which lead to a lot of the HTML 5 features.

There were a couple of reports:

The URL set

MAMA’s analysis total is a mere fraction of even a single percent of such a daunting total. It seems odd to say that 3.5 million of anything is insignificant. So let us assume for a moment that it is not. We are just not able to look at every Web page, so we must choose a smaller group of URLs to look at and justify that this is representative of the whole Web. One option is to choose a set of URLs selected at random. I had some conversations with Rene Saarsoo (author of an excellent previous study on coding practices), and he brought up many excellent points about the structure of the Web and choices in URL sets—some of which I have tried to paraphrase here.

Markup Validation Report

Web standards are good for the Web! Most of the readers of this site will understand why this statement holds true—ease of maintenance, cross platform compatibility, access by people with disabilities, the list goes on!

But how does the reality of the Web hold up to these ideals? Surely with so many good reasons to code using open Web standards, the majority of sites should validate? Not so—Opera’s MAMA project has gathered a lot of quite shocking statistics showing that very few of the sites surveyed actually exhibit markup that validates.

Key Findings Report

Analysis showing:

  • Web servers used: Apache: 67.72%, IIS: 25.91%
  • Document structure and size
  • Flash detection
  • CSS styles used
  • Scripting
  • XMLHttpRequest object detection

Of course, it is easy to point to some potential flaws. Looking for “XMLHttpRequest” on a page doesn’t account for today’s reality on how XHR is used for example.

Doron Rosenberg pointed to some of these issues:

While it is no surprise that very few websites are completely standards compliant, their methodology seems flawed. Websites are doing more and more dynamic JavaScript stuff after page load which could affect such stats. Also, major JS libs are linked in via CDNs, so it is unclear how useful their XMLHttpRequest usage stat is.

But I can totally verify that chinese websites love flash. I still have nightmares from the AOL China gecko testing days. Flashing, scrolling, floating ads are scary.

That being said, it is great to see some data out there, and to give us a place to communicate. I would love to see more of this from Google, Microsoft, Yahoo!, and any provider that has a nice index of the Web.

Related Content:

8 Comments »

Comments feed TrackBack URI

But on the same note, look at how many web sites are using DOCTYPES compared to how many use to.

Comment by TNO — October 17, 2008

Using a doctype means you have fewer browser bugs to worry about. I suspect for most web designers it has less to do with a desire to be standards-compliant than with a desire to get pages built faster.

Comment by Joeri — October 17, 2008

Quite hard issue. Web is not “standarized” yet, and when it does we’ll have another large, hard step: make it strict/x[ht]ml. THis is the tecnology we have in hands, is the one I’m trying hard to follow, but takes much more time and knowledge wich explains why web is not “fixed” yet. Besides all the benefits we know, will people really follow that, 100% in next years ? Quirks HTML was so easy to beginers to learn, and xHTML have so many tricks… I wish all this was designed in some other fashion.

Comment by blagus — October 17, 2008

I think this is great news and shows that Opera (and especially Håkon) is truly dedicated to following standards and evangelizing them. Standards holds a value all by themselves, and this helps us spread that value…

Comment by ThomasHansen — October 17, 2008

Am I the only one thinking the huge discrepancy between those figures (68% for Apache and 26% for IIS) and Netcraft figures (50% Apache 35% IIS) make the whole thing highly suspicious?

Comment by BertrandLeRoy — October 17, 2008

(I mean they do mention the Netcraft data but I didn’t find any explanation of the difference)

Comment by BertrandLeRoy — October 17, 2008

One thing I think would be handy and one of the big search vendors would likely have to do it is to generate a sample set of web pages that were created since, say 2004 and modified in the last year. The study likely included some pages that have existed for a decade or more. I’d like to see this kind of analysis on pages created in the era of ie6+, ff 1.5+, webkit 0+.

Comment by tack — October 17, 2008

@BertrandLeRoy
There are many ways to count web-server usage, I know some studies are counting also parked domains. And when GoDaddy switched from Apache to IIS for parked domains (this is a fact) the studies which used also parked domains as a basis for their counting had an increase in IIS usage with more than 10-15 percents! While the same decrease in Apache usage…
.
Many people are speculating today if that was conscious (by your employer and GoDaddy) to make the world believe that IIS was in fact more used than it actually was at that point to foster adoption…
.
Though for a guy with “Tales from the evil empire” as the name of his blog, this is probably not news… ;)

Comment by ThomasHansen — October 18, 2008

Leave a comment

You must be logged in to post a comment.