Friday, October 17th, 2008
Ars discussed the new Opera initiative MAMA, the study that only 4.13% of the web is standards-compliant whatever that means :)
It is cool to see Opera doing this kind of work, with “Metadata Analysis and Mining Application (MAMA), a tool that crawls the web and indexes the markup and scripting data from approximately 3.5 million pages.”
Ian Hickson did some great work using the Google index to look at how developers use HTML, which lead to a lot of the HTML 5 features.
There were a couple of reports:
MAMA’s analysis total is a mere fraction of even a single percent of such a daunting total. It seems odd to say that 3.5 million of anything is insignificant. So let us assume for a moment that it is not. We are just not able to look at every Web page, so we must choose a smaller group of URLs to look at and justify that this is representative of the whole Web. One option is to choose a set of URLs selected at random. I had some conversations with Rene Saarsoo (author of an excellent previous study on coding practices), and he brought up many excellent points about the structure of the Web and choices in URL sets—some of which I have tried to paraphrase here.
Web standards are good for the Web! Most of the readers of this site will understand why this statement holds true—ease of maintenance, cross platform compatibility, access by people with disabilities, the list goes on!
But how does the reality of the Web hold up to these ideals? Surely with so many good reasons to code using open Web standards, the majority of sites should validate? Not so—Opera’s MAMA project has gathered a lot of quite shocking statistics showing that very few of the sites surveyed actually exhibit markup that validates.
- Web servers used: Apache: 67.72%, IIS: 25.91%
- Document structure and size
- Flash detection
- CSS styles used
- XMLHttpRequest object detection
Of course, it is easy to point to some potential flaws. Looking for “XMLHttpRequest” on a page doesn’t account for today’s reality on how XHR is used for example.
Doron Rosenberg pointed to some of these issues:
But I can totally verify that chinese websites love flash. I still have nightmares from the AOL China gecko testing days. Flashing, scrolling, floating ads are scary.
That being said, it is great to see some data out there, and to give us a place to communicate. I would love to see more of this from Google, Microsoft, Yahoo!, and any provider that has a nice index of the Web.
Posted by Dion Almaer at 7:17 am