Monday, May 11th, 2009
Ian Hickson has chatted about an addition to HTML5, “microdata”:
Annotate structured data that HTML has no semantics for, and which nobody has annotated before, and may never again, for private use or use in a small self-contained community.
He goes on to detail a number of scenarios such as this subset:
- A group of users want to mark up their iguana collections so that they can write a script that collates all their collections and presents them in a uniform fashion.
- A scholar and teacher wants other scholars (and potentially students) to be able to easily extract information about what he teaches to add it to their custom applications.
- The list of specifications produced by W3C, for example, and various lists of translations, are produced by scraping source pages and outputting the result. This is brittle. It would be easier if the data was unambiguously obtainable from the source pages. This is a custom set of properties, specific to this community.
and then shows how one could take:
- <p>Hedral is a male american domestic shorthair, with a fluffy black
- fur with white paws and belly.</p>
- <img src="hedral.jpeg" alt="" title="Hedral, age 18 months"
Cat name: "Hedral" Description: "Hedral is a male american domestic shorthair, with afluffy black fur with white paws and belly." Image: "http://example.org/hedral.jpeg"
Here is where the fun begins as Ian walks through the issues with the microformat-esque approach, namely the overloaded “class”:
there is no way for a parser to know which classes are properties of cats and which are just for styling (e.g. ‘photo’ used in this example).
I have to admit that I think the baby could be thrown out with the bathwater here. I would hate to see
class="cat" type="cat" for example!
Many iterations on, and we see:
- Page 1:
- <section item="com.damowmow.cat">
- <h1 property="com.damowmow.name">Hedral</h1>
- <p property="com.damowmow.desc">Hedral is a male american domestic
- shorthair, with a fluffy black fur with white paws and belly.</p>
- <img property="com.damowmow.img" src="hedral.jpeg" alt="" title="Hedral, age 18 months"/>
- Page 2:
- <body item="com.damowmow.cat">
- <p>I love my cats. My oldest cat is <span property="com.damowmow.name">Silver</span>. <span property="com.damowmow.desc">Silver is <span property="com.damowmow.age">11</span> years old and refuses to eat
- alone, always waiting for either Yellow or Blue to eat with
- Page 3:
- <h2>My Cats</h2><h2>
- <dd item="com.damowmow.cat">
- <meta property="com.damowmow.name" content="Schrödinger">
- </meta><meta property="com.damowmow.age" content="9">
- <p property="com.damowmow.desc">Orange male.
- <dd item="com.damowmow.cat">
- <meta property="com.damowmow.name" content="Lord Erwin">
- </meta><meta property="com.damowmow.age" content="3">
- <p property="com.damowmow.desc">Siamese color-point.
- <img property="com.damowmow.img" alt="" src="/images/erwin.jpeg"/>
I don’t miss the
com.* world of Java. I hate the verboseness. It looks so ugly to compare “com.mycompany.foo.cat” to “cat”. Is it just me?
Hixie then concludes:
To address this use case and its scenarios, I’ve added to HTML5 a simple
syntax (three new attributes) based on RDFa. It doesn’t have the full
power of RDF, because that didn’t seem to be necessary to address the use
cases. It doesn’t really have anything in common with Microformats; I
didn’t find the Microformats syntax to be very convenient. (This was also
the experience with eRDF.)
I expect the syntax will need adjustments over the coming weeks to address
issues that I overlooked. I look forward to such feedback.
@hixie @kidehen URLs are useful, as they resolve. all else is stamp collecting.
@kidehen In practice few people really understand the subtlties of URN vs URI vs IRI vs URL vs Web Address vs Hypertext Reference vs…
All this crap about HTML5 “gatekeepers” is hiLARious. For 6 years, they BEGGED the W3C to work on HTML+1. The W3C said no.
Posted by Dion Almaer at 6:13 am