Monday, July 14th, 2008

HTML 5 data- elements to store private values

Category: HTML

John Resig goes into more detail on the HTML 5 data- elements that gives developers a valid place to store metadata:

This allows you to write valid HTML markup (passing an HTML 5 validator) while, simultaneously, embedding data within your page. A quick example:

  1. <li class="user" data-name="John Resig" data-city="Boston"
  2.     data-lang="js" data-food="Bacon">
  3.   <b>John says:</b> <span>Hello, how are you?</span>
  4. </li>

Which you can get at via some simple code:

javascript

  1. var user = document.getElementsByTagName("li")[0];
  2. var pos = 0, span = user.getElementsByTagName("span")[0];
  3.  
  4. var phrases = [
  5.   {name: "city", prefix: "I am from "},
  6.   {name: "food", prefix: "I like to eat "},
  7.   {name: "lang", prefix: "I like to program in "}
  8. ];
  9.  
  10. user.addEventListener( "click", function(){
  11.   var phrase = phrases[ pos++ ];
  12.   // Use the .dataset property
  13.   span.innerHTML = phrase.prefix + user.dataset[ phrase.name ];
  14.  
  15.   // OR, to work with old browsers
  16.   //span.innerHTML = phrase.prefix + user.getAttribute("data-" + phrase.name );
  17. }, false);

Using data- is a very practical solution, but people in the comments of John’s post would much prefer a more “pure” solution. There are the “Just use XML namspaces” crowd, and the “define the data outside of the page” group, and the “use an XML island” folk. What are you?

Posted by Dion Almaer at 7:45 am
11 Comments

+++--
3.8 rating from 22 votes

11 Comments »

Comments feed TrackBack URI

Strictly speaking, it’s a “data attribute” rather than a “data element”.

Comment by skippyK — July 14, 2008

Well yes, the problem is, that data-anything has semanticallyno meaning.

for example, what happens, if the page is somehow auto-generated, and there are two sources, both of them containing a “city” attribute?Like one is a “person card” (profile), meaning the place of living, the other is a geographic location, meaning something different.

When speaking about programs, one can easily emulate at least part of namespaces by using underscores or similiar, like: org_php_classname or such.

But this is a document. It means it can contain data from various, undefined sources, and as we have seen with the collosions which happened between ajax libraries, these data tags can be as easily mixed.

So, I’d vote for namespaces, and perhaps something easier than DOM to reach them. but that’s only my two cents.

Comment by Adam Nemeth — July 14, 2008

I would argue for the use of an external XML/Json data service – this allows extreme flexibility for developers that might not want/have direct interaction with the DOM. In the case of a standalone web page, I would say a seperate XML data island is an acceptable use – inline data attributes seems to needlessly clutter the DOM.

The above meta-data example just seems out of place. Why not just have an XML document, and use server-side or client-side XSLT to transform it to HTML compliant markup?

Comment by matanlurey — July 14, 2008

Oh, and for the record: I’m a purist, but I’m down with this. There’s absolutely no reason why the data contents shouldn’t have semantic meaning, they just don’t have semantics that are easily or elegantly expressible within the constraints of the existing elements and attributes. Granted, the data attributes can be used to store entirely inappropriate data, but so can elements. Semantic meaning doesn’t come from HTML alone, it comes from HTML used properly by a knowledgeable author.

ANd in some cases, even though it may hold non-meaningful data (that is, non-meaningful outside the context of javascript or something like that), the real world demands that it be there. Given the alternatives (classname abuse, extra markup, extraneous scripting, etc.) this is probably the least of many evils.

Comment by skippyK — July 14, 2008

I like the “date-” attribute idea so much, I’m using it in production applications already. Often the simplest solution to a markup/javascript problem is a custom attribute. This gives us a way to tuck away those attributes to ensure they never conflict with legit ones.

Comment by veselosky — July 14, 2008

I confess I’m an xsl junkie, so I’m firmly in the namespace camp. I agree that this may not be semantically meaningful, and I would say that namespacing attributes are the shortest route to making them meaningful.

Comment by ashooner — July 14, 2008

“Well yes, the problem is, that data-anything has semanticallyno meaning.”
.
Because div and span mean so much. You know, those elements you use for just about everything because there isn’t a semantic element for 99% of what you do? Much less the abused ul/ol, p, br (!) and so on. And b/i which seem to survive just about anything for no apparent reason. Or the use of alt as title. Or h1-6, which aren’t assigned a hierarchical value and have no restriction on their order or placement in the structure of a document.
.
In fact, I’d argue that semantics-agnostic is the way to go for this. Just as it’s the way to go for div/span.
.
“for example, what happens, if the page is somehow auto-generated, and there are two sources, both of them containing a “city” attribute?Like one is a “person card” (profile), meaning the place of living, the other is a geographic location, meaning something different.”
.
The horror! So the attribute doesn’t, then, have an immutable meaning, but a contextual one. This allows authors/developers to define the semantics of the attribute. What is wrong with that?
.
And actually, that’s a pretty poor example too. Both are a “geographic location”, each just happens to, also, correspond with (shocking, I know) their respective contexts. Isn’t that, um, correct?
.
“But this is a document. It means it can contain data from various, undefined sources, and as we have seen with the collosions which happened between ajax libraries, these data tags can be as easily mixed.”
.
So do like all of us do and mix an at-least-healthy skepticism of any external content/data with some efforts to avoid collision like… prefixes. And if you allow scripts from an external source to run, don’t include sensitive data. I don’t understand why these rules are suddenly out the window the moment a common and correct-by-the-spirit-of-the-standard* practice like custom attributes becomes formalized.
.
“and perhaps something easier than DOM to reach them”
.
Because it’s so hard. Especially if you use a selector library, which, um, if you’re not doing, and haven’t written your own… you’re doing it wrong.
.
* And here I mean, custom attributes, insofar as they are actually metadata for the tag structure to which they’re attached, belong just as much as standard attributes which are… metadata for the tag structure to which they’re attached.
.
* * *
.
“The above meta-data example just seems out of place. Why not just have an XML document, and use server-side or client-side XSLT to transform it to HTML compliant markup?”
.
Because that seems more out of place? Look, for those of us who actually do write or template HTML and serve it as HTML… the HTML document is the correct place for the data it contains. Adding an abstraction there makes HTML a meaningless middleman. HTML is a document format that is meant to structure data.
.
* * *
.
This is a great aspect of HTML5. Arbitrary attributes, used properly, can greatly enhance a web page or application. Used properly.

Comment by eyelidlessness — July 14, 2008

Im already using the data-* set of attributes and I must say that this solution couldn’t really be simpler for what it is intended. I hate having to add different class names.. thats really non semantic

I see data-* attributes like perfectly semantic ones, since they represent exactly what they are meant to: data. The fact that they are freely extensible by the developer make them suitable for a lot of situations that even a cluttered xml data island could not solve easily.

I think simplicity is essential for conventions and standards to be adopted. If we insist in more “correct” solutions to the problem, then what will happen is that few people will use them, simply because they are not easy to understand (consider the time it could take to explain someone what is an xml data island vs. the time to explain the attribute and its correct usage).

Simple things like “innerHTML” DOM property for example, helps us in accelerating the development process, while still allowing us to do things correctly in spite of the existence of some “standard” procedure that (i would say) is sometimes more oriented towards the browsers vendors than towards real world developers. Dont get me wrong, I really love standars.. but I hate innecesary clutter.. just make the simplest possible thing the standard thing!

Comment by sideral — July 14, 2008

I’m with the namespace camp (yes, I actually do like XML ;-) ). It makes perfect sense to me to draw a line between markup and data this way.

It would be sufficient to specify that browsers should automatically ignore (I mean not display) anything from elements with foreign namespaces, only allow full DOM access — no new attributes or elements are necessary…

Comment by danielkvasnicka — July 15, 2008

@eyelidlessness:
.
divs usually have a “class” or “id” attribute. You can think of divs as named and unnamed array-like structures, some of them will won’t be named probably, but most of them have a meaning in modern web development, even if not noted.
.
I wouldn’t go into details that a “paragraph” has a very well defined meaning in writing, I think it even has its own wikipedia page, and that “this is an unordered list of elements” also has meanings. I also won’t try to argue that when writing text, it’s more practical to use b or i, than span class=”emphasizedforquicklook” or class=”personname”.
.
The thing is that, currently, these auto-generated things, like ext and dojo widgets use their own namespace: ext-something and dojo-something. I also won’t try to argue, that a namespace isn’t necessarily something with a : or that “javascript doesn’t have namespaces”. Namespace is a much broader term for me.
.
I think that what the overloading of $ sign did is enough for everybody to not try to do this again – I believe it wasn’t that good.
.
I strongly believe, that data- in itself is a prefix, and namely, a wrong one. If you agree that using $ for different purposes in different libraries is a wrong thing, than I think it’s logical that the same goes for this.
.
If we are speaking of creating a HTML5 standard API for something like data, I can’t see your point why aren’t we creating a simpler mechanism to reach namespaces than DOM. Of course you can use now SQL queries with Gears, but it will be a part of HTML 5 too. The same could go for selector libraries.
.
In general: even if it’s convenient to use $ in jquery and prototype than a much longer name (like jquery), I would by no means standardize it and I believe it was, in a much broader term than the personal convenience of the developers using just one of them – an unhealthy solution (and here I agree with Crockford).
.
Standardizing such thing would exchange the rationale for cheap tricks.

Comment by Adam Nemeth — July 15, 2008

‘divs usually have a “class” or “id” attribute.’
.
So why is that “semantically meaningful” and data-* isn’t?
.
“You can think of divs as named and unnamed array-like structures”
.
Huh? For most uses, they’re unnamed or misleadingly-named non-structures.
.
“but most of them have a meaning in modern web development, even if not noted”
.
Which is… not. semantically. meaningful.
.
“I wouldn’t go into details that a “paragraph” has a very well defined meaning in writing”
.
If used properly.
.
‘“this is an unordered list of elements” also has meanings.’
.
But it doesn’t mean… navigation menu. Or any other manner of things even a good author might use it for for lack of better elements.
.
“I also won’t try to argue that when writing text, it’s more practical to use b or i, than span class=”emphasizedforquicklook” or class=”personname”.”
.
The use of b/i over strong/em is the issue here. Why use spans at all?? Sure, class your strong/em if they have more meaning than emphasis (although the structure should probably then be span.something strong or similar).
.
“I strongly believe, that data- in itself is a prefix, and namely, a wrong one. If you agree that using $ for different purposes in different libraries is a wrong thing, than I think it’s logical that the same goes for this.”
.
I don’t think using $ for different purposes in different libraries is wrong. They’re different libraries, and Javascript isn’t… a semantic structured document language. Using it this way might be considered obtrusive however, and I think jQuery deserves accolades on this one (though I use Prototype primarily) for allowing both the syntax of $() and the ability to free up window[‘$’] for other libraries to use.
.
“If we are speaking of creating a HTML5 standard API for something like data, I can’t see your point why aren’t we creating a simpler mechanism to reach namespaces than DOM. Of course you can use now SQL queries with Gears, but it will be a part of HTML 5 too. The same could go for selector libraries.”
.
What does that have to do with embedding metadata into a document? The metadata itself in the document itself is of value itself, without being able to query it in the DOM.
.
“In general: even if it’s convenient to use $ in jquery and prototype than a much longer name (like jquery), I would by no means standardize it and I believe it was, in a much broader term than the personal convenience of the developers using just one of them – an unhealthy solution (and here I agree with Crockford).”
.
But for all intents and purposes, there’s nothing wrong with jQuery’s behavior for its ability to interoperate with other scripts. It can free up anything global it creates outside the window[‘jquery’] “namespace”. You might argue that the use of any globals is harmful, but then you pretty much must write your entire script in a closure. I can sort of see the value of that except then there’s no interoperability between scripts at all, no libraries and APIs at all!
.
“Standardizing such thing would exchange the rationale for cheap tricks.”
.
I don’t know what you’re talking about. What does standardizing a metadata-in-the-document attribute prefix have to do with standardizing the $() function[s]? How is data-* a cheap trick? It’s a defined metadata structure. It is a deliberate mechanism to let authors arbitrarily define their own metadata semantics. It’s neither cheap nor a trick. As John Resig points out, it’s even already available albeit informally. I don’t see the harm, and you haven’t explained it.

Comment by eyelidlessness — July 15, 2008

Leave a comment

You must be logged in to post a comment.