Monday, March 10th, 2008

Internal IE-HTML DOM still isn’t XHTML compliant

Category: IE

Jon Davis was glad to see that XHTML compliance was on the list in IE 8, but was surprised to see the above. It turns out that he found:

XHTML compliance exists in parsing and rendering only. Microsoft is still using an internal IE-HTML DOM that is not XHTML-compliant, even in XHTML documents. All you have to do prove this out is, in script, alert(document.documentElement.outerHTML); and what do you see? The most obvious observation is a total disregard for XHTML 1.0 § 4.2, which reads, “Element and attribute names must be in lower case; XHTML documents must use lower case for all HTML element and attribute names. This difference is necessary because XML is case-sensitive e.g. <li> and <LI> are different tags.” Why does this matter? It matters because of DHTML. It matters because there is an implemented and oft-used setter on DOM elements’ innerHTML. It matters because people actually use the DOM programmatically, both in evaluating and assigning markup. It matters because the browser has a Content-Editable mode that is often used with online content editing whereby the innerHTML contents are posted to the server for viewing as content. It matters because Internet Explorer has a COM interface that can, and often is, used to parse and tidy HTML markup, or to provide a WYSIWYG rich text editor for applications. It matters because it’s broken, has been all along, and has never been deemed acceptable.

Should IE be looking to fix this?

Posted by Dion Almaer at 6:48 am

3.8 rating from 42 votes


Comments feed TrackBack URI

Of course IE should fix this. IE should be striving to adhere to the standards above everything else. If they want to introduce new garbage like “slices” then dang sure better get the basics right before wasting time.

Comment by Jon Hartmann — March 10, 2008


Comment by aw2xcd — March 10, 2008

*sigh*, come on now, other browsers do similar things.

Open firebug on an xhtml page.

Run this…

var node = document.createElement("div");

No trailing slash on the img. I don’t know how they get away with it.


Comment by jaffathecake — March 10, 2008

is that a problem of IE8, or of the developer toolbar? will document.getElementsByTagName(‘HTML’) return something? What about the same behaviour in other browsers?

Comment by urandom — March 10, 2008

Gee, what a suprise…

Comment by qqrq — March 10, 2008

I’d suspect it’s because they are using the HTML DOM (which should be uppercase) as IE doesn’t support application/xhtml+xml (unless something has changed in IE8).

It’s all a moot point really since no one serves xhtml correctly as application/xhtml+xml so browsers will be processing any xhtml as plain-old-html.

Comment by adrianlynch — March 10, 2008

It’s extremely frustrating because rather than scrapping the old IE4/5/6/7 codebase, they’ve clearly just made a few fixes and improvements to the rendering engine to make the CSS work properly(ish) and tacked a bunch of new features on top. This is how they achieve IE6/7 compatibility – or conversely the requirement of being IE6/7 compatible has forced them to do this. It’s also the reason they inherit horrible legacy issues such as the totally ropey internal DOM.

Obviously I don’t actually have proof, but I’m fairly certain that they’ve just worked on improving the old engine. There’s a little more insight here:

Comment by chickerino — March 10, 2008

So basically all of the developer headaches (other than cleaning up their CSS 2.1 implementation) that we have been complaining about and begging for fixes for and make huge hacks around for 6ish years are still not going to be fixed.

Wow, what a release for developers!!!!!!!!!



Comment by ozonecreations — March 10, 2008

Have you guys read the Improved Namespace Support whitepaper?
The XHTML namespace does not trigger the “namespace support” (neither does any “default namespace declaration” on the element or any other “known” element).

The xmlns= attribute is parsed as any other unknown attribute so is accessible in the DOM, but it’s not a namespace declaration.

Note that this behavior is compatible with HTML5.

Comment by tbroyer — March 10, 2008

“The xmlns= attribute is parsed as any other unknown attribute so is accessible in the DOM, but it’s not a namespace declaration.”

True, but since both the tag and its xmlns attribute are parsed into the DOM tree and “activated”, per se, the fact that there is no tag in the XHTML default namespace (there is only an tag) means that this is not the same tag that is declared in the referenced namespace.

Comment by stimpy77 — March 10, 2008

bah my {html}/{HTML} tags got stripped.

Anyway, if xmlns it isn’t treated a namespace declaration, it’s not a true XHTML parsing engine is it?

Comment by stimpy77 — March 10, 2008


Heh heh… I’m not a hater, I’m a rediculously passionate Microsoft enthusiast. Passion just also makes me intolerant. :) For every complaint I have two or three praises of Microsoft, when they earn it.

Comment by stimpy77 — March 10, 2008

“For every complaint I have two or three praises of Microsoft, when they earn it.”

Have they earned one since IE4 leap-frogged netscape technology wise? Ohh they created the XMLHttpRequest….. uhhhh 8 years ago?

I have a dream…….. code once, call it a day
I have reality……… code once, spend all day fixing it in IE

Somewhere over the rainbow…………..

Comment by ozonecreations — March 10, 2008

Yeah, they earned praises with finally getting around to CSS 2.1 compliance for rendering.

Comment by stimpy77 — March 10, 2008

Fix this? Of course not. The whole point of IE was to add as much confusion and interoperability problems to the Internet as possible. Microsoft’s desktop dominance would assure it of large market acceptance. Interoperability issues are good because it makes web development harder. And that is good, because of web apps become ubiquitous, poof, there goes the desktop cash cow.

If you are a Microsoft shareholder you want the interoperability pain to continue as much and as long as possible. And try to come up with some alternatives to the web of course, that require Microsoft software to work.

Guys, the last thing Microsoft is interested in is solving your problems. That’s just wishful thinking. No one will put at risk the billions raked in every year from the platform and Office sales.

Comment by Berend de Boer — March 10, 2008

“And that is good, because [if] web apps become ubiquitous, poof, there goes the desktop cash cow.”

I think Microsoft proved that this isn’t their way of thinking when they made Silverlight 2 support RIAs on all popular OS’s rather than hoarding all that CLR goodness just for WPF.

Comment by stimpy77 — March 10, 2008

@Berend, you’re clearly a conspiracy theorist of epic proportions. Please stop drinking the koolaid and come back to reality.

Comment by Jeff Howden — March 10, 2008

i knew something would be exposed. i knew they will never get things done we need. ie is lost. forever. at least for me.

Comment by britneyfreek — March 11, 2008

@stimpy77: read the whitepaper. This is *not* an XHTML parser, it’s explicitly called out. Actually, even for those “namespaced elements”, IE doesn’t use an XML parser (you can omit quotes from your attributes and some end tags; the major difference is that tags ending in /> are parsed as start-end tags, contrary to the “HTML parsing mode” where the slash is just ignored and the “emptyness” is based on a list of known elements).

Actually, IE8 is not much different from IE7, IE6 or IE5 wrt namespaces. IE8 just allows you to:
a) omit the “object” and “?import?” (they’ve been “moved” to the windows registry; but IE8 will add them to the DOM, it’s just “syntactic sugar”) and
b) use “default namespaces” (i.e. unprefixed elements)

Comment by tbroyer — March 11, 2008

“I think Microsoft proved that this isn’t their way of thinking when they made Silverlight 2 support RIAs on all popular OS’s rather than hoarding all that CLR goodness just for WPF.”

To play advocate of the devil, that doesn’t prove anything. MS could be simply trying to replace HTML with a proprietary alternative, that they can then drop cross-platform support for because “nobody wants cross-platform”.

Let me put it this way: if microsoft isn’t trying to protect the windows hegemony on end user applications, they are not protecting shareholder value, and IIRC that is illegal.

Comment by Joeri — March 11, 2008

tbroyer: You’re missing the point. I actually don’t care whether an XHTML document is being read as an XHTML namespaced XML document, nor do I care how Internet Explorer is maintaining its DOM.

What I care about is element.innerHTML. IE keeps converting valid XML (XHTML) to XML-unfriendly SGML (HTML 4). Not only are element names upper-cased, attributes get their quotation marks clipped off. When you take a content-editable block and pass its innerHTML to the server, the server has to reprocess the SGML crap back to the XHTML that it started with.

The least Microsoft can do is provide an alternate element.innerXHTML property that outputs clean XHTML.

Comment by stimpy77 — March 11, 2008

Joeri, please let’s not get off-topic here. This isn’t about Microsoft as a company, it’s about IE as a software product.

Comment by stimpy77 — March 11, 2008

@stimpy77:If you are serving your xhtml as text/html then you are actually sending the browser invalid-html. For the browser to treat your xhtml as proper xhtml (eg. as XML) then you need to serve the doc as application/xhtml+xml.

So until IE supports application/xhtml+xml (and I doubt it will be anytime soon) they will probably never give you what you want.

Of course the other problem is everyone serves xhtml with the wrong mimetype anyway, so even if IE did start proper support for application/xhtml+xml they would still treat xhtml as broken plain old html.

Comment by adrianlynch — March 11, 2008

@adrian, XHTML is identified by the DOCTYPE, not by the MIME type.

[!DOCTYPE html PUBLIC “-//W3C//DTD XHTML 1.0 Transitional//EN” “”]

Comment by stimpy77 — March 12, 2008
Comment by stimpy77 — March 12, 2008

[!DOCTYPE html PUBLIC “-//W3C//DTD XHTML 1.0 Strict//EN” “”]

Comment by stimpy77 — March 12, 2008

@stimpy77 Sorry no you are wrong – if you send with a mimetype of text/html ALL browsers treat your lovely formatted xhtml as INCORRECTLY formatted html. All browsers completely ignore the doctype. For a round up of the problem see:

Comment by adrianlynch — March 13, 2008

@stimpy77 err I can understand why you are thinking this way.

But the reality is xhtml is not formatted correctly at all. This is why xhmtl 2 is proposed to diverge the way it is. And html 5’s “xhtml” version exists. one to serve the true rendering of a XML version of xhtml and the other to continue the pseudo xhtml route that the fault correction compliance of all browsers has now got us into.

The doctype declaration is not longer used the way it was intended by the browsers. It now just a render and translation (DOM wise) checker that is all. The focus on the SGML compliance etc has been lost in the murky waters of the backwards compatibility and fault tolerance.

Ever considered why you are coding in xhtml, when maybe html 4.01 would do just as fine. :)

Comment by CannedTuna — March 13, 2008

You guys are still missing the point!!

When using .innerHTML, the only reason why people would use that is if they are either tidying or capturing HTML markup, whether by ContentEditable or for something else. When you go and reuse that markup, a lot of DAMAGE is done when the markup is conforming to a different standard (IE HTML) from where it originated or from where it is going.

I would NOT CARE about this at all if there was an .innerXHTML property. But there isn’t one.

Meanwhile, yesterday I created innerXHTML() and outerXHTML() myself.

Comment by stimpy77 — March 13, 2008

Leave a comment

You must be logged in to post a comment.