Friday, December 14th, 2007

The problem with innerHTML

Category: Articles, JavaScript, Tip, Yahoo!

<p>Some like to DOM their way around. Others prefer the simplicity of innerHTML. Julien Lecomte, of Yahoo!, wrote up his thoughts on the problems with innerHTML.

Julien first points out some issues:

  • Improper handling of the innerHTML property can enable script-injection attacks on Internet Explorer when the HTML string contains a script tag marked as defered: <script defer>…<script>
  • Setting innerHTML will destroy existing HTML elements that have event handlers attached to them, potentially creating a memory leak on some browsers.
  • You don’t get back a reference to the element(s) you just created, forcing you to add code to retrieve those references manually (using the DOM APIs…)
  • You can’t set the innerHTML property on all HTML elements on all browsers (for instance, Internet Explorer won’t let you set the innerHTML property of a table row element)

He then puts together Douglas Crockfords purge and some code to clean up script tags, arriving at:

javascript
< view plain text >
  1. YAHOO.util.Dom.setInnerHTML = function (el, html) {
  2.     el = YAHOO.util.Dom.get(el);
  3.     if (!el || typeof html !== 'string') {
  4.         return null;
  5.     }
  6.  
  7.     // Break circular references.
  8.     (function (o) {
  9.  
  10.         var a = o.attributes, i, l, n, c;
  11.         if (a) {
  12.             l = a.length;
  13.             for (i = 0; i < l; i += 1) {
  14.                 n = a[i].name;
  15.                 if (typeof o[n] === 'function') {
  16.                     o[n] = null;
  17.                 }
  18.             }
  19.         }
  20.  
  21.         a = o.childNodes;
  22.  
  23.         if (a) {
  24.             l = a.length;
  25.             for (i = 0; i < l; i += 1) {
  26.                 c = o.childNodes[i];
  27.  
  28.                 // Purge child nodes.
  29.                 arguments.callee(c);
  30.  
  31.                 // Removes all listeners attached to the element via YUI's addListener.
  32.                 YAHOO.util.Event.purgeElement(c);
  33.             }
  34.         }
  35.  
  36.     })(el);
  37.  
  38.     // Remove scripts from HTML string, and set innerHTML property
  39.     el.innerHTML = html.replace(/<script[^>]*>((.|[\r\n])*?)< \\?\/script>/ig, "");
  40.  
  41.     // Return a reference to the first child
  42.     return el.firstChild;
  43. };

Posted by Dion Almaer at 7:41 am
18 Comments

+++--
3.8 rating from 72 votes

18 Comments »

Comments feed TrackBack URI

Script tags are just one out of many ways to inject malicious code. Stripping only the script tag is sub par. You must also be careful of IFRAMEs, OBJECT, EMBED, IMG … and even some attributes ( some browsers still allow script execution in style=”background:url(javascript:…);”.

The only way to sanitize properly markup to be injected is to use a whitelist of tags and attributes … or to simply not inject markup using innerHTML.

Comment by Mathieu \'p01\' Henri — December 14, 2007

Using innerHMTL also has the problem of getting an “Unknow runtime error” in Internet Explorer sometimes.

Comment by Rodrigo — December 14, 2007

Concerning innerHTML of the in IE it’s a real problem.
My solution is to receive data via AJAX, and then use DOM functions to build new rows of the table:

document.createElement(“TR”);
// ….

etc.

Comment by Snowcore — December 14, 2007

since DOM is the recommendation it really shouldn’t be an issue

Comment by Site Smart — December 14, 2007

Some of the issues with event listeners and innerHTML can easily be overcome.

Chris Heilmann (unless my memory fails me) wrote an excellent article on using event delegates instead of event handlers in javascript. Chris: can you link it?

LowPro for Prototype, by Dan Webb, has the brilliant Event.addBehavior method, that also overcomes problems with dom updates and event handlers.

Comment by Morgan Roderick — December 14, 2007

I’ve run up against a number of problems using innerHTML recently: document.body.innerHTML += ‘your stuff’ seems to break the DOM tree so that any node collections and event handlers are lost. Also, when using innerHTML to parse a json feed (in a form – asp.net) makes ie randomly crash

Comment by Pete B — December 14, 2007

@Mathieu

I could not agree more. The setInnerHTML function barely normalizes the script tag execution behavior across all A-grade browsers.

@Morgan

Reducing the number of event handlers definitely helps alleviating some of the associated memory leaks, and event delegation is a well-known pattern that helps doing just that.

Comment by jlecomte — December 14, 2007

sometimes innerHTML is only the right way
http://www.quirksmode.org/dom/innerhtml.html

Comment by Anonymous — December 14, 2007

@mdmadph

One of the best use case for the innerHTML property is when it comes to modifying the document tree efficiently. Using innerHTML is a lot faster than using the DOM APIs.

Comment by jlecomte — December 14, 2007

Julien,

Thanks for the article. I’m also glad that you brought up the fact that innerHTML is extremely fast!

Oliver

Comment by skypoet — December 14, 2007

The Taconite Ajax framework does not use the innerHTML property**, instead relying on DOM manipulation methods. All you do is specify the markup you want, *as markup*, and Taconite does the renderning for you.

Ryan
Taconite Lead Developer

** The innerHTML property is used in a few spots to try to overcome some IE quirks, but it is used quite sparingly.

Comment by Ryan — December 14, 2007

In my experience, if you build a DOM structure offline (createDocumentFragment/createElement and so on,) finally inserting to the “live” document with a single appendChild() call, the browser shouldn’t have to do any more reflow work than if innerHTML were used.
 
I’m also a fan of having “templates” in HTML wherever possible (avoiding creating HTML using JS is always good, thus separating layers etc.), cloning that template, modifying and appending to the DOM. Again if you do all of this work offline and append to the live document once, I think the performance is pretty solid.

Comment by Schill — December 14, 2007

* Setting innerHTML will destroy existing HTML elements that have event handlers attached to them, potentially creating a memory leak on some browsers.

That’s a implementation problem, and not an argument, neither in a favorably or contrary way to innerHTML.

* You can’t set the innerHTML property on all HTML elements on all browsers (for instance, Internet Explorer won’t let you set the innerHTML property of a table row element)

Until IE isn’t dead, cross-browser solutions still as a set of work around. Ok, it’s a good argument, but… we all know that there are millions of further things which IE doesn’t allow. We’re furious, however, not amazed about.

@Schill
When using AJAX, sometimes i throw the results on a XSLT processor before be attached on the response. So, i use innerHTML without merging the layers of my applications, pasting directly the XSLT results on a target node.

Comment by HudsonTavares — December 14, 2007

One thing concerning innerHTML that never seems to be mentioned is that it will not work when content is served as application/xhtml+xml (or any other flavor of true xml for that matter)

Comment by posaune — December 15, 2007

Actually, interestingly enough, I’ve noted that you can write to innerHTML in at least a limited fashion with Firefox 2/3 and Safari 3 in this case; my personal site is served with the correct MIME type when applicable, and I am (unfortunately) writing to innerHTML in a few spots. I was thinking I’d have to rewrite those bits to use proper DOM calls, but it turns out I was able to get away with a few things. This was not the case in 2004 when I was “researching” application/xhtml+xml, Firefox would throw errors when any innerHTML “write” call was attempted. I get the feeling they decided to allow some form of writes because the technique is used (and overly-so, in my opinion,) so often.

Comment by Schill — December 15, 2007

I personaly never liked using regular expressions to filter HTML. Unless you use a grammar you won’t really be able to parse anything.
el.innerHTML = html.replace(/]*>((.|[\r\n])*?)/ig, “”);
looks good, but what if instead of terminating the script with “” I terminate it with “” ? Or better yet, how about
html = “void(0);pt defer=\”true\”>alert(‘foo’);”

If this is about security and disabling script injection attacks, it still has a lot that isn’t done.

Comment by MiguelVentura — December 16, 2007

Sorry for my previous post… I was hoping to get “readable version here.

Comment by MiguelVentura — December 16, 2007

kschneid: that’s the one. That article inspired me greatly awhile back, and has caused a small shift in how I write my javascript :-)

Comment by Morgan Roderick — December 18, 2007

Leave a comment

You must be logged in to post a comment.