<?xml version="1.0" encoding="UTF-8"?><rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
		>
<channel>
	<title>Comments on: toStaticHTML: Sanitize your HTML in IE 8</title>
	<atom:link href="http://ajaxian.com/archives/tostatichtml-sanitize-your-html-in-ie-8/feed" rel="self" type="application/rss+xml" />
	<link>http://ajaxian.com/archives/tostatichtml-sanitize-your-html-in-ie-8</link>
	<description>Cleaning up the web with Ajax</description>
	<lastBuildDate>Thu, 17 May 2012 07:43:39 +0000</lastBuildDate>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.3.2</generator>
	<item>
		<title>By: gonchuki</title>
		<link>http://ajaxian.com/archives/tostatichtml-sanitize-your-html-in-ie-8/comment-page-1#comment-267595</link>
		<dc:creator>gonchuki</dc:creator>
		<pubDate>Tue, 23 Sep 2008 01:57:34 +0000</pubDate>
		<guid isPermaLink="false">http://ajaxian.com/?p=4307#comment-267595</guid>
		<description>@pmontrasio:
you are missing the point, having that function in JS is completely useless. User input should be sanitized upon submission in server-side, so everything that hits the DB is already sanitized and cleared for safe outputting.</description>
		<content:encoded><![CDATA[<p>@pmontrasio:<br />
you are missing the point, having that function in JS is completely useless. User input should be sanitized upon submission in server-side, so everything that hits the DB is already sanitized and cleared for safe outputting.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: pmontrasio</title>
		<link>http://ajaxian.com/archives/tostatichtml-sanitize-your-html-in-ie-8/comment-page-1#comment-267170</link>
		<dc:creator>pmontrasio</dc:creator>
		<pubDate>Wed, 03 Sep 2008 16:25:14 +0000</pubDate>
		<guid isPermaLink="false">http://ajaxian.com/?p=4307#comment-267170</guid>
		<description>Because maybe you have a site like Ajaxian where people can type in comments like I&#039;m doing. If you don&#039;t sanitize input you might end up serving rogue HTML created by some attacker which does a lot of nasty things to the browsers of your visitors and their PCs.</description>
		<content:encoded><![CDATA[<p>Because maybe you have a site like Ajaxian where people can type in comments like I&#8217;m doing. If you don&#8217;t sanitize input you might end up serving rogue HTML created by some attacker which does a lot of nasty things to the browsers of your visitors and their PCs.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: wwwmarty</title>
		<link>http://ajaxian.com/archives/tostatichtml-sanitize-your-html-in-ie-8/comment-page-1#comment-267161</link>
		<dc:creator>wwwmarty</dc:creator>
		<pubDate>Wed, 03 Sep 2008 15:28:15 +0000</pubDate>
		<guid isPermaLink="false">http://ajaxian.com/?p=4307#comment-267161</guid>
		<description>Why would I need this?</description>
		<content:encoded><![CDATA[<p>Why would I need this?</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Jerome</title>
		<link>http://ajaxian.com/archives/tostatichtml-sanitize-your-html-in-ie-8/comment-page-1#comment-267148</link>
		<dc:creator>Jerome</dc:creator>
		<pubDate>Wed, 03 Sep 2008 11:03:28 +0000</pubDate>
		<guid isPermaLink="false">http://ajaxian.com/?p=4307#comment-267148</guid>
		<description>Yeah, I&#039;m convinced now that proper parsing and whitelisting of tags and their attributes is the only way you stand a chance.</description>
		<content:encoded><![CDATA[<p>Yeah, I&#8217;m convinced now that proper parsing and whitelisting of tags and their attributes is the only way you stand a chance.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: nate</title>
		<link>http://ajaxian.com/archives/tostatichtml-sanitize-your-html-in-ie-8/comment-page-1#comment-267142</link>
		<dc:creator>nate</dc:creator>
		<pubDate>Wed, 03 Sep 2008 05:43:20 +0000</pubDate>
		<guid isPermaLink="false">http://ajaxian.com/?p=4307#comment-267142</guid>
		<description>Sigh, my first post and already hit by escaping issues. &lt;code&gt;s/&lt;/&lt;/g&lt;/code&gt;, nate. The second snippet (which turned into a link labeled &quot;test&quot;) was supposed to read:

&lt;code&gt;&lt;a href=&quot;javascript:alert(&#039;ok&#039;);&quot;&gt;test&lt;/a&gt;&lt;/code&gt;</description>
		<content:encoded><![CDATA[<p>Sigh, my first post and already hit by escaping issues. <code>s/&lt;/&amp;lt;/g</code>, nate. The second snippet (which turned into a link labeled &#8220;test&#8221;) was supposed to read:</p>
<p><code>&lt;a href="javascript:alert('ok');"&gt;test&lt;/a&gt;</code></p>
]]></content:encoded>
	</item>
	<item>
		<title>By: nate</title>
		<link>http://ajaxian.com/archives/tostatichtml-sanitize-your-html-in-ie-8/comment-page-1#comment-267141</link>
		<dc:creator>nate</dc:creator>
		<pubDate>Wed, 03 Sep 2008 05:40:55 +0000</pubDate>
		<guid isPermaLink="false">http://ajaxian.com/?p=4307#comment-267141</guid>
		<description>It is important to know that it is extremely difficult (if not nearly impossible) to sanitize all scripts out of a string using regexes (especially with those as simple as posted here).

The only way to properly do this would be to parse and tokenize the HTML. You can&#039;t generalize that &quot;on___&quot; means javascript; you need to know the context in which the &quot;on___&quot; is found. Take the following example regex provided by Jerome above:

&lt;code&gt;/&lt;script[\s\S]+?&#124;(]+)\son\w+=([&#039;&quot;])[\s\S]+?\2/gi&lt;/code&gt;

Apply that regex to the following 2 sample snippets:

&lt;code&gt;&lt;input type=&quot;text&quot; value=&quot; only=&#039;text&#039; &quot; /&gt;&lt;/code&gt;
&lt;code&gt;&lt;a href=&quot;javascript:alert(&#039;ok&#039;);&quot;&gt;test&lt;/a&gt;&lt;/code&gt;

The first should not be affected by sanitation, but will be due to the value of the &quot;value&quot; attribute. The second one makes it through sanitation when it obviously shouldn&#039;t (though I am unaware if this IE8 function will remove javascript:-prefixed URIs?).

If you attempt to hack HTML sanitation with regexes, you are going to forget to plug at least one hole and someone is bound to find and exploit it. At one time or another every developer seems to get it in their head that they &quot;know what they&#039;re doing&quot; with regards to sanitizing HTML of malicious tags. So far I haven&#039;t seen a single, simple regex solution that hasn&#039;t been an utterly insecure hack.</description>
		<content:encoded><![CDATA[<p>It is important to know that it is extremely difficult (if not nearly impossible) to sanitize all scripts out of a string using regexes (especially with those as simple as posted here).</p>
<p>The only way to properly do this would be to parse and tokenize the HTML. You can&#8217;t generalize that &#8220;on___&#8221; means javascript; you need to know the context in which the &#8220;on___&#8221; is found. Take the following example regex provided by Jerome above:</p>
<p><code>/&lt;script[\s\S]+?|(]+)\son\w+=(['"])[\s\S]+?\2/gi</code></p>
<p>Apply that regex to the following 2 sample snippets:</p>
<p><code>&lt;input type="text" value=" only='text' " /&gt;</code><br />
<code>&lt;a href="javascript:alert('ok');"&gt;test&lt;/a&gt;</code></p>
<p>The first should not be affected by sanitation, but will be due to the value of the &#8220;value&#8221; attribute. The second one makes it through sanitation when it obviously shouldn&#8217;t (though I am unaware if this IE8 function will remove javascript:-prefixed URIs?).</p>
<p>If you attempt to hack HTML sanitation with regexes, you are going to forget to plug at least one hole and someone is bound to find and exploit it. At one time or another every developer seems to get it in their head that they &#8220;know what they&#8217;re doing&#8221; with regards to sanitizing HTML of malicious tags. So far I haven&#8217;t seen a single, simple regex solution that hasn&#8217;t been an utterly insecure hack.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: savetheclocktower</title>
		<link>http://ajaxian.com/archives/tostatichtml-sanitize-your-html-in-ie-8/comment-page-1#comment-267125</link>
		<dc:creator>savetheclocktower</dc:creator>
		<pubDate>Tue, 02 Sep 2008 22:40:32 +0000</pubDate>
		<guid isPermaLink="false">http://ajaxian.com/?p=4307#comment-267125</guid>
		<description>Why on earth is this a global method? Why not make it an instance method (or even a static method) on String? It&#039;d be a proprietary augmentation of String, yes, but is that any worse than encroaching on window?</description>
		<content:encoded><![CDATA[<p>Why on earth is this a global method? Why not make it an instance method (or even a static method) on String? It&#8217;d be a proprietary augmentation of String, yes, but is that any worse than encroaching on window?</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: jeromew</title>
		<link>http://ajaxian.com/archives/tostatichtml-sanitize-your-html-in-ie-8/comment-page-1#comment-267119</link>
		<dc:creator>jeromew</dc:creator>
		<pubDate>Tue, 02 Sep 2008 21:07:11 +0000</pubDate>
		<guid isPermaLink="false">http://ajaxian.com/?p=4307#comment-267119</guid>
		<description>This makes for interesting reading: http://refactormycode.com/codes/333-sanitize-html</description>
		<content:encoded><![CDATA[<p>This makes for interesting reading: <a href="http://refactormycode.com/codes/333-sanitize-html" rel="nofollow">http://refactormycode.com/codes/333-sanitize-html</a></p>
]]></content:encoded>
	</item>
	<item>
		<title>By: jeromew</title>
		<link>http://ajaxian.com/archives/tostatichtml-sanitize-your-html-in-ie-8/comment-page-1#comment-267115</link>
		<dc:creator>jeromew</dc:creator>
		<pubDate>Tue, 02 Sep 2008 20:47:11 +0000</pubDate>
		<guid isPermaLink="false">http://ajaxian.com/?p=4307#comment-267115</guid>
		<description>@genericallyloud - I&#039;d actually say that the browser is often the worst candidate for sanitization. i.e. if you&#039;re sending back data to the server and expecting it to have been sanitized by the browser you&#039;re in for a world of hurt.
.
Fair comment about the simplistic regex though :) I&#039;d be interested to hear about ways to work around it.
.
e.g. 
- the regex currently expects the onXXX value to be surrounded by [double]quotes
- need to match and remove: href=&quot;javascript:xxx&quot;
- need to match and remove: behaviour: xxx in a &lt;style&gt; tag
- other things...?</description>
		<content:encoded><![CDATA[<p>@genericallyloud &#8211; I&#8217;d actually say that the browser is often the worst candidate for sanitization. i.e. if you&#8217;re sending back data to the server and expecting it to have been sanitized by the browser you&#8217;re in for a world of hurt.<br />
.<br />
Fair comment about the simplistic regex though :) I&#8217;d be interested to hear about ways to work around it.<br />
.<br />
e.g.<br />
- the regex currently expects the onXXX value to be surrounded by [double]quotes<br />
- need to match and remove: href=&#8221;javascript:xxx&#8221;<br />
- need to match and remove: behaviour: xxx in a &lt;style&gt; tag<br />
- other things&#8230;?</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: genericallyloud</title>
		<link>http://ajaxian.com/archives/tostatichtml-sanitize-your-html-in-ie-8/comment-page-1#comment-267112</link>
		<dc:creator>genericallyloud</dc:creator>
		<pubDate>Tue, 02 Sep 2008 20:06:50 +0000</pubDate>
		<guid isPermaLink="false">http://ajaxian.com/?p=4307#comment-267112</guid>
		<description>@Morgan - seriously? Not that its a perfect solution, but html sanitization is a real problem. As Jerome has just clearly demonstrated, html sanitization efforts are often done poorly (sorry Jerome, but thats a very simplistic solution). There are so many crazy ways for people to get script code into html and cause XSS attacks. This is because of how loose the browser can be in allowing scripts in. Who better to sanitize than a browser?!</description>
		<content:encoded><![CDATA[<p>@Morgan &#8211; seriously? Not that its a perfect solution, but html sanitization is a real problem. As Jerome has just clearly demonstrated, html sanitization efforts are often done poorly (sorry Jerome, but thats a very simplistic solution). There are so many crazy ways for people to get script code into html and cause XSS attacks. This is because of how loose the browser can be in allowing scripts in. Who better to sanitize than a browser?!</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: MorganRoderick</title>
		<link>http://ajaxian.com/archives/tostatichtml-sanitize-your-html-in-ie-8/comment-page-1#comment-267104</link>
		<dc:creator>MorganRoderick</dc:creator>
		<pubDate>Tue, 02 Sep 2008 18:20:44 +0000</pubDate>
		<guid isPermaLink="false">http://ajaxian.com/?p=4307#comment-267104</guid>
		<description>Clearly MS has not learnt any lessons from past endeavours, and are still creating needless and proprietary extensions to their browser. 

All in a time where all their focus should be on catching up to the rest of the markets support for existing web standards.

Practical as the single function may be, it just helps to further delay the demise of IE6.</description>
		<content:encoded><![CDATA[<p>Clearly MS has not learnt any lessons from past endeavours, and are still creating needless and proprietary extensions to their browser. </p>
<p>All in a time where all their focus should be on catching up to the rest of the markets support for existing web standards.</p>
<p>Practical as the single function may be, it just helps to further delay the demise of IE6.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Rumble</title>
		<link>http://ajaxian.com/archives/tostatichtml-sanitize-your-html-in-ie-8/comment-page-1#comment-267103</link>
		<dc:creator>Rumble</dc:creator>
		<pubDate>Tue, 02 Sep 2008 18:16:24 +0000</pubDate>
		<guid isPermaLink="false">http://ajaxian.com/?p=4307#comment-267103</guid>
		<description>I wonder if this will also clean up some of the terrible mark up generated by Microsoft Office?

I recently had a bash at writing an HTML sanitizer/clean up in Javascript, using RegExes to check against a white list of allowed tags and attributes.

The result works OK, but was a bit slow and could probably do with some optimization love, must get around to sharing the code online soon, as it totally pwnd Office&#039;s dodgy HTML and I&#039;m sure others would find it useful too.</description>
		<content:encoded><![CDATA[<p>I wonder if this will also clean up some of the terrible mark up generated by Microsoft Office?</p>
<p>I recently had a bash at writing an HTML sanitizer/clean up in Javascript, using RegExes to check against a white list of allowed tags and attributes.</p>
<p>The result works OK, but was a bit slow and could probably do with some optimization love, must get around to sharing the code online soon, as it totally pwnd Office&#8217;s dodgy HTML and I&#8217;m sure others would find it useful too.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Jerome</title>
		<link>http://ajaxian.com/archives/tostatichtml-sanitize-your-html-in-ie-8/comment-page-1#comment-267101</link>
		<dc:creator>Jerome</dc:creator>
		<pubDate>Tue, 02 Sep 2008 17:01:32 +0000</pubDate>
		<guid isPermaLink="false">http://ajaxian.com/?p=4307#comment-267101</guid>
		<description>OK, so I&#039;m really sorry for spamming this article. Last try - I hope this helps someone. This should work:
.
if (typeof toStaticHTML == &quot;undefined&quot;) {
	toStaticHTML = function(inputHtml) {
		return inputHtml.replace(/&lt;script[\s\S]+?&lt;\/script&gt;&#124;(&lt;[^&gt;]+)\son\w+=([&#039;&quot;])[\s\S]+?\2/gi, &quot;$1&quot;);
	}
}</description>
		<content:encoded><![CDATA[<p>OK, so I&#8217;m really sorry for spamming this article. Last try &#8211; I hope this helps someone. This should work:<br />
.<br />
if (typeof toStaticHTML == &quot;undefined&quot;) {<br />
	toStaticHTML = function(inputHtml) {<br />
		return inputHtml.replace(/&lt;script[\s\S]+?&lt;\/script&gt;|(&lt;[^&gt;]+)\son\w+=(['&quot;])[\s\S]+?\2/gi, &quot;$1&quot;);<br />
	}<br />
}</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Jerome</title>
		<link>http://ajaxian.com/archives/tostatichtml-sanitize-your-html-in-ie-8/comment-page-1#comment-267100</link>
		<dc:creator>Jerome</dc:creator>
		<pubDate>Tue, 02 Sep 2008 16:29:52 +0000</pubDate>
		<guid isPermaLink="false">http://ajaxian.com/?p=4307#comment-267100</guid>
		<description>Triple oops. That&#039;s only going to work in .NET or some other language that supports look-behinds. I&#039;ll get my coat.</description>
		<content:encoded><![CDATA[<p>Triple oops. That&#8217;s only going to work in .NET or some other language that supports look-behinds. I&#8217;ll get my coat.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Jerome</title>
		<link>http://ajaxian.com/archives/tostatichtml-sanitize-your-html-in-ie-8/comment-page-1#comment-267098</link>
		<dc:creator>Jerome</dc:creator>
		<pubDate>Tue, 02 Sep 2008 16:14:04 +0000</pubDate>
		<guid isPermaLink="false">http://ajaxian.com/?p=4307#comment-267098</guid>
		<description>Double oops. Should have HTML encoded before posting:
.
&lt;script[\s\S]+?&lt;/script&gt;&#124;(?&lt;=&lt;[^&gt;]+)\son\w+=([&#039;&quot;])[\s\S]+?\1</description>
		<content:encoded><![CDATA[<p>Double oops. Should have HTML encoded before posting:<br />
.<br />
&lt;script[\s\S]+?&lt;/script&gt;|(?&lt;=&lt;[^&gt;]+)\son\w+=(['&quot;])[\s\S]+?\1</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Jerome</title>
		<link>http://ajaxian.com/archives/tostatichtml-sanitize-your-html-in-ie-8/comment-page-1#comment-267096</link>
		<dc:creator>Jerome</dc:creator>
		<pubDate>Tue, 02 Sep 2008 16:09:26 +0000</pubDate>
		<guid isPermaLink="false">http://ajaxian.com/?p=4307#comment-267096</guid>
		<description>Oops. To be clear: use the regex to replace() with empty string.</description>
		<content:encoded><![CDATA[<p>Oops. To be clear: use the regex to replace() with empty string.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Jerome</title>
		<link>http://ajaxian.com/archives/tostatichtml-sanitize-your-html-in-ie-8/comment-page-1#comment-267095</link>
		<dc:creator>Jerome</dc:creator>
		<pubDate>Tue, 02 Sep 2008 16:07:07 +0000</pubDate>
		<guid isPermaLink="false">http://ajaxian.com/?p=4307#comment-267095</guid>
		<description>This regular expression should do something similar:
.
&lt;script[\s\S]+?&#124;(?&lt;=]+)\son\w+=([&#039;&quot;])[\s\S]+?\1
.
I just knocked it up and ran a quick test on the example above though, so possibly not production quality :)</description>
		<content:encoded><![CDATA[<p>This regular expression should do something similar:<br />
.<br />
&lt;script[\s\S]+?|(?&lt;=]+)\son\w+=(['"])[\s\S]+?\1<br />
.<br />
I just knocked it up and ran a quick test on the example above though, so possibly not production quality :)</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Nosredna</title>
		<link>http://ajaxian.com/archives/tostatichtml-sanitize-your-html-in-ie-8/comment-page-1#comment-267090</link>
		<dc:creator>Nosredna</dc:creator>
		<pubDate>Tue, 02 Sep 2008 14:31:48 +0000</pubDate>
		<guid isPermaLink="false">http://ajaxian.com/?p=4307#comment-267090</guid>
		<description>Anyone have a solid JavaScript version of this?</description>
		<content:encoded><![CDATA[<p>Anyone have a solid JavaScript version of this?</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: genericallyloud</title>
		<link>http://ajaxian.com/archives/tostatichtml-sanitize-your-html-in-ie-8/comment-page-1#comment-267089</link>
		<dc:creator>genericallyloud</dc:creator>
		<pubDate>Tue, 02 Sep 2008 14:07:18 +0000</pubDate>
		<guid isPermaLink="false">http://ajaxian.com/?p=4307#comment-267089</guid>
		<description>Can we get this function for text inputs?</description>
		<content:encoded><![CDATA[<p>Can we get this function for text inputs?</p>
]]></content:encoded>
	</item>
</channel>
</rss>

