Monday, March 31st, 2008

Using a hash property for security and caching

Category: Performance, Security

<p>Hash Browns

Douglas Crockford would like to see a hash= attribute to aid security and performance:

Any HTML tag that accepts a src= or href= attribute should also be allowed to take a hash= attribute. The value of a hash attribute would be the base 32 encoding of the SHA of the object that would be retrieved. This does a couple of useful things.

First, it gives us confidence that the file that we receive is the one that we asked for, that it was not replaced or tampered with in transit.

Second, browsers can cache by hash code. If the cache contains a file that matches the requested hash=, then there is no need to go to the network regardless of the url. This would improve the performance of Ajax libraries because you would only have to download the library once for all of the sites you visit, even if every site links to its own copy.

Posted by Dion Almaer at 8:35 am
25 Comments

++++-
4.3 rating from 22 votes

25 Comments »

Comments feed TrackBack URI

A question for those who know more than I do: Mr. Crockford has made a lot of noise about web security, specifically in JavaScript. Is he solving actual problems that actual people are having, or is he addressing theoretical possibilities? E.g., this hash idea: has anyone reported that their content has been tampered with between the server and the client?

Comment by slaniel — March 31, 2008

he basicly is trying to find ways to prevent XSS which is a big concern.

Comment by Zsolt — March 31, 2008

I’m sure I’m missing something obvious, but if one assumes the ability to “replace or tamper with” a file sent by the server, should one not also assume that the hash attribute, being nothing but text in a file sent by the server, is suspect?

Comment by jemmons — March 31, 2008

I wouldn’t doubt this would cause problems with browser compatibility and get messed up with window.location.hash. Lots of AJAX and Flash use the window location hash(#products) to capture page changes without changing the page and I hope if this hash html property gets implemented that it doesn’t confuse the two for developers.

Comment by joelpittet — March 31, 2008

@slaniel: He is attempting to address both current, active security concerns, and look at where the web is going in the future, and attempting to plot various options for the evolution of the web down the pike.

@jemmons: Yep. What’s more, that might be even more dangerous, because it could provide a false sense of security.

I’m just going on the fly, but I’m thinking certificates for each site, with the site providing hash tables for verification of the requested content upon authentication from the client….no that wouldn’t be practical… crap, I dunno…need more coffee. ;)

Comment by Carbon43 — March 31, 2008

It could open up a route to secure XSS.

Comment by Chris Phillips — March 31, 2008

I don’t believe caching by the content of a html property is a good idea, as this property will probably be forgotten in every second update to a page or library, when not created automatically. A user might change the link to a library or image and the browser wouldn’t load the new target when the hash isn’t calculated anew.

Comment by hd42 — March 31, 2008

A microformat syntax to implement this has been discussed before, and may be more practical than adding an attribute to HTML tags.

Comment by brondsem — March 31, 2008

Awesome idea!!
To important for a microformat only.

Comment by nuxodin — March 31, 2008

But .. how do I hash the page in a good way?
SHA returns completely different results if the server does not deliver the same page for the same url – so I this is not a good idea for dynamic web pages. It’s useful only for scripts, CSS and images. They are static in most cases.

And .. do I really want to compute a hash for every object?

Comment by webEater — March 31, 2008

webEater: why should the server return a different page for the same URL ? I thought these days were gone.

Comment by p01 — March 31, 2008

Surely this is better off as an extension to HTTP rather than HTML. If the file could be tampered with in transit, so could the transmission of the hash – the point is moot. If you don’t want your stuff being tampered, there’s already a great solution: SSL.

There is no need for it to be in HTML, but it could work as a HTTP extension for the purposes of caching.

Comment by coob — March 31, 2008

This is ridiculous. As has already been pointed out, if someone can tamper with the resource they can tamper with the original HTML document. SSL will at least give you some security, although if someone can listen in on your network connection you’re practically compromised and SSL can only do so much.

As for caching, SHA-0/1 have both been cracked. SHA-2 will probably be cracked in 5 years. There is a growing belief in the cryptography community that no message digest hash function can be completely secure. I can just imagine the fun of a malicious website generating a file that matched the hash of a popular library and using it to steal personal info. At least same-domain policy prevents something like that from happening.

Comment by morbiuswilters — March 31, 2008

Pure genius.

The protocol is:
1/ We agree that you will deliver something
2/ You deliver that thing when I ask. But often I don’t ask because I cached it.
3/ You’d better not deliver something else because I will detect it thanks to the hash I computed when we agreed.

Now, is it secure? I don’t know 100% but as far as I know it is not trivial to generate a modified file with a same hash or is it?

Comment by JeanHuguesRobert — March 31, 2008

@Jean,

While this may be useful if your calling a JS file from a remote site. But, if your storing the file on your site, and your site got compromised, it’s not it would not be too hard for the attacker to then update the hash key to the new hash of the file and have things go undetected.

While this may work as security for a brief while, once attackers catch on “oh, I now need to look for where this JS file is being called and update the hash….”

Comment by shypht — March 31, 2008

Obviously without SSL this isn’t any more secure, but this would be a big benefit to caching and delivery of common JS libraries, etc. You get to host the file locally to guarantee availability, but still get much of the benefit of a CDN.

Comment by mrclay — March 31, 2008

link to the original:

http://blog.360.yahoo.com/blog-TBPekxc1dLNy5DOloPfzVvFIVOWMB0li?p=789

I’ll save extended comment for a blog post but the short form of this is that we need an HTTP semantic which correlates to any such scheme and a way to specify a “super” version of any hash-augmented script. Also, the privacy concerns which any such preferential caching scheme imply are non-trivial and without a serious resolution could easily be deal-breakers.

Regards

Comment by slightlyoff — March 31, 2008

on second though, Doug’s proposal is good enough. I had a longer thing queued up about an alternate source URL plus durable caching, but Doug’s proposal implicitly keeps the same cache clearing behavior which addresses the privacy issues that a different scheme would raise.

Jean: the point Doug specifying the SHA family of hashes is that it is cryptographically strong. That is to say, that the odds of generating a different file with the same hash value are so low as to require a brute-force search over the entire hash value space. Long story short, you can’t do it.

The best part of the proposal is that you only need to compute the hash once on the client. Subsequent fetches are simply a check against the value (although Doug’s proposal doesn’t outline a mismatch policy).

Comment by slightlyoff — March 31, 2008

morbiuswilters:

I think you misunderstand the proposal. The hash is a stand-in for content being able to pre-ordain HTTP caching headers…nothing more. Any sane implementation MUST use both the absolute URL and the hash value as the key for the cache, not simply the hash value. That doesn’t provide arbitrary content the ability to inject itself into a global cache under an assumed name, and reduces the back-pressure on the SHA family of hashes as a security token (which, in this scheme, it’s not). Only a “valid” originating server could ever serve content at a particular URL/hash pair, and the worst effect is to aggravate DNS attacks or other forms of server compromise (in which case you’re hosed anyway…thanks for playing).

From a security perspective, this proposal is neutral. If it had suggested a different caching strategy than the normal HTTP cache, there would be other concerns to worry about, but as it is you can think of it as a way of (from content) saying “set the expires header to a long time from now”.

Regards

Comment by slightlyoff — March 31, 2008

I don’t see the benefit of this at all. There are much better ways to add security. Like some other people that have commented here, I’d rather see an extension to HTTP than HTML.
On a side note, what’s with the recent “I wish” posts from Ajaxian? There was the one about OpenID in the browser, then Jabber, and now this?

Comment by musicfreak — March 31, 2008

musicfreak:

the primary benefit here is performance, not security. Furthermore, there are already HTTP-level semantics to handle this use case. The primary problem is that much content can’t take advantage of those semantics, either because of ignorance or a lack of control over the deployment environment. Anything that raises the average here will be beneficial in getting us over the hump in the next couple of years while we wait for browser competition (and Gears) to do its thing.

Regards

Comment by slightlyoff — March 31, 2008

@slightlyoff: thanks for the explanation on SHA :)

Comment by JeanHuguesRobert — March 31, 2008

I think this proposal is a good one if you want to test how difficult it is to generate the same SHA1 signatures for different documents.

As we have seen with md5, all we have to do is wait for the day until someone proves this is actually not that hard. And boom, we suddenly have the biggest security hole ever.

This proposal adds very little to what we have and opens a can of worms.

Comment by Berend de Boer — April 1, 2008

Wow. This proposal is acceptable in a crowd that’s gung-ho on modularity? (modularity == remaining blissfully ignorant of the precise implementation of the modules you include)

*Use a content distribution network*. You’ll get the same positive network performance effect, and you’ll be similarly crippled and unable to update your static files without notifying every page that includes them.

Crockford should know this; he works at Yahoo: where they evangelize the concept of renaming your CSS/JS files every time you edit them, and rewriting the files that include them, for the sake of cache optimization.

Comment by trav1m — April 1, 2008

Bahhhh. Why on earth can’t I add coherent HTML code in comment ? pfff

[a href="dynamic.php?page=1" hash="foo_1"]
[a href="dynamic.php?page=2" hash="foo_2"]
[form action="login.php" method="POST" hash="foo_3"]

This stuff is not realistic for dynamically generated content, no realistic for any server redirection, etc.

I hope it will NEVER come live, will be a total mess for devs and will bring a false security to users.

Bad Idea.

Comment by Laurent V. — July 22, 2008

Leave a comment

You must be logged in to post a comment.