Tuesday, April 27th, 2010

Steve’s call to improve browser caching

Category: Browsers

<p>Steve Souders was at the Mozilla Web caching summit that we posted on recently. At the event he lead the charge on the default size of the cache, and has written up a call to improve browser caching.

He created a browser survey form to capture information on the size of peoples cache, and this jumps out at you:

The data shows that 55% of people surveyed have a cache that’s over 90% full.

And we wonder why our servers are often serving up content to empty caches? Here Steve tells the story:

In 2007 Tenni Theurer and I ran an experiment to measure browser cache stats from the server side. Tenni’s write up, Browser Cache Usage – Exposed, is the stuff of legend. There she reveals that while 80% of page views were done with a primed cache, 40-60% of unique users hit the site with an empty cache at least once per day. 40-60% seems high, but I’ve heard similar numbers from respected web devs at other major sites.

Why do so many users have an empty cache at least once per day?

I’ve been racking my brain for years trying to answer this question. Here are some answers I’ve come up with:

  • first time users – Yea, but not 40-60%.
  • cleared cache – It’s true: more and more people are likely using anti-virus software that clears the cache between browser sessions. And since we ran that experiment back in 2007 many browsers have added options for clearing the cache frequently (for example, Firefox’s privacy.clearOnShutdown.cache option). But again, this doesn’t account for the 40-60% number.
  • flawed experiment – It turns out there was a flaw in the experiment (browsers ignore caching headers when an image is in memory), but this would only affect the 80% number, not the 40-60% number. And I expect the impact on the 80% number is small, given the fact that other folks have gotten similar numbers. (In a future blog post I’ll share a new experiment design I’ve been working on.)
  • resources got evicted – hmmmmm

OK, let’s talk about eviction for a minute. The two biggest influencers for a resource getting evicted are the size of the cache and the eviction algorithm. It turns out, the amount of disk space used for caching hasn’t kept pace with the size of people’s drives and their use of the Web. Here are the default disk cache sizes for the major browsers:

  • Internet Explorer: 8-50 MB
  • Firefox: 50 MB
  • Safari: everything I found said there isn’t a max size setting (???)
  • Chrome: < 80 MB (varies depending on available disk space)
  • Opera: 20 MB
  • Those defaults are too small. My disk drive is 150 GB of which 120 GB is free. I’d gladly give up 5 GB or more to raise the odds of web pages loading faster.

Even with more disk space, the cache is eventually going to fill up. When that happens, cached resources need to be evicted to make room for the new ones. Here’s where eviction algorithms come into play. Most eviction algorithms are LRU-based – the resource that was least recently used is evicted. However, our knowledge of performance pain points has grown dramatically in the last few years. Translating this knowledge into eviction algorithm improvements makes sense. For example, we’re all aware how much costlier it is to download a script than an image. (Scripts block other downloads and rendering.) Scripts, therefore, should be given a higher priority when it comes to caching.

Time to bump the default size, and get some smarter algorithms in place. Or, you can always run a cache server like polipo ;).

Related Content:

Posted by Dion Almaer at 11:43 am
7 Comments

+----
1 rating from 1 votes

7 Comments »

Comments feed TrackBack URI

I would use a few different things for caching… first off, cache with a key based on filename + crc32, so that multiple sites using the same js library can be shared. Second, give a certain weight to js, and to images based on size… something similar to the “frecency” algorythm that mozilla uses in its’ superbar. What would be nice, would be more more dynamic application to utilize the if-modified-since directive, and yeilding imput/output via a message pump, instead of blocks of independent requests. I find that merging/minifying the css, and js files and supporting the if-modified-since header offers up a pretty nice boost over the current trend to dynamically load libraries and modules (even via CDN).

Comment by tracker1 — April 27, 2010

In Safari browser cache may cause the browser to run slower

http://www.ehow.com/how_2033308_delete-memory-cache.html

Comment by Ilidio — April 27, 2010

In Firefox you may change browser cache

about:config

browser.cache.disk.capacity
browser.cache.disk.enable
image.cache.size

Comment by Ilidio — April 27, 2010

I’d say a lot of people are looking at porn and do not wish to have their porn watching habits caught. So, porn appreciators either look at porn in their browsers private browsing mode or clear their page history afterward. This would result in a not insignificant number of empty caches.

Comment by Patmania — April 27, 2010

@tracker1: good suggestions

@lidio: I wrote a page with instructions for finding and changing browser cache settings: http://stevesouders.com/cache.php

@Patmania: And yet, 50%+ of people who have completed the survey have a full cache. In a way, this isn’t surprising, esp. given the small size of caches – if someone browses the Web their cache should fill up. But when I saw the high empty cache rates in 2007, I always wondered if it was because of people clearing their cache. Now I believe the bigger reason is caches filling up and pushing out resources.

Comment by souders — April 27, 2010

@Souders, I re-read your post and agree completely.

Comment by Patmania — April 28, 2010

Hey,
nice article !
I agree with it and it fits precisley with this one: http://devblog.xing.com/frontend/browser-caching-why-is-it-not-a-good-standalone-solution/

Cheers

Comment by gekkstah — April 30, 2010

Leave a comment

You must be logged in to post a comment.