Tuesday, April 27th, 2010p>Steve Souders was at the Mozilla Web caching summit that we posted on recently. At the event he lead the charge on the default size of the cache, and has written up a call to improve browser caching.
He created a browser survey form to capture information on the size of peoples cache, and this jumps out at you:
The data shows that 55% of people surveyed have a cache that’s over 90% full.
And we wonder why our servers are often serving up content to empty caches? Here Steve tells the story:
In 2007 Tenni Theurer and I ran an experiment to measure browser cache stats from the server side. Tenni’s write up, Browser Cache Usage – Exposed, is the stuff of legend. There she reveals that while 80% of page views were done with a primed cache, 40-60% of unique users hit the site with an empty cache at least once per day. 40-60% seems high, but I’ve heard similar numbers from respected web devs at other major sites.
Why do so many users have an empty cache at least once per day?
I’ve been racking my brain for years trying to answer this question. Here are some answers I’ve come up with:
- first time users – Yea, but not 40-60%.
- cleared cache – It’s true: more and more people are likely using anti-virus software that clears the cache between browser sessions. And since we ran that experiment back in 2007 many browsers have added options for clearing the cache frequently (for example, Firefox’s privacy.clearOnShutdown.cache option). But again, this doesn’t account for the 40-60% number.
- flawed experiment – It turns out there was a flaw in the experiment (browsers ignore caching headers when an image is in memory), but this would only affect the 80% number, not the 40-60% number. And I expect the impact on the 80% number is small, given the fact that other folks have gotten similar numbers. (In a future blog post I’ll share a new experiment design I’ve been working on.)
- resources got evicted – hmmmmm
OK, let’s talk about eviction for a minute. The two biggest influencers for a resource getting evicted are the size of the cache and the eviction algorithm. It turns out, the amount of disk space used for caching hasn’t kept pace with the size of people’s drives and their use of the Web. Here are the default disk cache sizes for the major browsers:
- Internet Explorer: 8-50 MB
- Firefox: 50 MB
- Safari: everything I found said there isn’t a max size setting (???)
- Chrome: < 80 MB (varies depending on available disk space)
- Opera: 20 MB
- Those defaults are too small. My disk drive is 150 GB of which 120 GB is free. I’d gladly give up 5 GB or more to raise the odds of web pages loading faster.
Even with more disk space, the cache is eventually going to fill up. When that happens, cached resources need to be evicted to make room for the new ones. Here’s where eviction algorithms come into play. Most eviction algorithms are LRU-based – the resource that was least recently used is evicted. However, our knowledge of performance pain points has grown dramatically in the last few years. Translating this knowledge into eviction algorithm improvements makes sense. For example, we’re all aware how much costlier it is to download a script than an image. (Scripts block other downloads and rendering.) Scripts, therefore, should be given a higher priority when it comes to caching.
Time to bump the default size, and get some smarter algorithms in place. Or, you can always run a cache server like polipo ;).