Cache in Chromium 
Disk Cache & Overall cache flow
Chang W. Doh 
<hi> </hi> 
GDG Korea WebTech Organizer 
HTML5Rocks/KO Contributor/Coordinator
Before stepping into Cache: 
Network stack
Network Stack: 
What’s ‘Network Stack’? 
● A mostly single-threaded cross-platform 
library primarily for resource fetching 
○ URLRequest 
■ represents the request for a URL 
○ URLRequestContext 
■ contains all the associated contexts to fullfill 
the ‘URL request’ 
● e.g. cookies, host resolver, cache
Before stepping into Cache: 
● Some code layouts 
○ /net/base 
Network stack 
■ shared utilities for /net modules 
○ /net/disk_cache 
■ Cache for web resources 
○ /net/url_request 
■ URLRequest, URLRequestContext, ...
Typical request-flow
HttpCache Cache 
check (aka Disk Cache) 
HttpCache::Transaction 
not exist 
notify 
notify 
Cache hit!
Disk Cache: 
(a.k.a. Cache)
DiskCache: 
● Cache 
What’s DiskCache? 
○ Stores resources fetched from the web 
○ A part of ‘/net’ 
■ location: /net/disk_cache 
■ This means ‘DiskCache’ will controll cache-flows 
for network fetches. 
● NOTE: 
○ Android use ‘Simple cache’. 
■ location: /net/disk_cache/simple
DiskCache: 
● Main characteristics: 
Characteristics 
○ The cache should not grow unbounded 
■ Algorithm to decide when removing old entries 
○ Not critical to loose some data 
■ But discarding whole cache should be minimized 
○ Access should be possible to use sync or 
async operations 
○ Design should avoid ‘cache trashing’
DiskCache: 
● Main characteristics: 
Characteristics 
○ Should be possible to remove a entry from 
the cache 
■ and keep working with that entry while same time inaccessible to other requests 
○ Shouldn’t be using explicit multithread sync 
■ Always called from the same thread 
■ However, callbacks must be issued by message 
loop for avoiding reentrancy
DiskCache: 
External interfaces 
● /net/disk_cache/disk_cache.h 
● 2 Interfaces 
○ disk_cache::Backend 
■ manages entries on the cache 
○ disk_cache::Entry 
■ handles operations specific to a given resource
External interfaces: 
● An entry is identified by its key 
○ e.g. http://www.google.com/favicon.ico 
● Once an entry is created, the data is stored in 
separate chunks or data streams: 
○ HTTP headers 
○ Actual resource data 
● Index for the required stream is an argument to 
methods: 
○ Entry::ReadData 
○ Entry::WriteData 
Backend
Very Simple Cache 
(a.k.a. Simple Cache)
Simple cache: 
What is “Simple Cache”? 
● Proposed to a new backend for diskcache 
○ Conforming to the interface in Disk Cache 
○ Very simple 
■ Using 1 file per cache entry + index file 
■ Dealing with I/O bottlenecks
Comparison to 
Blockfile Backend 
?
Simple cache: 
Benefits and goals 
● Comparison to blockfile cache 
○ More resilent under corruption from the 
system crash 
■ Periodcally flushes its entire index 
■ Swaps index in atomically 
■ After system crash, will starts with the stale 
cache 
● NOTE: With the blockfile cache, chrome will drops 
whole cache by default
Simple cache: 
Benefits and goals 
● Comparison to blockfile cache (cont’d) 
○ Doesn’t delay launching network requests 
■ Elimination of delay factors 
● No context switching 
● Not blocks disk I/O before using network 
■ Blockfile has (AVR) 14~25ms delay on requests 
● On Android, slower flash controllers make delays significantly slower.
Simple cache: 
Benefits and goals 
● Comparison to blockfile cache (cont’d) 
○ Lower resident set pressure & fewer IO ops. 
■ Disk format has 
● 256~512B per entry records 
● + rankings & index information(~100B) per entry 
● Not all entries that are heavily used contiguously 
■ Simple cache 
● stores only SMALLNUM bytes per entry in memory 
● doesn’t access the disk where not required
Simple cache: 
Benefits and goals 
● Comparison to blockfile cache (cont’d) 
○ Simpler 
■ Shorter and easier via explicitly avoiding 
implementation of filesystem than blockfile’s
Simple cache: 
Non-goals; Simple cache is 
● Not a log structed cache system 
○ I/O performs by Simple cache is mostly 
sequential. But NOT log structed 
■ If it is, it means “filesystem that itself is log 
structed.” 
● Not a filesystem 
○ Disk cache delegates filesystem. 
■ means “Simple cache uses abstract interface 
of Disk Cache instead implementing its own 
filesystem”.
Simple cache: 
● Entry hash 
Structure on Disk 
○ Hash with 40 bit SHA-2 of url 
○ 2 entries with same EH can’t be stored 
● Stored in single directory 
○ ONE index file 
○ Each entry stored in a single file 
■ named by HexEntryHash_StreamNumber
Simple cache: 
● A file ‘00index’ 
Structure on Disk 
○ contains data for initializing memory index 
● Index (on memory) 
○ used for faster cache performance 
○ consists of entry hashes for records & simple 
eviction information
Simple cache: 
Structure on Disk 
● Formats of entry file 
○ Simple file header 
magic_number version key_length key_hash 
○ Simple file EOF 
final_magic_number flags data_crc32 stream_size 
○ Simple File sparse range header 
sparse_magic_number offset length data_crc32
Term: 
Sparse file
Simple cache: 
Implementation 
● I/O thread operations 
○ public API is called on the I/O thread 
○ The index is updated in the I/O thread 
● Worker pool operations 
○ All I/O operations are performed async on 
the worker pool. 
○ Cache will keep a pool of new entries ready 
to move into final place.
Simple cache: 
Implementation 
● Index flushing & consistency checking 
○ The index is flushed on 
■ shutdown 
■ periodically 
● Operation without index 
○ can operate without the IO thread index by 
directly opening files in the directory. 
○ for avoiding startup speeds & I/O costs
References 
[1] Disk Cache 
[2] Disk Cache 3.0 
[3] Very Simple Cache 
[4] Multi-process Resource Loading 
[5] Network Stack 
[6] Network Stack Use in Chromium

Cache in Chromium: Disk Cache

  • 1.
    Cache in Chromium Disk Cache & Overall cache flow
  • 2.
    Chang W. Doh <hi> </hi> GDG Korea WebTech Organizer HTML5Rocks/KO Contributor/Coordinator
  • 3.
    Before stepping intoCache: Network stack
  • 4.
    Network Stack: What’s‘Network Stack’? ● A mostly single-threaded cross-platform library primarily for resource fetching ○ URLRequest ■ represents the request for a URL ○ URLRequestContext ■ contains all the associated contexts to fullfill the ‘URL request’ ● e.g. cookies, host resolver, cache
  • 5.
    Before stepping intoCache: ● Some code layouts ○ /net/base Network stack ■ shared utilities for /net modules ○ /net/disk_cache ■ Cache for web resources ○ /net/url_request ■ URLRequest, URLRequestContext, ...
  • 6.
  • 7.
    HttpCache Cache check(aka Disk Cache) HttpCache::Transaction not exist notify notify Cache hit!
  • 8.
  • 9.
    DiskCache: ● Cache What’s DiskCache? ○ Stores resources fetched from the web ○ A part of ‘/net’ ■ location: /net/disk_cache ■ This means ‘DiskCache’ will controll cache-flows for network fetches. ● NOTE: ○ Android use ‘Simple cache’. ■ location: /net/disk_cache/simple
  • 10.
    DiskCache: ● Maincharacteristics: Characteristics ○ The cache should not grow unbounded ■ Algorithm to decide when removing old entries ○ Not critical to loose some data ■ But discarding whole cache should be minimized ○ Access should be possible to use sync or async operations ○ Design should avoid ‘cache trashing’
  • 11.
    DiskCache: ● Maincharacteristics: Characteristics ○ Should be possible to remove a entry from the cache ■ and keep working with that entry while same time inaccessible to other requests ○ Shouldn’t be using explicit multithread sync ■ Always called from the same thread ■ However, callbacks must be issued by message loop for avoiding reentrancy
  • 12.
    DiskCache: External interfaces ● /net/disk_cache/disk_cache.h ● 2 Interfaces ○ disk_cache::Backend ■ manages entries on the cache ○ disk_cache::Entry ■ handles operations specific to a given resource
  • 13.
    External interfaces: ●An entry is identified by its key ○ e.g. http://www.google.com/favicon.ico ● Once an entry is created, the data is stored in separate chunks or data streams: ○ HTTP headers ○ Actual resource data ● Index for the required stream is an argument to methods: ○ Entry::ReadData ○ Entry::WriteData Backend
  • 14.
    Very Simple Cache (a.k.a. Simple Cache)
  • 15.
    Simple cache: Whatis “Simple Cache”? ● Proposed to a new backend for diskcache ○ Conforming to the interface in Disk Cache ○ Very simple ■ Using 1 file per cache entry + index file ■ Dealing with I/O bottlenecks
  • 16.
  • 17.
    Simple cache: Benefitsand goals ● Comparison to blockfile cache ○ More resilent under corruption from the system crash ■ Periodcally flushes its entire index ■ Swaps index in atomically ■ After system crash, will starts with the stale cache ● NOTE: With the blockfile cache, chrome will drops whole cache by default
  • 18.
    Simple cache: Benefitsand goals ● Comparison to blockfile cache (cont’d) ○ Doesn’t delay launching network requests ■ Elimination of delay factors ● No context switching ● Not blocks disk I/O before using network ■ Blockfile has (AVR) 14~25ms delay on requests ● On Android, slower flash controllers make delays significantly slower.
  • 19.
    Simple cache: Benefitsand goals ● Comparison to blockfile cache (cont’d) ○ Lower resident set pressure & fewer IO ops. ■ Disk format has ● 256~512B per entry records ● + rankings & index information(~100B) per entry ● Not all entries that are heavily used contiguously ■ Simple cache ● stores only SMALLNUM bytes per entry in memory ● doesn’t access the disk where not required
  • 20.
    Simple cache: Benefitsand goals ● Comparison to blockfile cache (cont’d) ○ Simpler ■ Shorter and easier via explicitly avoiding implementation of filesystem than blockfile’s
  • 21.
    Simple cache: Non-goals;Simple cache is ● Not a log structed cache system ○ I/O performs by Simple cache is mostly sequential. But NOT log structed ■ If it is, it means “filesystem that itself is log structed.” ● Not a filesystem ○ Disk cache delegates filesystem. ■ means “Simple cache uses abstract interface of Disk Cache instead implementing its own filesystem”.
  • 22.
    Simple cache: ●Entry hash Structure on Disk ○ Hash with 40 bit SHA-2 of url ○ 2 entries with same EH can’t be stored ● Stored in single directory ○ ONE index file ○ Each entry stored in a single file ■ named by HexEntryHash_StreamNumber
  • 23.
    Simple cache: ●A file ‘00index’ Structure on Disk ○ contains data for initializing memory index ● Index (on memory) ○ used for faster cache performance ○ consists of entry hashes for records & simple eviction information
  • 24.
    Simple cache: Structureon Disk ● Formats of entry file ○ Simple file header magic_number version key_length key_hash ○ Simple file EOF final_magic_number flags data_crc32 stream_size ○ Simple File sparse range header sparse_magic_number offset length data_crc32
  • 25.
  • 26.
    Simple cache: Implementation ● I/O thread operations ○ public API is called on the I/O thread ○ The index is updated in the I/O thread ● Worker pool operations ○ All I/O operations are performed async on the worker pool. ○ Cache will keep a pool of new entries ready to move into final place.
  • 27.
    Simple cache: Implementation ● Index flushing & consistency checking ○ The index is flushed on ■ shutdown ■ periodically ● Operation without index ○ can operate without the IO thread index by directly opening files in the directory. ○ for avoiding startup speeds & I/O costs
  • 28.
    References [1] DiskCache [2] Disk Cache 3.0 [3] Very Simple Cache [4] Multi-process Resource Loading [5] Network Stack [6] Network Stack Use in Chromium