The document discusses Chromium's caching system. It describes the overall network stack and cache flow, including the disk cache which stores web resources on disk. It then focuses on the "simple cache" implementation, a new backend for disk cache that uses one file per cache entry and an index file for faster lookups. The simple cache aims to be more resilient to corruption, reduce delays, and have lower memory and disk usage than the existing blockfile backend.
4. Network Stack:
What’s ‘Network Stack’?
● A mostly single-threaded cross-platform
library primarily for resource fetching
○ URLRequest
■ represents the request for a URL
○ URLRequestContext
■ contains all the associated contexts to fullfill
the ‘URL request’
● e.g. cookies, host resolver, cache
5. Before stepping into Cache:
● Some code layouts
○ /net/base
Network stack
■ shared utilities for /net modules
○ /net/disk_cache
■ Cache for web resources
○ /net/url_request
■ URLRequest, URLRequestContext, ...
9. DiskCache:
● Cache
What’s DiskCache?
○ Stores resources fetched from the web
○ A part of ‘/net’
■ location: /net/disk_cache
■ This means ‘DiskCache’ will controll cache-flows
for network fetches.
● NOTE:
○ Android use ‘Simple cache’.
■ location: /net/disk_cache/simple
10. DiskCache:
● Main characteristics:
Characteristics
○ The cache should not grow unbounded
■ Algorithm to decide when removing old entries
○ Not critical to loose some data
■ But discarding whole cache should be minimized
○ Access should be possible to use sync or
async operations
○ Design should avoid ‘cache trashing’
11. DiskCache:
● Main characteristics:
Characteristics
○ Should be possible to remove a entry from
the cache
■ and keep working with that entry while same time inaccessible to other requests
○ Shouldn’t be using explicit multithread sync
■ Always called from the same thread
■ However, callbacks must be issued by message
loop for avoiding reentrancy
12. DiskCache:
External interfaces
● /net/disk_cache/disk_cache.h
● 2 Interfaces
○ disk_cache::Backend
■ manages entries on the cache
○ disk_cache::Entry
■ handles operations specific to a given resource
13. External interfaces:
● An entry is identified by its key
○ e.g. http://www.google.com/favicon.ico
● Once an entry is created, the data is stored in
separate chunks or data streams:
○ HTTP headers
○ Actual resource data
● Index for the required stream is an argument to
methods:
○ Entry::ReadData
○ Entry::WriteData
Backend
15. Simple cache:
What is “Simple Cache”?
● Proposed to a new backend for diskcache
○ Conforming to the interface in Disk Cache
○ Very simple
■ Using 1 file per cache entry + index file
■ Dealing with I/O bottlenecks
17. Simple cache:
Benefits and goals
● Comparison to blockfile cache
○ More resilent under corruption from the
system crash
■ Periodcally flushes its entire index
■ Swaps index in atomically
■ After system crash, will starts with the stale
cache
● NOTE: With the blockfile cache, chrome will drops
whole cache by default
18. Simple cache:
Benefits and goals
● Comparison to blockfile cache (cont’d)
○ Doesn’t delay launching network requests
■ Elimination of delay factors
● No context switching
● Not blocks disk I/O before using network
■ Blockfile has (AVR) 14~25ms delay on requests
● On Android, slower flash controllers make delays significantly slower.
19. Simple cache:
Benefits and goals
● Comparison to blockfile cache (cont’d)
○ Lower resident set pressure & fewer IO ops.
■ Disk format has
● 256~512B per entry records
● + rankings & index information(~100B) per entry
● Not all entries that are heavily used contiguously
■ Simple cache
● stores only SMALLNUM bytes per entry in memory
● doesn’t access the disk where not required
20. Simple cache:
Benefits and goals
● Comparison to blockfile cache (cont’d)
○ Simpler
■ Shorter and easier via explicitly avoiding
implementation of filesystem than blockfile’s
21. Simple cache:
Non-goals; Simple cache is
● Not a log structed cache system
○ I/O performs by Simple cache is mostly
sequential. But NOT log structed
■ If it is, it means “filesystem that itself is log
structed.”
● Not a filesystem
○ Disk cache delegates filesystem.
■ means “Simple cache uses abstract interface
of Disk Cache instead implementing its own
filesystem”.
22. Simple cache:
● Entry hash
Structure on Disk
○ Hash with 40 bit SHA-2 of url
○ 2 entries with same EH can’t be stored
● Stored in single directory
○ ONE index file
○ Each entry stored in a single file
■ named by HexEntryHash_StreamNumber
23. Simple cache:
● A file ‘00index’
Structure on Disk
○ contains data for initializing memory index
● Index (on memory)
○ used for faster cache performance
○ consists of entry hashes for records & simple
eviction information
24. Simple cache:
Structure on Disk
● Formats of entry file
○ Simple file header
magic_number version key_length key_hash
○ Simple file EOF
final_magic_number flags data_crc32 stream_size
○ Simple File sparse range header
sparse_magic_number offset length data_crc32
26. Simple cache:
Implementation
● I/O thread operations
○ public API is called on the I/O thread
○ The index is updated in the I/O thread
● Worker pool operations
○ All I/O operations are performed async on
the worker pool.
○ Cache will keep a pool of new entries ready
to move into final place.
27. Simple cache:
Implementation
● Index flushing & consistency checking
○ The index is flushed on
■ shutdown
■ periodically
● Operation without index
○ can operate without the IO thread index by
directly opening files in the directory.
○ for avoiding startup speeds & I/O costs
28. References
[1] Disk Cache
[2] Disk Cache 3.0
[3] Very Simple Cache
[4] Multi-process Resource Loading
[5] Network Stack
[6] Network Stack Use in Chromium