Back in 2000, home Internet is slow
MODEM data rate:
33.6Kbps or 56Kbps
round trip latency:
2 minutes to load a
Today, Internet isn’t always fast
Satellite link (eg. Iridium)
◦ high latency
◦ $1.35 per minute
2G cellular data (eg. H2O Wireless)
◦ high latency
◦ low bandwidth
◦ $0.30 per MB
Web contents are redundant
Screenshots of http://quotes.wsj.com/index/CN/SHCOMP during a trading day. Quote changes, but other remains same.
Web contents are often uncached
Web authors don’t want you to cache
their contents, because:
◦ Contents are dynamic. Stock price may
change at any time. News articles are
posted throughout the day.
◦ Contents are personalized. Your Facebook
homepage is different from anyone else’s.
◦ Access count must be accurate. Advertising
revenue is calculated per thousand
response headers of http://www.dailyfinance.com/
strings into tokens
contents of both caches must be consistent
Cache: holds most recent packets
◦ admission policy: admit all
◦ replacement policy: FIFO
Indexed by representative fingerprints of the packets it holds
◦ map fingerprint to the most recent packet it appears
window size: β
select one in 2γ fingerprints
fingerprint space: M
1. Calculate rolling Rabin fingerprints for sequences of β bytes, mod M.
2. Select fingerprints ending with γ zeros as representative fingerprints.
Rabin fingerprints are not cryptographically secure. Algorithm should not
Rabin fingerprints are used for finding similar documents, not for chunking.
lookup fingerprints in
add packet to
oldest packet if
verify no collision
expand to the left and to
the right, byte-by-byte
• the fingerprint
• # bytes expanded to the left
• # bytes expanded to the right
convert matched regions
send encoded, smaller packet
lookup tokens in cache
add packet to cache,
packet if necessary
deliver original packet
Contents of sender cache and receiver cache must be consistent.
Why caches might be inconsistent?
◦ Network channel isn’t reliable. A packet that entered sender cache but lost on the
channel will not be present in receiver cache.
How to detect cache inconsistency?
◦ Fingerprints! If there’s no collision, receiving an unrecognized fingerprint indicates
caches are inconsistent.
What happens if caches are inconsistent?
◦ Receiver cannot reconstruct original packet.
The algorithm is implemented as a user-level process to analyze a trace.
Fingerprint space: M=260
◦ collision almost impossible
Penalty for each matching region: 12 octets
◦ to represent the space needed for the token
Windows size β and fingerprint selecting frequency 2γ
large β: better “quality” of matches, less potential bytes saving
small β: worse “quality” of matches (shorter matches in more recent packets)
small γ: more likely to find a match, larger index (=less memory for cached packets)
large γ: less likely to find a match, less memory usage
45Mbps on a PC with Pentium Ⅲ-550 and 1GB memory
This work is designed for slow links.
Future works by same authors:
◦ universal redundancy elimination
◦ SmartRE: coordinated network-wide redundancy elimination
◦ EndRE: end-system redundancy elimination
How much redundancy is there?
Amount of redundancy
Internet => corporate
with just 1MB of memory
at least 10% redundant
corporate => Internet
Redundancy by protocol
traffic amount (%)
HTTP, Telnet, POP, ASF have high percentage of repeated strings.
HTTPS, FTP-data, Napster, RTSP, NNTP have low percentage of
Redundancy elimination algorithm is protocol-independent, so we can save bytes on non-Web traffic.
HTTPS FTP-data NNTP
Comparison with HTTP caching
works better than HTTP
caching and compression