Network Redundancy Elimination


Published on

Published in: Technology
  • Be the first to comment

  • Be the first to like this

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

Network Redundancy Elimination

  1. 1. slides © 2013, CreativeCommons BY-NC 3.0 Network Redundancy Elimination JUNXIAO SHI 2013-11-05 Neil T. Spring and David Wetherall. 2000. A protocol-independent technique for eliminating redundant network traffic. SIGCOMM Comput. Commun. Rev. 30, 4 (August 2000), 87-95. DOI=10.1145/347057.347408
  2. 2. Problem
  3. 3. Back in 2000, home Internet is slow MODEM data rate: 33.6Kbps or 56Kbps round trip latency: >100ms 2 minutes to load a webpage
  4. 4. Today, Internet isn’t always fast Satellite link (eg. Iridium) ◦ high latency ◦ 2.4KB/s ◦ $1.35 per minute 2G cellular data (eg. H2O Wireless) ◦ high latency ◦ low bandwidth ◦ $0.30 per MB
  5. 5. Web contents are redundant Screenshots of during a trading day. Quote changes, but other remains same.
  6. 6. Web contents are often uncached Web authors don’t want you to cache their contents, because: ◦ Contents are dynamic. Stock price may change at any time. News articles are posted throughout the day. ◦ Contents are personalized. Your Facebook homepage is different from anyone else’s. ◦ Access count must be accurate. Advertising revenue is calculated per thousand impressions. response headers of
  7. 7. To the naïve user -
  8. 8. Design
  9. 9. Architecture convert repeated strings into tokens network layer, protocol-independent reconstruct original packet bandwidthconstrained channel cache cache contents of both caches must be consistent
  10. 10. The Cache Cache: holds most recent packets ◦ admission policy: admit all ◦ replacement policy: FIFO Indexed by representative fingerprints of the packets it holds ◦ map fingerprint to the most recent packet it appears
  11. 11. window size: β select one in 2γ fingerprints fingerprint space: M Representative fingerprints 1. Calculate rolling Rabin fingerprints for sequences of β bytes, mod M. 2. Select fingerprints ending with γ zeros as representative fingerprints. Rabin fingerprints are not cryptographically secure. Algorithm should not assume collision-free. Rabin fingerprints are used for finding similar documents, not for chunking.
  12. 12. Sender process generate representative fingerprints lookup fingerprints in cache index cache add packet to cache, evicting oldest packet if necessary verify no collision expand to the left and to the right, byte-by-byte token format • the fingerprint • # bytes expanded to the left • # bytes expanded to the right convert matched regions into tokens send encoded, smaller packet
  13. 13. Receiver process lookup tokens in cache index generate representative fingerprints reconstruct original packet add packet to cache, evicting oldest packet if necessary cache deliver original packet
  14. 14. Cache consistency Contents of sender cache and receiver cache must be consistent. Why caches might be inconsistent? ◦ Network channel isn’t reliable. A packet that entered sender cache but lost on the channel will not be present in receiver cache. How to detect cache inconsistency? ◦ Fingerprints! If there’s no collision, receiving an unrecognized fingerprint indicates caches are inconsistent. What happens if caches are inconsistent? ◦ Receiver cannot reconstruct original packet.
  15. 15. Implementation
  16. 16. Trace analyzer The algorithm is implemented as a user-level process to analyze a trace.
  17. 17. Parameters Fingerprint space: M=260 ◦ collision almost impossible Penalty for each matching region: 12 octets ◦ to represent the space needed for the token Windows size β and fingerprint selecting frequency 2γ ◦ ◦ ◦ ◦ ◦ large β: better “quality” of matches, less potential bytes saving small β: worse “quality” of matches (shorter matches in more recent packets) small γ: more likely to find a match, larger index (=less memory for cached packets) large γ: less likely to find a match, less memory usage γ=5, β=64
  18. 18. Performance 45Mbps on a PC with Pentium Ⅲ-550 and 1GB memory This work is designed for slow links.
  19. 19. Follow-up work Future works by same authors: ◦ universal redundancy elimination ◦ SmartRE: coordinated network-wide redundancy elimination ◦ EndRE: end-system redundancy elimination
  20. 20. Traffic Analysis How much redundancy is there?
  21. 21. Amount of redundancy Internet => corporate 30% redundant with just 1MB of memory for cache+index: at least 10% redundant corporate => Internet 50% redundant
  22. 22. redundant traffic 60 Redundancy by protocol traffic amount (%) 50 HTTP, Telnet, POP, ASF have high percentage of repeated strings. 40 HTTPS, FTP-data, Napster, RTSP, NNTP have low percentage of repeated strings. 30 20 Redundancy elimination algorithm is protocol-independent, so we can save bytes on non-Web traffic. 10 0 HTTP RTSP Napster Lotus HTTPS FTP-data NNTP DNS ASF AOL SMTP POP Telnet Other
  23. 23. Comparison with HTTP caching 100 redundancy elimination works better than HTTP caching and compression traffic (%) 80 60 40 20 0 Squid gzip Squid+gzip RE Squid+RE