The document discusses how workload characteristics affect the performance of cache replacement policies. It analyzes traces of web requests and finds that most files are requested only a few times, file popularity follows a Zipf distribution, file sizes have a heavy-tailed distribution, and there is temporal locality in requests. It evaluates the performance of LRU, LFU-Aging and GDS cache replacement policies based on hit rate and byte hit rate. The goal is to examine how sensitive these policies are to different workload characteristics like the percentage of "one-timers" or files requested only once.
4. One Timers
• Most of the Files are extremely unpopular.
• Over 90% of the Distinct Files requested
only a few times.
• No Benefit in caching one-timers.
• 90% of the requests come to only 2%-4% of the files (concetration of references).
5. File Popularity
• Some Web files are more popular than others.
• Popularity: Number of times a file was
requested.
• File Popularity follow the Zipf Law.
Extremely popular file (the top 1% of the Each file sorted into decreasing order base
unique files received 39% of all client on the number of times it was requetsed.
requests), moderately popular files (the top Rank 1 given to the file with the most
37% received 78% of the requests) and references and rank N granted to the file
unpopoular files (one timers) with the fewest requestes.
6. File Size
• Files in Web are variable size.
• File size follow the heavy-tailed distribution
• The propability of obtaining extremely large
values is non-negligible.
1) Small Files (100B – 10KB) 20%
2) Medium Files (10 – 15KB) 65%
3) Large Files (15 – KB) 15%
90% of files were HTML or Images
These objects account for only 50%
of the total size.
40% of the total size is due to few
large files(audio,video).
Pareto:Many small observations
mixed in with a few large
7. Temporal Locality
• Files which have recently been referenced are
likely to be-referenced in the near future.
• Temporal correlation bewteen recent past and
near future references.
• 30% of all re-references to an file occurred within
an hour of the previous reference to the same file.
60% of all re-references occurred within 24 hours
of the previous request.
8. Performance Metrics
• File Hit Rate(HR) : Percent of requested files found in
cache.
HR=70% 7 of 10 request(file) fulfill from proxy.
• Byte Hit Rate(BHR): Percent of requested bytes found
in the cache.
BHR=70% 7 of 10 bytes returned from the cache, the
rest 3 bytes retrieved across the external network.
9. Tradeoff HR-BHR
File Hit Rate Byte Hit rate
Maximize: Many Small Files Maximize: Few Large Files
Reduce Overload Web Server Reduce Traffic Network
10. Web Replacement
• LRU : Evicts files that has no be accessed for the
longest time (temporal locality). Most recently
referenced files are most likely to be referenced again in
near future.
• LFU-Aging : Evicts files with the lowest reference
count (file popularity).
• GDS : Assosiate a value H=1/s, to each file. Evicts the
file with the lowets H(min) and the H value of all others
files are reduce by H(min). So this policy considre both
the file size and its temporal locality.
11. Comparison of Web
Replacements
• Higher HR are achieved using size-based
replacements, because these policies store a large
number of small files.
• Higher BHR are achieved using frequency-based
replacements, because these policies keep the most
popular files, regardless of size.
12. How SENSITIVE are the
Web Cache Replacements to
Workload Characteristics?
13. TARGET
• The Goal is to examine the sensitivity of proxing
caching to certain workload characteristics.
• Generate proxy workload, with generator tool,
that differ in one chocen characteristic and
investigate the sensitivity of cache replacements
to each characteristic.
Characteristic Trace 1 Trace 2
Zipf Slope 0.80 0.80
Tail Index 1.4 1.4
Per. One-timers 60% 80%