One Timers• Most of the Files are extremely unpopular.• Over 90% of the Distinct Files requested only a few times.• No Benefit in caching one-timers.• 90% of the requests come to only 2%-4% of the files (concetration of references).
File Popularity• Some Web files are more popular than others.• Popularity: Number of times a file was requested.• File Popularity follow the Zipf Law. Extremely popular file (the top 1% of the Each file sorted into decreasing order baseunique files received 39% of all client on the number of times it was requetsed.requests), moderately popular files (the top Rank 1 given to the file with the most37% received 78% of the requests) and references and rank N granted to the fileunpopoular files (one timers) with the fewest requestes.
File Size• Files in Web are variable size.• File size follow the heavy-tailed distribution• The propability of obtaining extremely large values is non-negligible. 1) Small Files (100B – 10KB) 20% 2) Medium Files (10 – 15KB) 65% 3) Large Files (15 – KB) 15% 90% of files were HTML or Images These objects account for only 50% of the total size. 40% of the total size is due to few large files(audio,video). Pareto:Many small observations mixed in with a few large
Temporal Locality• Files which have recently been referenced are likely to be-referenced in the near future.• Temporal correlation bewteen recent past and near future references.• 30% of all re-references to an file occurred within an hour of the previous reference to the same file. 60% of all re-references occurred within 24 hours of the previous request.
Performance Metrics• File Hit Rate(HR) : Percent of requested files found in cache. HR=70% 7 of 10 request(file) fulfill from proxy.• Byte Hit Rate(BHR): Percent of requested bytes found in the cache. BHR=70% 7 of 10 bytes returned from the cache, the rest 3 bytes retrieved across the external network.
Tradeoff HR-BHR File Hit Rate Byte Hit rateMaximize: Many Small Files Maximize: Few Large FilesReduce Overload Web Server Reduce Traffic Network
Web Replacement• LRU : Evicts files that has no be accessed for the longest time (temporal locality). Most recently referenced files are most likely to be referenced again in near future.• LFU-Aging : Evicts files with the lowest reference count (file popularity).• GDS : Assosiate a value H=1/s, to each file. Evicts the file with the lowets H(min) and the H value of all others files are reduce by H(min). So this policy considre both the file size and its temporal locality.
Comparison of Web Replacements• Higher HR are achieved using size-based replacements, because these policies store a large number of small files.• Higher BHR are achieved using frequency-based replacements, because these policies keep the most popular files, regardless of size.
How SENSITIVE are theWeb Cache Replacements to Workload Characteristics?
TARGET• The Goal is to examine the sensitivity of proxing caching to certain workload characteristics.• Generate proxy workload, with generator tool, that differ in one chocen characteristic and investigate the sensitivity of cache replacements to each characteristic. Characteristic Trace 1 Trace 2 Zipf Slope 0.80 0.80 Tail Index 1.4 1.4 Per. One-timers 60% 80%