Evaluation of Caching Strategies Based on Access Statistics on Past Requests

Evaluation of Caching Strategies
based on Access Statistics on Past Requests
Gerhard Haßlinger, Konstantinos Ntougias
gerhard.hasslinger@telekom.de; kostas_ntougias@yahoo.gr

Commercial in Confidence



Least Recently Used (LRU): Simple Standard Method
- Analysis, Simulation: Deficits of LRU Cache Hit Rate



Statistics-based Caching Strategies
- Window: over the last K Requests
- Geometrical Aging: Geom. Decreasing Weight per Request
- Criteria: Hit Rate and Effort for Alternative Strategies



Summary on hit rates and effort of web caching strategies

© 2013 The SmartenIT Consortium

Cache Efficiency for YouTube Video Traces
60%

Cache Hit Rate

50%

Optimal Cache Strategy: Most Popular Data in Cache
Zipf Law Approximation: 0.004*R**(-5/8)
LRU Cache Strategie:
Least Recently Used

40%
30%
20%
10%


0%
0.0078%

0.031%

0.124%

0.5%

2%

Cache Size: Fraction of videos in the cache

Evaluation of 3.7 billion accesses on 1.65 million YouTube
files
Sources: M. Cha et al., I tube, you tube, everybody tubes: Analyzing the world’s largest user
generated content video system, Internet measurement conference IMC, San Diego, USA (2007)
Efficiency of caching for IP-based Content Delivery (G. Haßlinger, O. Hohlfeld, ITC 2010)
Results confirmed by N. Megiddo and S. Modha, Outperforming LRU with an adaptive
replacement cache algorithm, IEEE Computer, (Apr. 2004) 4-11

Cache Strategies incl. Statistics on Past Requests
Sliding Window: Cache holds objects with highest request
frequency over a sliding window of the last K requests



Geometric Fading: Cache holds objects that have the highest
sum of weights for past requests, where the kth request in the
past has a geometrically decreasing weight r k (0 < r < 1).





Statistics over window of the last K requests





Converges to caching of the most popular objects for large K
Reacts to dynamic change in population, after delay until
requests to new item are relevant in the statistics

Implementation:
 The request sequence in the window has to be stored;
for a new request one request is falling out of the window
and has to be removed from statistics
 2 objects change their statistics score per new request:
Updates in cache still have constant effort per request,
although more than for LRU


Statistics with geometrical aging





The k-th request in the past is weighted by ρ k (ρ <1)
The weight of an object is the sum of the weights of request
Objects are ordered according to their weights

Implementation:
 In principle, all weights should be multiplied by ρ for each
request; instead, the new weight can be multiplied by 1/ρ (>1)
i.e. weights are (1/ρ )k for the k-th request
 One object changes rank per request;
Effort for update rank in sorted list: O(ln(M))
Faster approx.: Requested object to step up noly one rank;
or rank updates only e.g. per hour or per day

Basic Assumptions on Cache Modeling & Evaluation
We assume
 a set of N objects and a cache for M (< N) objects of fixed
size
(objects of different size are handled as k unit size chunks;
bin-packing problems are almost irrelevant in large caches)




Random independent requests with static popularity
pk: Request Probability to object k in the order of popularity

⇒ Optimum strategy holds the most popular objects in cache

Static popularity is favourable for the cache hit rate, since
unforeseen changes in popularity detract from cache
efficiency




Measurement traces of request to Youtube show only slowly
varying popularity, a few percent of new top 100 items appear
per day/week


Results on LRU Caching Strategy


An LRU cache is implemented as a stack of dept M;
A new request puts the object on top
LRU is simple and frequently used (Squid, DropBox etc.)



Analysis of the hit rate for static distribution is possible:
pk2
hLRU ( M ) = ∑ pk1 ∑
1 − pk1
k1 =1
k 2 =1


N

N

k 2 ≠ k1



N

∑

k3 =1
k3 ≠ k1 ,k 2

p k3
...
1 − pk1 − pk2

N

∑

k M =1
k M ≠ k1 ,..., k n −1

pkM

1 − ∑ j =1 pk j
M −1

M

∑p
j =1

kj

.

but has complex evaluation feasible only for small size M < 15
Approximations by Towsley et al. (1999), Ha. & Ho. (2010),
Fricker, Robert, Roberts (2011) seem to be good for arbitrary
static request distribution but verified only by simulation


Worst Case Analysis of LRU Caching Strategy


Cache size M =1 with only one popular popularity
p1 >> ε > p2 , … When most popular item is always in cache
⇒ optimum hit rate: p1; LRU hit rate is smaller: p12.



Arbitrary cache size M with a set IPop of M popular objects
p1 = p2 = … =pM = p/M >> ε > pM+1, pM+2, …





pLRU(j, k): probability of j popular items from the set IPop are found
in an LRU cache of size k. We can analyse pLRU(j, k) iteratively:
pLRU ( j , k ) = p ( j , k − 1)

1 − Mp − (k − 1 − j )ε
( M − j + 1) p
+ p( j − 1, k − 1)
.
1 − jp − (k − 1 − j )ε
1 − ( j − 1) p − (k − j )ε

⇒ LRU hit rate hLRU = Σj pLRU(j, M)[ j  p + (M – j)ε ].
LRU
Cache
of
size k

=

XTop

+
Cache of
size k-1

XTop ∈ IPop
Last request to an object X
not in the cache of size k-1

pLRU(j+1, k)

XTop ∉ IPop

pLRU(j, k)

pLRU(j, k-1)

⇒ Exact analysis of LRU worst case hit rate is feasible

Worst Case Analysis of LRU Caching
100%

Most popular items in cache
LRU Worst Case for Cache of Size 1


Cache Hit Rate

80%

60%

28.9% max. absolute deficit →
severe relative deficits for
↓ small cache hit rate

40%

20%

0%
0

0,1

0,2

0,3

0,4

0,5

0,6

0,7

0,8

0,9

Worst Case LRU Scenario: Probability of a request to the set of popular objects


1

Simulation Results for Caching Strategies
Hit rate of the caching strategies (N = 1000 objects; K = 1000)
for Zipf distributed requests A(R) = α R–β (β = 0.6; α = 2.7%)
40%
Most popular objects in the cache
Geometrical fading
Sliding window
LRU Approximation
LRU Simulation

20%
R
t
i
H
e
h
c
a
C


30%

10%

0%
M=

5


10

20

50

100

Hit rate of the caching strategies (N = 1000; K = 1000)
70%

Optimum
Geometrical fading
Sliding window
LRU Approximation
LRU Simulation

60%

40%
30%
R
t
i
H
e
h
c
a
C


50%

20%
10%
0%
M=

5


10

20

50

100

Hit rate of the caching strategies (N = 1000)
60%
55%

45%
R
t
i
H
e
h
c
a
C


50%

Optimum for i.i.d. requests
Geometrical fading
Sliding window
LRU

40%
35%
K= 1

4

16

64

128

256

512

1024

2048

Sliding Window and Geometrical Fading:
Hit rate depending on the window size K, ρ (ρ = K/(K + 1))

Conclusions on Cache Replacement Strategies






LRU seems most often used in web caches (Squid, DropBox)
For static popularity, LRU is below the maximal hit rate by
- 28.9% in the worst case
- 10-20% for large content sites (YouTube; Zipf-like requests)
LRU performance is poor especially for small caches
Statistics over a fixed size window and geometric aging
can converge to optimum hit rate of the static popularity case



Implementation:
- Statistics over window needs some storage,
has constant update effort per request but more than LRU
- Geometric aging has effort O(ln(M))



Zipf law popularity makes (small) caches efficient


Evaluation of Caching Strategies Based on Access Statistics on Past Requests

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Evaluation of Caching Strategies Based on Access Statistics on Past Requests

Similar to Evaluation of Caching Strategies Based on Access Statistics on Past Requests (20)

More from SmartenIT

More from SmartenIT (13)

Recently uploaded

Recently uploaded (20)

Evaluation of Caching Strategies Based on Access Statistics on Past Requests

Editor's Notes