RL-Cache: Learning-Based Cache Admission
for Content Delivery
Sergey Gorinsky
IMDEA Networks Institute, Madrid, Spain
Joint work with Vadim Kirilin, Aditya Sundarrajan, and Ramesh K. Sitaraman
TEWI-Kolloquium, University of Klagenfurt, Austria
12 November 2021
CDN
Content Delivery Network (CDN)
Sergey Gorinsky, RL-Cache @ TEWI-Kolloquium, University of Klagenfurt
End user
Origin server
Content provider
• Object delivery from edge servers
 low latency
 reduced traffic
Edge server
Access network
2
12.11.2021
CDN
Genius of CDN Economics
Sergey Gorinsky, RL-Cache @ TEWI-Kolloquium, University of Klagenfurt
End user
Origin server
Content provider
• Object delivery from edge servers
 low latency
 reduced traffic
All stakeholders have incentives to deploy
Edge server
Access network
3
12.11.2021
The Good: Cache Hit
Sergey Gorinsky, RL-Cache @ TEWI-Kolloquium, University of Klagenfurt
End user
Origin server
Edge server
Request
Hit
4
12.11.2021
The Bad: Cache Miss
Sergey Gorinsky, RL-Cache @ TEWI-Kolloquium, University of Klagenfurt
End user
Origin server
Edge server
Request
Miss
5
12.11.2021
• Cold misses
• Limited cache capacity
 one-time wonders
 object churn
Caching Problem
Sergey Gorinsky, RL-Cache @ TEWI-Kolloquium, University of Klagenfurt
Origin server
Edge server
Miss
6
12.11.2021
• Which of the fetched objects are
to be cached at any given time?
 constraint: fixed cache size
 objective: maximize the hit rate
Caching Problem: Timing Constraints
Sergey Gorinsky, RL-Cache @ TEWI-Kolloquium, University of Klagenfurt
Origin server
Edge server
Miss
7
12.11.2021
• Real-time but neither tight nor hard
 latency of fetching the object
 asynchronous with serving the object to the end user
• Different traffic classes
 web, image, video
• Different object sizes within a class
Caching Challenges: Object Diversity
Sergey Gorinsky, RL-Cache @ TEWI-Kolloquium, University of Klagenfurt
An intractable knapsack problem
8
12.11.2021
?
Caching Challenges: Online Operation
Sergey Gorinsky, RL-Cache @ TEWI-Kolloquium, University of Klagenfurt
9
12.11.2021
• No optimal solution even for equal-size objects
 unlike László Bélády’s algorithm in the offline version
Time
Now
Caching Challenges: Online Operation
Sergey Gorinsky, RL-Cache @ TEWI-Kolloquium, University of Klagenfurt
10
12.11.2021
• No optimal solution even for equal-size objects
 unlike László Bélády’s algorithm for the offline problem
• Heuristics based on historical features
 request recency, frequency, object size
Time
Now
Unknown future
Prior Solutions
Sergey Gorinsky, RL-Cache @ TEWI-Kolloquium, University of Klagenfurt
Edge cache
Admission Eviction
• AdmitAll
• SecondHit
• AdaptSize
Our goal: A simple admission front end for an LRU server
11
12.11.2021
• Much larger body of policies
• Least Recently Used (LRU)
 predominant in production
12
Caching by Reinforcement Learning (RL)
Sergey Gorinsky, RL-Cache @ TEWI-Kolloquium, University of Klagenfurt 12.11.2021
• Natural fit between cache admission and RL
 state: request features
 action: admit or not admit
 reward: hit or miss
• Limited prior RL-based work
 prefetching based on popularity as a proxy metric [DeepCache]
 equal-size objects and synthetic workloads
13
Alternatives within the RL Paradigm
Sergey Gorinsky, RL-Cache @ TEWI-Kolloquium, University of Klagenfurt 12.11.2021
• Q-Learning and other Temporal Difference (TD) methods
 weak correlation between an action (admission decision upon a miss)
and immediate reward (hit on the next request)
 imprecise estimation of action values under noisy reward signals
• Monte Carlo (MC) algorithms
 action-value update based on the average return from all sampled
long sequences of state-action pairs
 large overhead and vulnerability to high variance in the returns
• Direct Policy Search (DPS) techniques
 individual returns of long state-action sequences
 direct learning of a policy on a high-return subset of sequences
 best performance in our preliminary evaluation
an
• DPS-based algorithm for cache admission
• 8 features of 3 common types
 request recency, frequency, object size
• Feedforward neural network
 outputs the probability of admitting the requested object
• Training
 periodic in the cloud
• Inference
 real-time in the edge server
RL-Cache: Overview
Sergey Gorinsky, RL-Cache @ TEWI-Kolloquium, University of Klagenfurt
14
12.11.2021
15
RL-Cache: Features
Sergey Gorinsky, RL-Cache @ TEWI-Kolloquium, University of Klagenfurt 12.11.2021
16
RL-Cache: Neural Network
Sergey Gorinsky, RL-Cache @ TEWI-Kolloquium, University of Klagenfurt 12.11.2021
admit
5 hidden layers
input layer output layer
Fully connected
n inputs, up to 160
- 8 features;
- their historical
versions;
- up to 10 quantization
bins L2 regularization
admitting
probability
non-admitting
probability
…
………
……..
…….
…...
…..
5(6-l)n neurons in hidden layer l
ELU activation functions softmax activation function
17
RL-Cache: Training
Sergey Gorinsky, RL-Cache @ TEWI-Kolloquium, University of Klagenfurt 12.11.2021
. . .
Window of K requests
Trace
m samples
of admission
decisions for
the window
Admission decisions
with discounted rewards
for L extra requests
Sampling
. . .
18
RL-Cache: Training
Sergey Gorinsky, RL-Cache @ TEWI-Kolloquium, University of Klagenfurt 12.11.2021
. . .
Window of K requests
Trace
p samples with
highest hit rates
m samples
of admission
decisions for
the window
Admission decisions
with discounted rewards
for L extra requests
Sampling
. . .
Selection
19
RL-Cache: Training
Sergey Gorinsky, RL-Cache @ TEWI-Kolloquium, University of Klagenfurt 12.11.2021
. . .
Window of K requests
Trace
p samples with
highest hit rates
not converged
m samples
of admission
decisions for
the window
Admission decisions
with discounted rewards
for L extra requests
Sampling
. . .
Learning
Weights w
Selection
20
RL-Cache: Training
Sergey Gorinsky, RL-Cache @ TEWI-Kolloquium, University of Klagenfurt 12.11.2021
. . .
Window of K requests
Trace
p samples with
highest hit rates
not converged
m samples
of admission
decisions for
the window
Admission decisions
with discounted rewards
for L extra requests
Sampling
. . .
Learning
Weights w
converged
Selection
Sliding to the next window; refilling the cache every q windows
21
RL-Cache: Inference
Sergey Gorinsky, RL-Cache @ TEWI-Kolloquium, University of Klagenfurt 12.11.2021
• Real-time in the edge server
• Maintenance with a database with feature statistics
 for computing the frequency and recency metrics
• Rounding the inferred probability to 1 or 0
 to decide whether to admit or not to admit the requested
• Request batching
 For reducing the per-request computation overhead
an
Evaluation: Initial Datasets
• Image, video, and web traces from Akamai’s production server
Sergey Gorinsky, RL-Cache @ TEWI-Kolloquium, University of Klagenfurt
22
12.11.2021
Active Bytes
Sergey Gorinsky, RL-Cache @ TEWI-Kolloquium, University of Klagenfurt
an
• Active object
 between its first and last requests in the trace
• Active bytes
 total size of the active objects
Image Web
23
12.11.2021
Hit-Rate Performance of RL-Cache
Sergey Gorinsky, RL-Cache @ TEWI-Kolloquium, University of Klagenfurt
an
• Successfully learns different strategies for different cache sizes
 e.g., admit all for abundant cache sizes
• Outperforms or matches the existing algorithms
 across the cache sizes and traffic classes
 total size of the active objects
Image Web
24
12.11.2021
an
Evaluation: Additional Datasets
• Additional geographic locations
Sergey Gorinsky, RL-Cache @ TEWI-Kolloquium, University of Klagenfurt
25
12.11.2021
an
Impact by Larger Request Intensity
Sergey Gorinsky, RL-Cache @ TEWI-Kolloquium, University of Klagenfurt
26
12.11.2021
EU-video
RL-Cache outperforms the baselines
an
Robustness in the Same Geographic Region
Sergey Gorinsky, RL-Cache @ TEWI-Kolloquium, University of Klagenfurt
RL-Cache can be trained in one
and executed in other locations
27
12.11.2021
an
Robustness across Geographic Regions
Sergey Gorinsky, RL-Cache @ TEWI-Kolloquium, University of Klagenfurt
Robustness across the continents is weak
28
12.11.2021
an
Processing Overhead
Sergey Gorinsky, RL-Cache @ TEWI-Kolloquium, University of Klagenfurt
29
12.11.2021
Per-request processing time is only 64 μs for CPU
and 4 μs for GPU for the batch size of 1024 requests
30
Pearson Correlation for Feature Pairs
Sergey Gorinsky, RL-Cache @ TEWI-Kolloquium, University of Klagenfurt 12.11.2021
EU-video
Grouping by feature class
31
Feature Importance: Random Forests
Sergey Gorinsky, RL-Cache @ TEWI-Kolloquium, University of Klagenfurt 12.11.2021
EU-video US1-image
Features of all 3 classes are important
32
Reducing the Feature Set
Sergey Gorinsky, RL-Cache @ TEWI-Kolloquium, University of Klagenfurt 12.11.2021
EU-video
The full feature set provides the best performance
33
Hyperparameter Sensitivity: K and L
Sergey Gorinsky, RL-Cache @ TEWI-Kolloquium, University of Klagenfurt 12.11.2021
34
Hyperparameter Sensitivity: p and q
Sergey Gorinsky, RL-Cache @ TEWI-Kolloquium, University of Klagenfurt 12.11.2021
an
Conclusion
Sergey Gorinsky, RL-Cache @ TEWI-Kolloquium, University of Klagenfurt
• RL-Cache
 RL-based cache admission in a CDN edge server
 8 features of the recency, frequency, and size types
 training via direct policy search
 real-time inference with low batching-enabled overhead
• Evaluation on Akamai’s production CDN traces
 active bytes as a measure of caching requirements for a trace
 successful adaptability in a variety of settings
 robustness across traffic classes in the same geographic region
 relevance of selected features and hyperparameters
• Open software
 https://github.com/WVadim/RL-Cache
35
12.11.2021

RL-Cache: Learning-Based Cache Admission for Content Delivery

  • 1.
    RL-Cache: Learning-Based CacheAdmission for Content Delivery Sergey Gorinsky IMDEA Networks Institute, Madrid, Spain Joint work with Vadim Kirilin, Aditya Sundarrajan, and Ramesh K. Sitaraman TEWI-Kolloquium, University of Klagenfurt, Austria 12 November 2021
  • 2.
    CDN Content Delivery Network(CDN) Sergey Gorinsky, RL-Cache @ TEWI-Kolloquium, University of Klagenfurt End user Origin server Content provider • Object delivery from edge servers  low latency  reduced traffic Edge server Access network 2 12.11.2021
  • 3.
    CDN Genius of CDNEconomics Sergey Gorinsky, RL-Cache @ TEWI-Kolloquium, University of Klagenfurt End user Origin server Content provider • Object delivery from edge servers  low latency  reduced traffic All stakeholders have incentives to deploy Edge server Access network 3 12.11.2021
  • 4.
    The Good: CacheHit Sergey Gorinsky, RL-Cache @ TEWI-Kolloquium, University of Klagenfurt End user Origin server Edge server Request Hit 4 12.11.2021
  • 5.
    The Bad: CacheMiss Sergey Gorinsky, RL-Cache @ TEWI-Kolloquium, University of Klagenfurt End user Origin server Edge server Request Miss 5 12.11.2021 • Cold misses • Limited cache capacity  one-time wonders  object churn
  • 6.
    Caching Problem Sergey Gorinsky,RL-Cache @ TEWI-Kolloquium, University of Klagenfurt Origin server Edge server Miss 6 12.11.2021 • Which of the fetched objects are to be cached at any given time?  constraint: fixed cache size  objective: maximize the hit rate
  • 7.
    Caching Problem: TimingConstraints Sergey Gorinsky, RL-Cache @ TEWI-Kolloquium, University of Klagenfurt Origin server Edge server Miss 7 12.11.2021 • Real-time but neither tight nor hard  latency of fetching the object  asynchronous with serving the object to the end user
  • 8.
    • Different trafficclasses  web, image, video • Different object sizes within a class Caching Challenges: Object Diversity Sergey Gorinsky, RL-Cache @ TEWI-Kolloquium, University of Klagenfurt An intractable knapsack problem 8 12.11.2021 ?
  • 9.
    Caching Challenges: OnlineOperation Sergey Gorinsky, RL-Cache @ TEWI-Kolloquium, University of Klagenfurt 9 12.11.2021 • No optimal solution even for equal-size objects  unlike László Bélády’s algorithm in the offline version Time Now
  • 10.
    Caching Challenges: OnlineOperation Sergey Gorinsky, RL-Cache @ TEWI-Kolloquium, University of Klagenfurt 10 12.11.2021 • No optimal solution even for equal-size objects  unlike László Bélády’s algorithm for the offline problem • Heuristics based on historical features  request recency, frequency, object size Time Now Unknown future
  • 11.
    Prior Solutions Sergey Gorinsky,RL-Cache @ TEWI-Kolloquium, University of Klagenfurt Edge cache Admission Eviction • AdmitAll • SecondHit • AdaptSize Our goal: A simple admission front end for an LRU server 11 12.11.2021 • Much larger body of policies • Least Recently Used (LRU)  predominant in production
  • 12.
    12 Caching by ReinforcementLearning (RL) Sergey Gorinsky, RL-Cache @ TEWI-Kolloquium, University of Klagenfurt 12.11.2021 • Natural fit between cache admission and RL  state: request features  action: admit or not admit  reward: hit or miss • Limited prior RL-based work  prefetching based on popularity as a proxy metric [DeepCache]  equal-size objects and synthetic workloads
  • 13.
    13 Alternatives within theRL Paradigm Sergey Gorinsky, RL-Cache @ TEWI-Kolloquium, University of Klagenfurt 12.11.2021 • Q-Learning and other Temporal Difference (TD) methods  weak correlation between an action (admission decision upon a miss) and immediate reward (hit on the next request)  imprecise estimation of action values under noisy reward signals • Monte Carlo (MC) algorithms  action-value update based on the average return from all sampled long sequences of state-action pairs  large overhead and vulnerability to high variance in the returns • Direct Policy Search (DPS) techniques  individual returns of long state-action sequences  direct learning of a policy on a high-return subset of sequences  best performance in our preliminary evaluation
  • 14.
    an • DPS-based algorithmfor cache admission • 8 features of 3 common types  request recency, frequency, object size • Feedforward neural network  outputs the probability of admitting the requested object • Training  periodic in the cloud • Inference  real-time in the edge server RL-Cache: Overview Sergey Gorinsky, RL-Cache @ TEWI-Kolloquium, University of Klagenfurt 14 12.11.2021
  • 15.
    15 RL-Cache: Features Sergey Gorinsky,RL-Cache @ TEWI-Kolloquium, University of Klagenfurt 12.11.2021
  • 16.
    16 RL-Cache: Neural Network SergeyGorinsky, RL-Cache @ TEWI-Kolloquium, University of Klagenfurt 12.11.2021 admit 5 hidden layers input layer output layer Fully connected n inputs, up to 160 - 8 features; - their historical versions; - up to 10 quantization bins L2 regularization admitting probability non-admitting probability … ……… …….. ……. …... ….. 5(6-l)n neurons in hidden layer l ELU activation functions softmax activation function
  • 17.
    17 RL-Cache: Training Sergey Gorinsky,RL-Cache @ TEWI-Kolloquium, University of Klagenfurt 12.11.2021 . . . Window of K requests Trace m samples of admission decisions for the window Admission decisions with discounted rewards for L extra requests Sampling . . .
  • 18.
    18 RL-Cache: Training Sergey Gorinsky,RL-Cache @ TEWI-Kolloquium, University of Klagenfurt 12.11.2021 . . . Window of K requests Trace p samples with highest hit rates m samples of admission decisions for the window Admission decisions with discounted rewards for L extra requests Sampling . . . Selection
  • 19.
    19 RL-Cache: Training Sergey Gorinsky,RL-Cache @ TEWI-Kolloquium, University of Klagenfurt 12.11.2021 . . . Window of K requests Trace p samples with highest hit rates not converged m samples of admission decisions for the window Admission decisions with discounted rewards for L extra requests Sampling . . . Learning Weights w Selection
  • 20.
    20 RL-Cache: Training Sergey Gorinsky,RL-Cache @ TEWI-Kolloquium, University of Klagenfurt 12.11.2021 . . . Window of K requests Trace p samples with highest hit rates not converged m samples of admission decisions for the window Admission decisions with discounted rewards for L extra requests Sampling . . . Learning Weights w converged Selection Sliding to the next window; refilling the cache every q windows
  • 21.
    21 RL-Cache: Inference Sergey Gorinsky,RL-Cache @ TEWI-Kolloquium, University of Klagenfurt 12.11.2021 • Real-time in the edge server • Maintenance with a database with feature statistics  for computing the frequency and recency metrics • Rounding the inferred probability to 1 or 0  to decide whether to admit or not to admit the requested • Request batching  For reducing the per-request computation overhead
  • 22.
    an Evaluation: Initial Datasets •Image, video, and web traces from Akamai’s production server Sergey Gorinsky, RL-Cache @ TEWI-Kolloquium, University of Klagenfurt 22 12.11.2021
  • 23.
    Active Bytes Sergey Gorinsky,RL-Cache @ TEWI-Kolloquium, University of Klagenfurt an • Active object  between its first and last requests in the trace • Active bytes  total size of the active objects Image Web 23 12.11.2021
  • 24.
    Hit-Rate Performance ofRL-Cache Sergey Gorinsky, RL-Cache @ TEWI-Kolloquium, University of Klagenfurt an • Successfully learns different strategies for different cache sizes  e.g., admit all for abundant cache sizes • Outperforms or matches the existing algorithms  across the cache sizes and traffic classes  total size of the active objects Image Web 24 12.11.2021
  • 25.
    an Evaluation: Additional Datasets •Additional geographic locations Sergey Gorinsky, RL-Cache @ TEWI-Kolloquium, University of Klagenfurt 25 12.11.2021
  • 26.
    an Impact by LargerRequest Intensity Sergey Gorinsky, RL-Cache @ TEWI-Kolloquium, University of Klagenfurt 26 12.11.2021 EU-video RL-Cache outperforms the baselines
  • 27.
    an Robustness in theSame Geographic Region Sergey Gorinsky, RL-Cache @ TEWI-Kolloquium, University of Klagenfurt RL-Cache can be trained in one and executed in other locations 27 12.11.2021
  • 28.
    an Robustness across GeographicRegions Sergey Gorinsky, RL-Cache @ TEWI-Kolloquium, University of Klagenfurt Robustness across the continents is weak 28 12.11.2021
  • 29.
    an Processing Overhead Sergey Gorinsky,RL-Cache @ TEWI-Kolloquium, University of Klagenfurt 29 12.11.2021 Per-request processing time is only 64 μs for CPU and 4 μs for GPU for the batch size of 1024 requests
  • 30.
    30 Pearson Correlation forFeature Pairs Sergey Gorinsky, RL-Cache @ TEWI-Kolloquium, University of Klagenfurt 12.11.2021 EU-video Grouping by feature class
  • 31.
    31 Feature Importance: RandomForests Sergey Gorinsky, RL-Cache @ TEWI-Kolloquium, University of Klagenfurt 12.11.2021 EU-video US1-image Features of all 3 classes are important
  • 32.
    32 Reducing the FeatureSet Sergey Gorinsky, RL-Cache @ TEWI-Kolloquium, University of Klagenfurt 12.11.2021 EU-video The full feature set provides the best performance
  • 33.
    33 Hyperparameter Sensitivity: Kand L Sergey Gorinsky, RL-Cache @ TEWI-Kolloquium, University of Klagenfurt 12.11.2021
  • 34.
    34 Hyperparameter Sensitivity: pand q Sergey Gorinsky, RL-Cache @ TEWI-Kolloquium, University of Klagenfurt 12.11.2021
  • 35.
    an Conclusion Sergey Gorinsky, RL-Cache@ TEWI-Kolloquium, University of Klagenfurt • RL-Cache  RL-based cache admission in a CDN edge server  8 features of the recency, frequency, and size types  training via direct policy search  real-time inference with low batching-enabled overhead • Evaluation on Akamai’s production CDN traces  active bytes as a measure of caching requirements for a trace  successful adaptability in a variety of settings  robustness across traffic classes in the same geographic region  relevance of selected features and hyperparameters • Open software  https://github.com/WVadim/RL-Cache 35 12.11.2021