Object Placement in Video Content Distribution Networks
1. Object Placement in Video Content
Distribution Networks
Mohammad Faraji, Kianoosh Mokhtarian
Department of Electrical and Computer Engineering
University of Toronto
December 2011
2. Background
8 years of video content added to YouTube every day
Terabytes a day; Petabytes a year
Trend is to further accelerate
Higher-quality video streams (currently only 10% are HD)
Content distribution infrastructure
Several datacentres around the world
User request sent to closest datacentre (DNS/HTTP redirect)
3. Motivation
Store video files across datacentres (DCs)
Generously replicate all videos on DCs?
Not viable
Growth of data volume >> storage cost
4. Good News from Measurement Studies
Popularity of video depends on geographical location
More than half of the time, only a fraction from the
beginning of video is downloaded
=> Place (partial) video files in selected locations
5. Modeling
Input: history of user requests (video v for IP address i)
Distance of i to any of datacentres?
Use an Internet Coordinate System (ICS)
Delay(i, j) = Eucledian_distance[ ICS(i), ICS(j) ]
Make tracking of requests scalable
Cluster user IPs into regions in the Eucledian space of ICS
Popularity matrix P[region, video]
Distance matrix D[region, datacentre]
6. Partial Video Files
First minute of video downloaded many more times
Store partial video files
More effective caching
Lower start-up delay
Partial popularity assumed independent of region
Download reports: (v, 1MB), (v, 2.3MB), (v, 0.5 MB), ...
Compress into a few entries for each video (dynamic alg)
PP[v] = (0...1MB, 100 times), (1MB...end, 50 times)
7. Problem Statement
Assign (part of) each video to one or more DC
Minimize distance of video to user (region), given:
The distance matrix D[region, datacentre]
The expected download pattern P[region, video]
Partial popularity PP[video]
The storage limitation of each DC
8. Problem Hardness
Simpler alternatives
Store one video file on a few selected DCs
NP-Complete (min set cover, max coverage)
Store multiple video files on one DC
NP-Complete (knapsack)
9. Solution
Maintain a utility matirx U[v, d]
Utility of replicating "the next chunk of" video v on DC d
Auxiliary priority queues
1. Find the highest-utility video v*:
2. Place the next chunk of v* on the best DC d*
3. Update row v* of U, and what the next chunk is for v*
Complexity: O[ (total video replicas) x
(log[# videos] + log[# DCs] + log[max chunks/video]) ]
10. Evaluation (in Progress): Data
File size and length of ~200K videos from [Cheng 2010]
Distances in Internet
Pairwise delay between 2500 nodes from [Wong 2005]
Video popularities
Global: Zipf-distributed (as repeatedly reported)
Local: synthetic
Partial video popularities
Generated according to [Qiu 2010]
11. Evaluation (in Progress): Results
Total delay, given our placement
Delay w/ and wo/ partial file storage
Comparison to simple threshold based distributed caching
Running time
Estimated communication overhead
12. Take-Away
Benefits of storing partial video files on selected DCs
Future work
Sevral further details for a complete working system ...
Low-overhead collection of (sub-samples of) downloads
Estimate near-future download patterns
Carefully cluster users in a limited num of regions
Solving video placement by multiple nodes
Incremental algorithm; can't shuffle everything every night
13.
14. Appendix: Previous Works
Cooperative web caching
Hierarchical, distributed, hybrid
CDN design (various flavors)
Video caching
On a single cache
To optimize for VCR-like functions
Editor's Notes
We have seen a lot of works on how to build and maintain datacentres. Our work is about utilizing datacentres for a large scale CDN.
There are petabytes of video files to store on the Dcs. Can't replicate everything on every DC; needs to build a whole new DC every year! Data volume is increasing at a much faster pace than the rate of storage cost decreasing.
Interesting potentials that we can leverage Previous measurements on YouTube report that ...