Technologies for Building
Content Delivery Networks
Pei Cao
Cisco Systems, Inc.
cao@cisco.com
What are Content Delivery
Networks
• A centrally managed network of devices
that collectively facilitate the delivery of
content to end users
• Solve network bandwidth bottleneck
• Solve server throughput bottleneck
CDN Categories
• Network Infrastructure:
– Single ISP
– Overlay networks
– Enterprise premise
• Content types:
– Static images and texts
– Multimedia content: audio and video streams
– Dynamic HTML and XML pages
• Customers:
– Content providers
– Enterprise
Technology Components
• Content distribution
– Placing the content to the devices
• Request routing
– Steer users to a delivery node that is close
• Content delivery
– Protocol processing, access control, QoS mechanisms
• Resource accounting
– Logging and billing
Content Distribution
• Goal: position content objects into delivery
devices
• Different content types use different
techniques
– Static images and texts: pulled & cached, or
pushed
– Multimedia contents: usually pre-positioned
– Dynamic pages: requires prior setup
Distribution Mechanisms
• HTTP request for pulling
– Example: standard HTTP reverse proxy
• FTP of tar files
– Some equipment vendors use this technique
• Rate limited tree-form replication
– Example: Cisco’s “Soda” algorithm
Distribution Mechanisms using
Multicast
• Application-level reliable multicast
– Example: Inktomi’s Fast-Forward
• Unreliable IP multicast with file-level error
correction
– Example: Digital Fountain, multicast-ftp
• Unreliable IP multicast
– Example: RealNetworks
Content Consistency
Mechanisms
• Expiration times or TTL
• Renaming in the HTML file
• Web Cache Invalidation Protocol (WCIP)
– Nodes receive invalidations when objects
change
– Objects are organized into channels
– Nodes subscribe to a channel to receive
invalidation
Request Routing
• Goal: steer the client such that it fetches the
content from a close node
• Methods
– DNS selection
– HTTP redirection
– Transparent interception
Overview of Request Arrival
Process
Client
How a request for www.xyz.com/index.html arrives at 1.2.3.4:
DNS
server
1. what is IP addr of
www.xyz.com?
Root NS2. where is name
server
of xyz.com?
xyz.com
NS
IP: 1.2.3.1
3. NS record: 1.2.3.1
4. what is IP of
www.xyz.com?
5. A record:1.2.3.4
Server
s
w
i
t
c
h
IP: 1.2.3.4
Router
7. GET /index.html
6. 1.2.3.4
DNS selection
• Basic idea: xyz.com’s NS returns node close to
client
• How to become xyz.com’s NS?
– Rewrite URLs (aka Akamizer)
– Take a subdomain cdn.xyz.com and put all content
there
• Accuracy limited to client’s name server
– Only suitable for ISP or overlay networks
– Not suitable for some enterprise or cable networks
HTTP Redirection
• Basic idea: web server tells client to go
somewhere else
– Returns “302 redirect … 1.2.4.5/index.html…”
• Mostly used for multimedia objects
– These objects are usually put together in an
index file (.sml or .asx) and clients fetch the
index file via HTTP before streaming
• Accuracy is at individual client level
– More suitable for enterprise and cable networks
Transparent Interception
• Router and switch along the request path
can send the request elsewhere
• Mostly used for distributed data centers
front-ended with L7 switches
– Example: Cisco’s CSS11k WebNS
Algorithms for Request Routing
• Map-based
– Create a map of the Internet based on AS
domains, pick the node with the shortest hop
count to client
– Or, set up coverage zones mapping a node to a
collection of subnets
• Racing-based
– Let the delivery nodes all race to the client with
A-records
– Winner is selected by client automatically
The Boomerang Algorithm
• Cisco’s research published in WCW’01
– xyz.com’s NS server forwards lookup of
www.xyz.com to all delivery nodes
– Delivery nodes all send “A record” response
with its own IP address to the client
– The one that reaches the client first wins
– NS server times the forwarding so that lookup
message arrives at all nodes around the same
time
– Use “simulated annealing” for scalability
Interaction between Content
Distribution and Request Routing
• Don’t route request to a node that doesn’t
have the content!
• Particularly important for large streaming
contents
– Such content are usually pre-positioned to
ensure high-bandwidth playbacks
• Nodes need to report its content acquisition
status to the “request router”
Content Delivery
• Goal: serve content to each client at desired
quality of service
• Supported protocols
– HTTP
– Microsoft MMS
– Open standard RTP/RTSP
– RealNetworks RTP/RTSP
• Usually part of the larger CDN system
Content Access Control
• Content object attributes
– “Publication date” and “Expiration date”
– ACL based on user/group/IP
• User authentication
– HTTP basic
– Microsoft NTLM for enterprise environment
– other schemes
• Media Rights Management
QoS of Content Delivery
• Server QoS
– Server needs to make sure it has enough CPU and
disk to service the stream at specified bit rate
• Network QoS
– Interoperate with routers via DiffServ bits
• Coordination with request router
– delivery devices should communicate load
information to the “Request Router”
Resource Accounting
• Mining the log files
– Log file aggregation: all device sending log
files to a central location
– Local mining: analyzing the log file at each
delivery device
• Real-time statistics
– Real-time statistics on throughput/latency based
on domain, content type or any HTTP header
– Example: Cisco CSS switch billing MIB
Cisco’s CDN Products
• Content Distribution Manager (CDM)
• Content Router (CR)
• Content Engine (CE)
• CSS switch
Summary
• Main components of building a CDN:
– Content distribution
– Request routing
– Content Delivery
– Resource accounting
• A CDN system requires the four components
to work in concert with each other!
• Cisco is the only vendor that provide the full
solution!

Slides cao

  • 1.
    Technologies for Building ContentDelivery Networks Pei Cao Cisco Systems, Inc. cao@cisco.com
  • 2.
    What are ContentDelivery Networks • A centrally managed network of devices that collectively facilitate the delivery of content to end users • Solve network bandwidth bottleneck • Solve server throughput bottleneck
  • 3.
    CDN Categories • NetworkInfrastructure: – Single ISP – Overlay networks – Enterprise premise • Content types: – Static images and texts – Multimedia content: audio and video streams – Dynamic HTML and XML pages • Customers: – Content providers – Enterprise
  • 4.
    Technology Components • Contentdistribution – Placing the content to the devices • Request routing – Steer users to a delivery node that is close • Content delivery – Protocol processing, access control, QoS mechanisms • Resource accounting – Logging and billing
  • 5.
    Content Distribution • Goal:position content objects into delivery devices • Different content types use different techniques – Static images and texts: pulled & cached, or pushed – Multimedia contents: usually pre-positioned – Dynamic pages: requires prior setup
  • 6.
    Distribution Mechanisms • HTTPrequest for pulling – Example: standard HTTP reverse proxy • FTP of tar files – Some equipment vendors use this technique • Rate limited tree-form replication – Example: Cisco’s “Soda” algorithm
  • 7.
    Distribution Mechanisms using Multicast •Application-level reliable multicast – Example: Inktomi’s Fast-Forward • Unreliable IP multicast with file-level error correction – Example: Digital Fountain, multicast-ftp • Unreliable IP multicast – Example: RealNetworks
  • 8.
    Content Consistency Mechanisms • Expirationtimes or TTL • Renaming in the HTML file • Web Cache Invalidation Protocol (WCIP) – Nodes receive invalidations when objects change – Objects are organized into channels – Nodes subscribe to a channel to receive invalidation
  • 9.
    Request Routing • Goal:steer the client such that it fetches the content from a close node • Methods – DNS selection – HTTP redirection – Transparent interception
  • 10.
    Overview of RequestArrival Process Client How a request for www.xyz.com/index.html arrives at 1.2.3.4: DNS server 1. what is IP addr of www.xyz.com? Root NS2. where is name server of xyz.com? xyz.com NS IP: 1.2.3.1 3. NS record: 1.2.3.1 4. what is IP of www.xyz.com? 5. A record:1.2.3.4 Server s w i t c h IP: 1.2.3.4 Router 7. GET /index.html 6. 1.2.3.4
  • 11.
    DNS selection • Basicidea: xyz.com’s NS returns node close to client • How to become xyz.com’s NS? – Rewrite URLs (aka Akamizer) – Take a subdomain cdn.xyz.com and put all content there • Accuracy limited to client’s name server – Only suitable for ISP or overlay networks – Not suitable for some enterprise or cable networks
  • 12.
    HTTP Redirection • Basicidea: web server tells client to go somewhere else – Returns “302 redirect … 1.2.4.5/index.html…” • Mostly used for multimedia objects – These objects are usually put together in an index file (.sml or .asx) and clients fetch the index file via HTTP before streaming • Accuracy is at individual client level – More suitable for enterprise and cable networks
  • 13.
    Transparent Interception • Routerand switch along the request path can send the request elsewhere • Mostly used for distributed data centers front-ended with L7 switches – Example: Cisco’s CSS11k WebNS
  • 14.
    Algorithms for RequestRouting • Map-based – Create a map of the Internet based on AS domains, pick the node with the shortest hop count to client – Or, set up coverage zones mapping a node to a collection of subnets • Racing-based – Let the delivery nodes all race to the client with A-records – Winner is selected by client automatically
  • 15.
    The Boomerang Algorithm •Cisco’s research published in WCW’01 – xyz.com’s NS server forwards lookup of www.xyz.com to all delivery nodes – Delivery nodes all send “A record” response with its own IP address to the client – The one that reaches the client first wins – NS server times the forwarding so that lookup message arrives at all nodes around the same time – Use “simulated annealing” for scalability
  • 16.
    Interaction between Content Distributionand Request Routing • Don’t route request to a node that doesn’t have the content! • Particularly important for large streaming contents – Such content are usually pre-positioned to ensure high-bandwidth playbacks • Nodes need to report its content acquisition status to the “request router”
  • 17.
    Content Delivery • Goal:serve content to each client at desired quality of service • Supported protocols – HTTP – Microsoft MMS – Open standard RTP/RTSP – RealNetworks RTP/RTSP • Usually part of the larger CDN system
  • 18.
    Content Access Control •Content object attributes – “Publication date” and “Expiration date” – ACL based on user/group/IP • User authentication – HTTP basic – Microsoft NTLM for enterprise environment – other schemes • Media Rights Management
  • 19.
    QoS of ContentDelivery • Server QoS – Server needs to make sure it has enough CPU and disk to service the stream at specified bit rate • Network QoS – Interoperate with routers via DiffServ bits • Coordination with request router – delivery devices should communicate load information to the “Request Router”
  • 20.
    Resource Accounting • Miningthe log files – Log file aggregation: all device sending log files to a central location – Local mining: analyzing the log file at each delivery device • Real-time statistics – Real-time statistics on throughput/latency based on domain, content type or any HTTP header – Example: Cisco CSS switch billing MIB
  • 21.
    Cisco’s CDN Products •Content Distribution Manager (CDM) • Content Router (CR) • Content Engine (CE) • CSS switch
  • 22.
    Summary • Main componentsof building a CDN: – Content distribution – Request routing – Content Delivery – Resource accounting • A CDN system requires the four components to work in concert with each other! • Cisco is the only vendor that provide the full solution!

Editor's Notes

  • #4 Before we talk about how each of the technology component work and how they should Work together though, we need to understand that there are many kinds of CDNs, and each kind requires a different mix of the technologies. We list the categories here and explain its implication to the technology components. One can attempt to map CDN service providers based on the above categories. Akamai and digital island so far has content providers as customers and uses overlay network and most focus on static images, though trying to branch out to other media. RealNetworks is trying to build a CDN for content providers that also uses overlay network to some extent and focus on multimedia contents. I know of a number of ISPs focusing on enterprise and use enterprise premise network and focus on multimedia contents. Of course, most companies follow the money and cross over in terms of customers and content types. The reason for caterizing CDNs along these axises, however, is that different kinds of CDNs requires technologies to coordinate in different ways.
  • #5 Content distribution can have simple ones, like ftp, and can have complicated ones, like Rate-limited multicast; Request routing: is a research problem. Most places use approximation only. Two Papers on this topic in WCW. Content delivery are the traditional web caches, used in a reverse proxy mode. That is why caching and CDN is quite tightly related. Resource accounting is easier to do if it doesn’t have to be done in real time at high throughput. Mining the log files would do. However, if information is needed at real time in a high throughput environment, then it is harder.
  • #10 Picture here showing the client request process .. Before we explain the details, let’s takea look at the process of how a requewt arrive at a server.
  • #19 Server QoS involves guaranteeing the bit rate of the delivery. This typically involves appropriate load control,