Caching and Content Distribution
Networks
Web Caching
As an example, we use the web to illustrate
caching and other related issues
browser Web Proxy
cache
request
response
request
response
Web
server
browser
Web
server
request
response
Web Browser Caching
Web browsers have their own caches. When a
page is downloaded from a site the web page is put
into the browser cache.
This is especially useful in those cases when the
back button is pressed.
If a new copy is needed then a “refresh” can be
done.
No page stays permanently in the cache. There is
limited room.
A replacement algorithm is needed to determine which
cached page should be purged.
Why Web Server Caching
Latency
Reduce latency
Request does not require going to the server
Request is served from the client side which
means that network communication is avoided
Reduce traffic
Consistency
What if the page changes after saved in
the cache?
This means that cached copy is out of date
The copy and the original are not consistent
There are different strategies for dealing
with this
Web Browser Caching
Client pull
The server provides the content with instructions on
when the client should ask for a refreshed copy of the
content or if the content should be cached.
Server push
The server transmits page information to the screen.
The browser application displays the information and
leaves the connection to the server open.
With an open connection, the server can continue to push
updated pages for your screen to display on an ongoing
basis. You can close the connection by closing the page.
The server is in control
Browser caches are different from proxy caches (discussed
next).
Web Caching
Proxy caches (also called proxy server)
Intercepts HTTP requests from client
• Serves object if in its cache and the date is still valid
• If not go to object’s home server
– On behalf of user, gets the object and possibly deposits in
its cache before returning to user
• Usually deployed at edges of a network
– Wide area bandwidth savings, improved response time
and increased availability of static web-based objects
A browser may have to be configured to point to
the proxy server.
Usually a proxy cache is purchased and installed by
an organization
Web Caching
Not all web pages can be cached
If the Last-Modified tag then page can be
cached
Refresh is often done when
There is a request; and
Expiry time has passed
Cooperative Caching
Caching infrastructure can have multiple
web proxies
Proxies can be arranged in a hierarchy or other
structures
Proxies can cooperate with one another
• Answer client requests
• Propagate server notifications
Uses a combination of HTTP and ICP (Internet
Caching Protocol).
• ICP can be used by one cache to quickly ask another
cache if it has an object.
• HTTP is used to actually retrieve the object.
Problems
Caching proxies do not serve all Internet
users
Content providers (say, Web servers)
cannot rely on existence and correct
implementation of caching proxies.
Accounting issues with caching proxies:
Example: www.cnn.com needs to know the
number of hits to the advertisements displayed
on the web page.
Content Distribution Networks
(CDN)
Business Model: A content provider such as
www.cnn.com or Yahoo pays a CDN
company (such as Akamai) to get its
content to the requesting users with short
delays.
A CDN provides a mechanism for
Replicating content on multiple servers in
the Internet
Providing clients with a means to
determine the servers that can deliver
the content fastest.
Terminology
Content: Any publicly accessible combination
of text, images, applets, frames, MP3, video,
flash, virtual reality objects, etc.
Content Provider: Any individual, organization,
or company that has content that it wishes to
make available to users.
Origin Server: Content provider’s server ,
where the content is first uploaded.
Surrogate Server (sometimes called edge
server): Content distributor’s server, where
the replicated content is kept.
Players
Content Provider
H/W and S/W
Vendor
Content
Distributor
Hosting
Provider
Yahoo,
MSNBC,
CNN
CBC
Cisco,
Oracle-
Sun
Akamai,
Bell
Sells servers
Send content
Installservers
CDN Distribution
Content providers are CDN
customers
Content replication
CDN company installs
thousands of servers
throughout Internet
In large datacenters
Or, close to users
CDN replicates customers’
content
When provider updates
content, CDN updates servers
origin server
in North America
CDN distribution node
CDN server
in S. America CDN server
in Europe
CDN server
in Asia
14
CDN: Functional Components
Distribution Service
Redirection Service
Accounting and Billing system
CDN:Distribution Service
The content provider determines which of
its objects it wants the CDN to distribute.
The content provider tags and then pushes
this content to a CDN node, which in turn
replicates and pushes the content to all its
CDN servers.
CDN: Redirection
When a browser in a user’s host is
instructed to retrieve a specific object
(specified using a URL), how does the
browser determine whether it should
retrieve the object from the origin server
or from one of the CDN servers?
As an example, suppose the hostname of
the content provider is www.cnn.com
How Akamai Works
End-user
cnn.com (content provider) DNS root server
1 2
Nearby
Akamai
cluster
GET
index.h
tml
18
http://a.73.g.akamai.net/7/23/
cnn.com/af/cnn.com/foo.jpg
HTTP
Akamai
cluster
Akamai global
DNS server
Akamai regional
DNS server
CDN: Redirection
Users get an html document from
www.cnn.com; this could be index.html
The file index.html uses a modified URL for
content that has been replicated.
Example: If the jpeg files are what has been
replicated then
<img src=“http://cnn.com/af/foo.jpg>
may be modified as follows:
<img
src=http://a73.g.akamai.net/7/23/cnn.com/af/foo.j
pg>
The browser needs to resolve a73.g.akamai.net
CDN: Redirection
What does this mean?
<img
src=http://a73.g.akamai.net/7/23/cnn.com/af/foo.j
pg>
host part: a73.g.akamai.net
Akamai control part: /7/23
Content URL: /af/foo.jpg
CDN: Redirection
DNS is configured so that all queries about
g.akamai.net that arrive at a DNS server are sent
to an authoritative DNS server for g.akamai.net.
This is referred to as a Akamai DNS server
(authoritative DNS server)
How Akamai Works
End-user
cnn.com (content provider) DNS root server
1 2
Nearby
Akamai
cluster
DNS lookup
cache.cnn.com
Akamai
cluster
3
4 ALIAS:
g.akamai.net
Akamai global
DNS server
Akamai regional
DNS server
CDN: Redirection
DNS is configured so that all queries about
g.akamai.net that arrive at a DNS server are sent
to an authoritative DNS server for g.akamai.net.
This is referred to as a Akamai DNS server
(authoritative DNS server)
When the Akamai DNS server receives the query,
it extracts the IP address of the requesting
browser.
.
PP
How Akamai Works
End-user
cnn.com (content provider) DNS root server
1 2
Akamai global
DNS server
Akamai regional
DNS server
Nearby
Akamai
cluster
Akamai
cluster
3
4 6
5
ALIAS
a73.g.akamai.net
DNS lookup
g.akamai.net
CDN: Redirection
Based on the IP address and information that it has
about the Internet (called a map), the IP address of
an Akamai regional server is returned to the
requesting browser based on policy
e.g., select the server that is the fewest hops away.
The regional server may choose a surrogate server
for content retrieval
HTTPHTTP
How Akamai Works
End-user
cnn.com (content provider) DNS root server
1 2
Akamai global
DNS server
Akamai regional
DNS server
Nearby
Akamai
cluster
Akamai
cluster
3
4 6
5
8
7
DNS a73.g.akamai.net
Address
1.2.3.4
HTTPHTTP
How Akamai Works
End-user
cnn.com (content provider) DNS root server
1 2
Akamai global
DNS server
Akamai regional
DNS server
Nearby
Akamai
cluster
Akamai
cluster
3
4 6
5
8
7
9
GET /foo.jpg
Host: cache.cnn.com
HTTPHTTP
How Akamai Works
End-user
cnn.com (content provider) DNS root server
1 2
Akamai global
DNS server
Akamai regional
DNS server
Nearby
Akamai
cluster
Akamai
cluster
3
4 6
5
8
7
9
GET /foo.jpg
Host: cache.cnn.com
12
11
GET foo.jpg
CDN Redirection
The Akamai DNS server IP address is now
in the cache of the local DNS server.
This implies that it is not always
necessary to go to the root DNS server.
The TTL associated with the IP address of
an Akamai server(surrogate) is relatively
small.
This is done for performance reasons.
Akamai content distribution servers are
caches
CDN Redirection
What if content is not there?
If the request content is not found then the
surrogate will ask other surrogates within a
specified region for information.
If requested information is still not found or is
stale, then a request is made to the original
web site.
CDN Selection
The tricky issue is selecting which local
content server to use for a particular
request
Want to spread load evenly
Want minimal impact if server is added or
removed.
In Akamai, each surrogate server sends
measurement results to the Network
Operations Communications Center (NOCC).
Measurement results include number of active
TCP connections, HTTP request arrival rate,
bandwidth availability, etc
This information is used by the Akamai DNS
server.
Accounting Mechanism
Accounting mechanisms collect and track
information related to request routing,
distribution and delivery.
Information is gathered in real time and
put into log files for each CDN component.
This gets sent to the Network Operations
Communications Center (NOCC).
Full Site Delivery vs. Partial
Site Delivery
Full Site Delivery : All the contents are
delivered by the CDN (including HTML,
images, and other objects).
Partial Site delivery: Only images,
streaming media and other bandwidth
intensive objects delivered by the CDN.
Current Akamai Customers
Summary
We have examined replication and issues
related to the design and implementation
of a replicated system.
Many choices and tradeoffs to consider

Cdn

  • 1.
    Caching and ContentDistribution Networks
  • 2.
    Web Caching As anexample, we use the web to illustrate caching and other related issues browser Web Proxy cache request response request response Web server browser Web server request response
  • 3.
    Web Browser Caching Webbrowsers have their own caches. When a page is downloaded from a site the web page is put into the browser cache. This is especially useful in those cases when the back button is pressed. If a new copy is needed then a “refresh” can be done. No page stays permanently in the cache. There is limited room. A replacement algorithm is needed to determine which cached page should be purged.
  • 4.
    Why Web ServerCaching Latency Reduce latency Request does not require going to the server Request is served from the client side which means that network communication is avoided Reduce traffic
  • 5.
    Consistency What if thepage changes after saved in the cache? This means that cached copy is out of date The copy and the original are not consistent There are different strategies for dealing with this
  • 6.
    Web Browser Caching Clientpull The server provides the content with instructions on when the client should ask for a refreshed copy of the content or if the content should be cached. Server push The server transmits page information to the screen. The browser application displays the information and leaves the connection to the server open. With an open connection, the server can continue to push updated pages for your screen to display on an ongoing basis. You can close the connection by closing the page. The server is in control Browser caches are different from proxy caches (discussed next).
  • 7.
    Web Caching Proxy caches(also called proxy server) Intercepts HTTP requests from client • Serves object if in its cache and the date is still valid • If not go to object’s home server – On behalf of user, gets the object and possibly deposits in its cache before returning to user • Usually deployed at edges of a network – Wide area bandwidth savings, improved response time and increased availability of static web-based objects A browser may have to be configured to point to the proxy server. Usually a proxy cache is purchased and installed by an organization
  • 8.
    Web Caching Not allweb pages can be cached If the Last-Modified tag then page can be cached Refresh is often done when There is a request; and Expiry time has passed
  • 9.
    Cooperative Caching Caching infrastructurecan have multiple web proxies Proxies can be arranged in a hierarchy or other structures Proxies can cooperate with one another • Answer client requests • Propagate server notifications Uses a combination of HTTP and ICP (Internet Caching Protocol). • ICP can be used by one cache to quickly ask another cache if it has an object. • HTTP is used to actually retrieve the object.
  • 10.
    Problems Caching proxies donot serve all Internet users Content providers (say, Web servers) cannot rely on existence and correct implementation of caching proxies. Accounting issues with caching proxies: Example: www.cnn.com needs to know the number of hits to the advertisements displayed on the web page.
  • 11.
    Content Distribution Networks (CDN) BusinessModel: A content provider such as www.cnn.com or Yahoo pays a CDN company (such as Akamai) to get its content to the requesting users with short delays. A CDN provides a mechanism for Replicating content on multiple servers in the Internet Providing clients with a means to determine the servers that can deliver the content fastest.
  • 12.
    Terminology Content: Any publiclyaccessible combination of text, images, applets, frames, MP3, video, flash, virtual reality objects, etc. Content Provider: Any individual, organization, or company that has content that it wishes to make available to users. Origin Server: Content provider’s server , where the content is first uploaded. Surrogate Server (sometimes called edge server): Content distributor’s server, where the replicated content is kept.
  • 13.
    Players Content Provider H/W andS/W Vendor Content Distributor Hosting Provider Yahoo, MSNBC, CNN CBC Cisco, Oracle- Sun Akamai, Bell Sells servers Send content Installservers
  • 14.
    CDN Distribution Content providersare CDN customers Content replication CDN company installs thousands of servers throughout Internet In large datacenters Or, close to users CDN replicates customers’ content When provider updates content, CDN updates servers origin server in North America CDN distribution node CDN server in S. America CDN server in Europe CDN server in Asia 14
  • 15.
    CDN: Functional Components DistributionService Redirection Service Accounting and Billing system
  • 16.
    CDN:Distribution Service The contentprovider determines which of its objects it wants the CDN to distribute. The content provider tags and then pushes this content to a CDN node, which in turn replicates and pushes the content to all its CDN servers.
  • 17.
    CDN: Redirection When abrowser in a user’s host is instructed to retrieve a specific object (specified using a URL), how does the browser determine whether it should retrieve the object from the origin server or from one of the CDN servers? As an example, suppose the hostname of the content provider is www.cnn.com
  • 18.
    How Akamai Works End-user cnn.com(content provider) DNS root server 1 2 Nearby Akamai cluster GET index.h tml 18 http://a.73.g.akamai.net/7/23/ cnn.com/af/cnn.com/foo.jpg HTTP Akamai cluster Akamai global DNS server Akamai regional DNS server
  • 19.
    CDN: Redirection Users getan html document from www.cnn.com; this could be index.html The file index.html uses a modified URL for content that has been replicated. Example: If the jpeg files are what has been replicated then <img src=“http://cnn.com/af/foo.jpg> may be modified as follows: <img src=http://a73.g.akamai.net/7/23/cnn.com/af/foo.j pg> The browser needs to resolve a73.g.akamai.net
  • 20.
    CDN: Redirection What doesthis mean? <img src=http://a73.g.akamai.net/7/23/cnn.com/af/foo.j pg> host part: a73.g.akamai.net Akamai control part: /7/23 Content URL: /af/foo.jpg
  • 21.
    CDN: Redirection DNS isconfigured so that all queries about g.akamai.net that arrive at a DNS server are sent to an authoritative DNS server for g.akamai.net. This is referred to as a Akamai DNS server (authoritative DNS server)
  • 22.
    How Akamai Works End-user cnn.com(content provider) DNS root server 1 2 Nearby Akamai cluster DNS lookup cache.cnn.com Akamai cluster 3 4 ALIAS: g.akamai.net Akamai global DNS server Akamai regional DNS server
  • 23.
    CDN: Redirection DNS isconfigured so that all queries about g.akamai.net that arrive at a DNS server are sent to an authoritative DNS server for g.akamai.net. This is referred to as a Akamai DNS server (authoritative DNS server) When the Akamai DNS server receives the query, it extracts the IP address of the requesting browser. .
  • 24.
    PP How Akamai Works End-user cnn.com(content provider) DNS root server 1 2 Akamai global DNS server Akamai regional DNS server Nearby Akamai cluster Akamai cluster 3 4 6 5 ALIAS a73.g.akamai.net DNS lookup g.akamai.net
  • 25.
    CDN: Redirection Based onthe IP address and information that it has about the Internet (called a map), the IP address of an Akamai regional server is returned to the requesting browser based on policy e.g., select the server that is the fewest hops away. The regional server may choose a surrogate server for content retrieval
  • 26.
    HTTPHTTP How Akamai Works End-user cnn.com(content provider) DNS root server 1 2 Akamai global DNS server Akamai regional DNS server Nearby Akamai cluster Akamai cluster 3 4 6 5 8 7 DNS a73.g.akamai.net Address 1.2.3.4
  • 27.
    HTTPHTTP How Akamai Works End-user cnn.com(content provider) DNS root server 1 2 Akamai global DNS server Akamai regional DNS server Nearby Akamai cluster Akamai cluster 3 4 6 5 8 7 9 GET /foo.jpg Host: cache.cnn.com
  • 28.
    HTTPHTTP How Akamai Works End-user cnn.com(content provider) DNS root server 1 2 Akamai global DNS server Akamai regional DNS server Nearby Akamai cluster Akamai cluster 3 4 6 5 8 7 9 GET /foo.jpg Host: cache.cnn.com 12 11 GET foo.jpg
  • 29.
    CDN Redirection The AkamaiDNS server IP address is now in the cache of the local DNS server. This implies that it is not always necessary to go to the root DNS server. The TTL associated with the IP address of an Akamai server(surrogate) is relatively small. This is done for performance reasons. Akamai content distribution servers are caches
  • 30.
    CDN Redirection What ifcontent is not there? If the request content is not found then the surrogate will ask other surrogates within a specified region for information. If requested information is still not found or is stale, then a request is made to the original web site.
  • 31.
    CDN Selection The trickyissue is selecting which local content server to use for a particular request Want to spread load evenly Want minimal impact if server is added or removed. In Akamai, each surrogate server sends measurement results to the Network Operations Communications Center (NOCC). Measurement results include number of active TCP connections, HTTP request arrival rate, bandwidth availability, etc This information is used by the Akamai DNS server.
  • 32.
    Accounting Mechanism Accounting mechanismscollect and track information related to request routing, distribution and delivery. Information is gathered in real time and put into log files for each CDN component. This gets sent to the Network Operations Communications Center (NOCC).
  • 33.
    Full Site Deliveryvs. Partial Site Delivery Full Site Delivery : All the contents are delivered by the CDN (including HTML, images, and other objects). Partial Site delivery: Only images, streaming media and other bandwidth intensive objects delivered by the CDN.
  • 34.
  • 35.
    Summary We have examinedreplication and issues related to the design and implementation of a replicated system. Many choices and tradeoffs to consider

Editor's Notes

  • #35 Alright, these slides is pretty self explanatory….., As you can see here, this slides can either be overwhelmed or impressed. Todate we have more than 2800 customers that utilising our contect delivery &amp; streaming services.