SlideShare a Scribd company logo
1 of 72
Download to read offline
Evaluating the Impact of Content Delivery
Networks on the Performance and Scalability
of Content-Rich and Highly Transactional Java
EE-based Websites
Witold Rzepnicki
Submitted to the Department of Electrical Engineering and Computer
Science and the Faculty of the Graduate School of the University of
Kansas in partial fulfillment of the requirements for the degree of
Master’s of Science
Committee Members: Hossein Saiedian, Ph.D.
Professor and Thesis Adviser
Arvin Agah, Ph.D.,
Associate Professor
Prasad Kulkarni, Ph.D.
Associate Professor
Date Defended: February ??, 2008
ii
Acceptance
iii
Abstract
N-tier e-commerce environments present unique performance and scalability chal-
lenges under highly variable transactional loads. This thesis investigates Content De-
livery Networks (CDNs) as a tactic to manage transactional peaks and valleys in a
“bursty” environment that has suffered from content delivery-related performance,
scalability and, ultimately, availability problems during key selling seasons. We estab-
lished monitoring infrastructure that consists of client (browser) and server-side per-
formance monitors as well as server and network resource utilization measurements
in order to gauge the full impact of our chosen CDN on response times, bandwidth
utilization and CPU utilization.
We ran a series of controlled experiments to establish performance and scalabil-
ity gains at the Java EE component level and HTML object level of granularity. Our
results suggest a 30 percent improvement in page response times, a four-fold improve-
ment in bandwidth utilization and Web server CPU utilization reduction of almost 90
percent. At the same time, we were able to sustain twice the amount of Web traffic
with zero outages and the exact same hardware footprint year-over-year. Our conclu-
sions are applicable to many other n-tier e-commerce environments that suffer from
content delivery issues.
iv
Acknowledgements
Contents
Acceptance ii
Abstract iii
Acknowledgements iv
1 Introduction 1
1.1 Problem statement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.2 Justification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.3 Significance and expected contributions . . . . . . . . . . . . . . . . . . . 3
1.4 Methodology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.5 Evaluation criteria . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
1.6 Thesis organization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
2 Background and Previous Work 8
2.1 Architectural overview of subject site . . . . . . . . . . . . . . . . . . . . 9
2.2 Website content . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
2.3 Performance considerations and tactics . . . . . . . . . . . . . . . . . . . 10
2.4 Previous work related to WWW environments . . . . . . . . . . . . . . . 11
3 Proposed Solution 22
3.1 CDN selection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
3.2 Solution configuration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
3.3 Monitoring infrastructure configuration . . . . . . . . . . . . . . . . . . . 27
3.3.1 Performance measurement tools . . . . . . . . . . . . . . . . . . . 27
3.3.2 Scalability assessment framework and tools . . . . . . . . . . . . 29
4 Quality of Service Evaluation 38
4.1 Performance experiments . . . . . . . . . . . . . . . . . . . . . . . . . . . 38
4.1.1 Transaction types . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
4.1.2 Server-side transaction capture . . . . . . . . . . . . . . . . . . . . 40
4.1.3 Bandwidth utilization monitoring . . . . . . . . . . . . . . . . . . 41
v
CONTENTS vi
5 Analysis 43
5.1 Scalability and availability observations . . . . . . . . . . . . . . . . . . . 44
5.1.1 CPU utilization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44
5.1.2 Memory utilization . . . . . . . . . . . . . . . . . . . . . . . . . . . 46
5.1.3 Bandwidth . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
5.2 Performance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50
5.2.1 Consumer Perspective . . . . . . . . . . . . . . . . . . . . . . . . . 50
5.2.2 Server-side Perspective . . . . . . . . . . . . . . . . . . . . . . . . 52
6 Conclusions and Further Research 56
6.1 Summary of Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . 57
6.2 Future Research . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58
Bibliography 59
List of Figures
1.1 Queue network for a typical n-tier e-commerce system . . . . . . . . . . 5
2.1 Sample customer behavior model graph . . . . . . . . . . . . . . . . . . . 15
2.2 Subject site page views by month, 2004-2007 . . . . . . . . . . . . . . . . 16
2.3 Logical view of subject site . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
2.4 Infrastructure diagram of system under study . . . . . . . . . . . . . . . 18
2.5 Websphere Commerce interaction diagram . . . . . . . . . . . . . . . . . 19
2.6 Typical approach for content management and publishing . . . . . . . . 20
2.7 Location of Internet bottlenecks . . . . . . . . . . . . . . . . . . . . . . . . 21
3.1 How DNS works . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
3.2 Caching in DNS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
3.3 URL rewrite process . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
3.4 How Akamai works . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
3.5 Application monitoring appliance network location . . . . . . . . . . . . 32
3.6 Application monitoring appliance measurement timeline . . . . . . . . . 33
3.7 Content delivery measurement combined timeline . . . . . . . . . . . . . 34
3.8 Sample CPU utilization report . . . . . . . . . . . . . . . . . . . . . . . . . 35
3.9 Memory page-in report . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36
3.10 Hourly page views for 2/11-2/15, 2005-2007 . . . . . . . . . . . . . . . . 37
4.1 Sample server-side performance report . . . . . . . . . . . . . . . . . . . 42
5.1 CPU utilization, February of 2006 and 2007 . . . . . . . . . . . . . . . . . 44
5.2 Operating system run queue: February of 2006 and 2007 . . . . . . . . . 45
5.3 Memory page-ins, February of 2006 and 2007 . . . . . . . . . . . . . . . . 47
5.4 Memory page-outs, February of 2006 and 2007 . . . . . . . . . . . . . . . 48
5.5 Memory scan rate, February of 2006 and 2007 . . . . . . . . . . . . . . . . 49
5.6 Peak bandwidth comparison, February of 2006 and 2007 . . . . . . . . . 50
5.7 Akamai bandwidth offload, February of 2007 . . . . . . . . . . . . . . . . 51
5.8 RTUM shop transaction response times in seconds . . . . . . . . . . . . . 52
5.9 RTUM browse transaction response times in seconds . . . . . . . . . . . 54
vii
List of Tables
1.1 Subject site’s content characteristics, 1998-2007 . . . . . . . . . . . . . . . 2
1.2 Annual e-commerce customer satisfaction survey results . . . . . . . . . 3
4.1 Performance test nodes by ISP and location . . . . . . . . . . . . . . . . . 39
4.2 RTUM transaction characteristics . . . . . . . . . . . . . . . . . . . . . . . 40
5.1 RTUM browse transaction performance . . . . . . . . . . . . . . . . . . . 53
5.2 RTUM shop transaction performance . . . . . . . . . . . . . . . . . . . . . 53
5.3 Server-side performance in milliseconds . . . . . . . . . . . . . . . . . . . 53
5.4 TCP packet counts at hosting environment . . . . . . . . . . . . . . . . . 55
viii
Chapter 1
Introduction
Electronic commerce (e-commerce) dates back to late 1970s when the first electronic
data interchange (EDI) systems were introduced to facilitate the exchange of data be-
tween companies and institutions. The initial types of data exchange were limited
to invoices and purchase orders and required significant financial investment in the
computer networks and hardware. The first EDI-based e-commerce networks were
later augmented with other forms of e-commerce such as automated teller machines
(ATM) and telephone banking in the 1980s and subsequently enterprise resource plan-
ning, data mining and data warehouse systems in the early 1990s.
In late 1990s and early 2000s, the term e-commerce started to include consumer-
based shopping transactions over the World Wide Web (WWW). These transactions
typically include a combination of browsing of a website’s catalog and its related con-
tent in the form of HTML, image files and marketing videos followed by the presenta-
tion of e-shopping cart and check out transactions supported by secure payment au-
thorization, generally over the HTTPs protocol. The United States Census Bureau [62]
defines several important concepts related to this topic. For example, it defines e-
commerce as “any transaction completed over a computer-mediated network that in-
volves the transfer of ownership or rights to use goods or services.” This definition
does not differentiate between monetary and non-monetary transactions as well zero
price and paid transactions. The concept of e-business infrastructure is also defined as
“the share of total economic infrastructure to support electronic business and conduct
electronic commerce”. The examples of infrastructure thus include the computers,
routers, satellites, network channels, system and application software, website host-
ing and programmers.
1.1 Problem statement
The goal of this thesis is to evaluate the impact of a Content Delivery Network on
scalability and performance of a content-rich and highly transactional website that
utilizes the Java Enterprise Edition (EE) platform. Since its inception in 1996 the web-
1
CHAPTER 1. INTRODUCTION 2
Table 1.1: Subject site’s content characteristics, 1998-2007
Content Type Size 1998 2004 2007
Documents 0 KB 28 KB 34 KB
Images 36 KB 42 KB 96 KB
Scripts 0 KB 2 KB 20 KB
StyleSheets 0 KB 0 KB 62 KB
Total 36 KB 72 KB 181 KB
site that comprises the experimental test bed of this thesis has dealt with significant
scalability and performance challenges around major holidays. These challenges can
be classified as transactional and content delivery-related. Over the last several years the
subject site has been able to tune its system infrastructure to achieve acceptable trans-
actional performance in the application server and database server tiers. However, it
has struggled with Web server content delivery. As a matter of fact, it has not been able
to service all of the requests during its most content-intensive peak week in February
for the last several years. This is mostly due to the popularity of the free customiz-
able Flash videos available on the website. Web servers responsible for delivering the
Flash videos frequently get overloaded with long-running connections that result in
high CPU utilization in the Web tier and a domino effect on the processing queues in
the application server tier. We have decided to investigate the tactic of implementing
a Content Delivery Network to help address the issues with content delivery not just
for this particular peak week, but for the website in general.
1.2 Justification
Text-only and informational websites from mid to late 1990s have transformed into
rich and highly interactive environments that offer not only physical goods, but also
services and digital assets such as MP3 files (e.g., itunes.com) for download. Ad-
ditional features such as wish lists, consumer product reviews and message boards
have also been introduced in order to aid consumers in objective product selection.
Some sites even offer interactive chat-based product selection advice. This conver-
gence of services and products as well as content presentation and interaction has re-
sulted in much larger Web pages and increased diversity of e-commerce transactions.
We used web.archive.org website as a source for a comparison of the subject web-
site’s home page characteristics for years 1998, 2004 and 2007 in Table 1.2.
The number of downloadable page elements has increased from 15 in 1998 to 28
in 2007 and the overall size of the home page has increased from 36 KB to 181 KB.
Another intriguing aspect of the subject site’s growth is increased reliance on scripts
CHAPTER 1. INTRODUCTION 3
Table 1.2: Annual e-commerce customer satisfaction survey results
Satisfaction Level 2005 2004 2003 2002 2002 vs 2005 change
Very Satisfied 40% 37% 40% 37% 3%
Somewhat Satisfied 24% 24% 23% 22% 2%
Neutral 31% 32% 30% 33% -3%
Somewhat Dissatisfied 4% 5% 5% 5% -1%
Very Dissatisfied 2% 3% 3% 3% -1%
that run within browsers to enable rich consumer experience. The number of objects
on HTML pages is the key indicator of the number of connections that will need to be
opened between the client (browser) and the Web server.
While e-commerce still represents only a minor fraction of total retail sales, it is
quickly becoming a very important component of any major retailer’s presence in the
marketplace. One study of European consumers shows increase in online participa-
tion from 42 percent to 56 percent between 2003 and 2006. Forrester Research also re-
ports that in 2006, on average, 80 percent of consumers reported researching products
online with approximately 55 percent reporting either an online or offline purchase in
the preceding three months. Nielsen Research [50] confirms this growth trend, but its
research also points out that consumer satisfaction from online purchases has experi-
enced only a slight growth as shown in the Table 1.2.
Current level of consumer satisfaction underscores the need to continuously im-
prove customer experience on the Web, especially with regards to website perfor-
mance. More than half of online shoppers cite response time as a key influence on
their shopping behavior. Consumers have come to expect fast, uninterrupted and
relevant shopping experience as well as scalability during major holidays and contin-
uous availability of services. Content delivery has become a critical component of the
online shopping experience and Content Delivery Networks (CDNs) were established
with the primary purpose of addressing issues with performance and scalability.
1.3 Significance and expected contributions
Current model of content delivery on the Internet can be classified as centralized in
a sense that a majority of the websites use the origin servers at their hosting environ-
ments to serve content. User requests have to travel through several networks and
encounter a myriad of bottlenecks including Peering, First Mile, Last Mile and Backbone
- all of which are inherent to the way Internet works [2,3,5]. CDNs have emerged as
collections of servers located in close proximity to the Internet backbone, and thus to
the end user, with the purpose of reducing the number of “hops” between network
CHAPTER 1. INTRODUCTION 4
nodes. This overlay network approach allows shorter routing paths and more efficient
DNS routing that helps alleviate all but the Last Mile bottleneck. CDN companies
claim that by running thousands of servers in hundreds of networks in geograph-
ically dispersed locations they can deliver a significant performance and scalability
boost to e-commerce websites. We will review the merits of this claim in our study
of the subject site. We will also review the impact of the potential network benefits
realized from implementation of a CDN on all processing tiers of a Java EE-based e-
commerce environment. Additionally, we will evaluate tradeoffs that may result from
deploying this solution.
Based on our analysis of the subject site’s page profile, we concluded that the av-
erage number of objects (images, cascading style sheets, JavaScript and Flash files) is
approximately 40 items with the size of between 1 KB for a simple image to 400 KB for
a Flash movie. While the load of 100 million page views per month probably would
not pose a problem for the Web servers for 1 KB items, the story is quite different for
the 400 KB items since they will take up a much longer connection time. Also, each
hosting facility imposes limits associated with amount of network bandwidth avail-
able to each customer. It is not unusual for the subject site to reach this limit during
the key holidays.
1.4 Methodology
Many studies to-date have focused on evaluating the network effects of implementing
a CDN on the Internet Service Providers in terms of bandwidth efficiency and network
traffic patterns. In these cases, the terms performance and scalability refer to the ability
of the network to sustain “bursty” traffic patterns within reasonable response times
during peaks with minimal changes in the hardware footprint. The methodologies
used to measure the impact have revolved around implementing a network monitor
that captured traffic patterns and characteristics of objects passing through the net-
work.
To our knowledge, there has been no research to-date that established a methodol-
ogy to measure the correlations between performance improvements in content deliv-
ery and individual tiers of n-tier e-commerce environments. We assert that in order to
identify meaningful conclusions in such research the methodologies must encompass
considerations of all tiers. Our approach is to conduct a more thorough case study
of one particular highly transactional e-commerce environment with the purpose of
identifying conclusions that could be applicable to a variety of n-tier platforms.
In order to improve validity of our conclusions, we decided to analyze and char-
acterize the workload for the subject site for one year prior to implementing the pro-
posed solution and one year after. We will use a combination of network and ap-
plication monitoring technologies for measurements. Our methodology will also en-
compass software-level component measurements that will aid in analysis of relation-
CHAPTER 1. INTRODUCTION 5
ships between content delivery and server-side Java EE components of the site’s ar-
chitecture. This granularity is necessary due to inherent inter-dependencies between
processing tiers when queuing networks are involved.
Figure 1.1: Queue network for a typical n-tier e-commerce system
Typical performance model of an e-commerce site could be represented by the
queue network in Figure 1.4 according to [44]. According to this model, the bottle-
necks in the database (DB) server tier could result in requests queuing up at the Web
server tier and vice versa. Also, performance improvements in any of the tiers will
most likely result in either a positive or negative impact on other tiers.
1.5 Evaluation criteria
Our evaluation framework will include consumer-centric and system-centric metrics.
Consumer-centric metrics focus on response times as observed by the users of the site.
We will implement an appliance-based monitoring solution that captures HTTP and
HTTPS traffic at the URI stem leave and provides a high-level view of application re-
sponse times. It will give us benchmarks for key function points of our e-commerce
solution, including browsing of Category and Product Detail pages as well as Shop-
ping Cart, Billing and Order Confirmation components. The tool will provide us with
measurements in terms of network, client and server time. We will capture measure-
ments from before and after the CDN implementation.
CHAPTER 1. INTRODUCTION 6
The existing website traffic monitoring solution will be utilized to capture traffic
characteristics in terms of typical e-commerce measures such as page views, visits
and orders. This data will enhance our application monitoring with aggregated load
metrics. System metrics, especially the correlation between system resource utilization
(CPU, memory and I/O) and content delivery offloading, will also need to be studied
in order to measure benefits from implementing a CDN in terms of hardware footprint
reduction and bandwidth cost efficiency. We will use a set of typical UNIX server
utilities to measure CPU, memory and network interface utilization for servers in all
three tiers.
We expect that making our content available via a network of geographically dis-
persed servers will have a significant impact on bandwidth utilization. Most CDNs
provide a monitoring console that allows customers to measure the origin server of-
fload effect using several metrics. We will use the console to measure the size and
types of website objects that most benefit from this solution and to quantify savings
in terms of bandwidth and the number of connections to the Web servers that were
avoided. Our expectation is that the CDN will offload a significant amount of band-
width from the network of our hosting environment. Current limitation imposed on
our subject site by the hosting facility is 1 Gbps (gigabit per second).
The evaluation would not be complete without tradeoff analysis of our tactic. Cur-
rent research points to manageability and the speed of content propagation as key
concerns when implementing a CDN. In this case, manageability refers to the abil-
ity to deploy configuration and content changes into the CDN. We will measure the
amount of time necessary to have content changes reflected in the overlay network
from the origin server as well the amount of time needed to deploy configuration
changes.
1.6 Thesis organization
The thesis will be organized as follows:
• Chapter 1: Introduction - The background and justification for the research
presented in this thesis.
• Chapter 2: Background and Previous Work - The architectural background of
the subject site and related work in measuring and characterization of perfor-
mance and scalability of WWW environments.
• Chapter 3: Proposed Solution - The selection and implementation of a Content
Delivery Network and performance monitoring tools.
• Chapter 4: Quality of Service Evaluation: Performance and Scalability - The
impact of the CDN on the subject site and its processing tiers.
CHAPTER 1. INTRODUCTION 7
• Chapter 5: Analysis - Validation of quantitative and qualitative results of the
implementation
• Chapter 6: Conclusions and Further Research - The conclusions resulting from
this research and proposed areas for further investigations
Chapter 2
Background and Previous Work
Bass et al. [15] give a generic definition of performance as responsiveness of the system
characterized by the time required to respond to events. The level of performance is
often described by the number of user or system transactions per unit of time. In e-
commerce, this concept encompasses several measures of throughput in terms of Web
page response times and number of page views for a given period of time. It can
also be expressed in terms of responsiveness and throughput capacity of key server-
side Java EE components such as Servlets and Enterprise Java Beans. Each site has a
unique traffic profile, but a generic traffic pattern can be represented by the Customer
Behavior Model (CBM) state transition graph in Figure 2 as proposed by Menasce et
al. in [45].
The CBM graph identifies several key states in the browsing and shopping expe-
rience. The numbers for each state transition represent its probability. Hence, there
is a 30 percent chance of a customer selecting the Search page from the Browse page.
Performance of a website can thus be characterized by the response times for each of
these states under a particular level of volume of user site visits. A visit consists of
multiple requests to different components of the browsing and shopping experience.
For the purpose of this study, we will further breakdown the performance of each
behavior point into three components: network time, client time and server time. The
network time is the amount time that a request and response pairs spend traveling to
and from the load balancers. The client time represents the amount of time it takes the
Web browser to issue an HTTP request and render the response in the form of HTML
and other objects. The server time is the amount of time each request spends on the
Web, application and database servers.
Scalability represents the ability for a given site to handle its projected traffic vol-
ume as the number of requests for the critical points grows while maintaining its cur-
rent hardware footprint. The latter is particularly important as most websites find it
difficult to significantly change their footprint during a critical business period. For
example, the subject website experiences up to four-fold increases in page views and
a 15-fold increase in bandwidth usage during some of its critical business periods.
8
CHAPTER 2. BACKGROUND AND PREVIOUS WORK 9
Figure 2 represents the subject site’s transactional bursts during key shopping holi-
days. Brebner et al. [22] provide several benchmark-based insights into the potential
scalability of J2EE environments.
2.1 Architectural overview of subject site
A typical e-commerce website can be represented in a series of views that depict its
functional components, software modules and hardware infrastructure as proposed
by [20, 42]. For the purpose of this study we will analyze a website that has existed
since 1996 and was recently (2003) re-architected using the IBM Websphere Commerce
software components using the Java Enterprise Edition (Java EE) platform. The site’s
business process view is represented in Figure 2.1. The business process view high-
lights several key functional components of the subject website, namely the catalog,
marketing and order management features. The functional view is supported by the
hardware and software components as shown in Figure 2.1 and Figure 2.1.
2.2 Website content
In the realm of e-commerce, content refers to any information made available for con-
sumers to download. This includes catalog data, HTML pages, images, audio files,
video files in Flash or other formats and even software downloads. Efficient man-
agement and deployment of content-related assets have thus become mission critical
activities for many organizations that depend on the Internet as a sales channel. E-
commerce websites rely on three types of assets to support the selling function: cat-
alog data assets, content assets and application assets. Catalog assets are used to define
products with characteristics such as price or product name. Content assets represent
editorial content such as marketing information, product reviews and promotional
information. Application assets include software components created by software de-
velopers that are generally managed separately from the other two types of assets.
Content can also be classified as dynamic or static. Dynamic content is generated upon
consumer’s request and it may be unique for each request. It is typically assembled
on the application server with involvement of the DBMS. Static content is represented
by HTML, client-side scripts (JavaScript), images and other multimedia assets that do
not change from request to request. This type of content generally resides on Web
server(s) and is served up from there. Figure 2.2 represents a typical approach to
management and publishing of static and dynamic content. It is important to note
that generally the static content will be published to a file system and dynamic con-
tent will be made available via server side components with support of a DBMS. The
fundamental concepts of content management and publishing are covered in [19].
Even if we were to achieve infinite scalability and performance at the hosting fa-
cility we would still have a dependency on the Internet as a whole for delivery of
CHAPTER 2. BACKGROUND AND PREVIOUS WORK 10
content to the browsers. The Internet could not function efficiently unless the indi-
vidual networks that make its backbone can also communicate with one another. The
process of exchanging data between two networks is called peering and it takes place
between two routers also known as peering points located at the edge of each net-
work. The two networks use an exterior gateway protocol called the Border Gateway
Protocol (BGP) to send traffic between them. The interior gateway protocol (OSPF
and RIP) is used to route traffic within a network. Hence, the peering points and rout-
ing protocols connect different networks into a unified infrastructure that allows the
Internet to function efficiently. Internet is made up of many different networks that
communicate with one another using the IP protocol. Individual networks consist of
a variety of routers, switch-es and fiber in order to move data packets within the net-
work. The structure of “network of networks” results in several potential scalability
and performance bottlenecks. They are generally classified as: first mile, peering points,
backbone and last mile. Figure 2.2 represents the types and location of the bottlenecks as
described by [3].
The First Mile bottleneck refers to the limitations in the website’s connectivity to
the Internet via its Internet Service Provider (ISP). In order to reach desired scalability
it needs to continuously expand its connectivity to the ISP. The ISP, in turn, must also
expand its capacity in order to meet its customers’ scalability requirements. Peering
points also represent potential bottlenecks as large networks are not economically mo-
tivated to scale up the number of peering points with the networks of their competi-
tors, especially since a significant portion of the traffic handled by the peering points
is transit traffic with packets originating on other networks. This lack of competitive
and financial motivation over time has resulted in a limited number of peering points
across major networks. The Backbone Problem refers to the fact that the ISPs’ router
capacity has historically not kept up with growth of traffic demands. Finally, the Last
Mile problem reflects the limited capacity of a typical user’s connection to their ISP.
It’s important to note that just solving one of the above bottlenecks, such as the Last
Mile, by increasing the reach of broadband connectivity at home will not automat-
ically address the other limitations. These need to be treated as separate problems
that, if addressed, would help solve the problem as a whole.
2.3 Performance considerations and tactics
Most websites measure their performance and scalability using a combination of user
and system-centric metrics. Page response times represent the user-centric metric
while the CPU, memory and disk utilization of the web, application and database
servers illustrate the system dimension of performance. High consumer expectations
have led many highly-transactional websites to implement a variety of technology
strategies to achieve a combination of high performance and scalability [10]. Typical
approaches include scaling both horizontally and vertically. Horizontal scalability en-
CHAPTER 2. BACKGROUND AND PREVIOUS WORK 11
tails increasing the number of computing nodes at each layer while vertical scalability
focuses on increasing computing capacity of the existing nodes. These tactics address
the bottlenecks at each computing node and can have significant impact on what we
defined as server time. However, they generally have minimal impact on the network
and client times.
Caching has also been used as a tactic to achieve better performance. It can refer to
caching of static content on the Web servers or caching dynamic content in memory of
the application server. We will refer to this concept as centralized caching. Centralized
refers to caching mechanism(s) that are implemented at the hosting facility and are
not geographically dispersed. The key limitation to the caching tactic is the overall
size of the computing environment and the incremental cost of adding new hardware
to support growth.
2.4 Previous work related to WWW environments
Web server behavior has been studied extensively since the early days of the In-
ternet [11, 13, 48] . Network considerations have also been given a lot of attention
with much research in the area of HTTP and TCP - the core protocols of the Inter-
net [12, 41, 51]. Workload characterization should be a key component of any perfor-
mance or scalability study regardless of the tactics used to achieve the two qualities of
service in an e-commerce environment. Early studies [11] in this space attempted to
generalize workload classifications in terms of Web server characteristics. Pitkow [53]
summarizes a group of traffic characterization studies on the World Wide Web until
1998. Subsequently, numerous characterization studies of Web server workloads were
conducted, including [9,10].
Menasce et al. [45] were first to extend workload modeling studies to include e-
commerce business transactions and not just Web server traffic. They presented a
technique to model workloads on the Web using the Customer Behavior Model Graph
profiles. We will use it to establish performance baseline and criteria to measure im-
provement in performance that we are expecting from implementing the CDN tactic.
Performance modeling of n-tier systems and the associated queuing networks was
also covered in [32]. Crovella et al. [27] conclude that e-commerce workloads tend
to vary very dynamically and exhibit short-term fluctuations. We agree with this as-
sessment based on several years’ worth of traffic measurements. Urgaonkar et al. [61]
represents some of the most recent work in performance modeling of multitier Inter-
net systems based on a network of queues that represent application tiers.
One of the key conclusions from the workload studies so far is that the popularity
of content elements on the Web follows a Zipf-like distribution [23]. Zipf distribution
supports the need for caching - considering that 25-40 percent of pieces of content
account for 70 percent requests. This concentration of access to key resources on the
site, such as Flash videos in the case of the system under study, strongly suggests that
CHAPTER 2. BACKGROUND AND PREVIOUS WORK 12
any meaningful level of caching should help performance as long as there exists a
cache replacement policy that minimizes the unavailability of the key resources. Bent
et al. [16] have determined that much of the workload of some of the busiest websites
remains uncacheable.
Perhaps the largest shortcoming of work to-date is the lack of analysis of correla-
tions between de-centralized content delivery tactics such as a CDN and performance
characteristics in the web, application server and database tiers. The existing body
of research exhibits a tendency towards evaluating individual tiers in the context of
content delivery instead of studying the relationships and interdependencies between
them. This is not surprising since the first implementations of websites focused on
serving static content. Our work builds on the existing research by augmenting the
queueing models with the new de-centralized content delivery tier and studying the
impact of such implementation across tiers.
We will need to use some sort of a benchmark in order to determine performance
gains from deployment of a CDN. Benchmarking of e-commerce applications has been
addressed by the Transaction Processing Council’s TPC-W (obsolete as of 2005) and
TPC-App specification [57]. Many websites have used TPC-W as a benchmark for
simulating the load on an e-commerce site [43]. We have the advantage of dealing
with a live website, so we do not need to use the TPC specification for load gener-
ation. However, we will refer to its performance metrics and response time capture
requirements in order to assess improvement.
Performance and scalability of Web-based environments have generated a lot of
interest within the research community. These environments usually involve static
requests for documents and dynamic requests for data or content generated by the
combination of application and database servers [36]. Iyengar’s paper also points out
that serving dynamic content can be orders of magnitude more CPU-intensive than
serving static content. Andreolini et al. [8] conclude that there is a need for differ-
ent levels of measurement granularity in order to identify bottlenecks. The levels are
classified into system-level, node level, resource level and component level metrics.
This study further classifies hardware resource utilizations into CPU, disk and net-
work utilization types. We also use classification in of bottlenecks could of client, net-
work and server constraints. Client bottlenecks are related to the ability to efficiently
render downloaded content within the browser. The diversity of personal comput-
ing environments, from Personal Digital Assistants to full-blown personal computers
contributes to a large variance in client-side performance.
Network bottlenecks result primarily from network congestion at the origin net-
work or the Wide Area Network [14]. The server-based bottlenecks result from inter-
tier interactions and associated CPU, I/O and memory utilization costs. Menasce et
al. [44] discuss the system bottlenecks and tactics to address them such as replicating
Web server queues into a cluster of Web servers in order better balance the load. We
will utilize the existing research around performance bottlenecks to identify metrics
that will help us prove the impact of the CDN tactic. The recurring theme in perfor-
CHAPTER 2. BACKGROUND AND PREVIOUS WORK 13
mance and scalability research is the continuous need offload as much of repetitive
processing from the websites as possible. For example, Challenger et al. [24] deter-
mined that the cost of generating a cacheable response is 500 times higher from the
CPU utilization standpoint than serving it from a cache.
Caching has been a part of many performance evaluations and recommendations
for improvement of the World Wide Web. Challenger et al. developed an approach
for consistently caching dynamic Web data that became a critical component of the
1998 Olympic Winter Games Website [25]. Performance results from implementing
a Web proxy cache as well as the impact of various cache replacement strategies are
also presented in [9]. Additionally, Arlit et al. present a number of workload char-
acteristics and their impact on effectiveness of caching. Rodriguez et al. [55] discuss
different types of Web caches including hierarchical and distributed and conclude that
geographic dispersion of content reduces the expected network distance to access a
document as well as band-width usage. Their results confirm the value proposition
of caching in general, but they stop short of recommending a decentralized caching
approach such as a CDN. Nahum’s paper [49] is one of the first efforts to consider
Wide Area Network effects in website performance studies. Wide Area Network’s
implications on performance are also discussed in [8]. A very detailed overview of
proxy-based caching with emphasis on dynamically generated content is also pre-
sented in [1]. They do not characterize the types of workload and do not compare the
response times with and without CDN as well resource utilization on the Web servers
and how a CDN impacts it.
The case for geographic distribution of content has been made by the research
community for quite some time [28, 52]. Perhaps the most comprehensive review of
CDN-related concepts is presented in [63]. We found Verma’s summary of the po-
tential applicability of a CDN to be particularly relevant. According to his work, the
application is a good CDN candidate if
• it has a high ratio of reads compared to writes
• client access patterns tend to access some set of objects more frequently
• limited windows of inconsistent data are acceptable
• data updates occur relatively slowly
Our analysis of the subject site shows consistency with the first three items, but
we have to concede that the data updates have to occur quickly in order to support
ever-changing content on the site. Nonetheless, the tradeoffs are well worth further
investigation of CDN’s applicability and the potential performance benefits. What we
found lacking in Verma’s work was a comprehensive analysis of sample websites that
have implemented a CDN and benefited from it. We had to look no further than [17]
to get a first glimpse of potential real-life benefits of CDNs using DNS redirection and
URL rewriting. This study was conducted on home pages of several hundred most
CHAPTER 2. BACKGROUND AND PREVIOUS WORK 14
popular websites as of November and December of 2000. One of the key findings was
that the networks performed significantly better than the origin sites even with the
overhead of DNS redirection. Other studies [39], confirm the CDN-related reduction
of download response times even with the DNS redirection tradeoff. Finally, Su et
al. [60] present a set of measurements for the Akamai delivery network along with
a detailed explanation of its architecture. This is the first study we are aware of that
presents data to support the network path selection efficiency gains that result from
implementing a CDN.
The key missing component in the research to-date is the impact of reduced down-
load times on e-commerce transactions as a whole from client, server, network and
consumer performance perspective. Also, due to proprietary nature of surrogate net-
works and e-commerce websites, there has been little more than anecdotal evidence
of quality attribute improvements for production environments. Our intent is to close
this gap with data that speaks to the impact of a CDN across the enterprise processing
tiers of a production e-commerce site.
CHAPTER 2. BACKGROUND AND PREVIOUS WORK 15
Figure 2.1: Sample customer behavior model graph
CHAPTER 2. BACKGROUND AND PREVIOUS WORK 16
Figure 2.2: Subject site page views by month, 2004-2007
CHAPTER 2. BACKGROUND AND PREVIOUS WORK 17
J2EE Application Server
E-Commerce Engine
Administration Tools
Admin. Console
Commerce
Accelerator
Organization
Admin. Console
Business Context
Engine
Policies
Roles and
Relationships
Analytics
Business
Process
Personalization
Globalization
Business Process Subsystems
Member
Subsystem
Trading
Subsystem
Catalog
Subsystem
Marketing
Subsystem
Merchandising
Subsystem
Inventory
Subsystem
Order Subsystem
Business
Intelligence
Websphere Commerce
Payments Server
E-Commerce
Engine
Database
create content
retrieve data
manage business rules
settle payments
Content Management
Portal
Content
Management
CM
Console
Content
Publishing
Content
Integration
Utility
Content Data
Sourcing Scripts
Data &
Content
create
create
Site Search
Engine
Components
Consumer
Search Engine
Indexing Rules
Indexed Objects
index
data & content
execute search
CM
Database
create
source
www.example.com
view content
access store features
Figure 2.3: Logical view of subject site
CHAPTER 2. BACKGROUND AND PREVIOUS WORK 18
Figure 2.4: Infrastructure diagram of system under study
CHAPTER 2. BACKGROUND AND PREVIOUS WORK 19
Figure 2.5: Websphere Commerce interaction diagram
CHAPTER 2. BACKGROUND AND PREVIOUS WORK 20
Figure 2.6: Typical approach for content management and publishing
CHAPTER 2. BACKGROUND AND PREVIOUS WORK 21
Figure 2.7: Location of Internet bottlenecks
Chapter 3
Proposed Solution
Our solution will consist of several components that will need to be deployed, con-
figured and tested at our hosting facility. We will also need to make configuration
changes in our network, namely the DNS server, in order to re-route certain portions
of our website’s traffic to the chosen CDN. The high-level implementation steps can
be outlined as follows:
1. Select a CDN provider
2. Configure solution components
3. Implement network and application monitoring infrastructure
4. Test implementation
Web traffic re-routing almost always involves DNS - one of the core subsystems
of the Internet. We present a short overview of key DNS concepts as described in
[7,46,47]. Figure 3 depicts a typical series of steps in resolution of a URL to an IP ad-
dress for www.example.com. DNS is essentially a distributed database that follows
the client-server architecture. Adequate performance of DNS is achieved through
replication and caching. The server side portion of a request is handled by programs
called name servers. They contain information about a portion of the global database
and are capable of forwarding requests to other authoritative servers if necessary. The
information is made available to the client-side software components called resolvers.
A typical domain name on the Internet consists of two or more parts separated by
dots such as my.yahoo.com. Top-level domain (TLD) represents the rightmost por-
tion, .com in our case, while the subdomain(s) are represented by the labels to the left
of the top-level domain.
In our example, my.yahoo.com is a subdomain of yahoo.com, which in turn
belongs to the .com top-level domain. Finally, the hostname refers to a domain name
that has one or more IP addresses associated with it. Each domain or subdomain has
an authoritative server associated with it. It contains and publishes information about
22
CHAPTER 3. PROPOSED SOLUTION 23
Figure 3.1: How DNS works
the domain and any other domains it encapsulates. Root nameservers reside at the top
of the DNS hierarchy and they are queried first to resolve TLD. Caching and emphtime-
to-live (TTL) are very important concepts in DNS and, as we will later discover, in
CDN implementations [38,56]. IP mappings obtained from DNS can be stored in the
local resolver for a period of time as defined in TTL. This greatly reduces the load on
the DNS servers.
Figure 3 shows the usage of local cache in the resolution process. Our DNS config-
uration follows the configuration steps from [35] for the AIX 5.3 environment using
the BIND protocol. We have created a set of configuration files that will define the
name server mapping and resolvers and then we will start the appropriate processes
(daemons) on the server to activate DNS resolutions.
CHAPTER 3. PROPOSED SOLUTION 24
Figure 3.2: Caching in DNS
3.1 CDN selection
The main purpose of a CDN is to direct consumer requests for objects to a server at
the optimal Internet location relative to the consumer’s location. The key components
of a CDN architecture are described in [37]. They are defined as: overlay network for-
mation, client request redirection, content routing and last-mile content delivery. The two
most common techniques employed by the networks are DNS redirection and URL
rewriting. Wills et al. [17] provide a good overview of these techniques and their effec-
tiveness. The DNS redirection technique utilizes a series of DNS resolutions based on
several factors such as server availability and network conditions with the purpose
of identifying the most suitable server. This identification step is usually encapsu-
lated into a series of algorithms that are proprietary to each network’s implementa-
tion. The end result is a DNS response with the IP address to the content server. The
response includes a time-to-live value that is usually limited to less than a minute (in
the case of Akamai it is 20 seconds). The TTL has to be set to a relatively low value
because the network conditions and server availability change constantly and quick
IP re-mapping is key.
The DNS redirection technique can facilitate either a full- or partial-site delivery.
Will full-site delivery, all requests to the origin server are directed using DNS to a
CDN server. If the CDN server can’t fulfill the request it simply routes it back to the
origin server. Several networks, including Adero and NetCaching, employ this deliv-
ery model. The main shortcoming of this model is the additional routing overhead
of wasted DNS requests that could have been handled by the origin server to begin
with. With partial-site content delivery, on the other hand, the origin site modifies the
URLs for certain objects or object directory locations to be resolved by the CDN’s DNS
server. This approach seems to be well suited for our website due to its combination of
static digital assets and dynamically generated server-side presentation components.
URL rewriting is another potential solution for server lookups. With this tech-
nique, the origin server continuously rewrites the URL links for dynamically gener-
ated pages in order to redirect them to the appropriate CDN server. The DNS func-
CHAPTER 3. PROPOSED SOLUTION 25
Figure 3.3: URL rewrite process
tionality remains on the origin site with this approach. When a page is requested by
the user it will be served from the origin server. However, before it is served, all of the
embedded links will be rewritten to point to the CDN’s DNS. Figure 3.1 shows a typi-
cal rewrite approach. The main drawback to the URL rewrite approach from the mea-
surement standpoint is the fact that the rewrites usually take place at the Web server
tier. Hence, the rewrite steps would inevitably introduce additional background noise
to our performance measurements. Therefore, we decided to avoid this approach for
the purpose of our study.
The latest list of available CDN implementations is maintained in [29]. At the time
of writing of this thesis we counted 18 different networks on Davison’s website. It
is not our primary purpose to evaluate tradeoffs between the various networks and
their implementations. The choice we made does not reflect a belief in superiority of
one network over others - it is merely a reflection of the need to get our experimental
test bed up and running as quickly as possible within boundaries imposed on us by
our existing hosting facility. For our implementation, we settled on partial-site, DNS
redirection-based CDN implementation using the Akamai delivery network.
3.2 Solution configuration
There exist several public sources that describe the DNS redirection mechanism im-
plemented by Akamai. We relied on [21, 40, 60] for more in-depth view of the inner
workings of this CDN. Akamai’s website and the available white papers [3] provide
the steps for configuring our network environment to redirect partial-site traffic. We
will start by explaining how Akamai works and then move on to our configuration
steps. Figure 3.2 depicts the DNS redirection process employed by this CDN.
In essence, Akamai performs a highly complex translation of a customer’s domain
to the IP address of the most suitable edge server. First, the Web browser requests
CHAPTER 3. PROPOSED SOLUTION 26
an HTML object. In order to accommodate this request, the local DNS resolver has
to translate the domain name into an IP address. The resolver issues a query to the
customer’s DNS server which in turn forwards the request to the Akamai network.
This is enabled via a configuration of a canonical name record (CNAME) in the origin
site’s DNS name server. The CNAME triggers the request redirection to the CDN.
Next, a hierarchy of Akamai servers responds to the request using the requestor’s
IP address, the name of the CDN customer, and the name of the requested content
as seeds for its DNS resolution. The CDN name resolution step is perhaps the most
critical in this sequence of events. Configuration of the Akamai CDN is described
in [4]. The steps for our deployment can be summarized as follows:
1. Create origin hostname
2. Activate Akamai edge hostname
3. Activate content delivery configuration
4. Point website to Akamai network
In our case, this process begins with configuration of a CNAME in our DNS name
server. A CNAME record maps an alias or nickname to the real name which may lie
outside the current zone. Typical format of a CNAME entry is as follows:
name ttl class rr canonical name
www IN CNAME joe.example.com.
We need to set up an origin server hostname that will resolve to our content server.
This server will be used by Akamai edge servers to retrieve our content, so it can be
made available to all of the nodes in the CDN. The naming convention for the origin
server is:
origin-<website>
where “website” refers to the is the hostname for our content that will be delivered
from Akamai. Our website stores all of its static content in the generic images folder,
so we will define the following origin server name:
origin-images.example.com for images.example.com
Next, we will create a DNS record for our origin server hostname on our authoritative
name server. We will use the CNAME record type for this step.
origin-www.example.com IN CNAME loadbalancer.example.com
We are now pointing our website to the Akamai network. An edge hostname will need
to be activated on an Akamai domain for our website using the CDN’s configuration
console. It will resolve to the Akamai network. For example, www.example.com
would have to point to www.example.edgesuite.net and www.example.edgesuite.net
would in turn resolve to individual servers on the Akamai network since it owns the
edgesuite.net domain. The remaining configuration steps need to be performed
in the configuration console of Akamai and they are covered in-depth in [4].
CHAPTER 3. PROPOSED SOLUTION 27
3.3 Monitoring infrastructure configuration
We will utilize three different types of monitoring solutions in the course of our per-
formance experiments. Monitoring infrastructure will help us obtain measurements
from three different perspectives: consumer, application and CDN. We need this full
view in order identify correlations between the CDN implementation and the key
performance characteristics in terms of system and user response times. The CDN
monitoring will aid the bandwidth scalability improvement assessment.
3.3.1 Performance measurement tools
The consumer perspective will be measured using a real-time user monitoring (RTUM)
service. RTUM services [18] provide details of user interactions with a website by
reporting on user application successes and failures. We will define a series of trans-
actions with sequences of browsing, clicks, object downloads and data input. This is
functionally equivalent to a consumer interacting with our website via a Web browser.
Transactions will be configured with our service provider and deployed into its net-
work of six different geographically dispersed origin locations within the U.S. with ac-
cess speeds ranging from dial-up to broadband. We will configure two different trans-
actions with multiple steps within them. The two types of transactions are necessary
because different paths through our website tend to have different content delivery
characteristics in terms of number of assets delivered and the number of server-side
components triggered. Our service provides very detailed HTML object-level report-
ing on transaction steps which will include going to the home page, catalog browsing,
product selection, adding an item to a shopping cart and checking out. The RTUM
time measurements will break the download time into slices consisting of DNS lookup
time, Web server connect time, content time and SSL time. Following are the definitions
for the key metrics used by our RTUM:
DNS look-up The process of calling a DNS server to lookup and convert a hostname
to an IP address. For instance, to convert www.foo.com to 10.0.0.1
Connect time The time it takes to connect to a Web server (or CDN edge server in our
case) across a network from a client browser or an RTUM agent
Secure sockets layer time The time it takes to create an SSL TCP/IP connection with
a website.
First byte time The time between the completion of the TCP connection with the des-
tination server that will provide the displayed page’s HTML, graphic, or other
component and the reception of the first packet (also known as first byte) for that
object.
CHAPTER 3. PROPOSED SOLUTION 28
Content download time The time in seconds that measures the actual time to deliver
content (images, HTML, or other objects) from the Web server to the browser.
The application perspective will be captured using an appliance based application
monitoring solution. The network location of this appliance is depicted in Figure 3.3.1.
We will configure “watchpoints” using the appliance’s configuration tool to capture
server-side response times of Java EE components corresponding to the transaction
steps defined in the RTUM service. The appliance uses passive traffic analysis to cap-
ture actual transactions from the RTUM within our hosting environment and mea-
sures performance and availability of our e-commerce application as a whole. The im-
portant difference between this and other approaches is that our appliance does not
generate any traffic and the only performance overhead it introduces is reading the
copy of traffic from the network connection. The data is assembled into requests for
objects, pages and user sessions. Performance metrics include host, SSL and redirect
times. This solution also measures server errors or prematurely terminated connec-
tions due to increase in traffic. Figure 3.3.1 depicts the measurement timeline for a
sample request that would be captured by our appliance [26].
The appliance solution groups latency into the following six categories and defines
them as follows:
Host time This is the combined time the Web, application, and database servers take
to process a request. Host time is a key measure to assess performance impli-
cations of implementing a CDN on performance of our Java EE components
(servlet, EJBs, etc.). It can be very short in the case of a static image or long
in cases of long reports and complex server-side transactions such as adding a
list of items to the shopping basket.
Network time This is the time spent traveling across intervening networks. Once the
server has prepared its response, host time is over and network time begins. A
small object might be delivered quickly; a large one might take a long time. This
time is highly dependent on the type of consumer’s connection. Low-bandwidth
connections will result in higher network times and vice versa with broadband
connections. Our monitoring appliance also records additional information on
packet loss, out-of-order delivery, and round-trip time to help with this diagno-
sis.
SSL time The appliance will record the time spent negotiating the encryption of en-
crypted transactions. This portion of the SSL time represents the server-side
latency elements of the handshake versus the client-side SSL time captured by
the RTUM.
Redirect time This is the time the site spends sending a request on to other pages. In
some applications, a request for a page results in a redirect that usually points
elsewhere. This delay is recorded as redirect time.
CHAPTER 3. PROPOSED SOLUTION 29
Idle time When a browser is retrieving a page, but there is no activity between ob-
jects on the same page, the HTTP interaction is defined as “idle”. This measure-
ment is key to understanding the amount of time spent processing client-side
scripts such as JavaScript. When there is inactivity in the middle of rendering
the page within the browser, our appliance will measure it as idle time.
End-to-end This is the total time for the object or page, from the moment the first
packet of a request is seen until the browser acknowledges delivery of the last
packet.
In order to quantify the benefits from implementing a CDN we will need to com-
bine server-side measurements taken by our application monitoring appliance with
client-side RTUM service data. The combined response timeline for the two monitor-
ing components is represented in Figure 3.3.1. We will capture measurements for all
elements of the above timeline and compare their pre- and post-CDN response times.
The individual latency components can also be summarized using the following two
equations:
Tee = Tdns + Tnetwork + Thost + Tsslclient + Tsslserver + Tidle + Thttpredirect (3.1)
where
Tnetwork = Tconnect + Tfirstbyte + Tdownload (3.2)
We will use them to assess the CDN’s impact on the latency as a whole in terms of
end-to-end and network times.
3.3.2 Scalability assessment framework and tools
Duboc et al. [30] argue that scalability of a system should be seen in the context of
its requirements. Scalability is often associated with performance and the terms have
at times been used interchangeably. However, performance is just one of the dimen-
sions used to represent scalability. We particularly agree with the authors’ assessment
that scalability is implicit on the relationship between cause and effect as observed
by a stakeholder. In our case, we define scalability as a relationship between the im-
plementation of a CDN on the following key dependent variables for our site and its
infrastructure: page views, CPU utilization, memory utilization, and bandwidth.
Based on the claims made by our CDN solution provider, we expect it to offload
a significant portion of connections to our Web server tier. We also expect a signifi-
cant decrease in CPU, memory and input/output (I/O) utilization of the Web tier that
should result in higher website capacity in terms of page views - the key dependent
variable of our scalability assessment. The third component where we should be able
CHAPTER 3. PROPOSED SOLUTION 30
to observe a significant improvement is the bandwidth usage of our environment at
the hosting facility since a significant portion of it will be served from the edge servers.
We will use a combination of monitoring reports available with our CDN and our in-
ternal server usage measurements to determine these scalability characteristics in the
context of traffic bursts our site experiences during the February peak week.
The internal server measurements will be taken using the vmstat UNIX util-
ity [33]. We will set the appropriate flags in vmstat in order to enable capturing of
utilization statistics for CPU and memory usage in the Production (Prod) environment
during the most traffic intensive time interval for our website - February 11th through
February 15th. The Prod environment consists of multiple servers in the Web and ap-
plication tiers and one server in the database tier. Typical operational reports for our
website include CPU and memory usage-related charts whose data is gathered via
vmstat. The examples of the reports are represented in Figure 3.3.2 and Figure 3.3.2.
The CPU utilization chart simply depicts the usage of all available CPU resources for
a given server. Servers may have multiple CPUs installed, so data is averaged out
across all of them for a given node. The data represents both kernel and any other
processes, so it gives a full picture for this particular resource.
Physical memory is a finite resource on any system. The UNIX memory handler
manages the memory allocations. The kernel is responsible for freeing up physical
memory of idle processes by saving it to disk until it is needed again. Paging and
swapping are used to accomplish this task. Paging refers to writing portions, termed
pages, of a process’ memory to disk. Swapping refers to writing the entire process, not
just part, to disk. Page-out represents the event of writing pages to disk, while page-in
is defined as retrieving memory data from disk. Page-ins are common and under nor-
mal circumstances are not a cause for concern. However, if page-ins become excessive
the kernel can reach a point where it’s actually spending more time managing paging
activity than running the applications, and system performance suffers. This state is
referred to as thrashing. Our memory measurements have focused on identifying ex-
cessive memory utilization in terms of paging characteristics as shown in Figure 3.3.2.
We will compare the pre-Akamai resource utilization for 2006 and post-Akamai
resource utilization for 2007 for this period of time. Figure 3.3.2 represents the traffic
patterns for the years 2005, 2006, and 2007 and it speaks to the type of traffic burstiness
our site experiences during the holiday. Even though year 2005 is not included in our
resource measurements, we decided to show it on the chart to illustrate the traffic
pattern during the peak week over the last several years. It’s important to note that
sudden drops in traffic for years 2005 and 2006 represent site outages due to Web
servers getting overloaded with incoming requests.
CHAPTER 3. PROPOSED SOLUTION 31
Figure 3.4: How Akamai works
CHAPTER 3. PROPOSED SOLUTION 32
Figure 3.5: Application monitoring appliance network location
CHAPTER 3. PROPOSED SOLUTION 33
Figure 3.6: Application monitoring appliance measurement timeline
CHAPTER 3. PROPOSED SOLUTION 34
Figure 3.7: Content delivery measurement combined timeline
CHAPTER 3. PROPOSED SOLUTION 35
Figure 3.8: Sample CPU utilization report
CHAPTER 3. PROPOSED SOLUTION 36
Figure 3.9: Memory page-in report
CHAPTER 3. PROPOSED SOLUTION 37
Figure 3.10: Hourly page views for 2/11-2/15, 2005-2007
Chapter 4
Quality of Service Evaluation
We have devised two different ways to measure performance and scalability impacts
resulting from implementing a CDN on our website. The need for two separate stud-
ies resulted from the fact that performance observations tend to be impacted by the
amount of load on the system under study. On the other hand, we need sufficient load
to assess scalability improvements. These two goals are mutually exclusive, so we de-
cided to take measurements in two different runtime environments. In order to have
a more controlled environment for performance measurements, we decided to obtain
response time experiments in our performance evaluation (PerfEval) environments.
It represents a scaled down replica of our production (Prod) environment based on
the rPerf benchmark measurement [31]. We determined that from the resource capac-
ity perspective our PerfEval environment represents approximately 17 percent of our
Prod capacity. We will discuss the details of the experiments in subsequent sections.
In order to assess scalability, we need to generate significant load on the produc-
tion environment and compare the results before and after we implement the CDN
configuration. A typical symptom of our scalability problems during key holidays is
that our Web servers get overloaded with HTTP connections from the browsers and
the CPU utilization spikes to above 90 percent when the Web servers stop respond-
ing to the incoming requests. Another key symptom of our scalability challenges is
related to the amount of bandwidth that has to be provided by our hosting facility.
Therefore, our scalability factors can be represented by three key benchmarks: CPU
and memory utilization, page views, and bandwidth. We decided to capture the measure-
ments during the most load-intensive holiday in February before and after the CDN
implementation in the full-scale Production environment.
4.1 Performance experiments
In order to fully represent the impact of the CDN on response times, we needed to de-
ploy our measurement transactions at geographically dispersed locations. Our RTUM
38
CHAPTER 4. QUALITY OF SERVICE EVALUATION 39
Table 4.1: Performance test nodes by ISP and location
ISP City and State
Level3 Los Angeles, CA
Savvis Santa Clara, CA
Verizon Denver, CO
MFN Washington, D.C.
Internap Miami, FL
Level3 Chicago, IL
Sprint New York, NY
service allowed us to use seven transaction nodes as described in Table 4.1. We de-
cided not to use international testing locations due to additional cost and the fact that
almost 80 percent of our website’s traffic originates from within the continental U.S.
4.1.1 Transaction types
We recorded two types of scripts with two distinct sequences of transaction steps:
one for the browse transaction profile and one for shop transaction profile. The browse
transaction consists of the following steps:
1. Access the Website’s Home Page
2. Go to four different levels of Category pages
3. Access Flash video product detail page
4. Personalize Flash video
5. Sign In to the website
6. Send Flash video to recipient
The shop transaction type consists of the following steps:
1. Access the Website’s Home Page
2. Go to the top Category page for one of our product types
3. Go to Product Detail page for one of the products (it remains constant through-
out the experiment)
4. Select delivery options and go to the Shipping and Billing page
CHAPTER 4. QUALITY OF SERVICE EVALUATION 40
Table 4.2: RTUM transaction characteristics
Transaction workload characteristics Browse Shop
Number of transaction steps 9 6
Number of images retrieved 163 94
Number of scripts, HTML, CSS, Flash components 57 39
Number of server-side J2EE components accessed 12 15
Average image size 2.9 KB 2.8 KB
Average size of HTML, script and Flash 4.9 KB 5.8 KB
Total number of bytes retrieved per connection 250 KB 98 KB
Number of web-server connections initiated from the browser 4 5
5. Submit payment information and go to Order review page
The key difference between the two transaction types are the types of server-side
Java EE components as well as the amount of content that gets served by the Web
servers. The browse transaction focuses more on content delivery characteristics of
large Flash video files while the shop transaction emphasizes server-side functionality.
The combination of the two transactions should provide us with adequate coverage of
content delivery and server-side performance. Table 4.1.1 summarizes the key char-
acteristics of the two transaction types in terms of workload, types of items retrieved
and their size as well as server-side load generated by the transactions in the form of
Java EE component requests.
In order to gather a representative data sample, we decided to schedule the two
transactions to run uninterrupted for 48 hours with the CDN re-routing enabled and
for another 48 hours without the CDN. The RTUM will initiate six transactions for
each transaction type and for each node per hour. This will give us a sample of over a
thousand transactions for each test’s duration.
4.1.2 Server-side transaction capture
We will enable server-side performance monitoring using a series of “watchpoints”
that will be defined in our application monitoring appliance. The watchpoints are
typically configured at the servlet level. We identified 21 unique servlets that will be
accessed from the two transaction sequences. We dedicated one full instance of the ap-
pliance to monitor traffic directed to our PerfEval environment by the RTUM service.
This will enable us to filter out traffic destined to our production environment and get
more accurate server-side measurements. Typical configuration steps for a servlet in-
clude defining the URI stem to be intercepted and any additional filtering parameters.
Since our environment is isolated from other traffic, we can simply configure the URI
stems to be intercepted in the configuration utility.
CHAPTER 4. QUALITY OF SERVICE EVALUATION 41
Performance results for each servlet will be aggregated by the appliance’s report-
ing components. A typical performance report generated by the appliance is repre-
sented by Figure 4.1.2. The key highlights of the performance report are the number of
requests to a given watchpoint along with their host, network and end-to-end times.
4.1.3 Bandwidth utilization monitoring
Our pre-CDN bandwidth utilization was captured using a combination of network
measurement tools available at the hosting facility. With the implementation of a
CDN, we also gained the ability to generate reports for bandwidth utilization coming
through the CDN. HTTP request traffic data includes the header size and any protocol
overhead. The HTTP response data includes object bytes plus overhead bytes along
with origin traffic (ingress), midgress traffic and edge traffic (or egress, all response
traffic from edge servers to end users [3]. Ingress traffic represents all response traf-
fic from our origin server to the Edge Servers, while the midgress includes responses
between Edge Servers. We will monitor several data elements in order to assess band-
width efficiency gains. They include:
Edge traffic, in requests per second The total number of requests for the content provider
(CP) and time period selected. This data includes all hits with all response codes
from the edge server to the end user.
Origin bandwidth, in Megabits per second The total bandwidth usage to our origin
servers for the CP code(s) and time period selected. This occurs when the Aka-
mai Edge Server cache does not have the requested content, and therefore the
edge server had to request it from the origin. This data includes the object bytes
plus overhead bytes, all ingress traffic and all HTTP response codes
Origin traffic, in hits per second The total number of hits on the origin for the CP
code(s) and time period selected. The peak number of hits per second and the
total origin hits are highlighted in the graph. The breakdown of response codes
is shown with different colors. This data includes all response codes served from
the origin to the edge server.
Origin bandwidth offload, in Megabits per second The origin offload view draws a
picture of the bandwidth reduction Akamai provides for our origin by showing
an aggregate view of site traffic from the edge compared to the traffic going to
our origin.
CHAPTER 4. QUALITY OF SERVICE EVALUATION 42
Figure 4.1: Sample server-side performance report
Chapter 5
Analysis
This chapter is divided into sections that discuss the respective quality of service (QoS)
attribute impacts resulting from the implementation of the CDN. We undertook two
separate evaluations of performance and scalability qualities of service. The scala-
bility evaluation focused on comparing a variety of resource utilization metrics cap-
tured during the February peak transaction period for years 2006 and 2007. We antic-
ipated increased throughput of transactions as well as page views while maintaining
the same hardware footprint and resource utilization. Better scalability typically re-
sults in better availability and the two QoS are frequently viewed as derivatives of
each other. Therefore, we will include the availability observations in the scalability
section.
In order to assess the performance impact of the CDN, we ran a series of experi-
ments for with pre- and post-CDN environments for browse and shop types of transac-
tions. With the monitoring infrastructure in place, we were able to gather performance
at the object level of granularity. An object in our case represents the lowest level of
granularity from the browser perspective in terms of how it renders content and it
corresponds to an image (JPEG, GIF or PNG), a script (JavaScript or otherwise), Flash
SWF files, HTML content, style sheets and any other low-level components that a web
page may consist of. We anticipated improvement in end-to-end download times for
the objects. Assuming that each web page consists of tens or hundreds of them, we
expected the performance improvement of each individual object to add up to a sig-
nificant impact on the overall performance of the RTUM transactions. It was unclear
to us how a CDN may impact server-side performance of JSPs and Servlets, especially
under steady-state, low-volume conditions, so we decided to capture server-side per-
formance metrics as well.
43
CHAPTER 5. ANALYSIS 44
Figure 5.1: CPU utilization, February of 2006 and 2007
5.1 Scalability and availability observations
5.1.1 CPU utilization
We maintained the same number of CPUs and thus the rPerf value for the Production
environment throughout 2006 and 2007. However, the CPU allocation changed in
the application server and Web server tiers. We decreased the available resources
by 50 percent in the web tier and allocated them to the application server. It was
our expectation that the CDN would allow us to change the ratios in favor of the
application tier since we expected a majority of the resource utilization decrease to be
visible in the web tier. We made no changes to the database tier. Figure 5.1.1 depicts
CPU utilization across all tiers after accounting for resource allocation changes. As we
anticipated, the Web tier experienced the most impact from the CDN - an average of
57 percent utilization decrease during the peak week and close to 60 percent on the
highest traffic day. Meanwhile, the overall traffic increased almost five-fold from the
previous day (see Figure 3.3.2 for peak week traffic pattern).
CHAPTER 5. ANALYSIS 45
Figure 5.2: Operating system run queue: February of 2006 and 2007
The outages we experienced in previous years in the Web tier prevented a high
percentage of requests from even being forwarded to the application tier. With the
elimination of the Web tier bottleneck, we experienced higher CPU utilization in the
application tier and the database responsible for the shop transactions. This was to be
expected since a larger number of requests usually translates into higher CPU utiliza-
tion. However, when we accounted for the two times the amount of traffic on peak
day of the 14th between 2006 and 2007, we concluded that we still gained significant
scalability in the application tier. The two-percent increase of CPU utilization well
offsets the amount of the additional traffic the tier handled successfully. We suspect
that some of the efficiencies gained in the application tier were caused by the appli-
cation tuning activities that were undertaken prior to the holiday, similar to the steps
described in [58], but based on the results of the load tests prior and after the tuning
activities, we concluded that the contribution was minimal and most, if not all, of the
scalability gains could be attributed to the elimination of the Web tier bottleneck with
the CDN.
Perhaps the most unexpected result was uncovered in the database tier. One of the
CHAPTER 5. ANALYSIS 46
databases experienced 80 percent average CPU utilization decrease. We decided to
investigate it further and we discovered the main contributor to be the database tun-
ing activities performed by the engineering team prior to the peak week. Therefore,
we deemed the data invalid for analysis purposes and we concluded that additional
efficiencies could not be attributed to the CDN implementation.
Figure 5.1.1 further supports our conclusions for the Web and application tiers.
Generally speaking, the fewer number of threads waiting for execution in the run
queue results in lower CPU utilization. As in our CPU results, the most significant
impact on the run queues was on the peak traffic day - 2/14. The slight increase in
some cases between 2006 and 2007 is negligible from the CPU perspective as long as
the number oscillates around one or two threads. According to [34] and [6], blocking
I/O contributes to high run queue thread waits. The implications of the high cost of
I/O-related context switching are also discussed in [59]. We conclude that the blocking
I/O was the main culprit behind high CPU utilization prior to the CDN implemen-
tation because requests made directly to the Web servers resulted in significant I/O
waits for long-running large-object connections which, in turn, resulted in high I/O
blocking in the application tier. The offloading of a large number of HTTP connections
to the CDN has had a significant impact on the CPU utilization.
5.1.2 Memory utilization
We used three different measures to gauge the impact of the CDN on memory utiliza-
tion. The same level of memory resources was maintained throughout 2006 and 2007.
We observed the largest differences in memory paging activity. Retrievals of memory
pages from the paging space instead of RAM are considerably slower and constant
paging activity can cause system performance degradation. The most pronounced
impact on memory paging activity was observed on the peak day with average mem-
ory page-ins per second decreasing from 25 to almost zero and page-outs decreasing
from approximately 60 to one in the Web tier as shown in Figure 5.1.2 and Figure 5.1.2.
Interestingly enough, the paging activity increased on all other days during the peak
week, but the increase was not significant enough to cause performance problems or
outages.
Figure 5.1.2 represents the memory scanned-to-freed ratios. Overcommitting of
memory results in high scanned-to-freed ratios. It’s difficult to establish the level of
this ratio that will definitely result in memory being constrained to the point of having
an impact on performance. Overall, the ratios were neutral across all tiers and did
not seem to have a visible impact on scalability in either 2006 or 2007. All evidence
is pointing to the excessive paging activity as the main culprit of memory and CPU
utilization spikes in 2006. The impact of the CDN is probably indirect in the case of
memory utilization. The most plausible conclusion is that the lower number of I/O
waits due to offloading of HTTP connections results in a significant decrease of calls
from the kernel scheduler to the paging system which, in turn, alleviates the memory
CHAPTER 5. ANALYSIS 47
Figure 5.3: Memory page-ins, February of 2006 and 2007
constraints [54].
5.1.3 Bandwidth
Bandwidth costs and availability are key considerations for any highly transactional
e-commerce environment. Hosting facilities usually impose limits on bandwidth us-
age in order to allow many customers to share the connectivity to the Internet from
the hosting facility. Increasing bandwidth availability by a significant amount entails
significant costs during peaks due to market conditions. There are also significant
costs associated with deployment of network infrastructure to support bandwidth
increases. Therefore, bandwidth offloading represents a significant benefit for CDN
implementations.
The bandwidth impact in our environment between 2006 and 2007 peak weeks
is depicted in Figure 5.1.3. We determined that low bandwidth utilization in 2006
was caused by the refusal of a large number of requests in the Web tier and the sub-
sequent outages. However, even if we were to allow all of the requests we would
CHAPTER 5. ANALYSIS 48
Figure 5.4: Memory page-outs, February of 2006 and 2007
still encounter the bandwidth limitation imposed on our environment by the hosting
provider.
The impact of the CDN is two-fold. First, it allows us to fully utilize all of our
bandwidth capacity at the hosting facility. Secondly, it allows us offload over 1 Gbps
of bandwidth utilization to the CDN network with thousands of Web servers that
represent a much more resilient and cost effective infrastructure. We conclude that
the scalability impact here is likely in the range of a factor of two to four.
Figure 5.1.3 provides another view into the impact of the CDN and it supports our
scalability assessment from the standpoint of bandwidth usage during the peak day.
It shows that we were able to sustain a five-fold spike in bandwidth usage for several
hours. Our analysis of the number hits further verifies our scalability claim. A hit is
defined by our CDN as a retrieval of an object from an edge server versus an origin
server. Our peak edge hit rate was 16,000 per second. This number is of interest to us
because we can use it to determine the approximate Web tier scalability terms of TCP
connections using the following formula:
CHAPTER 5. ANALYSIS 49
Figure 5.5: Memory scan rate, February of 2006 and 2007
ScaleFactortcp = HitsCDN /AvailableThreads (5.1)
where Hits represents the peak number hits per second offloaded to the CDN and
AvailableThreads represents the maximum possible number of threads for our Web
server cluster without impacting server availability. Our Web servers’ thread capacity
varies depending on the type of hardware currently in use, but we have historically
estimated it at approximately 3,600 for a typical day. Therefore, ScaleFactortcp = 3.9
and, during peak, the CDN allows us to quadruple our TCP connection capacity with-
out additional hardware. This number represents the best case scenario as many con-
nections will be kept alive by the browsers. Nonetheless, we expect that it will still
lead to a significant load decrease on the Web servers, especially since many of the
connections constitute long-running Flash video downloads.
CHAPTER 5. ANALYSIS 50
Figure 5.6: Peak bandwidth comparison, February of 2006 and 2007
5.2 Performance
We decided to divide performance observations into two perspectives: consumer and
server-side. The consumer perspective corresponds to measurements obtained from
the RTUM transactions executed under steady-state with and without the CDN. The
RTUM allowed us to capture object-level download times in an environment that
closely emulates consumers’ experience. The server-side perspective supplements our
consumer perspective with server-side pre- and post-CDN measurements obtained
from the performance measurement appliance deployed in our hosting environment’s
network.
5.2.1 Consumer Perspective
The RTUM experiments resulted in approximately 1,200 combined shop and browse
transactions for the duration of 48 hours before and after implementing the CDN.
The 1,200 transactions for each time period generated over 8,700 page views and
downloads of over 250,000 objects to the RTUM clients. We eliminated transactions
that were incomplete due to unavailability of the environment. Figure 5.2.1 and Fig-
ure 5.2.1 summarize response time measurements for each transaction type. Shop
transaction response times averaged 14 seconds with the CDN and 11 seconds without
the CDN resulting in 21 percent performance improvement at the transaction level.
CHAPTER 5. ANALYSIS 51
Figure 5.7: Akamai bandwidth offload, February of 2007
Browse transaction response times averaged 15 seconds with CDN versus 10 seconds
without it for the overall improvement of around 30 percent. We expected the browse
transaction to realize slightly better benefit from the CDN due to its higher number of
content assets and the usage of larger content assets.
Subsequently, we decided to analyze performance data at the object level of granu-
larity in order to identify the source of performance improvements and to distinguish
between characteristics of server-side components such as servlets and static content.
Due to very different content characteristics we expected to see significant differences
between the shop and browse transactions in terms of object response times as well.
Table 5.2.1 and Table 5.2.1 illustrate end-to-end response time changes by content type.
Positive percentages represent performance improvement while negative percentages
represent performance degradations. Both transaction types show significant perfor-
mance improvements at the object level.
Although the transaction and object level performance realized significant benefits
from the CDN, it was not without a trade-off in terms of servlet performance. Browse
and shop servlets suffered performance degradation of 5 percent and 18 percent, re-
spectively. We suspect that the primary reason for the degradation are the additional
CHAPTER 5. ANALYSIS 52
Figure 5.8: RTUM shop transaction response times in seconds
DNS redirections that take place during edge server resolution. Overall, however,
the benefits at the object level far outweigh the degradation at the servlet level and the
transaction response times improved. The 30 percent improvement in browse transac-
tions is particularly important to our environment during the peak week in February
when majority of our transactions are browse transactions.
5.2.2 Server-side Perspective
The analysis of server-side performance centered around measuring the servlet re-
sponse times. However, during our measurements we uncovered additional network-
related characteristics that helped us better understand the impact of the CDN not
only on the server performance of our hosting environment, but also on its network
infrastructure.
Table 5.2.2 illustrates the server-side performance changes for the various key servlets.
It’s worth noting that the measurements are represented in milliseconds and while the
improvement in terms of a percentage seems very impressive at around 20 percent av-
CHAPTER 5. ANALYSIS 53
Table 5.1: RTUM browse transaction performance
Content Type Performance Change
JavaScript 69%
CSS 68%
GIF 68%
JPG 65%
PNG 77%
Servlet -5%
Table 5.2: RTUM shop transaction performance
Content Type Performance Change
JavaScript 68%
CSS 68%
GIF 69%
JPG 61%
PNG 72%
Servlet -18%
Table 5.3: Server-side performance in milliseconds
Servlet Name Server time without CDN Server time with CDN Chg
Category 1 64.05 28.06 56%
Category 2 38.89 38.62 1%
Category 3 45.48 39.81 12%
Category 4 56.18 33.08 41%
Category 5 42.78 39.67 7%
Category 6 56.26 46.51 17%
Checkout 236.18 193.54 18%
Ecard Display 31.46 21.75 31%
Ecard Personalize 81.54 69.38 15%
Home Page 50.60 20.14 60%
Login 97.25 58.68 40%
Marketing Spot 13.74 13.48 2%
Order Review 267.64 248.73 7%
Product Detail 473.52 500.06 -6%
Shopping Cart 4052.16 4069.80 0%
CHAPTER 5. ANALYSIS 54
Figure 5.9: RTUM browse transaction response times in seconds
erage, the majority of improvement can be attributed to servlets that already perform
at 100 milliseconds or less. Therefore, the improvement has very little or no impact
on the overall response time for the Web pages that correspond to the servlets. On
the other hand, the servlets that performed at above 100 milliseconds suffered either
a slight performance degradation or insignificant improvement. We anticipated the
servlet performance to experience more of an improvement, but the only logical con-
clusion based on the data is that the impact of the CDN on server side performance in
the application and database tiers is neutral or slightly negative.
We were able to obtain data regarding packet flow through our hosting environ-
ment’s network before and after the CDN implementation. The summary of the re-
sults is presented in Table 5.2.2. There is a strong correlation between the amount of
content associated with a servlet and the decrease in the number of packets in our
network. This supports the results from the Scalability section of this chapter where
we discuss the bandwidth implications of the CDN. With much of the bandwidth of-
floaded to the edge network we would expect a much smaller number of TCP packets
in our network.
CHAPTER 5. ANALYSIS 55
Table 5.4: TCP packet counts at hosting environment
Servlet Name TCP Pkt Cnt without CDN TCP Pkt Cnt with CDN Chg
Category 1 758.46 74.08 90%
Category 2 281.20 84.65 70%
Category 3 159.78 85.36 47%
Category 4 419.39 87.01 79%
Category 5 621.74 95.43 85%
Category 6 482.11 95.96 80%
Checkout 594.68 178.45 70%
Ecard Display 124.92 51.21 59%
Ecard Personalize 178.09 118.57 33%
Home Page 795.08 76.92 90%
Login 531.43 215.71 59%
Marketing Spot 16.24 27.05 -66%
Order Review 350.12 178.77 49%
Product Detail 347.56 221.57 36%
Shopping Cart 267.09 201.32 25%
Chapter 6
Conclusions and Further Research
In this thesis, we evaluated the merits of implementing a Content Delivery Network as
a strategy to improve scalability and performance of a highly transactional e-commerce
website. The key differentiating factor of our research is the evaluation of impact
across all tiers and processing components of our environment. We were also inter-
ested in potential tradeoffs from this architectural decision - especially with regards
to configuration and content management across a geographically dispersed environ-
ment. Research to-date has focused primarily on network efficiencies that result from
the CDN implementations.
By implementing a series of server and client-side monitoring components, we
were able to obtain measurements under steady transactional load and also during
significant transactional bursts. We captured results from the consumer as well as
application and infrastructure perspectives. Our overall conclusion is that CDN of-
floading has a significant impact on overall performance and scalability of n-tier e-
commerce environments such as ours. The results of our experiments lead us to the
following conclusions:
• CDNs have a significant impact on page response times as observed by con-
sumers. We achieved a 30 percent improvement in the most utilized transaction
on our website - the browse transaction.
• Implementing a CDN as a scalability tactic is a much more cost-efficient and
flexible way to achieve the desired results in the Web tier as opposed to hori-
zontal and vertical scalability tactics because it does not require an investment
in additional server and network hardware.
• We conclude that the CDN resolved our content delivery bottlenecks and al-
lowed us to scale to meet the peak demand without sacrificing page response
times. We also achieved a 100 percent availability during the peak day with
much of the success attributable to the offloading solution.
56
CHAPTER 6. CONCLUSIONS AND FURTHER RESEARCH 57
• Server-side performance impact varies throughout the tiers with the Web tier
benefiting from the CDN the most. We observed a slight performance degrada-
tion in the application tier and the impact on the DB tier was neutral. Servlet per-
formance degradation and additional DNS redirections had minimal impact on
consumer experience. This leads us to conclude that CDNs are not the panacea
for all performance-related problems including poor application and database
design. We observed that CDN content delivery has little or no impact on server-
side application-tier caching strategies because of its primary impact on the Web
tier.
• We observed a five-fold increase in the efficiency of bandwidth utilization with
the CDN in place. This was partly due to the ability to fully utilize our hosting
center’s bandwidth and the additional bandwidth provided by Akamai. In the
past, we were unable to achieve full utilization at our hosting facility due to
Web tier outages. With the CDN in place, we achieved bandwidth scalability
equivalent to adding another hosting facility at a fraction of the cost.
• The CDN allowed us to maintain half of our previous Web server footprint while
increasing the number of page views almost two fold and decreasing CPU uti-
lization by almost 90 percent in the Web tier. We also saw a positive impact on
memory utilization with memory page-ins and page-outs decreasing by almost
95 percent.
• Maintainability should be considered as a part of tradeoff analysis during CDN
selection. We observed that configuration changes can take up to two hours and
content changes can take an additional 7-10 minutes. Majority of our changes are
content-related, so the several-minute delay may have an impact on our ability to
respond to content changes during peaks. Configuration changes do not impact
availability because they do not require system re-starts or a planned outage.
6.1 Summary of Contributions
Our research is the most comprehensive and architecturally significant attempt we are
aware of to correlate content delivery offloading with Quality of Service factors of an
e-commerce environment. We assert that the CDN impact on the network behavior is
just a part of the full range of performance and scalability benefits that an e-commerce
website may realize from it. This work also attempted to quantify its impact on the
computing infrastructure - especially the Web server footprint. We were able to quan-
tify scalability gains in terms of bandwidth and Web tier CPU and memory utilization.
Additionally, we proposed an approach to measuring performance of an e-commerce
website with a combination of server-side and client-side monitors in order to obtain
CHAPTER 6. CONCLUSIONS AND FURTHER RESEARCH 58
full end-to-end picture of performance. This is important because much of the re-
search to-date focuses on either client or server perspective without identifying cor-
relations between the two. The full performance picture is a pre-requisite to under-
standing the locations of the performance bottlenecks in an environment such as ours.
Finally, were able to verify the claim that CDNs enhance not only performance
but also scalability and availability of e-commerce websites. Much research to-date
focuses on performance alone and treats scalability as an enabler or, at times, a by-
product of performance. We were able to isolate measurements that pertain to scala-
bility alone and it is our assertion that this quality of service deserves as much focus
as performance.
6.2 Future Research
Edge content delivery is already a well-established technology and quite a few solu-
tions exist in the marketplace to enable it. Edge computing has been proposed as a way
to extend this model to geographically dispersed application delivery and there have
been several attempts in the research community to define application replication and
deployment standards related to this concept. This area deserves further research as
the complexity for multi-platform geographically dispersed computing environments
will far outweigh the content delivery of a limited set of static asset types as is the case
with the CDNs.
N-tier system quality of service investigations of e-commerce environments have
also been scarce in the research community. The transactional burstiness of these
environments makes them particularly relevant to performance and scalability re-
search. More research into availability, reliability and security of these environments
is needed to establish a full architectural view of tradeoffs and dependencies of the
qualities of service.
We did not evaluate the tradeoffs between the various techniques that can be used
to enable CDN re-routing. We also did not consider differences between HTTP down-
loading and streaming of video content using protocols such as MMS or RTSP. Edge
streaming has gained popularity as an alternative to hosting streaming servers in cen-
tralized e-commerce environments. The above could be areas for further research.
Bibliography
[1] A.Datta, K. Dutta, H. Thomas, D. Vandermeer, and A. Ramamritham. Proxy-
based acceleration of dynamically generated content on the world wide web: An
approach and implementation. ACM Transactions on Database Systems, 29(2):403–
443, 2004.
[2] V. Aggarwal, A. Feldmann, and C. Scheideler. Can isps and p2p users cooper-
ate for improved performance? SIGCOMM Computer Communications Review,
37(3):29–40, 2007.
[3] Akamai. www.akamai.com, 2007.
[4] Akamai. Akamai http content delivery configuration guide. https://dl.
akamai.com/customers/HTTPCD/HTTPCD_Activation_Guide.pdf,
2007.
[5] A. Akella, S. Seshan, and A. Shaikh. An empirical evaluation of wide-area inter-
net bottlenecks. In IMC ’03: Proceedings of the 3rd ACM SIGCOMM conference on
Internet measurement, pages 101–114, New York, NY, USA, 2003. ACM.
[6] J. Akella and D. Siewiorek. Modeling and measurement of the impact of in-
put/output on system performance. In ISCA ’91: Proceedings of the 18th annual
international symposium on Computer architecture, pages 390–399, New York, NY,
USA, 1991. ACM.
[7] P. Albitz and C. Liu. DNS and BIND. O’Reilly, Sebastopol, CA, 2001.
[8] M. Andreolini, M. Colajanni, R. Lancellotti, and F. Mazzoni. Fine grain perfor-
mance evaluation of e-commerce sites. SIGMETRICS Performance Evaluation Re-
view, 32(3):14–23, 2004.
[9] M. Arlitt, L. Cherkasova, J. Dilley, R. Friedrich, and T. Jin. Evaluating content
management techniques for web proxy caches. SIGMETRICS Performance Evalu-
ation Review, 27(4):3–11, 2000.
[10] M. Arlitt, D. Krishnamurthy, and J. Rolia. Characterizing the scalability of a large
web-based shopping system. ACM Transactions on Internet Technologies, 1(1):44–
69, 2001.
59
Rzepnicki_thesis
Rzepnicki_thesis
Rzepnicki_thesis
Rzepnicki_thesis
Rzepnicki_thesis

More Related Content

What's hot

Service level management using ibm tivoli service level advisor and tivoli bu...
Service level management using ibm tivoli service level advisor and tivoli bu...Service level management using ibm tivoli service level advisor and tivoli bu...
Service level management using ibm tivoli service level advisor and tivoli bu...Banking at Ho Chi Minh city
 
Master_Thesis_2015_by_Sanjeev_Laha_21229267
Master_Thesis_2015_by_Sanjeev_Laha_21229267Master_Thesis_2015_by_Sanjeev_Laha_21229267
Master_Thesis_2015_by_Sanjeev_Laha_21229267Sanjeev Laha
 
Certification study guide ibm tivoli access manager for e business 6.0 sg247202
Certification study guide ibm tivoli access manager for e business 6.0 sg247202Certification study guide ibm tivoli access manager for e business 6.0 sg247202
Certification study guide ibm tivoli access manager for e business 6.0 sg247202Banking at Ho Chi Minh city
 

What's hot (6)

Service level management using ibm tivoli service level advisor and tivoli bu...
Service level management using ibm tivoli service level advisor and tivoli bu...Service level management using ibm tivoli service level advisor and tivoli bu...
Service level management using ibm tivoli service level advisor and tivoli bu...
 
PureFlex pour les MSP
PureFlex pour les MSPPureFlex pour les MSP
PureFlex pour les MSP
 
Master_Thesis_2015_by_Sanjeev_Laha_21229267
Master_Thesis_2015_by_Sanjeev_Laha_21229267Master_Thesis_2015_by_Sanjeev_Laha_21229267
Master_Thesis_2015_by_Sanjeev_Laha_21229267
 
thesis
thesisthesis
thesis
 
Performance tuning for content manager sg246949
Performance tuning for content manager sg246949Performance tuning for content manager sg246949
Performance tuning for content manager sg246949
 
Certification study guide ibm tivoli access manager for e business 6.0 sg247202
Certification study guide ibm tivoli access manager for e business 6.0 sg247202Certification study guide ibm tivoli access manager for e business 6.0 sg247202
Certification study guide ibm tivoli access manager for e business 6.0 sg247202
 

Viewers also liked

Taha_Ragab_Resume-opt
Taha_Ragab_Resume-optTaha_Ragab_Resume-opt
Taha_Ragab_Resume-optTaha Ragab
 
HYPER LOCAL MARKETING in INDIA ( NON METRO )
HYPER LOCAL MARKETING  in INDIA  ( NON METRO ) HYPER LOCAL MARKETING  in INDIA  ( NON METRO )
HYPER LOCAL MARKETING in INDIA ( NON METRO ) PESHWA ACHARYA
 
Gebeurtenis: Slag om Guadalcanal
Gebeurtenis: Slag om GuadalcanalGebeurtenis: Slag om Guadalcanal
Gebeurtenis: Slag om GuadalcanalJorenD
 
Managerial Planning and Goal Setting
Managerial Planning and Goal SettingManagerial Planning and Goal Setting
Managerial Planning and Goal SettingRuhull
 
Srivari mettu Tirumala footpath route oct 17-2011
Srivari mettu Tirumala footpath route oct 17-2011Srivari mettu Tirumala footpath route oct 17-2011
Srivari mettu Tirumala footpath route oct 17-2011Neelaraman Thimma
 
SciTech Quiz 2015 IIT-BHU Finals
SciTech Quiz 2015 IIT-BHU FinalsSciTech Quiz 2015 IIT-BHU Finals
SciTech Quiz 2015 IIT-BHU Finalsfaizan_khan_iit
 
Etika dalam sistem informasi kel 2 ppt
Etika dalam sistem informasi kel 2 pptEtika dalam sistem informasi kel 2 ppt
Etika dalam sistem informasi kel 2 pptLelys x'Trezz
 
PENGENALAN KEPADA HEMODIALISIS
PENGENALAN KEPADA HEMODIALISISPENGENALAN KEPADA HEMODIALISIS
PENGENALAN KEPADA HEMODIALISISMuhammad Nasrullah
 
DAUR ULANG SAMPAH PLASTIK kewirausahaan
DAUR ULANG SAMPAH PLASTIK kewirausahaanDAUR ULANG SAMPAH PLASTIK kewirausahaan
DAUR ULANG SAMPAH PLASTIK kewirausahaanMhd. Abdullah Hamid
 
John Maynard Keynes. Keynesian economics
John Maynard Keynes. Keynesian economicsJohn Maynard Keynes. Keynesian economics
John Maynard Keynes. Keynesian economicsRuhull
 
Debian 8 server_full
Debian 8 server_fullDebian 8 server_full
Debian 8 server_fullronijagarino
 
Forces and their effects
Forces and their effectsForces and their effects
Forces and their effectsheymisterlee
 

Viewers also liked (20)

JERMEL NEW CV
JERMEL NEW CVJERMEL NEW CV
JERMEL NEW CV
 
Qa06 average
Qa06 averageQa06 average
Qa06 average
 
Taha_Ragab_Resume-opt
Taha_Ragab_Resume-optTaha_Ragab_Resume-opt
Taha_Ragab_Resume-opt
 
HYPER LOCAL MARKETING in INDIA ( NON METRO )
HYPER LOCAL MARKETING  in INDIA  ( NON METRO ) HYPER LOCAL MARKETING  in INDIA  ( NON METRO )
HYPER LOCAL MARKETING in INDIA ( NON METRO )
 
Gebeurtenis: Slag om Guadalcanal
Gebeurtenis: Slag om GuadalcanalGebeurtenis: Slag om Guadalcanal
Gebeurtenis: Slag om Guadalcanal
 
Managerial Planning and Goal Setting
Managerial Planning and Goal SettingManagerial Planning and Goal Setting
Managerial Planning and Goal Setting
 
Mesologgi
MesologgiMesologgi
Mesologgi
 
Tesis
TesisTesis
Tesis
 
Muscular system
Muscular systemMuscular system
Muscular system
 
Srivari mettu Tirumala footpath route oct 17-2011
Srivari mettu Tirumala footpath route oct 17-2011Srivari mettu Tirumala footpath route oct 17-2011
Srivari mettu Tirumala footpath route oct 17-2011
 
SciTech Quiz 2015 IIT-BHU Finals
SciTech Quiz 2015 IIT-BHU FinalsSciTech Quiz 2015 IIT-BHU Finals
SciTech Quiz 2015 IIT-BHU Finals
 
Etika dalam sistem informasi kel 2 ppt
Etika dalam sistem informasi kel 2 pptEtika dalam sistem informasi kel 2 ppt
Etika dalam sistem informasi kel 2 ppt
 
PENGENALAN KEPADA HEMODIALISIS
PENGENALAN KEPADA HEMODIALISISPENGENALAN KEPADA HEMODIALISIS
PENGENALAN KEPADA HEMODIALISIS
 
Helen of Troy
Helen of Troy Helen of Troy
Helen of Troy
 
S &C Test_Juston
S &C Test_JustonS &C Test_Juston
S &C Test_Juston
 
DAUR ULANG SAMPAH PLASTIK kewirausahaan
DAUR ULANG SAMPAH PLASTIK kewirausahaanDAUR ULANG SAMPAH PLASTIK kewirausahaan
DAUR ULANG SAMPAH PLASTIK kewirausahaan
 
John Maynard Keynes. Keynesian economics
John Maynard Keynes. Keynesian economicsJohn Maynard Keynes. Keynesian economics
John Maynard Keynes. Keynesian economics
 
Debian 8 server_full
Debian 8 server_fullDebian 8 server_full
Debian 8 server_full
 
Forces and their effects
Forces and their effectsForces and their effects
Forces and their effects
 
It act ppt ( 1111)
It act ppt ( 1111)It act ppt ( 1111)
It act ppt ( 1111)
 

Similar to Rzepnicki_thesis

Patterns: Implementing an SOA using an enterprise service bus (ESB)
Patterns: Implementing an SOA using an enterprise service bus (ESB)Patterns: Implementing an SOA using an enterprise service bus (ESB)
Patterns: Implementing an SOA using an enterprise service bus (ESB)Kunal Ashar
 
Patterns: Implementing an SOA Using an Enterprise Service Bus
Patterns: Implementing an SOA Using an Enterprise Service BusPatterns: Implementing an SOA Using an Enterprise Service Bus
Patterns: Implementing an SOA Using an Enterprise Service BusBlue Atoll Consulting
 
Managing an soa environment with tivoli redp4318
Managing an soa environment with tivoli redp4318Managing an soa environment with tivoli redp4318
Managing an soa environment with tivoli redp4318Banking at Ho Chi Minh city
 
SzaboGeza_disszertacio
SzaboGeza_disszertacioSzaboGeza_disszertacio
SzaboGeza_disszertacioGéza Szabó
 
Mohan_Dissertation (1)
Mohan_Dissertation (1)Mohan_Dissertation (1)
Mohan_Dissertation (1)Mohan Bhargav
 
Deployment guide series ibm total storage productivity center for data sg247140
Deployment guide series ibm total storage productivity center for data sg247140Deployment guide series ibm total storage productivity center for data sg247140
Deployment guide series ibm total storage productivity center for data sg247140Banking at Ho Chi Minh city
 
Agentless Monitoring with AdRem Software's NetCrunch 7
Agentless Monitoring with AdRem Software's NetCrunch 7Agentless Monitoring with AdRem Software's NetCrunch 7
Agentless Monitoring with AdRem Software's NetCrunch 7Hamza Lazaar
 
bonino_thesis_final
bonino_thesis_finalbonino_thesis_final
bonino_thesis_finalDario Bonino
 
Distributed Traffic management framework
Distributed Traffic management frameworkDistributed Traffic management framework
Distributed Traffic management frameworkSaurabh Nambiar
 
bkremer-report-final
bkremer-report-finalbkremer-report-final
bkremer-report-finalBen Kremer
 
phd_thesis_PierreCHATEL_en
phd_thesis_PierreCHATEL_enphd_thesis_PierreCHATEL_en
phd_thesis_PierreCHATEL_enPierre CHATEL
 

Similar to Rzepnicki_thesis (20)

My PhD Thesis
My PhD Thesis My PhD Thesis
My PhD Thesis
 
Patterns: Implementing an SOA using an enterprise service bus (ESB)
Patterns: Implementing an SOA using an enterprise service bus (ESB)Patterns: Implementing an SOA using an enterprise service bus (ESB)
Patterns: Implementing an SOA using an enterprise service bus (ESB)
 
Patterns: Implementing an SOA Using an Enterprise Service Bus
Patterns: Implementing an SOA Using an Enterprise Service BusPatterns: Implementing an SOA Using an Enterprise Service Bus
Patterns: Implementing an SOA Using an Enterprise Service Bus
 
Managing an soa environment with tivoli redp4318
Managing an soa environment with tivoli redp4318Managing an soa environment with tivoli redp4318
Managing an soa environment with tivoli redp4318
 
SzaboGeza_disszertacio
SzaboGeza_disszertacioSzaboGeza_disszertacio
SzaboGeza_disszertacio
 
document
documentdocument
document
 
Mohan_Dissertation (1)
Mohan_Dissertation (1)Mohan_Dissertation (1)
Mohan_Dissertation (1)
 
Deployment guide series ibm total storage productivity center for data sg247140
Deployment guide series ibm total storage productivity center for data sg247140Deployment guide series ibm total storage productivity center for data sg247140
Deployment guide series ibm total storage productivity center for data sg247140
 
Milan_thesis.pdf
Milan_thesis.pdfMilan_thesis.pdf
Milan_thesis.pdf
 
Agentless Monitoring with AdRem Software's NetCrunch 7
Agentless Monitoring with AdRem Software's NetCrunch 7Agentless Monitoring with AdRem Software's NetCrunch 7
Agentless Monitoring with AdRem Software's NetCrunch 7
 
IBM Streams - Redbook
IBM Streams - RedbookIBM Streams - Redbook
IBM Streams - Redbook
 
bonino_thesis_final
bonino_thesis_finalbonino_thesis_final
bonino_thesis_final
 
Thesis_Report
Thesis_ReportThesis_Report
Thesis_Report
 
Master_Thesis
Master_ThesisMaster_Thesis
Master_Thesis
 
AWS Pentesting
AWS PentestingAWS Pentesting
AWS Pentesting
 
Distributed Traffic management framework
Distributed Traffic management frameworkDistributed Traffic management framework
Distributed Traffic management framework
 
Knapp_Masterarbeit
Knapp_MasterarbeitKnapp_Masterarbeit
Knapp_Masterarbeit
 
CS4099Report
CS4099ReportCS4099Report
CS4099Report
 
bkremer-report-final
bkremer-report-finalbkremer-report-final
bkremer-report-final
 
phd_thesis_PierreCHATEL_en
phd_thesis_PierreCHATEL_enphd_thesis_PierreCHATEL_en
phd_thesis_PierreCHATEL_en
 

Rzepnicki_thesis

  • 1. Evaluating the Impact of Content Delivery Networks on the Performance and Scalability of Content-Rich and Highly Transactional Java EE-based Websites Witold Rzepnicki Submitted to the Department of Electrical Engineering and Computer Science and the Faculty of the Graduate School of the University of Kansas in partial fulfillment of the requirements for the degree of Master’s of Science Committee Members: Hossein Saiedian, Ph.D. Professor and Thesis Adviser Arvin Agah, Ph.D., Associate Professor Prasad Kulkarni, Ph.D. Associate Professor Date Defended: February ??, 2008
  • 3. iii Abstract N-tier e-commerce environments present unique performance and scalability chal- lenges under highly variable transactional loads. This thesis investigates Content De- livery Networks (CDNs) as a tactic to manage transactional peaks and valleys in a “bursty” environment that has suffered from content delivery-related performance, scalability and, ultimately, availability problems during key selling seasons. We estab- lished monitoring infrastructure that consists of client (browser) and server-side per- formance monitors as well as server and network resource utilization measurements in order to gauge the full impact of our chosen CDN on response times, bandwidth utilization and CPU utilization. We ran a series of controlled experiments to establish performance and scalabil- ity gains at the Java EE component level and HTML object level of granularity. Our results suggest a 30 percent improvement in page response times, a four-fold improve- ment in bandwidth utilization and Web server CPU utilization reduction of almost 90 percent. At the same time, we were able to sustain twice the amount of Web traffic with zero outages and the exact same hardware footprint year-over-year. Our conclu- sions are applicable to many other n-tier e-commerce environments that suffer from content delivery issues.
  • 5. Contents Acceptance ii Abstract iii Acknowledgements iv 1 Introduction 1 1.1 Problem statement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 1.2 Justification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2 1.3 Significance and expected contributions . . . . . . . . . . . . . . . . . . . 3 1.4 Methodology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 1.5 Evaluation criteria . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 1.6 Thesis organization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 2 Background and Previous Work 8 2.1 Architectural overview of subject site . . . . . . . . . . . . . . . . . . . . 9 2.2 Website content . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9 2.3 Performance considerations and tactics . . . . . . . . . . . . . . . . . . . 10 2.4 Previous work related to WWW environments . . . . . . . . . . . . . . . 11 3 Proposed Solution 22 3.1 CDN selection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24 3.2 Solution configuration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25 3.3 Monitoring infrastructure configuration . . . . . . . . . . . . . . . . . . . 27 3.3.1 Performance measurement tools . . . . . . . . . . . . . . . . . . . 27 3.3.2 Scalability assessment framework and tools . . . . . . . . . . . . 29 4 Quality of Service Evaluation 38 4.1 Performance experiments . . . . . . . . . . . . . . . . . . . . . . . . . . . 38 4.1.1 Transaction types . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39 4.1.2 Server-side transaction capture . . . . . . . . . . . . . . . . . . . . 40 4.1.3 Bandwidth utilization monitoring . . . . . . . . . . . . . . . . . . 41 v
  • 6. CONTENTS vi 5 Analysis 43 5.1 Scalability and availability observations . . . . . . . . . . . . . . . . . . . 44 5.1.1 CPU utilization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44 5.1.2 Memory utilization . . . . . . . . . . . . . . . . . . . . . . . . . . . 46 5.1.3 Bandwidth . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47 5.2 Performance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50 5.2.1 Consumer Perspective . . . . . . . . . . . . . . . . . . . . . . . . . 50 5.2.2 Server-side Perspective . . . . . . . . . . . . . . . . . . . . . . . . 52 6 Conclusions and Further Research 56 6.1 Summary of Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . 57 6.2 Future Research . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58 Bibliography 59
  • 7. List of Figures 1.1 Queue network for a typical n-tier e-commerce system . . . . . . . . . . 5 2.1 Sample customer behavior model graph . . . . . . . . . . . . . . . . . . . 15 2.2 Subject site page views by month, 2004-2007 . . . . . . . . . . . . . . . . 16 2.3 Logical view of subject site . . . . . . . . . . . . . . . . . . . . . . . . . . . 17 2.4 Infrastructure diagram of system under study . . . . . . . . . . . . . . . 18 2.5 Websphere Commerce interaction diagram . . . . . . . . . . . . . . . . . 19 2.6 Typical approach for content management and publishing . . . . . . . . 20 2.7 Location of Internet bottlenecks . . . . . . . . . . . . . . . . . . . . . . . . 21 3.1 How DNS works . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23 3.2 Caching in DNS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24 3.3 URL rewrite process . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25 3.4 How Akamai works . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31 3.5 Application monitoring appliance network location . . . . . . . . . . . . 32 3.6 Application monitoring appliance measurement timeline . . . . . . . . . 33 3.7 Content delivery measurement combined timeline . . . . . . . . . . . . . 34 3.8 Sample CPU utilization report . . . . . . . . . . . . . . . . . . . . . . . . . 35 3.9 Memory page-in report . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36 3.10 Hourly page views for 2/11-2/15, 2005-2007 . . . . . . . . . . . . . . . . 37 4.1 Sample server-side performance report . . . . . . . . . . . . . . . . . . . 42 5.1 CPU utilization, February of 2006 and 2007 . . . . . . . . . . . . . . . . . 44 5.2 Operating system run queue: February of 2006 and 2007 . . . . . . . . . 45 5.3 Memory page-ins, February of 2006 and 2007 . . . . . . . . . . . . . . . . 47 5.4 Memory page-outs, February of 2006 and 2007 . . . . . . . . . . . . . . . 48 5.5 Memory scan rate, February of 2006 and 2007 . . . . . . . . . . . . . . . . 49 5.6 Peak bandwidth comparison, February of 2006 and 2007 . . . . . . . . . 50 5.7 Akamai bandwidth offload, February of 2007 . . . . . . . . . . . . . . . . 51 5.8 RTUM shop transaction response times in seconds . . . . . . . . . . . . . 52 5.9 RTUM browse transaction response times in seconds . . . . . . . . . . . 54 vii
  • 8. List of Tables 1.1 Subject site’s content characteristics, 1998-2007 . . . . . . . . . . . . . . . 2 1.2 Annual e-commerce customer satisfaction survey results . . . . . . . . . 3 4.1 Performance test nodes by ISP and location . . . . . . . . . . . . . . . . . 39 4.2 RTUM transaction characteristics . . . . . . . . . . . . . . . . . . . . . . . 40 5.1 RTUM browse transaction performance . . . . . . . . . . . . . . . . . . . 53 5.2 RTUM shop transaction performance . . . . . . . . . . . . . . . . . . . . . 53 5.3 Server-side performance in milliseconds . . . . . . . . . . . . . . . . . . . 53 5.4 TCP packet counts at hosting environment . . . . . . . . . . . . . . . . . 55 viii
  • 9. Chapter 1 Introduction Electronic commerce (e-commerce) dates back to late 1970s when the first electronic data interchange (EDI) systems were introduced to facilitate the exchange of data be- tween companies and institutions. The initial types of data exchange were limited to invoices and purchase orders and required significant financial investment in the computer networks and hardware. The first EDI-based e-commerce networks were later augmented with other forms of e-commerce such as automated teller machines (ATM) and telephone banking in the 1980s and subsequently enterprise resource plan- ning, data mining and data warehouse systems in the early 1990s. In late 1990s and early 2000s, the term e-commerce started to include consumer- based shopping transactions over the World Wide Web (WWW). These transactions typically include a combination of browsing of a website’s catalog and its related con- tent in the form of HTML, image files and marketing videos followed by the presenta- tion of e-shopping cart and check out transactions supported by secure payment au- thorization, generally over the HTTPs protocol. The United States Census Bureau [62] defines several important concepts related to this topic. For example, it defines e- commerce as “any transaction completed over a computer-mediated network that in- volves the transfer of ownership or rights to use goods or services.” This definition does not differentiate between monetary and non-monetary transactions as well zero price and paid transactions. The concept of e-business infrastructure is also defined as “the share of total economic infrastructure to support electronic business and conduct electronic commerce”. The examples of infrastructure thus include the computers, routers, satellites, network channels, system and application software, website host- ing and programmers. 1.1 Problem statement The goal of this thesis is to evaluate the impact of a Content Delivery Network on scalability and performance of a content-rich and highly transactional website that utilizes the Java Enterprise Edition (EE) platform. Since its inception in 1996 the web- 1
  • 10. CHAPTER 1. INTRODUCTION 2 Table 1.1: Subject site’s content characteristics, 1998-2007 Content Type Size 1998 2004 2007 Documents 0 KB 28 KB 34 KB Images 36 KB 42 KB 96 KB Scripts 0 KB 2 KB 20 KB StyleSheets 0 KB 0 KB 62 KB Total 36 KB 72 KB 181 KB site that comprises the experimental test bed of this thesis has dealt with significant scalability and performance challenges around major holidays. These challenges can be classified as transactional and content delivery-related. Over the last several years the subject site has been able to tune its system infrastructure to achieve acceptable trans- actional performance in the application server and database server tiers. However, it has struggled with Web server content delivery. As a matter of fact, it has not been able to service all of the requests during its most content-intensive peak week in February for the last several years. This is mostly due to the popularity of the free customiz- able Flash videos available on the website. Web servers responsible for delivering the Flash videos frequently get overloaded with long-running connections that result in high CPU utilization in the Web tier and a domino effect on the processing queues in the application server tier. We have decided to investigate the tactic of implementing a Content Delivery Network to help address the issues with content delivery not just for this particular peak week, but for the website in general. 1.2 Justification Text-only and informational websites from mid to late 1990s have transformed into rich and highly interactive environments that offer not only physical goods, but also services and digital assets such as MP3 files (e.g., itunes.com) for download. Ad- ditional features such as wish lists, consumer product reviews and message boards have also been introduced in order to aid consumers in objective product selection. Some sites even offer interactive chat-based product selection advice. This conver- gence of services and products as well as content presentation and interaction has re- sulted in much larger Web pages and increased diversity of e-commerce transactions. We used web.archive.org website as a source for a comparison of the subject web- site’s home page characteristics for years 1998, 2004 and 2007 in Table 1.2. The number of downloadable page elements has increased from 15 in 1998 to 28 in 2007 and the overall size of the home page has increased from 36 KB to 181 KB. Another intriguing aspect of the subject site’s growth is increased reliance on scripts
  • 11. CHAPTER 1. INTRODUCTION 3 Table 1.2: Annual e-commerce customer satisfaction survey results Satisfaction Level 2005 2004 2003 2002 2002 vs 2005 change Very Satisfied 40% 37% 40% 37% 3% Somewhat Satisfied 24% 24% 23% 22% 2% Neutral 31% 32% 30% 33% -3% Somewhat Dissatisfied 4% 5% 5% 5% -1% Very Dissatisfied 2% 3% 3% 3% -1% that run within browsers to enable rich consumer experience. The number of objects on HTML pages is the key indicator of the number of connections that will need to be opened between the client (browser) and the Web server. While e-commerce still represents only a minor fraction of total retail sales, it is quickly becoming a very important component of any major retailer’s presence in the marketplace. One study of European consumers shows increase in online participa- tion from 42 percent to 56 percent between 2003 and 2006. Forrester Research also re- ports that in 2006, on average, 80 percent of consumers reported researching products online with approximately 55 percent reporting either an online or offline purchase in the preceding three months. Nielsen Research [50] confirms this growth trend, but its research also points out that consumer satisfaction from online purchases has experi- enced only a slight growth as shown in the Table 1.2. Current level of consumer satisfaction underscores the need to continuously im- prove customer experience on the Web, especially with regards to website perfor- mance. More than half of online shoppers cite response time as a key influence on their shopping behavior. Consumers have come to expect fast, uninterrupted and relevant shopping experience as well as scalability during major holidays and contin- uous availability of services. Content delivery has become a critical component of the online shopping experience and Content Delivery Networks (CDNs) were established with the primary purpose of addressing issues with performance and scalability. 1.3 Significance and expected contributions Current model of content delivery on the Internet can be classified as centralized in a sense that a majority of the websites use the origin servers at their hosting environ- ments to serve content. User requests have to travel through several networks and encounter a myriad of bottlenecks including Peering, First Mile, Last Mile and Backbone - all of which are inherent to the way Internet works [2,3,5]. CDNs have emerged as collections of servers located in close proximity to the Internet backbone, and thus to the end user, with the purpose of reducing the number of “hops” between network
  • 12. CHAPTER 1. INTRODUCTION 4 nodes. This overlay network approach allows shorter routing paths and more efficient DNS routing that helps alleviate all but the Last Mile bottleneck. CDN companies claim that by running thousands of servers in hundreds of networks in geograph- ically dispersed locations they can deliver a significant performance and scalability boost to e-commerce websites. We will review the merits of this claim in our study of the subject site. We will also review the impact of the potential network benefits realized from implementation of a CDN on all processing tiers of a Java EE-based e- commerce environment. Additionally, we will evaluate tradeoffs that may result from deploying this solution. Based on our analysis of the subject site’s page profile, we concluded that the av- erage number of objects (images, cascading style sheets, JavaScript and Flash files) is approximately 40 items with the size of between 1 KB for a simple image to 400 KB for a Flash movie. While the load of 100 million page views per month probably would not pose a problem for the Web servers for 1 KB items, the story is quite different for the 400 KB items since they will take up a much longer connection time. Also, each hosting facility imposes limits associated with amount of network bandwidth avail- able to each customer. It is not unusual for the subject site to reach this limit during the key holidays. 1.4 Methodology Many studies to-date have focused on evaluating the network effects of implementing a CDN on the Internet Service Providers in terms of bandwidth efficiency and network traffic patterns. In these cases, the terms performance and scalability refer to the ability of the network to sustain “bursty” traffic patterns within reasonable response times during peaks with minimal changes in the hardware footprint. The methodologies used to measure the impact have revolved around implementing a network monitor that captured traffic patterns and characteristics of objects passing through the net- work. To our knowledge, there has been no research to-date that established a methodol- ogy to measure the correlations between performance improvements in content deliv- ery and individual tiers of n-tier e-commerce environments. We assert that in order to identify meaningful conclusions in such research the methodologies must encompass considerations of all tiers. Our approach is to conduct a more thorough case study of one particular highly transactional e-commerce environment with the purpose of identifying conclusions that could be applicable to a variety of n-tier platforms. In order to improve validity of our conclusions, we decided to analyze and char- acterize the workload for the subject site for one year prior to implementing the pro- posed solution and one year after. We will use a combination of network and ap- plication monitoring technologies for measurements. Our methodology will also en- compass software-level component measurements that will aid in analysis of relation-
  • 13. CHAPTER 1. INTRODUCTION 5 ships between content delivery and server-side Java EE components of the site’s ar- chitecture. This granularity is necessary due to inherent inter-dependencies between processing tiers when queuing networks are involved. Figure 1.1: Queue network for a typical n-tier e-commerce system Typical performance model of an e-commerce site could be represented by the queue network in Figure 1.4 according to [44]. According to this model, the bottle- necks in the database (DB) server tier could result in requests queuing up at the Web server tier and vice versa. Also, performance improvements in any of the tiers will most likely result in either a positive or negative impact on other tiers. 1.5 Evaluation criteria Our evaluation framework will include consumer-centric and system-centric metrics. Consumer-centric metrics focus on response times as observed by the users of the site. We will implement an appliance-based monitoring solution that captures HTTP and HTTPS traffic at the URI stem leave and provides a high-level view of application re- sponse times. It will give us benchmarks for key function points of our e-commerce solution, including browsing of Category and Product Detail pages as well as Shop- ping Cart, Billing and Order Confirmation components. The tool will provide us with measurements in terms of network, client and server time. We will capture measure- ments from before and after the CDN implementation.
  • 14. CHAPTER 1. INTRODUCTION 6 The existing website traffic monitoring solution will be utilized to capture traffic characteristics in terms of typical e-commerce measures such as page views, visits and orders. This data will enhance our application monitoring with aggregated load metrics. System metrics, especially the correlation between system resource utilization (CPU, memory and I/O) and content delivery offloading, will also need to be studied in order to measure benefits from implementing a CDN in terms of hardware footprint reduction and bandwidth cost efficiency. We will use a set of typical UNIX server utilities to measure CPU, memory and network interface utilization for servers in all three tiers. We expect that making our content available via a network of geographically dis- persed servers will have a significant impact on bandwidth utilization. Most CDNs provide a monitoring console that allows customers to measure the origin server of- fload effect using several metrics. We will use the console to measure the size and types of website objects that most benefit from this solution and to quantify savings in terms of bandwidth and the number of connections to the Web servers that were avoided. Our expectation is that the CDN will offload a significant amount of band- width from the network of our hosting environment. Current limitation imposed on our subject site by the hosting facility is 1 Gbps (gigabit per second). The evaluation would not be complete without tradeoff analysis of our tactic. Cur- rent research points to manageability and the speed of content propagation as key concerns when implementing a CDN. In this case, manageability refers to the abil- ity to deploy configuration and content changes into the CDN. We will measure the amount of time necessary to have content changes reflected in the overlay network from the origin server as well the amount of time needed to deploy configuration changes. 1.6 Thesis organization The thesis will be organized as follows: • Chapter 1: Introduction - The background and justification for the research presented in this thesis. • Chapter 2: Background and Previous Work - The architectural background of the subject site and related work in measuring and characterization of perfor- mance and scalability of WWW environments. • Chapter 3: Proposed Solution - The selection and implementation of a Content Delivery Network and performance monitoring tools. • Chapter 4: Quality of Service Evaluation: Performance and Scalability - The impact of the CDN on the subject site and its processing tiers.
  • 15. CHAPTER 1. INTRODUCTION 7 • Chapter 5: Analysis - Validation of quantitative and qualitative results of the implementation • Chapter 6: Conclusions and Further Research - The conclusions resulting from this research and proposed areas for further investigations
  • 16. Chapter 2 Background and Previous Work Bass et al. [15] give a generic definition of performance as responsiveness of the system characterized by the time required to respond to events. The level of performance is often described by the number of user or system transactions per unit of time. In e- commerce, this concept encompasses several measures of throughput in terms of Web page response times and number of page views for a given period of time. It can also be expressed in terms of responsiveness and throughput capacity of key server- side Java EE components such as Servlets and Enterprise Java Beans. Each site has a unique traffic profile, but a generic traffic pattern can be represented by the Customer Behavior Model (CBM) state transition graph in Figure 2 as proposed by Menasce et al. in [45]. The CBM graph identifies several key states in the browsing and shopping expe- rience. The numbers for each state transition represent its probability. Hence, there is a 30 percent chance of a customer selecting the Search page from the Browse page. Performance of a website can thus be characterized by the response times for each of these states under a particular level of volume of user site visits. A visit consists of multiple requests to different components of the browsing and shopping experience. For the purpose of this study, we will further breakdown the performance of each behavior point into three components: network time, client time and server time. The network time is the amount time that a request and response pairs spend traveling to and from the load balancers. The client time represents the amount of time it takes the Web browser to issue an HTTP request and render the response in the form of HTML and other objects. The server time is the amount of time each request spends on the Web, application and database servers. Scalability represents the ability for a given site to handle its projected traffic vol- ume as the number of requests for the critical points grows while maintaining its cur- rent hardware footprint. The latter is particularly important as most websites find it difficult to significantly change their footprint during a critical business period. For example, the subject website experiences up to four-fold increases in page views and a 15-fold increase in bandwidth usage during some of its critical business periods. 8
  • 17. CHAPTER 2. BACKGROUND AND PREVIOUS WORK 9 Figure 2 represents the subject site’s transactional bursts during key shopping holi- days. Brebner et al. [22] provide several benchmark-based insights into the potential scalability of J2EE environments. 2.1 Architectural overview of subject site A typical e-commerce website can be represented in a series of views that depict its functional components, software modules and hardware infrastructure as proposed by [20, 42]. For the purpose of this study we will analyze a website that has existed since 1996 and was recently (2003) re-architected using the IBM Websphere Commerce software components using the Java Enterprise Edition (Java EE) platform. The site’s business process view is represented in Figure 2.1. The business process view high- lights several key functional components of the subject website, namely the catalog, marketing and order management features. The functional view is supported by the hardware and software components as shown in Figure 2.1 and Figure 2.1. 2.2 Website content In the realm of e-commerce, content refers to any information made available for con- sumers to download. This includes catalog data, HTML pages, images, audio files, video files in Flash or other formats and even software downloads. Efficient man- agement and deployment of content-related assets have thus become mission critical activities for many organizations that depend on the Internet as a sales channel. E- commerce websites rely on three types of assets to support the selling function: cat- alog data assets, content assets and application assets. Catalog assets are used to define products with characteristics such as price or product name. Content assets represent editorial content such as marketing information, product reviews and promotional information. Application assets include software components created by software de- velopers that are generally managed separately from the other two types of assets. Content can also be classified as dynamic or static. Dynamic content is generated upon consumer’s request and it may be unique for each request. It is typically assembled on the application server with involvement of the DBMS. Static content is represented by HTML, client-side scripts (JavaScript), images and other multimedia assets that do not change from request to request. This type of content generally resides on Web server(s) and is served up from there. Figure 2.2 represents a typical approach to management and publishing of static and dynamic content. It is important to note that generally the static content will be published to a file system and dynamic con- tent will be made available via server side components with support of a DBMS. The fundamental concepts of content management and publishing are covered in [19]. Even if we were to achieve infinite scalability and performance at the hosting fa- cility we would still have a dependency on the Internet as a whole for delivery of
  • 18. CHAPTER 2. BACKGROUND AND PREVIOUS WORK 10 content to the browsers. The Internet could not function efficiently unless the indi- vidual networks that make its backbone can also communicate with one another. The process of exchanging data between two networks is called peering and it takes place between two routers also known as peering points located at the edge of each net- work. The two networks use an exterior gateway protocol called the Border Gateway Protocol (BGP) to send traffic between them. The interior gateway protocol (OSPF and RIP) is used to route traffic within a network. Hence, the peering points and rout- ing protocols connect different networks into a unified infrastructure that allows the Internet to function efficiently. Internet is made up of many different networks that communicate with one another using the IP protocol. Individual networks consist of a variety of routers, switch-es and fiber in order to move data packets within the net- work. The structure of “network of networks” results in several potential scalability and performance bottlenecks. They are generally classified as: first mile, peering points, backbone and last mile. Figure 2.2 represents the types and location of the bottlenecks as described by [3]. The First Mile bottleneck refers to the limitations in the website’s connectivity to the Internet via its Internet Service Provider (ISP). In order to reach desired scalability it needs to continuously expand its connectivity to the ISP. The ISP, in turn, must also expand its capacity in order to meet its customers’ scalability requirements. Peering points also represent potential bottlenecks as large networks are not economically mo- tivated to scale up the number of peering points with the networks of their competi- tors, especially since a significant portion of the traffic handled by the peering points is transit traffic with packets originating on other networks. This lack of competitive and financial motivation over time has resulted in a limited number of peering points across major networks. The Backbone Problem refers to the fact that the ISPs’ router capacity has historically not kept up with growth of traffic demands. Finally, the Last Mile problem reflects the limited capacity of a typical user’s connection to their ISP. It’s important to note that just solving one of the above bottlenecks, such as the Last Mile, by increasing the reach of broadband connectivity at home will not automat- ically address the other limitations. These need to be treated as separate problems that, if addressed, would help solve the problem as a whole. 2.3 Performance considerations and tactics Most websites measure their performance and scalability using a combination of user and system-centric metrics. Page response times represent the user-centric metric while the CPU, memory and disk utilization of the web, application and database servers illustrate the system dimension of performance. High consumer expectations have led many highly-transactional websites to implement a variety of technology strategies to achieve a combination of high performance and scalability [10]. Typical approaches include scaling both horizontally and vertically. Horizontal scalability en-
  • 19. CHAPTER 2. BACKGROUND AND PREVIOUS WORK 11 tails increasing the number of computing nodes at each layer while vertical scalability focuses on increasing computing capacity of the existing nodes. These tactics address the bottlenecks at each computing node and can have significant impact on what we defined as server time. However, they generally have minimal impact on the network and client times. Caching has also been used as a tactic to achieve better performance. It can refer to caching of static content on the Web servers or caching dynamic content in memory of the application server. We will refer to this concept as centralized caching. Centralized refers to caching mechanism(s) that are implemented at the hosting facility and are not geographically dispersed. The key limitation to the caching tactic is the overall size of the computing environment and the incremental cost of adding new hardware to support growth. 2.4 Previous work related to WWW environments Web server behavior has been studied extensively since the early days of the In- ternet [11, 13, 48] . Network considerations have also been given a lot of attention with much research in the area of HTTP and TCP - the core protocols of the Inter- net [12, 41, 51]. Workload characterization should be a key component of any perfor- mance or scalability study regardless of the tactics used to achieve the two qualities of service in an e-commerce environment. Early studies [11] in this space attempted to generalize workload classifications in terms of Web server characteristics. Pitkow [53] summarizes a group of traffic characterization studies on the World Wide Web until 1998. Subsequently, numerous characterization studies of Web server workloads were conducted, including [9,10]. Menasce et al. [45] were first to extend workload modeling studies to include e- commerce business transactions and not just Web server traffic. They presented a technique to model workloads on the Web using the Customer Behavior Model Graph profiles. We will use it to establish performance baseline and criteria to measure im- provement in performance that we are expecting from implementing the CDN tactic. Performance modeling of n-tier systems and the associated queuing networks was also covered in [32]. Crovella et al. [27] conclude that e-commerce workloads tend to vary very dynamically and exhibit short-term fluctuations. We agree with this as- sessment based on several years’ worth of traffic measurements. Urgaonkar et al. [61] represents some of the most recent work in performance modeling of multitier Inter- net systems based on a network of queues that represent application tiers. One of the key conclusions from the workload studies so far is that the popularity of content elements on the Web follows a Zipf-like distribution [23]. Zipf distribution supports the need for caching - considering that 25-40 percent of pieces of content account for 70 percent requests. This concentration of access to key resources on the site, such as Flash videos in the case of the system under study, strongly suggests that
  • 20. CHAPTER 2. BACKGROUND AND PREVIOUS WORK 12 any meaningful level of caching should help performance as long as there exists a cache replacement policy that minimizes the unavailability of the key resources. Bent et al. [16] have determined that much of the workload of some of the busiest websites remains uncacheable. Perhaps the largest shortcoming of work to-date is the lack of analysis of correla- tions between de-centralized content delivery tactics such as a CDN and performance characteristics in the web, application server and database tiers. The existing body of research exhibits a tendency towards evaluating individual tiers in the context of content delivery instead of studying the relationships and interdependencies between them. This is not surprising since the first implementations of websites focused on serving static content. Our work builds on the existing research by augmenting the queueing models with the new de-centralized content delivery tier and studying the impact of such implementation across tiers. We will need to use some sort of a benchmark in order to determine performance gains from deployment of a CDN. Benchmarking of e-commerce applications has been addressed by the Transaction Processing Council’s TPC-W (obsolete as of 2005) and TPC-App specification [57]. Many websites have used TPC-W as a benchmark for simulating the load on an e-commerce site [43]. We have the advantage of dealing with a live website, so we do not need to use the TPC specification for load gener- ation. However, we will refer to its performance metrics and response time capture requirements in order to assess improvement. Performance and scalability of Web-based environments have generated a lot of interest within the research community. These environments usually involve static requests for documents and dynamic requests for data or content generated by the combination of application and database servers [36]. Iyengar’s paper also points out that serving dynamic content can be orders of magnitude more CPU-intensive than serving static content. Andreolini et al. [8] conclude that there is a need for differ- ent levels of measurement granularity in order to identify bottlenecks. The levels are classified into system-level, node level, resource level and component level metrics. This study further classifies hardware resource utilizations into CPU, disk and net- work utilization types. We also use classification in of bottlenecks could of client, net- work and server constraints. Client bottlenecks are related to the ability to efficiently render downloaded content within the browser. The diversity of personal comput- ing environments, from Personal Digital Assistants to full-blown personal computers contributes to a large variance in client-side performance. Network bottlenecks result primarily from network congestion at the origin net- work or the Wide Area Network [14]. The server-based bottlenecks result from inter- tier interactions and associated CPU, I/O and memory utilization costs. Menasce et al. [44] discuss the system bottlenecks and tactics to address them such as replicating Web server queues into a cluster of Web servers in order better balance the load. We will utilize the existing research around performance bottlenecks to identify metrics that will help us prove the impact of the CDN tactic. The recurring theme in perfor-
  • 21. CHAPTER 2. BACKGROUND AND PREVIOUS WORK 13 mance and scalability research is the continuous need offload as much of repetitive processing from the websites as possible. For example, Challenger et al. [24] deter- mined that the cost of generating a cacheable response is 500 times higher from the CPU utilization standpoint than serving it from a cache. Caching has been a part of many performance evaluations and recommendations for improvement of the World Wide Web. Challenger et al. developed an approach for consistently caching dynamic Web data that became a critical component of the 1998 Olympic Winter Games Website [25]. Performance results from implementing a Web proxy cache as well as the impact of various cache replacement strategies are also presented in [9]. Additionally, Arlit et al. present a number of workload char- acteristics and their impact on effectiveness of caching. Rodriguez et al. [55] discuss different types of Web caches including hierarchical and distributed and conclude that geographic dispersion of content reduces the expected network distance to access a document as well as band-width usage. Their results confirm the value proposition of caching in general, but they stop short of recommending a decentralized caching approach such as a CDN. Nahum’s paper [49] is one of the first efforts to consider Wide Area Network effects in website performance studies. Wide Area Network’s implications on performance are also discussed in [8]. A very detailed overview of proxy-based caching with emphasis on dynamically generated content is also pre- sented in [1]. They do not characterize the types of workload and do not compare the response times with and without CDN as well resource utilization on the Web servers and how a CDN impacts it. The case for geographic distribution of content has been made by the research community for quite some time [28, 52]. Perhaps the most comprehensive review of CDN-related concepts is presented in [63]. We found Verma’s summary of the po- tential applicability of a CDN to be particularly relevant. According to his work, the application is a good CDN candidate if • it has a high ratio of reads compared to writes • client access patterns tend to access some set of objects more frequently • limited windows of inconsistent data are acceptable • data updates occur relatively slowly Our analysis of the subject site shows consistency with the first three items, but we have to concede that the data updates have to occur quickly in order to support ever-changing content on the site. Nonetheless, the tradeoffs are well worth further investigation of CDN’s applicability and the potential performance benefits. What we found lacking in Verma’s work was a comprehensive analysis of sample websites that have implemented a CDN and benefited from it. We had to look no further than [17] to get a first glimpse of potential real-life benefits of CDNs using DNS redirection and URL rewriting. This study was conducted on home pages of several hundred most
  • 22. CHAPTER 2. BACKGROUND AND PREVIOUS WORK 14 popular websites as of November and December of 2000. One of the key findings was that the networks performed significantly better than the origin sites even with the overhead of DNS redirection. Other studies [39], confirm the CDN-related reduction of download response times even with the DNS redirection tradeoff. Finally, Su et al. [60] present a set of measurements for the Akamai delivery network along with a detailed explanation of its architecture. This is the first study we are aware of that presents data to support the network path selection efficiency gains that result from implementing a CDN. The key missing component in the research to-date is the impact of reduced down- load times on e-commerce transactions as a whole from client, server, network and consumer performance perspective. Also, due to proprietary nature of surrogate net- works and e-commerce websites, there has been little more than anecdotal evidence of quality attribute improvements for production environments. Our intent is to close this gap with data that speaks to the impact of a CDN across the enterprise processing tiers of a production e-commerce site.
  • 23. CHAPTER 2. BACKGROUND AND PREVIOUS WORK 15 Figure 2.1: Sample customer behavior model graph
  • 24. CHAPTER 2. BACKGROUND AND PREVIOUS WORK 16 Figure 2.2: Subject site page views by month, 2004-2007
  • 25. CHAPTER 2. BACKGROUND AND PREVIOUS WORK 17 J2EE Application Server E-Commerce Engine Administration Tools Admin. Console Commerce Accelerator Organization Admin. Console Business Context Engine Policies Roles and Relationships Analytics Business Process Personalization Globalization Business Process Subsystems Member Subsystem Trading Subsystem Catalog Subsystem Marketing Subsystem Merchandising Subsystem Inventory Subsystem Order Subsystem Business Intelligence Websphere Commerce Payments Server E-Commerce Engine Database create content retrieve data manage business rules settle payments Content Management Portal Content Management CM Console Content Publishing Content Integration Utility Content Data Sourcing Scripts Data & Content create create Site Search Engine Components Consumer Search Engine Indexing Rules Indexed Objects index data & content execute search CM Database create source www.example.com view content access store features Figure 2.3: Logical view of subject site
  • 26. CHAPTER 2. BACKGROUND AND PREVIOUS WORK 18 Figure 2.4: Infrastructure diagram of system under study
  • 27. CHAPTER 2. BACKGROUND AND PREVIOUS WORK 19 Figure 2.5: Websphere Commerce interaction diagram
  • 28. CHAPTER 2. BACKGROUND AND PREVIOUS WORK 20 Figure 2.6: Typical approach for content management and publishing
  • 29. CHAPTER 2. BACKGROUND AND PREVIOUS WORK 21 Figure 2.7: Location of Internet bottlenecks
  • 30. Chapter 3 Proposed Solution Our solution will consist of several components that will need to be deployed, con- figured and tested at our hosting facility. We will also need to make configuration changes in our network, namely the DNS server, in order to re-route certain portions of our website’s traffic to the chosen CDN. The high-level implementation steps can be outlined as follows: 1. Select a CDN provider 2. Configure solution components 3. Implement network and application monitoring infrastructure 4. Test implementation Web traffic re-routing almost always involves DNS - one of the core subsystems of the Internet. We present a short overview of key DNS concepts as described in [7,46,47]. Figure 3 depicts a typical series of steps in resolution of a URL to an IP ad- dress for www.example.com. DNS is essentially a distributed database that follows the client-server architecture. Adequate performance of DNS is achieved through replication and caching. The server side portion of a request is handled by programs called name servers. They contain information about a portion of the global database and are capable of forwarding requests to other authoritative servers if necessary. The information is made available to the client-side software components called resolvers. A typical domain name on the Internet consists of two or more parts separated by dots such as my.yahoo.com. Top-level domain (TLD) represents the rightmost por- tion, .com in our case, while the subdomain(s) are represented by the labels to the left of the top-level domain. In our example, my.yahoo.com is a subdomain of yahoo.com, which in turn belongs to the .com top-level domain. Finally, the hostname refers to a domain name that has one or more IP addresses associated with it. Each domain or subdomain has an authoritative server associated with it. It contains and publishes information about 22
  • 31. CHAPTER 3. PROPOSED SOLUTION 23 Figure 3.1: How DNS works the domain and any other domains it encapsulates. Root nameservers reside at the top of the DNS hierarchy and they are queried first to resolve TLD. Caching and emphtime- to-live (TTL) are very important concepts in DNS and, as we will later discover, in CDN implementations [38,56]. IP mappings obtained from DNS can be stored in the local resolver for a period of time as defined in TTL. This greatly reduces the load on the DNS servers. Figure 3 shows the usage of local cache in the resolution process. Our DNS config- uration follows the configuration steps from [35] for the AIX 5.3 environment using the BIND protocol. We have created a set of configuration files that will define the name server mapping and resolvers and then we will start the appropriate processes (daemons) on the server to activate DNS resolutions.
  • 32. CHAPTER 3. PROPOSED SOLUTION 24 Figure 3.2: Caching in DNS 3.1 CDN selection The main purpose of a CDN is to direct consumer requests for objects to a server at the optimal Internet location relative to the consumer’s location. The key components of a CDN architecture are described in [37]. They are defined as: overlay network for- mation, client request redirection, content routing and last-mile content delivery. The two most common techniques employed by the networks are DNS redirection and URL rewriting. Wills et al. [17] provide a good overview of these techniques and their effec- tiveness. The DNS redirection technique utilizes a series of DNS resolutions based on several factors such as server availability and network conditions with the purpose of identifying the most suitable server. This identification step is usually encapsu- lated into a series of algorithms that are proprietary to each network’s implementa- tion. The end result is a DNS response with the IP address to the content server. The response includes a time-to-live value that is usually limited to less than a minute (in the case of Akamai it is 20 seconds). The TTL has to be set to a relatively low value because the network conditions and server availability change constantly and quick IP re-mapping is key. The DNS redirection technique can facilitate either a full- or partial-site delivery. Will full-site delivery, all requests to the origin server are directed using DNS to a CDN server. If the CDN server can’t fulfill the request it simply routes it back to the origin server. Several networks, including Adero and NetCaching, employ this deliv- ery model. The main shortcoming of this model is the additional routing overhead of wasted DNS requests that could have been handled by the origin server to begin with. With partial-site content delivery, on the other hand, the origin site modifies the URLs for certain objects or object directory locations to be resolved by the CDN’s DNS server. This approach seems to be well suited for our website due to its combination of static digital assets and dynamically generated server-side presentation components. URL rewriting is another potential solution for server lookups. With this tech- nique, the origin server continuously rewrites the URL links for dynamically gener- ated pages in order to redirect them to the appropriate CDN server. The DNS func-
  • 33. CHAPTER 3. PROPOSED SOLUTION 25 Figure 3.3: URL rewrite process tionality remains on the origin site with this approach. When a page is requested by the user it will be served from the origin server. However, before it is served, all of the embedded links will be rewritten to point to the CDN’s DNS. Figure 3.1 shows a typi- cal rewrite approach. The main drawback to the URL rewrite approach from the mea- surement standpoint is the fact that the rewrites usually take place at the Web server tier. Hence, the rewrite steps would inevitably introduce additional background noise to our performance measurements. Therefore, we decided to avoid this approach for the purpose of our study. The latest list of available CDN implementations is maintained in [29]. At the time of writing of this thesis we counted 18 different networks on Davison’s website. It is not our primary purpose to evaluate tradeoffs between the various networks and their implementations. The choice we made does not reflect a belief in superiority of one network over others - it is merely a reflection of the need to get our experimental test bed up and running as quickly as possible within boundaries imposed on us by our existing hosting facility. For our implementation, we settled on partial-site, DNS redirection-based CDN implementation using the Akamai delivery network. 3.2 Solution configuration There exist several public sources that describe the DNS redirection mechanism im- plemented by Akamai. We relied on [21, 40, 60] for more in-depth view of the inner workings of this CDN. Akamai’s website and the available white papers [3] provide the steps for configuring our network environment to redirect partial-site traffic. We will start by explaining how Akamai works and then move on to our configuration steps. Figure 3.2 depicts the DNS redirection process employed by this CDN. In essence, Akamai performs a highly complex translation of a customer’s domain to the IP address of the most suitable edge server. First, the Web browser requests
  • 34. CHAPTER 3. PROPOSED SOLUTION 26 an HTML object. In order to accommodate this request, the local DNS resolver has to translate the domain name into an IP address. The resolver issues a query to the customer’s DNS server which in turn forwards the request to the Akamai network. This is enabled via a configuration of a canonical name record (CNAME) in the origin site’s DNS name server. The CNAME triggers the request redirection to the CDN. Next, a hierarchy of Akamai servers responds to the request using the requestor’s IP address, the name of the CDN customer, and the name of the requested content as seeds for its DNS resolution. The CDN name resolution step is perhaps the most critical in this sequence of events. Configuration of the Akamai CDN is described in [4]. The steps for our deployment can be summarized as follows: 1. Create origin hostname 2. Activate Akamai edge hostname 3. Activate content delivery configuration 4. Point website to Akamai network In our case, this process begins with configuration of a CNAME in our DNS name server. A CNAME record maps an alias or nickname to the real name which may lie outside the current zone. Typical format of a CNAME entry is as follows: name ttl class rr canonical name www IN CNAME joe.example.com. We need to set up an origin server hostname that will resolve to our content server. This server will be used by Akamai edge servers to retrieve our content, so it can be made available to all of the nodes in the CDN. The naming convention for the origin server is: origin-<website> where “website” refers to the is the hostname for our content that will be delivered from Akamai. Our website stores all of its static content in the generic images folder, so we will define the following origin server name: origin-images.example.com for images.example.com Next, we will create a DNS record for our origin server hostname on our authoritative name server. We will use the CNAME record type for this step. origin-www.example.com IN CNAME loadbalancer.example.com We are now pointing our website to the Akamai network. An edge hostname will need to be activated on an Akamai domain for our website using the CDN’s configuration console. It will resolve to the Akamai network. For example, www.example.com would have to point to www.example.edgesuite.net and www.example.edgesuite.net would in turn resolve to individual servers on the Akamai network since it owns the edgesuite.net domain. The remaining configuration steps need to be performed in the configuration console of Akamai and they are covered in-depth in [4].
  • 35. CHAPTER 3. PROPOSED SOLUTION 27 3.3 Monitoring infrastructure configuration We will utilize three different types of monitoring solutions in the course of our per- formance experiments. Monitoring infrastructure will help us obtain measurements from three different perspectives: consumer, application and CDN. We need this full view in order identify correlations between the CDN implementation and the key performance characteristics in terms of system and user response times. The CDN monitoring will aid the bandwidth scalability improvement assessment. 3.3.1 Performance measurement tools The consumer perspective will be measured using a real-time user monitoring (RTUM) service. RTUM services [18] provide details of user interactions with a website by reporting on user application successes and failures. We will define a series of trans- actions with sequences of browsing, clicks, object downloads and data input. This is functionally equivalent to a consumer interacting with our website via a Web browser. Transactions will be configured with our service provider and deployed into its net- work of six different geographically dispersed origin locations within the U.S. with ac- cess speeds ranging from dial-up to broadband. We will configure two different trans- actions with multiple steps within them. The two types of transactions are necessary because different paths through our website tend to have different content delivery characteristics in terms of number of assets delivered and the number of server-side components triggered. Our service provides very detailed HTML object-level report- ing on transaction steps which will include going to the home page, catalog browsing, product selection, adding an item to a shopping cart and checking out. The RTUM time measurements will break the download time into slices consisting of DNS lookup time, Web server connect time, content time and SSL time. Following are the definitions for the key metrics used by our RTUM: DNS look-up The process of calling a DNS server to lookup and convert a hostname to an IP address. For instance, to convert www.foo.com to 10.0.0.1 Connect time The time it takes to connect to a Web server (or CDN edge server in our case) across a network from a client browser or an RTUM agent Secure sockets layer time The time it takes to create an SSL TCP/IP connection with a website. First byte time The time between the completion of the TCP connection with the des- tination server that will provide the displayed page’s HTML, graphic, or other component and the reception of the first packet (also known as first byte) for that object.
  • 36. CHAPTER 3. PROPOSED SOLUTION 28 Content download time The time in seconds that measures the actual time to deliver content (images, HTML, or other objects) from the Web server to the browser. The application perspective will be captured using an appliance based application monitoring solution. The network location of this appliance is depicted in Figure 3.3.1. We will configure “watchpoints” using the appliance’s configuration tool to capture server-side response times of Java EE components corresponding to the transaction steps defined in the RTUM service. The appliance uses passive traffic analysis to cap- ture actual transactions from the RTUM within our hosting environment and mea- sures performance and availability of our e-commerce application as a whole. The im- portant difference between this and other approaches is that our appliance does not generate any traffic and the only performance overhead it introduces is reading the copy of traffic from the network connection. The data is assembled into requests for objects, pages and user sessions. Performance metrics include host, SSL and redirect times. This solution also measures server errors or prematurely terminated connec- tions due to increase in traffic. Figure 3.3.1 depicts the measurement timeline for a sample request that would be captured by our appliance [26]. The appliance solution groups latency into the following six categories and defines them as follows: Host time This is the combined time the Web, application, and database servers take to process a request. Host time is a key measure to assess performance impli- cations of implementing a CDN on performance of our Java EE components (servlet, EJBs, etc.). It can be very short in the case of a static image or long in cases of long reports and complex server-side transactions such as adding a list of items to the shopping basket. Network time This is the time spent traveling across intervening networks. Once the server has prepared its response, host time is over and network time begins. A small object might be delivered quickly; a large one might take a long time. This time is highly dependent on the type of consumer’s connection. Low-bandwidth connections will result in higher network times and vice versa with broadband connections. Our monitoring appliance also records additional information on packet loss, out-of-order delivery, and round-trip time to help with this diagno- sis. SSL time The appliance will record the time spent negotiating the encryption of en- crypted transactions. This portion of the SSL time represents the server-side latency elements of the handshake versus the client-side SSL time captured by the RTUM. Redirect time This is the time the site spends sending a request on to other pages. In some applications, a request for a page results in a redirect that usually points elsewhere. This delay is recorded as redirect time.
  • 37. CHAPTER 3. PROPOSED SOLUTION 29 Idle time When a browser is retrieving a page, but there is no activity between ob- jects on the same page, the HTTP interaction is defined as “idle”. This measure- ment is key to understanding the amount of time spent processing client-side scripts such as JavaScript. When there is inactivity in the middle of rendering the page within the browser, our appliance will measure it as idle time. End-to-end This is the total time for the object or page, from the moment the first packet of a request is seen until the browser acknowledges delivery of the last packet. In order to quantify the benefits from implementing a CDN we will need to com- bine server-side measurements taken by our application monitoring appliance with client-side RTUM service data. The combined response timeline for the two monitor- ing components is represented in Figure 3.3.1. We will capture measurements for all elements of the above timeline and compare their pre- and post-CDN response times. The individual latency components can also be summarized using the following two equations: Tee = Tdns + Tnetwork + Thost + Tsslclient + Tsslserver + Tidle + Thttpredirect (3.1) where Tnetwork = Tconnect + Tfirstbyte + Tdownload (3.2) We will use them to assess the CDN’s impact on the latency as a whole in terms of end-to-end and network times. 3.3.2 Scalability assessment framework and tools Duboc et al. [30] argue that scalability of a system should be seen in the context of its requirements. Scalability is often associated with performance and the terms have at times been used interchangeably. However, performance is just one of the dimen- sions used to represent scalability. We particularly agree with the authors’ assessment that scalability is implicit on the relationship between cause and effect as observed by a stakeholder. In our case, we define scalability as a relationship between the im- plementation of a CDN on the following key dependent variables for our site and its infrastructure: page views, CPU utilization, memory utilization, and bandwidth. Based on the claims made by our CDN solution provider, we expect it to offload a significant portion of connections to our Web server tier. We also expect a signifi- cant decrease in CPU, memory and input/output (I/O) utilization of the Web tier that should result in higher website capacity in terms of page views - the key dependent variable of our scalability assessment. The third component where we should be able
  • 38. CHAPTER 3. PROPOSED SOLUTION 30 to observe a significant improvement is the bandwidth usage of our environment at the hosting facility since a significant portion of it will be served from the edge servers. We will use a combination of monitoring reports available with our CDN and our in- ternal server usage measurements to determine these scalability characteristics in the context of traffic bursts our site experiences during the February peak week. The internal server measurements will be taken using the vmstat UNIX util- ity [33]. We will set the appropriate flags in vmstat in order to enable capturing of utilization statistics for CPU and memory usage in the Production (Prod) environment during the most traffic intensive time interval for our website - February 11th through February 15th. The Prod environment consists of multiple servers in the Web and ap- plication tiers and one server in the database tier. Typical operational reports for our website include CPU and memory usage-related charts whose data is gathered via vmstat. The examples of the reports are represented in Figure 3.3.2 and Figure 3.3.2. The CPU utilization chart simply depicts the usage of all available CPU resources for a given server. Servers may have multiple CPUs installed, so data is averaged out across all of them for a given node. The data represents both kernel and any other processes, so it gives a full picture for this particular resource. Physical memory is a finite resource on any system. The UNIX memory handler manages the memory allocations. The kernel is responsible for freeing up physical memory of idle processes by saving it to disk until it is needed again. Paging and swapping are used to accomplish this task. Paging refers to writing portions, termed pages, of a process’ memory to disk. Swapping refers to writing the entire process, not just part, to disk. Page-out represents the event of writing pages to disk, while page-in is defined as retrieving memory data from disk. Page-ins are common and under nor- mal circumstances are not a cause for concern. However, if page-ins become excessive the kernel can reach a point where it’s actually spending more time managing paging activity than running the applications, and system performance suffers. This state is referred to as thrashing. Our memory measurements have focused on identifying ex- cessive memory utilization in terms of paging characteristics as shown in Figure 3.3.2. We will compare the pre-Akamai resource utilization for 2006 and post-Akamai resource utilization for 2007 for this period of time. Figure 3.3.2 represents the traffic patterns for the years 2005, 2006, and 2007 and it speaks to the type of traffic burstiness our site experiences during the holiday. Even though year 2005 is not included in our resource measurements, we decided to show it on the chart to illustrate the traffic pattern during the peak week over the last several years. It’s important to note that sudden drops in traffic for years 2005 and 2006 represent site outages due to Web servers getting overloaded with incoming requests.
  • 39. CHAPTER 3. PROPOSED SOLUTION 31 Figure 3.4: How Akamai works
  • 40. CHAPTER 3. PROPOSED SOLUTION 32 Figure 3.5: Application monitoring appliance network location
  • 41. CHAPTER 3. PROPOSED SOLUTION 33 Figure 3.6: Application monitoring appliance measurement timeline
  • 42. CHAPTER 3. PROPOSED SOLUTION 34 Figure 3.7: Content delivery measurement combined timeline
  • 43. CHAPTER 3. PROPOSED SOLUTION 35 Figure 3.8: Sample CPU utilization report
  • 44. CHAPTER 3. PROPOSED SOLUTION 36 Figure 3.9: Memory page-in report
  • 45. CHAPTER 3. PROPOSED SOLUTION 37 Figure 3.10: Hourly page views for 2/11-2/15, 2005-2007
  • 46. Chapter 4 Quality of Service Evaluation We have devised two different ways to measure performance and scalability impacts resulting from implementing a CDN on our website. The need for two separate stud- ies resulted from the fact that performance observations tend to be impacted by the amount of load on the system under study. On the other hand, we need sufficient load to assess scalability improvements. These two goals are mutually exclusive, so we de- cided to take measurements in two different runtime environments. In order to have a more controlled environment for performance measurements, we decided to obtain response time experiments in our performance evaluation (PerfEval) environments. It represents a scaled down replica of our production (Prod) environment based on the rPerf benchmark measurement [31]. We determined that from the resource capac- ity perspective our PerfEval environment represents approximately 17 percent of our Prod capacity. We will discuss the details of the experiments in subsequent sections. In order to assess scalability, we need to generate significant load on the produc- tion environment and compare the results before and after we implement the CDN configuration. A typical symptom of our scalability problems during key holidays is that our Web servers get overloaded with HTTP connections from the browsers and the CPU utilization spikes to above 90 percent when the Web servers stop respond- ing to the incoming requests. Another key symptom of our scalability challenges is related to the amount of bandwidth that has to be provided by our hosting facility. Therefore, our scalability factors can be represented by three key benchmarks: CPU and memory utilization, page views, and bandwidth. We decided to capture the measure- ments during the most load-intensive holiday in February before and after the CDN implementation in the full-scale Production environment. 4.1 Performance experiments In order to fully represent the impact of the CDN on response times, we needed to de- ploy our measurement transactions at geographically dispersed locations. Our RTUM 38
  • 47. CHAPTER 4. QUALITY OF SERVICE EVALUATION 39 Table 4.1: Performance test nodes by ISP and location ISP City and State Level3 Los Angeles, CA Savvis Santa Clara, CA Verizon Denver, CO MFN Washington, D.C. Internap Miami, FL Level3 Chicago, IL Sprint New York, NY service allowed us to use seven transaction nodes as described in Table 4.1. We de- cided not to use international testing locations due to additional cost and the fact that almost 80 percent of our website’s traffic originates from within the continental U.S. 4.1.1 Transaction types We recorded two types of scripts with two distinct sequences of transaction steps: one for the browse transaction profile and one for shop transaction profile. The browse transaction consists of the following steps: 1. Access the Website’s Home Page 2. Go to four different levels of Category pages 3. Access Flash video product detail page 4. Personalize Flash video 5. Sign In to the website 6. Send Flash video to recipient The shop transaction type consists of the following steps: 1. Access the Website’s Home Page 2. Go to the top Category page for one of our product types 3. Go to Product Detail page for one of the products (it remains constant through- out the experiment) 4. Select delivery options and go to the Shipping and Billing page
  • 48. CHAPTER 4. QUALITY OF SERVICE EVALUATION 40 Table 4.2: RTUM transaction characteristics Transaction workload characteristics Browse Shop Number of transaction steps 9 6 Number of images retrieved 163 94 Number of scripts, HTML, CSS, Flash components 57 39 Number of server-side J2EE components accessed 12 15 Average image size 2.9 KB 2.8 KB Average size of HTML, script and Flash 4.9 KB 5.8 KB Total number of bytes retrieved per connection 250 KB 98 KB Number of web-server connections initiated from the browser 4 5 5. Submit payment information and go to Order review page The key difference between the two transaction types are the types of server-side Java EE components as well as the amount of content that gets served by the Web servers. The browse transaction focuses more on content delivery characteristics of large Flash video files while the shop transaction emphasizes server-side functionality. The combination of the two transactions should provide us with adequate coverage of content delivery and server-side performance. Table 4.1.1 summarizes the key char- acteristics of the two transaction types in terms of workload, types of items retrieved and their size as well as server-side load generated by the transactions in the form of Java EE component requests. In order to gather a representative data sample, we decided to schedule the two transactions to run uninterrupted for 48 hours with the CDN re-routing enabled and for another 48 hours without the CDN. The RTUM will initiate six transactions for each transaction type and for each node per hour. This will give us a sample of over a thousand transactions for each test’s duration. 4.1.2 Server-side transaction capture We will enable server-side performance monitoring using a series of “watchpoints” that will be defined in our application monitoring appliance. The watchpoints are typically configured at the servlet level. We identified 21 unique servlets that will be accessed from the two transaction sequences. We dedicated one full instance of the ap- pliance to monitor traffic directed to our PerfEval environment by the RTUM service. This will enable us to filter out traffic destined to our production environment and get more accurate server-side measurements. Typical configuration steps for a servlet in- clude defining the URI stem to be intercepted and any additional filtering parameters. Since our environment is isolated from other traffic, we can simply configure the URI stems to be intercepted in the configuration utility.
  • 49. CHAPTER 4. QUALITY OF SERVICE EVALUATION 41 Performance results for each servlet will be aggregated by the appliance’s report- ing components. A typical performance report generated by the appliance is repre- sented by Figure 4.1.2. The key highlights of the performance report are the number of requests to a given watchpoint along with their host, network and end-to-end times. 4.1.3 Bandwidth utilization monitoring Our pre-CDN bandwidth utilization was captured using a combination of network measurement tools available at the hosting facility. With the implementation of a CDN, we also gained the ability to generate reports for bandwidth utilization coming through the CDN. HTTP request traffic data includes the header size and any protocol overhead. The HTTP response data includes object bytes plus overhead bytes along with origin traffic (ingress), midgress traffic and edge traffic (or egress, all response traffic from edge servers to end users [3]. Ingress traffic represents all response traf- fic from our origin server to the Edge Servers, while the midgress includes responses between Edge Servers. We will monitor several data elements in order to assess band- width efficiency gains. They include: Edge traffic, in requests per second The total number of requests for the content provider (CP) and time period selected. This data includes all hits with all response codes from the edge server to the end user. Origin bandwidth, in Megabits per second The total bandwidth usage to our origin servers for the CP code(s) and time period selected. This occurs when the Aka- mai Edge Server cache does not have the requested content, and therefore the edge server had to request it from the origin. This data includes the object bytes plus overhead bytes, all ingress traffic and all HTTP response codes Origin traffic, in hits per second The total number of hits on the origin for the CP code(s) and time period selected. The peak number of hits per second and the total origin hits are highlighted in the graph. The breakdown of response codes is shown with different colors. This data includes all response codes served from the origin to the edge server. Origin bandwidth offload, in Megabits per second The origin offload view draws a picture of the bandwidth reduction Akamai provides for our origin by showing an aggregate view of site traffic from the edge compared to the traffic going to our origin.
  • 50. CHAPTER 4. QUALITY OF SERVICE EVALUATION 42 Figure 4.1: Sample server-side performance report
  • 51. Chapter 5 Analysis This chapter is divided into sections that discuss the respective quality of service (QoS) attribute impacts resulting from the implementation of the CDN. We undertook two separate evaluations of performance and scalability qualities of service. The scala- bility evaluation focused on comparing a variety of resource utilization metrics cap- tured during the February peak transaction period for years 2006 and 2007. We antic- ipated increased throughput of transactions as well as page views while maintaining the same hardware footprint and resource utilization. Better scalability typically re- sults in better availability and the two QoS are frequently viewed as derivatives of each other. Therefore, we will include the availability observations in the scalability section. In order to assess the performance impact of the CDN, we ran a series of experi- ments for with pre- and post-CDN environments for browse and shop types of transac- tions. With the monitoring infrastructure in place, we were able to gather performance at the object level of granularity. An object in our case represents the lowest level of granularity from the browser perspective in terms of how it renders content and it corresponds to an image (JPEG, GIF or PNG), a script (JavaScript or otherwise), Flash SWF files, HTML content, style sheets and any other low-level components that a web page may consist of. We anticipated improvement in end-to-end download times for the objects. Assuming that each web page consists of tens or hundreds of them, we expected the performance improvement of each individual object to add up to a sig- nificant impact on the overall performance of the RTUM transactions. It was unclear to us how a CDN may impact server-side performance of JSPs and Servlets, especially under steady-state, low-volume conditions, so we decided to capture server-side per- formance metrics as well. 43
  • 52. CHAPTER 5. ANALYSIS 44 Figure 5.1: CPU utilization, February of 2006 and 2007 5.1 Scalability and availability observations 5.1.1 CPU utilization We maintained the same number of CPUs and thus the rPerf value for the Production environment throughout 2006 and 2007. However, the CPU allocation changed in the application server and Web server tiers. We decreased the available resources by 50 percent in the web tier and allocated them to the application server. It was our expectation that the CDN would allow us to change the ratios in favor of the application tier since we expected a majority of the resource utilization decrease to be visible in the web tier. We made no changes to the database tier. Figure 5.1.1 depicts CPU utilization across all tiers after accounting for resource allocation changes. As we anticipated, the Web tier experienced the most impact from the CDN - an average of 57 percent utilization decrease during the peak week and close to 60 percent on the highest traffic day. Meanwhile, the overall traffic increased almost five-fold from the previous day (see Figure 3.3.2 for peak week traffic pattern).
  • 53. CHAPTER 5. ANALYSIS 45 Figure 5.2: Operating system run queue: February of 2006 and 2007 The outages we experienced in previous years in the Web tier prevented a high percentage of requests from even being forwarded to the application tier. With the elimination of the Web tier bottleneck, we experienced higher CPU utilization in the application tier and the database responsible for the shop transactions. This was to be expected since a larger number of requests usually translates into higher CPU utiliza- tion. However, when we accounted for the two times the amount of traffic on peak day of the 14th between 2006 and 2007, we concluded that we still gained significant scalability in the application tier. The two-percent increase of CPU utilization well offsets the amount of the additional traffic the tier handled successfully. We suspect that some of the efficiencies gained in the application tier were caused by the appli- cation tuning activities that were undertaken prior to the holiday, similar to the steps described in [58], but based on the results of the load tests prior and after the tuning activities, we concluded that the contribution was minimal and most, if not all, of the scalability gains could be attributed to the elimination of the Web tier bottleneck with the CDN. Perhaps the most unexpected result was uncovered in the database tier. One of the
  • 54. CHAPTER 5. ANALYSIS 46 databases experienced 80 percent average CPU utilization decrease. We decided to investigate it further and we discovered the main contributor to be the database tun- ing activities performed by the engineering team prior to the peak week. Therefore, we deemed the data invalid for analysis purposes and we concluded that additional efficiencies could not be attributed to the CDN implementation. Figure 5.1.1 further supports our conclusions for the Web and application tiers. Generally speaking, the fewer number of threads waiting for execution in the run queue results in lower CPU utilization. As in our CPU results, the most significant impact on the run queues was on the peak traffic day - 2/14. The slight increase in some cases between 2006 and 2007 is negligible from the CPU perspective as long as the number oscillates around one or two threads. According to [34] and [6], blocking I/O contributes to high run queue thread waits. The implications of the high cost of I/O-related context switching are also discussed in [59]. We conclude that the blocking I/O was the main culprit behind high CPU utilization prior to the CDN implemen- tation because requests made directly to the Web servers resulted in significant I/O waits for long-running large-object connections which, in turn, resulted in high I/O blocking in the application tier. The offloading of a large number of HTTP connections to the CDN has had a significant impact on the CPU utilization. 5.1.2 Memory utilization We used three different measures to gauge the impact of the CDN on memory utiliza- tion. The same level of memory resources was maintained throughout 2006 and 2007. We observed the largest differences in memory paging activity. Retrievals of memory pages from the paging space instead of RAM are considerably slower and constant paging activity can cause system performance degradation. The most pronounced impact on memory paging activity was observed on the peak day with average mem- ory page-ins per second decreasing from 25 to almost zero and page-outs decreasing from approximately 60 to one in the Web tier as shown in Figure 5.1.2 and Figure 5.1.2. Interestingly enough, the paging activity increased on all other days during the peak week, but the increase was not significant enough to cause performance problems or outages. Figure 5.1.2 represents the memory scanned-to-freed ratios. Overcommitting of memory results in high scanned-to-freed ratios. It’s difficult to establish the level of this ratio that will definitely result in memory being constrained to the point of having an impact on performance. Overall, the ratios were neutral across all tiers and did not seem to have a visible impact on scalability in either 2006 or 2007. All evidence is pointing to the excessive paging activity as the main culprit of memory and CPU utilization spikes in 2006. The impact of the CDN is probably indirect in the case of memory utilization. The most plausible conclusion is that the lower number of I/O waits due to offloading of HTTP connections results in a significant decrease of calls from the kernel scheduler to the paging system which, in turn, alleviates the memory
  • 55. CHAPTER 5. ANALYSIS 47 Figure 5.3: Memory page-ins, February of 2006 and 2007 constraints [54]. 5.1.3 Bandwidth Bandwidth costs and availability are key considerations for any highly transactional e-commerce environment. Hosting facilities usually impose limits on bandwidth us- age in order to allow many customers to share the connectivity to the Internet from the hosting facility. Increasing bandwidth availability by a significant amount entails significant costs during peaks due to market conditions. There are also significant costs associated with deployment of network infrastructure to support bandwidth increases. Therefore, bandwidth offloading represents a significant benefit for CDN implementations. The bandwidth impact in our environment between 2006 and 2007 peak weeks is depicted in Figure 5.1.3. We determined that low bandwidth utilization in 2006 was caused by the refusal of a large number of requests in the Web tier and the sub- sequent outages. However, even if we were to allow all of the requests we would
  • 56. CHAPTER 5. ANALYSIS 48 Figure 5.4: Memory page-outs, February of 2006 and 2007 still encounter the bandwidth limitation imposed on our environment by the hosting provider. The impact of the CDN is two-fold. First, it allows us to fully utilize all of our bandwidth capacity at the hosting facility. Secondly, it allows us offload over 1 Gbps of bandwidth utilization to the CDN network with thousands of Web servers that represent a much more resilient and cost effective infrastructure. We conclude that the scalability impact here is likely in the range of a factor of two to four. Figure 5.1.3 provides another view into the impact of the CDN and it supports our scalability assessment from the standpoint of bandwidth usage during the peak day. It shows that we were able to sustain a five-fold spike in bandwidth usage for several hours. Our analysis of the number hits further verifies our scalability claim. A hit is defined by our CDN as a retrieval of an object from an edge server versus an origin server. Our peak edge hit rate was 16,000 per second. This number is of interest to us because we can use it to determine the approximate Web tier scalability terms of TCP connections using the following formula:
  • 57. CHAPTER 5. ANALYSIS 49 Figure 5.5: Memory scan rate, February of 2006 and 2007 ScaleFactortcp = HitsCDN /AvailableThreads (5.1) where Hits represents the peak number hits per second offloaded to the CDN and AvailableThreads represents the maximum possible number of threads for our Web server cluster without impacting server availability. Our Web servers’ thread capacity varies depending on the type of hardware currently in use, but we have historically estimated it at approximately 3,600 for a typical day. Therefore, ScaleFactortcp = 3.9 and, during peak, the CDN allows us to quadruple our TCP connection capacity with- out additional hardware. This number represents the best case scenario as many con- nections will be kept alive by the browsers. Nonetheless, we expect that it will still lead to a significant load decrease on the Web servers, especially since many of the connections constitute long-running Flash video downloads.
  • 58. CHAPTER 5. ANALYSIS 50 Figure 5.6: Peak bandwidth comparison, February of 2006 and 2007 5.2 Performance We decided to divide performance observations into two perspectives: consumer and server-side. The consumer perspective corresponds to measurements obtained from the RTUM transactions executed under steady-state with and without the CDN. The RTUM allowed us to capture object-level download times in an environment that closely emulates consumers’ experience. The server-side perspective supplements our consumer perspective with server-side pre- and post-CDN measurements obtained from the performance measurement appliance deployed in our hosting environment’s network. 5.2.1 Consumer Perspective The RTUM experiments resulted in approximately 1,200 combined shop and browse transactions for the duration of 48 hours before and after implementing the CDN. The 1,200 transactions for each time period generated over 8,700 page views and downloads of over 250,000 objects to the RTUM clients. We eliminated transactions that were incomplete due to unavailability of the environment. Figure 5.2.1 and Fig- ure 5.2.1 summarize response time measurements for each transaction type. Shop transaction response times averaged 14 seconds with the CDN and 11 seconds without the CDN resulting in 21 percent performance improvement at the transaction level.
  • 59. CHAPTER 5. ANALYSIS 51 Figure 5.7: Akamai bandwidth offload, February of 2007 Browse transaction response times averaged 15 seconds with CDN versus 10 seconds without it for the overall improvement of around 30 percent. We expected the browse transaction to realize slightly better benefit from the CDN due to its higher number of content assets and the usage of larger content assets. Subsequently, we decided to analyze performance data at the object level of granu- larity in order to identify the source of performance improvements and to distinguish between characteristics of server-side components such as servlets and static content. Due to very different content characteristics we expected to see significant differences between the shop and browse transactions in terms of object response times as well. Table 5.2.1 and Table 5.2.1 illustrate end-to-end response time changes by content type. Positive percentages represent performance improvement while negative percentages represent performance degradations. Both transaction types show significant perfor- mance improvements at the object level. Although the transaction and object level performance realized significant benefits from the CDN, it was not without a trade-off in terms of servlet performance. Browse and shop servlets suffered performance degradation of 5 percent and 18 percent, re- spectively. We suspect that the primary reason for the degradation are the additional
  • 60. CHAPTER 5. ANALYSIS 52 Figure 5.8: RTUM shop transaction response times in seconds DNS redirections that take place during edge server resolution. Overall, however, the benefits at the object level far outweigh the degradation at the servlet level and the transaction response times improved. The 30 percent improvement in browse transac- tions is particularly important to our environment during the peak week in February when majority of our transactions are browse transactions. 5.2.2 Server-side Perspective The analysis of server-side performance centered around measuring the servlet re- sponse times. However, during our measurements we uncovered additional network- related characteristics that helped us better understand the impact of the CDN not only on the server performance of our hosting environment, but also on its network infrastructure. Table 5.2.2 illustrates the server-side performance changes for the various key servlets. It’s worth noting that the measurements are represented in milliseconds and while the improvement in terms of a percentage seems very impressive at around 20 percent av-
  • 61. CHAPTER 5. ANALYSIS 53 Table 5.1: RTUM browse transaction performance Content Type Performance Change JavaScript 69% CSS 68% GIF 68% JPG 65% PNG 77% Servlet -5% Table 5.2: RTUM shop transaction performance Content Type Performance Change JavaScript 68% CSS 68% GIF 69% JPG 61% PNG 72% Servlet -18% Table 5.3: Server-side performance in milliseconds Servlet Name Server time without CDN Server time with CDN Chg Category 1 64.05 28.06 56% Category 2 38.89 38.62 1% Category 3 45.48 39.81 12% Category 4 56.18 33.08 41% Category 5 42.78 39.67 7% Category 6 56.26 46.51 17% Checkout 236.18 193.54 18% Ecard Display 31.46 21.75 31% Ecard Personalize 81.54 69.38 15% Home Page 50.60 20.14 60% Login 97.25 58.68 40% Marketing Spot 13.74 13.48 2% Order Review 267.64 248.73 7% Product Detail 473.52 500.06 -6% Shopping Cart 4052.16 4069.80 0%
  • 62. CHAPTER 5. ANALYSIS 54 Figure 5.9: RTUM browse transaction response times in seconds erage, the majority of improvement can be attributed to servlets that already perform at 100 milliseconds or less. Therefore, the improvement has very little or no impact on the overall response time for the Web pages that correspond to the servlets. On the other hand, the servlets that performed at above 100 milliseconds suffered either a slight performance degradation or insignificant improvement. We anticipated the servlet performance to experience more of an improvement, but the only logical con- clusion based on the data is that the impact of the CDN on server side performance in the application and database tiers is neutral or slightly negative. We were able to obtain data regarding packet flow through our hosting environ- ment’s network before and after the CDN implementation. The summary of the re- sults is presented in Table 5.2.2. There is a strong correlation between the amount of content associated with a servlet and the decrease in the number of packets in our network. This supports the results from the Scalability section of this chapter where we discuss the bandwidth implications of the CDN. With much of the bandwidth of- floaded to the edge network we would expect a much smaller number of TCP packets in our network.
  • 63. CHAPTER 5. ANALYSIS 55 Table 5.4: TCP packet counts at hosting environment Servlet Name TCP Pkt Cnt without CDN TCP Pkt Cnt with CDN Chg Category 1 758.46 74.08 90% Category 2 281.20 84.65 70% Category 3 159.78 85.36 47% Category 4 419.39 87.01 79% Category 5 621.74 95.43 85% Category 6 482.11 95.96 80% Checkout 594.68 178.45 70% Ecard Display 124.92 51.21 59% Ecard Personalize 178.09 118.57 33% Home Page 795.08 76.92 90% Login 531.43 215.71 59% Marketing Spot 16.24 27.05 -66% Order Review 350.12 178.77 49% Product Detail 347.56 221.57 36% Shopping Cart 267.09 201.32 25%
  • 64. Chapter 6 Conclusions and Further Research In this thesis, we evaluated the merits of implementing a Content Delivery Network as a strategy to improve scalability and performance of a highly transactional e-commerce website. The key differentiating factor of our research is the evaluation of impact across all tiers and processing components of our environment. We were also inter- ested in potential tradeoffs from this architectural decision - especially with regards to configuration and content management across a geographically dispersed environ- ment. Research to-date has focused primarily on network efficiencies that result from the CDN implementations. By implementing a series of server and client-side monitoring components, we were able to obtain measurements under steady transactional load and also during significant transactional bursts. We captured results from the consumer as well as application and infrastructure perspectives. Our overall conclusion is that CDN of- floading has a significant impact on overall performance and scalability of n-tier e- commerce environments such as ours. The results of our experiments lead us to the following conclusions: • CDNs have a significant impact on page response times as observed by con- sumers. We achieved a 30 percent improvement in the most utilized transaction on our website - the browse transaction. • Implementing a CDN as a scalability tactic is a much more cost-efficient and flexible way to achieve the desired results in the Web tier as opposed to hori- zontal and vertical scalability tactics because it does not require an investment in additional server and network hardware. • We conclude that the CDN resolved our content delivery bottlenecks and al- lowed us to scale to meet the peak demand without sacrificing page response times. We also achieved a 100 percent availability during the peak day with much of the success attributable to the offloading solution. 56
  • 65. CHAPTER 6. CONCLUSIONS AND FURTHER RESEARCH 57 • Server-side performance impact varies throughout the tiers with the Web tier benefiting from the CDN the most. We observed a slight performance degrada- tion in the application tier and the impact on the DB tier was neutral. Servlet per- formance degradation and additional DNS redirections had minimal impact on consumer experience. This leads us to conclude that CDNs are not the panacea for all performance-related problems including poor application and database design. We observed that CDN content delivery has little or no impact on server- side application-tier caching strategies because of its primary impact on the Web tier. • We observed a five-fold increase in the efficiency of bandwidth utilization with the CDN in place. This was partly due to the ability to fully utilize our hosting center’s bandwidth and the additional bandwidth provided by Akamai. In the past, we were unable to achieve full utilization at our hosting facility due to Web tier outages. With the CDN in place, we achieved bandwidth scalability equivalent to adding another hosting facility at a fraction of the cost. • The CDN allowed us to maintain half of our previous Web server footprint while increasing the number of page views almost two fold and decreasing CPU uti- lization by almost 90 percent in the Web tier. We also saw a positive impact on memory utilization with memory page-ins and page-outs decreasing by almost 95 percent. • Maintainability should be considered as a part of tradeoff analysis during CDN selection. We observed that configuration changes can take up to two hours and content changes can take an additional 7-10 minutes. Majority of our changes are content-related, so the several-minute delay may have an impact on our ability to respond to content changes during peaks. Configuration changes do not impact availability because they do not require system re-starts or a planned outage. 6.1 Summary of Contributions Our research is the most comprehensive and architecturally significant attempt we are aware of to correlate content delivery offloading with Quality of Service factors of an e-commerce environment. We assert that the CDN impact on the network behavior is just a part of the full range of performance and scalability benefits that an e-commerce website may realize from it. This work also attempted to quantify its impact on the computing infrastructure - especially the Web server footprint. We were able to quan- tify scalability gains in terms of bandwidth and Web tier CPU and memory utilization. Additionally, we proposed an approach to measuring performance of an e-commerce website with a combination of server-side and client-side monitors in order to obtain
  • 66. CHAPTER 6. CONCLUSIONS AND FURTHER RESEARCH 58 full end-to-end picture of performance. This is important because much of the re- search to-date focuses on either client or server perspective without identifying cor- relations between the two. The full performance picture is a pre-requisite to under- standing the locations of the performance bottlenecks in an environment such as ours. Finally, were able to verify the claim that CDNs enhance not only performance but also scalability and availability of e-commerce websites. Much research to-date focuses on performance alone and treats scalability as an enabler or, at times, a by- product of performance. We were able to isolate measurements that pertain to scala- bility alone and it is our assertion that this quality of service deserves as much focus as performance. 6.2 Future Research Edge content delivery is already a well-established technology and quite a few solu- tions exist in the marketplace to enable it. Edge computing has been proposed as a way to extend this model to geographically dispersed application delivery and there have been several attempts in the research community to define application replication and deployment standards related to this concept. This area deserves further research as the complexity for multi-platform geographically dispersed computing environments will far outweigh the content delivery of a limited set of static asset types as is the case with the CDNs. N-tier system quality of service investigations of e-commerce environments have also been scarce in the research community. The transactional burstiness of these environments makes them particularly relevant to performance and scalability re- search. More research into availability, reliability and security of these environments is needed to establish a full architectural view of tradeoffs and dependencies of the qualities of service. We did not evaluate the tradeoffs between the various techniques that can be used to enable CDN re-routing. We also did not consider differences between HTTP down- loading and streaming of video content using protocols such as MMS or RTSP. Edge streaming has gained popularity as an alternative to hosting streaming servers in cen- tralized e-commerce environments. The above could be areas for further research.
  • 67. Bibliography [1] A.Datta, K. Dutta, H. Thomas, D. Vandermeer, and A. Ramamritham. Proxy- based acceleration of dynamically generated content on the world wide web: An approach and implementation. ACM Transactions on Database Systems, 29(2):403– 443, 2004. [2] V. Aggarwal, A. Feldmann, and C. Scheideler. Can isps and p2p users cooper- ate for improved performance? SIGCOMM Computer Communications Review, 37(3):29–40, 2007. [3] Akamai. www.akamai.com, 2007. [4] Akamai. Akamai http content delivery configuration guide. https://dl. akamai.com/customers/HTTPCD/HTTPCD_Activation_Guide.pdf, 2007. [5] A. Akella, S. Seshan, and A. Shaikh. An empirical evaluation of wide-area inter- net bottlenecks. In IMC ’03: Proceedings of the 3rd ACM SIGCOMM conference on Internet measurement, pages 101–114, New York, NY, USA, 2003. ACM. [6] J. Akella and D. Siewiorek. Modeling and measurement of the impact of in- put/output on system performance. In ISCA ’91: Proceedings of the 18th annual international symposium on Computer architecture, pages 390–399, New York, NY, USA, 1991. ACM. [7] P. Albitz and C. Liu. DNS and BIND. O’Reilly, Sebastopol, CA, 2001. [8] M. Andreolini, M. Colajanni, R. Lancellotti, and F. Mazzoni. Fine grain perfor- mance evaluation of e-commerce sites. SIGMETRICS Performance Evaluation Re- view, 32(3):14–23, 2004. [9] M. Arlitt, L. Cherkasova, J. Dilley, R. Friedrich, and T. Jin. Evaluating content management techniques for web proxy caches. SIGMETRICS Performance Evalu- ation Review, 27(4):3–11, 2000. [10] M. Arlitt, D. Krishnamurthy, and J. Rolia. Characterizing the scalability of a large web-based shopping system. ACM Transactions on Internet Technologies, 1(1):44– 69, 2001. 59