WebAccel: Accelerating Web access for low-bandwidth hosts

WebAccel: Accelerating Web access for low-bandwidth hosts
Tae-Young Chang a,*, Zhenyun Zhuang b
, Aravind Velayutham c
, Raghupathy Sivakumar a
a
School of Electrical and Computer Engineering, Georgia Institute of Technology, Atlanta, GA 30332, United States
b
College of Computing, Georgia Institute of Technology, Atlanta, GA 30332, United States
c
Asankya Inc., Atlanta, GA 30308, United States
a r t i c l e i n f o
Article history:
Received 5 December 2007
Accepted 14 February 2008
Available online 18 March 2008
Responsible Editor: J. Neuman de Souza
Keywords:
Web performance
Network services and applications
a b s t r a c t
Current popular Web browsers simply fetch the entire Web page from the server in a
greedy fashion. This simple Web-fetching mechanism employed by browsers is inappropri-
ate for use in low-bandwidth networks since they unnecessarily cause large response time
for users. In this paper, we first analyze the reasons that cause large response time by con-
sidering several factors, including the properties of typical Web pages and browsers, the
interaction between the HTTP and TCP protocols, and the impact of server-side optimiza-
tion techniques. We then propose a new Web-acceleration solution called WebAccel, which
consists of three easy-to-deploy browser-side optimization mechanisms to reduce the user
response time. Through ns2 simulations and a prototype implementation, we compare the
performance of our solution with that of current browsers and show that WebAccel brings
significant performance benefits in terms of user-perceived response time.
Ó 2008 Elsevier B.V. All rights reserved.
1. Introduction
In the past couple of decades, a tremendous amount of
research has been done on improving Web-access perfor-
mance over Internet. Web-optimization techniques such
as cache proxies [1–5], transcoding schemes [6–8], pre-
fetching schemes [9–11], persistent HTTP connections
[12,13], and content distribution networks [14–16] have
found widespread adoption. On the other hand, new para-
digms such as WML [17], WAP [18], and BREW [19] have
also been developed to address the limitation of Web per-
formance on mobile hosts.
In this work, we study the performance of a Web brow-
ser under low-bandwidth network conditions. Specifically,
we analyze the characteristics of a browser with the objec-
tive of identifying reasons as to why they might suffer in
low-bandwidth conditions. We find that the current fetch-
ing model employed by most commercial Web browsers is
not optimal in the bandwidth-challenged environments.
We show that the absence of content prioritization and
intelligent object-fetching mechanisms in current Web
browsers leads to increased response times. Web browsers,
today, do not prioritize useful data that is viewed by a user
over other redundant data in a Web page. As a result, gree-
dy fetching of the entire content of a page wastes precious
bandwidth and in turn increases user-perceived response
time. Furthermore, without an intelligent object-fetching
mechanism, the download process of current Web brows-
ers does not utilize network bandwidth efficiently.
To make this problem even worse, many Web pages
have become larger both in pixel and byte with a large
number of embedded objects. For example, the main page
of cnn.com is approximately three times longer than the
height of a client area1
in pixel and 300-KB large with hun-
dreds of embedded objects. As a result, even with a 100-
Kbps bandwidth in a cellular data network, the time taken
to fetch the entire page can be longer than 20 s.
In this paper, we propose a new Web-optimization
solution called WebAccel to address the problem of large
1389-1286/$ - see front matter Ó 2008 Elsevier B.V. All rights reserved.
doi:10.1016/j.comnet.2008.02.007
* Corresponding author.
E-mail address: gtg386c@mail.gatech.edu (T.-Y. Chang). 1
It is defined as an effective area for displaying a Web page in a browser.
Computer Networks 52 (2008) 2129–2147
Contents lists available at ScienceDirect
Computer Networks
journal homepage: www.elsevier.com/locate/comnet

response time with current browsers. Our solution is based
on careful consideration of several factors, including the
content displayed on the screen viewed by the user, ser-
ver-side content distribution networks, and the relation-
ship between the HTTP and TCP protocols. WebAccel
consists of three mechanisms: prioritized fetching, object
reordering, and connection management. One major advan-
tage of our approach is that it is purely client-side enhance-
ment, and consequently it is easy to deploy since it only
requires client-side installation to current Web browsers.
To summarize, our contributions in this paper are:
(i) Identification of the inefficiencies of current brows-
ers by carefully analyzing the interactions of several
factors related to Web fetching.
(ii) Proposal of three mechanisms to reduce user
response time in an easy-to-deploy fashion.
The rest of the paper is organized as follows. Section 2
discusses the problems with current Web-access models
in bandwidth-limited networks. Section 3 presents the de-
tails of our client-side Web-acceleration solution. Section 4
evaluates the performance of our proposed solution with
that of conventional Web-access models using ns2 simula-
tions. Section 5 describes a prototype implementation and
shows some performance results of prioritized fetching.
Section 6 discusses related works in the area, and Section
7 concludes the paper.
2. Motivation
In this section, we describe drawbacks in a conventional
Web access model in low-bandwidth environments and
use them as a motivation for designing a new Web-access
scheme.
2.1. Web access model and simulation setup
A traditional network model for Web access is as shown
in Fig. 1. In this model, to access a Web page, a user inputs
an URL address into a Web browser. Then, the browser re-
quests a DNS server to translate the URL address into the
corresponding IP address. After obtaining this IP address,
the browser can directly access the main HTML document
located in the Web server. When load balancing is per-
formed among multiple Web servers that create a server
cluster or a server farm, a layer-7 switch rewrites domain
names of embedded objects defined in the HTML docu-
ment to distribute requests of embedded objects to multi-
ple servers. Finally, the Web browser performs additional
DNS resolutions for other unknown Web servers having
those objects and downloads objects from them.
A Web browser generally opens multiple HTTP connec-
tions to a single Web server to increase fetching speed. For
example, Microsoft Internet Explorer [20] and Netscape
Navigator [21] open up to two and six connections to a single
server, respectively [22]. Each opened HTTP connection has
a message queue to send object requests, (i.e., HTTP GET
messages) to the server. Since the browser is unaware of ob-
ject and network characteristics, the parsing engine in the
browser inserts object requests to all available message
queues associated with the target server in a round-robin
fashion.
In this paper, we consider comScore’s Top 50 Web Sites
[23] as representatives of typical Web pages and measured
their Web characteristics. Table 1 shows the statistical re-
sults of those Web sites. In the table, a screen refers to a
unit of an area that has the same size of the client area
in the browser window. We also assume that additional
HTML documents embedded by IFRAME elements are con-
sidered as other objects. We used Microsoft Internet Ex-
plorer, of which the client-area size is 1006 Â 511 under
a 1024 Â 768 (XGA) screen resolution. The initial screen is
defined as the first part that is shown in the client area
when a Web page is accessed. In the measurement, the
average number of screens in these pages is 1937/
511 = 3.7. Note that since a single server (i.e., a single do-
main) may provide objects of various types, the number
of total domains per page can be smaller than the sum of
the numbers of domains per object type.
To evaluate the performance of conventional Web-ac-
cess models in a low-bandwidth environment, we use
the ns2 simulator [24] with the Reno-FullTCP option that
supports bidirectional transmissions. In the simulations,
the same network topology as shown in Fig. 1 is used with
an assumption that a local DNS server has all the required
domain information and thus does not need additional
communications with root DNS servers. The bottleneck
Fig. 1. Network topology.
2130 T.-Y. Chang et al. / Computer Networks 52 (2008) 2129–2147

link is located between the Web client and the backbone
network and is configured to have a 100-Kbps bandwidth
and a 100-ms link delay. The bandwidth and delay from
both the DNS and Web servers to the backbone network
is 1 Mbps and 5 ms, respectively.
For modeling Web traffic, we use the same Web charac-
teristics as shown in Table 1. The average processing time
of an object per connection in the browser is assumed as
200 ms, and the parsing delay is ignored. We assume that
all Web servers support persistent connections defined in
HTTP/1.1, but pipelining is not considered since it is not
faithfully supported by most commercial Web servers
[25]. The byte size of an HTTP request message is set as
500 bytes, and the header size of an HTTP response mes-
sage is ignored. It is also assumed that the cache function
of the browser is disabled.
We consider initial-screen response time as a primary
metric, which is defined as the difference between the time
when a browser sends a request for a Web page and the
time when an HTML document and its all embedded ob-
jects required for displaying the initial screen are down-
loaded completely.
2.2. Screen-contention problem
When a user views a screen on a display device, objects
for displaying other screens are unnecessary in the sense
that they are not visible to the user at this time. However,
in conventional Web browsers, the process of fetching nec-
essary on-screen objects (i.e., objects displayed on the cur-
rent screen) may be slowed down, as a result of the
competition from the process of fetching unnecessary off-
screen objects (i.e., objects not shown on the current
screen). We refer to the fact that objects from different
screens are competing for bandwidth as screen contention.
The main reason of screen contention is disparity of
cumulative transfer size among multiple connections. As
mentioned earlier, a parsing engine inserts object requests
to multiple message queues in a round-robin fashion,
which considers only fairness in the number of objects
per connection. As a result, some connections having only
small-sized objects may finish transmissions of on-screen
objects earlier than others and begin to fetch off-screen ob-
jects. Under this scenario, different connections may fetch
objects on different screens simultaneously.
Another possible reason is directionality in a table
structure in HTML. When a multi-cell TABLE element is
used in a Web page, the internal cells in TABLE may have
a significantly longer height than the client area. In such
a case, only after the browser begins to fetch all the objects
in the first cell including the off-screen objects located at
the end of the cell, it begins to fetch on-screen objects lo-
cated at the beginning of the second cells.
Fig. 2 shows an example of an out-of-order Web-object
delivery caused by the directionality problem when Inter-
net Explorer accesses the amazon.com main page, which
has a very long multi-cell table across three screens. In this
example, the page consists of 73 objects (including 1 HTML
document, 5 javascript, 2 flash, and 65 image (IMG) ob-
jects), and the initial screen has 35 on-screen objects. In
this figure, it can be observed that many off-screen objects
are fetched as well before the initial screen is fully dis-
played. In the figure, the last (71th in total) on-screen ob-
ject located in the last cell is fetched after 36 off-screen
objects are downloaded.
Fig. 3 shows the effect of the screen-contention problem
by presenting the simulated object-fetching progress. We
assume that all the objects are from a single server, and
the effect of directionality in a table structure is ignored.
Table 1
Statistics of comScore’s Top 50 Web Sites
Parameters HTML IMG Others Total
Byte size per object [KB] Mean 31.72 2.46 12.91 225.96
STD 35.51 5.90 9.67 186.04
Number of objects in initial
screen
Mean 1 17.31 4.41 22.72
STD 15.36 6.22 16.37
Number of objects in all
screens
Mean 1 46.80 3.99 51.79
STD 28.16 6.22 30.34
Number of servers
(domains)
Mean 1 5.16 1.74 5.50
STD 2.90 1.20 3.38
Width [pixels] Mean 998
STD 46.49
Height [pixels] Mean 1937
STD 1119
0
2
4
6
8
1 6 11 16 21 26 31 36 41 46 51 56 61 66 71
Fetching Sequence
Objectsize[KB]
on-screen objects
off-screen objects
:
Fig. 2. An example of an out-of-order fetching sequence at amazon.com.
T.-Y. Chang et al. / Computer Networks 52 (2008) 2129–2147 2131

In the ns2 simulations, the initial screen has 18 on-screen
objects, i.e., objects numbered after 18 are unnecessary
data. As observed from the figure, objects on both initial
and other screens are fetched simultaneously because of
the screen-contention problem. Since fetching the unnec-
essary objects consumes a portion of bandwidth, the
resulting response time for the initial screen is increased
unnecessarily. In the figure, two off-screen IMG numbered
20 and 22 are fetched in parallel with other on-screen ob-
jects located on the initial screen. As a result, the response
time for the initial screen, which was measured after IMG
17 was downloaded, is 18.7 s.
An intuitive solution to screen contention is to prevent
unnecessary object fetching. Fig. 4 shows an ideal case that
contention is eliminated when an ideal browser opens two
connections to a single server. As seen from the figure,
when the faster connection (i.e., connection 1) completes
the downloading of all the on-screen objects on the initial
screen, it stops fetching and waits for the other connection
(i.e., connection 2) to finish fetching the other on-screen
objects. Thus, the remaining fetching connection can ob-
tain more bandwidth and in turn reduce the response time
for the initial screen by 1.1 s.
Fig. 5 shows how screen contention affects the initial-
screen response time under both single- and multiple-ser-
ver scenarios as the numbers of connections and servers
increase. In the simulations, up to two parallel connections
are allowed to be opened to each server. In the single-ser-
ver case as shown in Fig. 5a, as the number of connections
increases from one to three, both of the conventional and
ideal browsers show significant performance improve-
ment. However, when the number of connections is larger
than three, the performance becomes less affected by it.
When Web objects on a single page are distributed to mul-
tiple servers by a load balancing technique, the perfor-
mance is directly affected by the number of servers. In
Fig. 5b, both schemes show the best performance when
the number of servers is three, i.e., the number of total con-
nections is six. However, as the number of servers is in-
creased beyond three, the response performance becomes
degraded because of adverse interactions and connection
overhead [25]. It also can be seen that the performance
in the ideal scheme is less influenced by the number of
servers as a result of its effective prevention of screen
contention.
2.3. Bandwidth under-utilization problem
In HTTP/1.1, a persistent connection allows a series of
request-response transactions. Given this model, [25]
shows that the idle time of a network decreases with in-
crease in the number of simultaneous TCP connections.
Fig. 3. Fetching with screen contention in a conventional browser.
Fig. 4. Fetching without screen contention in an ideal browser.

The paper also shows that there exists an optimal number
of simultaneous connections (around six in the paper), at
which the performance is optimal because of minimized
idle time in the network. However, since current Web
browsers do not schedule object transfers in a band-
width-efficient way across multiple TCP connections, they
do not always maintain the optimal number of simulta-
neous connections. In most cases, only a small number of
connections are active at any instant. This results in un-
der-utilization of the access link, and we refer this to as
bandwidth under-utilization.
The above-described bandwidth under-utilization prob-
lem results in varying levels of performance degradation
depending on the number of Web servers. In the case of
multiple connections, bandwidth efficiency is determined
by how much the ending time of transmissions among par-
allel connections is synchronized. In Fig. 4, the last object
(i.e., IMG 17) is fetched with no other objects, and thus only
a single connection uses network bandwidth toward the
end. In the case of multiple servers, the user performance
is also affected by synchronized ending among connections
to other servers.
The solution to the bandwidth under-utilization prob-
lem is to schedule HTTP GET requests intelligently across
multiple TCP connections such that as many connections
as possible are active during the fetching process. Fig. 6
shows the impact of performing ideal scheduling such that
there always exist one or more pending requests in both
connections. In the figure, when the faster connection fin-
ishes fetching all on-screen objects and has no more ob-
jects to fetch, it takes over the unfulfilled on-screen
object requests from the other connection and helps fetch-
ing. As a result, both connections can use bandwidth more
efficiently, and the initial-screen response time is reduced
by 1.6 s when compared to that in a conventional Web
browser.
In the scenario of multiple servers, conventional brows-
ers open the same number of connections to each server.
They do not take into account the sizes of objects in sched-
uling object requests, and this invariably leads to the sce-
narios in which all connections to some servers are idle
while the other connections opened to the remaining serv-
ers are active.
The intuitive solution is to schedule the different object
requests across multiple servers such that all the connec-
tions are active. Fig. 7 shows how bandwidth under-utili-
zation affects the response time performance under
various scenarios. Note that the ideal browser does not
10
15
20
25
30
35
1 2 3 4 5 6
Initialscreenresponsetime[s]
Number of connections
with screen contention
without screen contention
10
15
20
25
30
35
1 2 3 4 5 6
Number of servers
with screen contention
without screen contention
Fig. 5. Impact of screen contention.
Fig. 6. Fetching with synchronized ending time in an ideal browser.

fetch any off-screen objects and thus screen contention
does not exist in this scenario. In the single-server case
as shown in Fig. 7a, both conventional and ideal browsers
show the similar performance pattern as in Fig. 5a and im-
prove the performance consistently in the entire range.
However, in Fig. 7b, unlike Fig. 5b, as the number of servers
increases beyond three, both schemes show stable perfor-
mance, and the ideal scheme shows up to a 20% perfor-
mance improvement.
2.4. Summary
In this section, we have identified two issues with con-
ventional Web browsers in bandwidth-limited networks.
First, we observe that contention among objects belonging
to different screens within the same Web page can in-
crease user-perceived response time of the initial screen.
Second, we identify that network bandwidth can be un-
der-utilized because of non-synchronized ending times of
transmissions. We also observe that in most cases the
screen contention and bandwidth under-utilization prob-
lems affect user performance negatively in a conventional
Web model, and show how the ideal browser can over-
come these problems and achieve significant performance
improvement. Based on these observations, in the next
section, we propose a new Web-access scheme.
3. WebAccel architecture
WebAccel consists of three mechanisms: prioritized
fetching, objects reordering, and connection management.
The brief summary of the mechanisms is as follows.
(i) Prioritized fetching (PF) addresses the screen con-
tention problem in large-sized Web pages and pro-
vides an optimization solution for fetching objects
with varying priority levels. Basically, it is a What-
You-See-Is-What-You-Fetch mechanism. While giv-
ing higher priority to on-screen objects, it gives
lower priority to off-screen ones to reduce the
user-perceived response time.
(ii) Object reordering (OR) addresses the bandwidth
under-utilization problem caused by fetching
objects from a single server. When load on connec-
tions to a server is unbalanced, it performs load bal-
ancing by rescheduling object requests across the
connections. In addition, it also dynamically reor-
ders the sequence of object requests queued in each
connection based on the current TCP status.
(iii) Connection management (CM) addresses the band-
width under-utilization problem when multiple
servers are involved in a Web page. To maintain an
optimal number of TCP connections at any time, it
estimates the ending time of transfer in each con-
nection and adjusts the number of connections per
server (i.e., per domain).
The three mechanisms complement each other as well
as perform optimization with different levels of granularity
for Web-object fetching on the browser side. Fig. 8 shows
where they are located in the entire data flow in the
browsing operation. One of the advantages of WebAccel
is easy deployment, since it requires only client-side mod-
ification. In fact, the solution can be implemented as noth-
ing more than an add-on to the current Web browsers.
3.1. Prioritized fetching (PF)
Conventional Web browsers fetch and parse objects as
soon as they find definitions of objects while downloading
a main HTML document. This on-the-fly fetching mecha-
nism brings performance improvement in high-speed net-
works, where transfer overhead caused by screen
contention does not affect overall response time much.
However, in bandwidth-limited networks, screen conten-
tion may degrade user performance significantly because
of low download speed and large delay. Thus, the PF mech-
anism differentiates objects from different screens based
on the current screen view and allows for downloading
only on-screen objects that are required to render the cur-
rent screen display. As a result, it reduces the response
time experienced by users. Fig. 9a illustrates the process
flow of the PF mechanism.
The basic operation of PF is as following: (1) Initial ob-
ject prioritization: When a Web access is requested, PF first
obtains the view of the initial screen from the entire
20
30
40
50
60
70
1 2 3 4 5 6
with bandwidth under-utilization
without bandwidth under-utilization
20
25
30
35
40
45
50
1 2 3 4 5 6
Number of servers
with bandwidth under-utilization
without bandwidth under-utilization
Fig. 7. Impact of bandwidth under-utilization.

document layout and prioritizes embedded objects accord-
ing to their locations in the layout. (2) Selective object fetch-
ing: Then, it performs sequential object fetching according
to their priority levels. Lower-priority objects can be
fetched only after all higher-priority objects are completely
downloaded. (3) Re-prioritization: When a user scrolls to
move to a different view, it performs the above-mentioned
process again for the new screen.
3.1.1. Initial object prioritization
A Web page generally consists of various types of
embedded objects. Particularly, text-based objects such
as HTML, javascripts, and cascading style sheets play an
important role to construct the overall HTML document
layout. Thus, PF considers those text-based files as the
highest-priority objects. For other object types such as im-
age (IMG) and multimedia objects, it assigns different pri-
ority levels according to their locations in the layout. For
simplicity, we consider IMG objects as representatives of
such objects that do not affect the document layout.
Initially, PF performs location-based prioritization for
image objects. Since HTML documents generally include
pixel-size information of embedded IMG objects, a Web
browser can estimate their location in the layout and con-
struct the full-page layout without downloading them. In
cases that an HTML document does not specify the pixel
size of image objects and the browser has not fetched those
objects yet, PF uses an average value obtained based on the
browsing history or a default value of the browser.
In order to estimate the location of IMG objects, PF first
scans the Document Object Model (DOM) tree [26], which
a browser generates and maintains for the target Web
page. When PF finds an IMG object while scanning the tree,
it searches its all predecessors and calculates location off-
sets from successors to predecessors in a recursive fashion
until it reaches the top of the tree. Then, the absolute loca-
tion in the layout is calculated as the sum of all the relative
offsets. Based on this location information, PF gives high
priority to objects that are located within the current view
in the client area and low priority to others.
3.1.2. Selective object fetching
To fetch objects of different priority levels simulta-
neously, many existing schemes such as [27] allocate a
smaller portion of bandwidth for low-priority transmis-
sions. However, these schemes cannot be efficiently
exploited in PF due to the following two reasons. First,
HTTP/1.1 defines a persistent connection, which is reusable
for transfer of multiple objects. In PF, a priority-based con-
nection does not have flexibility to send both high- and
low-priority objects. Second, each TCP connection per-
forms multiple short transmissions. As mentioned earlier,
IMG objects, which account for the half of total byte size
of a Web page, generally have small average byte sizes
and can be delivered by a few packets. In these short bursty
on-off transmissions, priority-based connection schemes
cannot assign a desired portion of bandwidth accurately.
Thus, PF uses a delayed-transmission scheme. When
information of new objects are extracted from an HTML
document and they are prioritized as the highest level, PF
inserts the corresponding request messages into the al-
ready-in-use queues. In this scheme, low-priority objects
are fetched only after all the higher-priority objects have
been downloaded, i.e., after the higher-priority queues be-
come empty.
3.1.3. Re-prioritization
When browsing Web pages, a user may scroll to another
view other than the current one before all the on-screen
objects in the current screen are fully downloaded. For
example, a user may perform fast scroll by searching with
a keyword and clicking an internal link to another part in
the same page. In this scenario, the initial prioritization
may not perform efficiently, and thus a proper mechanism
is required.
When the screen focus is moved to a new area, PF re-
moves all the IMG objects that reside in request queues
and re-prioritizes them for the newly focused area. For the
objects that are currently being downloaded, PF waits for
their completeness. The reason for allowing this off-screen
fetching is that most Web browsers, as applications above
Fig. 8. Overview of three mechanisms.

transport layer, do not have mechanisms to manage discon-
nections and re-connections. PF thus keeps the currently
incoming transfer and only updates the priority levels of
the queues involved. As mentioned earlier, in this work,
we consider only adjusting starting time to fetch different
screens using the currently existing transport protocols.
3.2. Objects reordering (OR)
For parallel connections to a single server, the OR mech-
anism uses balanced ordering of objects to gain beneﬁts in
terms of reduced response time. Its basic operation con-
sists of the following two steps. First, it performs a TCP-
aware object reordering until all objects are completely
fetched. Second, when one or more connections become
idle after completing their object transmissions, it per-
forms inter-connection load balancing. The detailed opera-
tions are illustrated in Fig. 9b.
3.2.1. TCP-aware objects reordering
When multiple objects are fetched along a single con-
nection, TCP-aware reordering of the fetching sequence
can increase download speed by minimizing the adverse
effect of slow start in TCP.
Let us assume that three objects (obj_ 1, obj_ 2, and
obj_ 3), which have the size of seven, three, and two pack-
ets respectively, are waiting to be fetched along a single
connection. If the TCP connection is newly created and its
congestion window size (cwnd) starts at two,2
the total
downloading time for an order of obj_1? obj_2? obj_3 is
ﬁve round-trip times (rtts) as shown in Fig. 10a. However,
if an opposite order of fetching is allocated, it may reduce
the total fetching time to three rtts as shown in Fig. 10b.
Thus, appropriate object ordering may save response time
in the order of rtt of the path.
TCP slow start can kick in at any time during the down-
loading process. However, since upper layers and HTTP are
unaware of each other’s status information, there is no way
to take advantage of them without some other cross-layer
mechanisms. Thus, the TCP-aware objects reordering
scheme is designed to make use of the slow-start phase
in TCP.
Fig. 9. Flow charts of the three mechanisms.
2
More than 95% of the servers do not perform TCP-JumpStart [28] and
start cwnd with two. We do not consider the Delayed ACK Scheme
described in RFC 2581 here for better observation of the change of cwnd.

Fig. 11 shows how the cwnd value varies after fetching
each object in the slow-start phase. Bandwidth under-uti-
lization in the slow-start phase can be estimated as
BWuu;ss ¼
XN
n¼1
2n
À dsizeobj n=MSSe
À Á
Á MSS ½bytesŠ; ð1Þ
where N is the total number of objects to be fetched in the
slow-start phase, sizeobj n is the byte size of the nth object,
and MSS is the maximum segment size in TCP. We assume
that if an object size is larger than the current cwnd value,
the object is regarded as multiple objects and thus its
transfer uses full cwnd for one or more round trips except
the last trip. However, before the entire HTML document is
completely downloaded, the browser can obtain only a
partial set of entire objects. Thus, OR does not perform full
scheduling with all object, and instead selects and fetches
the best object for the current cwnd one after another.
The basic operation steps of TCP-aware ordering are as
following.
(i) Find a set of objects that can be transmitted by a sin-
gle round-trip with the current cwnd value.
(ii) If such objects are not found, increase the number of
round-trips by one and double cwnd. Repeat this
process until objects that can be sent by the current
number of round-trips are found.
(iii) Among the found objects with the smallest number
of round-trips, find the best object that has the min-
imum gap with the current cwnd.
(iv) Remove the best object from the list of the remain-
ing objects, update cwnd, and repeat the entire pro-
cess until all the objects are fetched.
Fig. 12 shows the pseudo code of the above operations.
TCP-aware ordering shows a mice-elephants-mice
fetching pattern. Since OR always tries to send an object
that fits for the current cwnd, it initially shows the similar
pattern of exponential increase, which can be seen when
cwnd is fully used in every round-trip. However, once cwnd
becomes large enough, OR cannot find appropriate objects
for the current window size any more and begins to fetch
from bigger to smaller objects. Fig. 13 shows an example
of object-fetching patterns.
This fetching pattern has the following two advantages:
First, it minimizes the required number of round-trips for
fetching multiple objects; Second, it enables rescheduling
small objects in a finer granularity for inter-connection
load balancing, which will be described later.
3.2.2. Inter-connection load balancing
As we identify in Section 2, conventional browsers per-
form a round-robin assignment to distribute object re-
quests to multiple connections. This size-unaware
assignment causes unsynchronized ending time among
different connections and as a result increases response
time. Therefore, OR performs load balancing among con-
nections using byte-based metric rather than simple
round-robin in order to synchronize their ending time.
Since larger byte size typically translates to longer
downloading time in object fetching, OR synchronizes end-
ing time of different connections by distributing same
amount of objects to every connection. A more accurate
way to synchronize the ending time could be one that also
considers other parameters such as the number of objects,
the per-object processing time, and etc. Thus, the expected
downloading time, Tdownload, can be given by
Tdownload ¼
sizedata
BWavailable
þ n Ã
rtt
2
þ Tproc; ð2Þ
Fig. 10. Example of TCP-aware reordering.

where n is the number of objects, Tproc is the processing
time, and rtt is the round-trip time between the client
and the server. rtt/2 in the second term of the equation de-
scribes the overhead caused by request-reply handshaking
performed per object in HTTP. OR simpliﬁes the metric by
considering only data size, based on the observation that
the ﬁrst term, sizedata/BWavailable dominates over other
terms in low-bandwidth networks.
Performing OR requires the byte-size information of ob-
jects. Since this information is normally not included in
HTML documents, OR estimates it by considering both
the object’s pixel size included in HTML documents and
the object formats such as gif and jpeg. Based on this esti-
mated data size, OR sends object requests through multiple
connections in a balanced way. A time with an expiration
value, u, is used to strike the balance between amount of
objects and increased response time.
Fig. 11. Reordering using OR in the slow-start phase.
Fig. 12. Pseudo code for object reordering.
5
10
15
20
1 2 3 4 5 6 7 8 9 10
Objectsize[KB}
Fetching sequence
Before reordering
After reordering
Fig. 13. Object fetching pattern in OR.

Irrespective of initially balanced assignments of objects
among connections, the ending time of object download in
connections may still vary significantly due to dynamic
behaviors of connections. If due to some reasons one con-
nection is delayed and the other connection becomes idle,
it is possible to reschedule the objects from the busy con-
nection to the idle one and thus reduce response time even
more. Inter-connection load balancing runs in an on-de-
mand fashion during the fetching process in order to deal
with the dynamic nature of the connections.
3.3. Connection management (CM)
CM addresses the bandwidth under-utilization problem
when fetching objects from multiple servers by controlling
the numbers of connections a browser can open to differ-
ent servers. By adjusting the number of connections for
each server, CM can effectively synchronize the ending
time of downloads in the connections. As a result, a larger
number of active connections improves bandwidth utiliza-
tion, and the response time is reduced. CM consists of two
components: per-connection load estimation and dynamic
connection assignment. The detailed operations are illus-
trated in Fig. 9c.
3.3.1. Per-connection load estimation
In order to estimate the ending time of downloading,
CM uses objects’ byte sizes that OR estimated earlier. The
intuition of CM is to assign more connections to servers
with larger data size, while assigning less connections to
servers with smaller data size. To maintain friendliness
to current browsers and compatibility to the published
standards, the total number of connections in our mecha-
nism is maintained the same as that in today’s popular
browsers. To achieve this purpose, whenever it assigns
one more connection to some server, one less connection
should be deducted from some other server. Furthermore,
CM limits the maximum number of connections per server
to four because of several reasons, including the observa-
tion made by [25] stating that allocating too many connec-
tions does not necessarily lead to better performance.
3.3.2. Dynamic connection assignment
While fetching an HTML document from a server, a
browser also parses the document, searches object defini-
tions in the document, and fetches the objects at the same
time. Whenever it detects a new object, it estimates the
byte size of this object, and starts a timer with a d expira-
tion value. The setting of this timer requires careful consid-
eration. On one hand, CM needs to collect some amount of
object samples to achieve improvements. Thereby d should
not be too small. On the other hand, CM should not delay
object fetching significantly to avoid increasing response
time adversely, and thus the expiration value should not
be very large such as dozens of milliseconds.
After the expiration of this timer, CM performs the ini-
tial assignment based on object information collected so
far. During the process of fetching objects, it keeps record-
ing the object information on how much data already re-
ceived. This information will be used again to adjust the
number of connections.
3.3.3. Analytical model
This CM mechanism can be formulated into the follow-
ing analytical model. Given a set of servers, S ¼
fsi j 1 6 i 6 Ng, where N is the total number of servers,
let di denote the total data size of objects from server si. Gi-
ven a connection set, C ¼ fci j 1 6 i 6 Ng, where ci is the
number of connections opened for the server si, CM finds
a minimized maximum value of Li ¼ di=ci.
C is also subject to three other constraints. First, the to-
tal number of connections should not exceed 2N to main-
tain friendliness to current browsers. Second, ci should
not exceed the number of objects in si. Third, ci should have
a range from one to four.
Let us use ni to denote the number of objects in server si.
The output C should achieve the purpose described in Eq. 3,
and satisfy the constraints denoted in Eq. 4.
Lmax is minimized; where 1 6 i 6 N; ð3Þ
X
16i6I
Ci 6 2I and 1 6 Ci 6 minf4; Ng: ð4Þ
The detailed algorithm is as follows. In the beginning,
every server is assigned two connections, and CM com-
putes the largest and smallest values of Li ¼ di=ci. We use
Lmax and Lmin to denote the largest and smallest values of
Li, respectively. Let us assume that the server sj has the
largest value (i.e., Lmax ¼ dj=cj), and the server Sk has the
smallest value, Lmin. Now, we increase cj by one (i.e.,
cj ¼ 3), and decrease ck by one, then compute the new
maximum value, L0
max. If L0
max < Lmax, that means the new
assignment has a smaller maximum value, and then the
algorithm will continue to run. Otherwise, if L0
max > Lmax,
the algorithm stops and resumes to previous assignment,
since the new assignment results in a larger maximum
value.
The algorithm runs whenever an object is downloaded.
For every run di is set to be the remaining data size for the
server si. Thus, the algorithm will be performed whenever
an object-related event happens. For this reason, an ad-
verse effect of fluctuation on the number of connections
assigned to servers may possibly happen. To reduce this
fluctuation, we introduce a threshold value, s. In CM, only
when the L0
max value between two consecutive iterations
is larger than s, the algorithm will adjust connections set.
We suggest a 10% of previous L0
max as the s value.
4. Simulation results
In this section, we evaluate the performance of the pro-
posed mechanisms in WebAccel, and compare it with that
of conventional Web browsers. In order to evaluate the
performances, we use the ns2 simulator [24]. Unless other-
wise noted, the network configurations as well as the Web
characteristics used in simulations are the same as de-
scribed in Section 2. We use the same network topology
as shown in Fig. 1 with an assumption that the local DNS
server has all the required domain information.
Response time for the initial screen is used as the pri-
mary metric for comparing performances. In this section,
we compare five different schemes including conventional
(Conv), PF only (PF), PF with OR (PF+OR), PF with CM

(PF+CM), and all-integrated (All) schemes. To better ex-
plore the impacts of some factors on the performances,
we vary some object parameters, such as object sizes and
total number, numbers of servers involved, number of con-
nections opened to a single server, network characteristics
such as link bandwidth and rtt, and user’s fast scrolling to
different screens.
4.1. Impact of object characteristics
Fig. 14 shows the initial response time of a conventional
Web browser, and our proposed solution when the stan-
dard deviation of individual object size and the number
of objects in the first screen vary. As shown in the figure,
when used in combination, the three proposed mecha-
nisms can reduce up to 30% of response time compared
to current browsers.
Fig. 14a shows that as the variance of object size in-
creases, both of the conventional and our schemes show
worse performance. For conventional browsers, the reason
is obvious, since larger variance can be translated in to re-
duced bandwidth utilization as described in Section 2. Our
mechanisms can alleviate this problem, and thus reduce
the initial response time. However, since the problem still
exists and becomes more severe when the variance value
of object size increases, the performance degradation is
still expected.
Fig. 14b shows the performance differences between
conventional and proposed browsers when the total num-
ber of objects increases. In the figure, two trends can be
seen. First, as more objects are included in a Web page lar-
ger response time is expected. Second, the response time
reduced by the proposed solution is larger since all of the
three mechanisms can gain more benefits.
4.2. Impact of number of connections and servers
Fig. 15a shows the impact of number of connections to a
single server, and it can be seen that up to 20% of response
time can be reduced by using the proposed mechanisms. In
the figure, as the number of connections to a single server
increases, both conventional and our solutions have smal-
ler response time. However, as this number exceeds four,
there is no obvious performance improvement with more
connections. This result is consistent with the results pre-
sented in other works [25].
Fig. 15b shows how the initial response time varies as
the number of servers for a Web page increases. Two
observations can be made from the figure. First, increasing
the number of servers does not necessarily always result in
better performance for both conventional and proposed
browsers. Second, with more servers, our solution can
achieve more improvements compared to conventional
browsers.
4.3. Impact of network characteristics
Fig. 16a shows how the initial response time changes
under varying bottleneck bandwidth. As shown in the fig-
ure, our solution brings more performance improvement
for smaller bandwidth. It is because of the fact that smaller
bandwidth makes the screen-contention problem identi-
fied in Section 2 more severe, and thus our solution can re-
duce response time more by alleviating this problem.
Fig. 16b shows how the initial response time is affected
by the rtt values. Since the major effects of rtt come from
the request-response behavior of HTTP protocols (i.e., each
object is fetched upon the request from the Web client and
thus takes at least one rtt to fetch one object) and our solu-
tion can alleviate this effect by removing some of these rtts
required, our solution sees better performance. As shown
in the figure, around 20% performance improvement is
achieved by our solution under the rtt values considered
in the evaluation.
4.4. Impact of fast scroll
Fig. 17 shows the response time performance when a
user performs fast scrolling. The x-axis of the graph shows
the screen to which a user scrolls, and the y-axis is the
response time. We assume that scrolling is performed
when a Web browser completes downloading of the main
HTML document and the entire document layout becomes
available.
12
14
16
18
20
22
24
2 3 4 5 6 7 8 9 10
Responsetime[s]
Standard deviation of object size [KB]
Conv
PF
PF+CM
PF+OR
All
0
10
20
30
40
50
60
70
80
20 30 40 50 60 70 80 90 100
Responsetime[s]
Number of objects
Conv
PF
PF+CM
PF+OR
All
Fig. 14. Impact of object characteristics.

As seen from the ﬁgure, the response time increases
when a user scrolls farther away from the initial screen
in conventional Web browsers. It is because conventional
browsers perform greedy fetching without considering
the locations of objects on a screen, and thus display of
any screen requires downloading of all previous screens.
In contrast, our solution has smaller response time as a
user scrolls farther away from the initial screen. That is,
if a user simply scrolls to the fourth screen, it can experi-
ence even smaller response time than any preceding
screen. Since PF performs non-sequential fetching and
fetches the current screen ﬁrst, the response time does
not depend on the screen number, instead, is determined
by the data size in the current screen. Consequently, as less
data are located in farther screen (as seen in most popular
Web pages), the response time for these screens is less
than that for preceding screens. Thus, we see a 70% re-
duced response time when the users jump to the fourth
screen.
5. Prototype implementation
In this section, we demonstrate the feasibility of our
solution using a simple implementation. Because of inac-
cessibility to the source code of browsers [20] and high
complexity of message queue structures in browsers [29],
we focus only on the simple PF mechanism. In order to pro-
totype the PF mechanism, we designed a plug-in program
for Microsoft Windows, which does not require a full
10
15
20
25
30
35
1 2 3 4 5 6
Responsetime[s]
Conv
PF
PF+OR
10
15
20
25
30
35
1 2 3 4 5 6
Responsetime[s]
Number of servers
Conv
PF
PF+CM
PF+OR
All
Fig. 15. Impact of numbers of connections and servers.
0
5
10
15
20
25
30
35
40
50 100 150 200 250
Responsetime[s]
Bandwidth [Kbps]
Conv
PF
PF+CM
PF+OR
All
12
14
16
18
20
22
24
100 150 200 250 300 350 400 450 500
Responsetime[s]
Round-trip time (rtt) [ms]
Conv
PF
PF+CM
PF+OR
All
Fig. 16. Impact of network characteristics.
0
5
10
15
20
2 3 4
Responsetime[s]
Screen number
Conv
PF
PF+CM
PF+OR
All
Fig. 17. Impact of fast scroll.

access to the source code of Internet Explorer but still al-
lows control of delivery and display through interface
functions.
5.1. Architecture of Web browsers
Generally, a Web browser consists of five main compo-
nents: graphic user interface (GUI), browser control, pars-
ing engine, rendering engine, and memory space called
context. Fig. 18 provides a high-level overview of a typical
browser.
The GUI part of the browser is located at the top level
and allows a user to interact with the browser by providing
graphical components, including windows, frames, tool-
bars, and so on. It also initiates or closes a browser in-
stance, and manages system messages delivered from a
message queue of the operating system.
The browser control performs main functions of the
browser: navigation, linking, history and favorite sites
management, and support for various other document
types, such as Word and Acrobat document files. The
functions of the browser control can be overriden by
implementing an additional control program without
modifications of the source code, and many plug-in soft-
wares for browsers are developed in this manner.
The parsing engine interpretes HTML documents,
fetches embedded objects described in documents, and
transforms the original document hierarchy of HTML tags
into the display layout structure. It consists of a scanner
that reads input streams in real-time, a parser that inter-
prets tags and constructs a DOM tree, and a grammar
checker that examines the target (mostly HTML) grammar
of Web objects.
The rendering engine or layout engine displays graphi-
cal Web objects in the client area of a browser window by
following formatting information defined by CSS and XSL.
Microsoft Internet Explorer has a Trident engine (also
known as MSHTML) [30] for both parsing and rendering,
which also provides a WYSIWYG HTML editing functional-
ity in a manner similar to text editing in Microsoft Word.
Mozilla-based browsers such as Firefox and Netscape
Navigator use a Gecko rendering engine [31], which is
open-source and cross-platform. Opera [32] and Adobe
Dreamweaver use a Presto rendering engine. It has a
strength in small screen rendering, and many mobile and
low-end devices such as Nokia’s Symbian-loaded smart-
phones and Nintendo DS use this engine. Besides, various
rendering engines, including Ritlabs’ Robin, KDE’s KHTML,
and Apple’s WebCore are used in other popular Web
browsers.
Context is a memory resource that GUI-based operating
systems provide to display graphical objects or screens
through a display monitor or other devices. Nowadays,
many Web browsers follows a dual-context structure,
which uses an additional context, called rendering context,
which stores partial display information in a browser to in-
crease rendering speed as well as to reduce resource usage.
5.2. Prototype design
Our prototype for priority fetching has been developed
as a plug-in program in the WinAPI environment. It can be
installed upon Internet Explorer and interacts with the
browser control of IE without modification of Windows
registry. The prototype performs three basic operations:
on-screen object detection, calculation of object location
in layout, and download control.
The prototype initially disables the fetching option for
all external objects, i.e., Show Pictures in IE. When a user re-
quests to access a new Web page, it gets the document dis-
patch from the browser control through GetDocument.
Then, it obtains the IHTMLDocument2 interface pointer,
the element pointer IHTMLElement that directs the docu-
ment body, and the container interface pointer IHTMLText-
Container of the body element, sequentially. From the
container pointer, it calculates the scrollable area size of
the browser window and the current scroll position
through GetHeight/Width() and get_scrollTop/Left().
On the other hand, it requests an IMG collection to the
document interface. After that, it gets the IDispatch inter-
face pointer of each IMG element and the IHTMLImgEle-
ment interface from IDispatch. Then, it measures the pixel
sizes and the location of the IMG elements through
get_width/height(), get_height() and get_offsetTop/Left(). If
the IMG element has a parent element, it calculates the
margin from the parent element and moves the element
focus to the parent. After repeating this process until there
is no parent any more, it sums all the relative location to
calculate the absolute position. Then, it decides whether
the element is on- or off-screen.
The current version of the prototype performs periodic
detection and fetching through SetTimer() and KillTimer().
If the time interval is too long, the browser control may
not be able to follow user’s scrolling speed and update cor-
rectly. If it is set as too (i.e., the refreshment rate of the cli-
ent area is too high), thus the browser may excessively
consume system resource by performing unnecessary
detections. Currently, we set this value as 1000
milliseconds.Fig. 18. Structure of a typical Web browser.

After the initial prioritization, it performs selective de-
layed transmission for the objects; it fetches high-priority
objects immediately and delays low-priority objects until
all high-priority objects are completely downloaded. In
the current prototype, we use a simple prioritization
scheme that gives on-screen objects high-priority and
off-screen ones low-priority. Fig. 19 shows the screenshots
of browsing amazon.com in the prototype.
5.3. Performance results
For the performance evaluation of the prototype, we ac-
cessed two popular Web sites; amazon.com and cnn.com.
In the experiments, the pixel sizes of the sites were mea-
sured as about four and five screens, respectively.
Fig. 20 shows the transfer sizes and the numbers of
embedded objects per object type. In browsing
amazon.com, about 15% of the objects in number are
non-image, however it counts for a half of the total transfer
size in bytes. On the contrary, the off-screen objects in the
initial screen account for 30% in transfer size but 50% in
number. In accessing cnn.com, this unbalance between
the transfer size and the number of objects becomes more
severe. In Fig. 20b, the number of the off-screen objects
amounts to more than 60%, whereas their aggregated
transfer size takes only 17%.
Fig. 21 describes the initial and full response time per-
formance for the same two pages in Internet Explorer
and our prototype. In the figure, the darker area shows
the response time for the initial screen in both browsers,
and the lighter area shows the additional delay to fetch
the entire objects in Internet Explorer after displaying the
initial screen completely. When browsing amazon.com,
the prototype reduces the initial response time by five sec-
onds, i.e., Internet Explorer wastes five seconds to down-
load many off-screen objects before the initial screen is
fully displayed. In the case of cnn.com, the performance
benefit increases up to 40%. The figure also describes that
the performance is degraded not only by a large size of
off-screen objects but also by a large number of those
objects.
6. Related works
To accelerate Web delivery in today’s Internet, espe-
cially for users who access via low-bandwidth links, exten-
sive research has been conducted and various approaches
have been proposed.
0
50
100
150
200
250
300
350
400
Non-image
objects
On-screen image
objects
Off-screen image
objects
0
20
40
60
80
100
Transfersize[KB]
Numberofobjects
Transfer size
Number of objects
0
100
200
300
400
500
600
700
Non-image
objects
On-screen image
objects
Off-screen image
objects
0
20
40
60
80
100
120
140
Transfersize[KB]
Numberofobjects
Transfer size
Number of objects
Fig. 20. Transfer size and number of objects in prototype.
Fig. 19. Screenshot of browsing amazon.com in the prototype.

6.1. Web characteristics
In order to obtain optimization techniques related to
Web fetching, a lot of research have studied the character-
istics of embedded objects included in Web pages.
Bray [33] and Woodruff et al. [34] perform a quantita-
tive analysis for the select several million Web pages and
provide the basic statistics of byte size, tags, attributes, ob-
ject-file types, links in those pages, using their tools and
search engines.
In [35], Breslau et al. investigate the HTTP-request dis-
tribution through Web proxy caches and find that it fol-
lows Zipf’s law; the probability of a request for the ith
most popular page is proportional to 1/i.
Douglis et al. [36] quantified the benefit of a proxy
cache by using traces collected at two large corporations;
AT&T Labs and Digital Equipment Corporation. Through
the trace collection and analysis, they conclude that access
rate, life time, and modification rate of Web objects depend
mainly on content type and domain name.
Shi et al. [37] have proposed a methodology to obtain
characteristics of dynamic Web objects. Through an analy-
sis of six popular Web sites’ content, they conclude that
object sizes and freshness times of dynamic objects follow
an exponential or Weibull distribution.
6.2. Web caching
Web accesses from large population of users typically
follow the Pareto principle; 80% of total Web content is ac-
cessed only by 20% of users, i.e., most users access only a
small part of the entire Web content in Internet [38]. As
a result, Web servers having popular content become over-
loaded and underpowered by repeating the process for the
same content continuously. In order to reduce this unnec-
essary bandwidth consumption, many Web cache tech-
niques have been developed.
First, a Web cache can be deployed in the browser level.
Most Web browsers, including Microsoft Internet Explorer,
Netscape Navigator, and Mozilla Firefox [29], use private or
shared caches to keep records of accessed Web cotent, such
as HTML documents, images, and videos. A browser-level
cache can be deployed easily and show the best access-
time performance for the cached content data. However,
only limited content can be cached and re-accessed by a lo-
cal user(s), who uses the browser in the same machine. In
addition, a user(s) is required to customize and maintain
the cache periodically.
Xu et al. [39] propose a cooperative client-cache tech-
nique, where all the clients generate a large virtual cache
by sharing their browser caches in a P2P fashion. It has
high scalability since caching and data lookup operations
are distributed across all clients and super-clients Their
simulation results show that the proposed scheme show
better performance than proxy-based cache solutions.
In order to support a large number of group users in a
large organization like a campus, company, and ISP, forward
proxy caches such as Squid [1], Wingate [2], and Privoxy [4]
have been developed. These proxy caches store local copies
of frequently accessed content, and provide them to clients
in the group. Most forward proxy caches are not transpar-
ent to browsers, and users need to explicitly configure their
browsers to use them. In addition, they may not be able to
perform user authentication for specific content. RFC 3143
[40] discusses problems with these proxy caches.
Mogul et al. [41] investigate the potential benefits of
data compression and differential update in Web caches
and show that the combination of both can yield the best
performance in transfer size and time [42].
Proxy caches, called reverse proxie caches, also can be
deployed in front of Web servers by content providers.
The reverse caches process HTTP requests on behalf of
the main Web servers or pass them to the main servers if
necessary. Except some prducts such as SpiderCache [43]
and CacheFlow [10], most reverse proxies cannot cache dy-
namic Web content, which is generated based on ASP [44],
JSP [45], or Zope [46]. This dynamic content generally re-
quires a significant amount of processing resource.3
To distribute compute-intensive or large load to multi-
ple servers, multiple reverse caches may compose a coop-
erative Web-delevery system, called content delivery
network (CDN), where all reverse caches serve content
0
10
20
30
40
50
60
Internet Explorer Prototype
Responsetime[s]
Additional delay for full response
Initial screen response time
0
10
20
30
40
50
60
Internet Explorer Prototype
Responsetime[s]
Additional delay for full response
Initial screen response time
Fig. 21. Response Time in Prototype.
3
Generating a dynamic Web page may require up to 60 rount trips
between Web and database servers, and thus the Web server may become
unresponsive frequently [43].

behind a single IP address of the origin server. Current CDN
providers, such as Digital Island [15] and Akamai [14], gen-
erally deploy multiple server groups in multiple geograph-
ical locations. This edge-origin deployment increases user-
perceived browsing speed by minimizing Web-delivery
distance, which is calculated as a number of hops or
round-trip time. However, the initial deployment of the
infrastructure entails considerable expense, and additional
maintenance is required continuously. Thus, Web-content
providers generally outsource the service from CDN
providers.
6.3. Prefetching
Even though built-in caches in Web browsers and proxy
caches maintain a significant amount of cached objects,
objects requested by users may not be available in these
caches because of many reasons, such as new accesses,
limited cache size, and cache expiration. In order to reduce
fetching time for these non-cached object, many research-
ers has focused on object-prefetching techniques.
Padmanabhan et al. [47] propose a server-based pre-
fetching scheme, where the server makes predictions of
user’s future Web accesses through a graph-based Markov
model and the client decides to prefetch these objects.
Using the simulations, they show that user-perceived la-
tency can be reduced significantly at the cost of a net-
work-traffic increase.
Duchamp [48] proposes a client-initiated prefetching
approach, where the clients send the records of the hyper-
links included in their accessed pages to let the servers dis-
tribute them to all the clients by piggybacking on GET
responses. Based on the access frequency, the clients select
Web pages, which they would prefetch in the caches.
Through a real-life implementation with modification of
Mozilla and httpd, the author proves that the proposed
approach reduces response time by more than 50%.
On the other hand, Chen et al. [49] propose a cooperative
prefetching scheme between a server and a proxy, which
minimizes communication overhead by adaptively utiliz-
ing the reference access information in two different levels.
In their scheme, the access information stored in the serv-
ers are used only for objects not qualified for proxy-based
prefetching. Using trace-driven simulations, they show that
the hit ratios are increased by 5% to 88% compared to proxy-
and server-based prefetching schemes.
Ibrahim et al. [50] propose a context-oriented prefetch-
ing technique, which uses keywords in HREFs to capture
user-access patterns and neural networks to predict user’s
future access patterns. Unlike URL-based schemes, this
technique does not require the historical references of re-
quested objects but uses its self-learning capability and
adaptivility. Through the experiments with MSNBC and
CNN Web sites, the paper shows that the proposed scheme
achieves up to 60% hit ratios.
Web-acceleration products, such as Google Web Accel-
erator [9], CacheFlow [10], and Network Appliance’s NetC-
ache [11], reduce user response time by using a mix of
prefetching and proxy-based caching in current Web appli-
cations. However, since those products are optimized for
broadband networks and perform prefetching aggressively,
low-bandwidth users may not see any improvement and in-
stead excessive bandwidth consumption can degrade per-
formance of other users or applications significantly [51].
6.4. Transcoding
Over the past few decades, available network band-
width for has increased dramatically, however many users
in dial-up connections or mobile networks still suffer from
narrow bandwidth. Since most Web content has been de-
signed without considering users’ various computing envi-
ronments, these low-bandwidth users often have to wait
an inordinately long time to download a Web page. To
solve this diversity problem, many researches have been
performed in the area of content adaptation.
In [52–54], the authors present proxy-based transcod-
ing approaches, where application-specific transcoders
convert HTTP responses into different formats that are bet-
ter suited for the client. Han et al. [55] derive the theoret-
ical conditions of transcoding and present adaptive
transcoding policies for mobile Web browser. These
proxy-based approaches can be easily deployed without
major modifications in current networks, however in most
cases they need to perform lossy compressions that de-
grade the quality of images or sound significantly.
Gilbert et al. [56] propose a new Web delivery scheme
that improves initial-screen response time performance
using progressive JPEG coding. The scheme also allows to
accelerate downloading specific images, to which users
explicitly point with interest. Using a Web proxy and
browser-side Java applets, the authors implement a proto-
type for evaluation and show that delivery time for the first
visible layer can be reduced by up to 80%.
Noble et al. [57] also propose an application-aware dis-
tillation technique that controls the compression ratio of
object in order to adapt to changing network environ-
ments. In the paper, they construct a prototype, called
ODyssey, which controls the quality of objects in three
modified applications; a video player, a Web browser,
and a speech recognition system.
Joshi [58] proposes a combination of end-to-end and
proxy-based approaches as an ideal solution for supporting
mobile hosts. The proxy explicitly requests data from serv-
ers that have a resolution matching the present QoS and
client capabilities.
Lara et al. [59] propose a system for adapting compo-
nent-based applications in a mobile environment, called
Puppeteer. It has the advantage of adaptive transcoding
execution by a proxy without modifying applications.
However, it cannot overcome a quality degradation prob-
lem caused by limitations of transcoding in thick-client
computing.
Flinn et al. [60] propose Spectra, a remote execution
system, to balance performance, energy conservation, and
application quality. Even though it manages resources
effectively in a bandwidth-limited environment, it has a
limitation of application dependency. Therefore, it needs
newly structured applications for Spectra or modification
of current applications.
Currently, a lot of prototypes and commercial transcod-
ing products, such as UC Berkeley’s Transend [6], Intel’s

QuickWeb, IBM’s WebExpress [7], and Oracle’s Portal-To-
Go [8], have been developed to improve Web-response
performance by reducing image quality or size via lossy
compression or image transformation. However, these
solutions are difficult to deploy since it requires support
from non-browser entities.
7. Conclusions
In this paper, we first explore the reasons that conven-
tional Web access models are not appropriate for low-
bandwidth hosts. We identify two major reasons, screen
contention and bandwidth under-utilization, which result
in large user-perceived response time. To address these
problems, we propose a new Web access scheme for low-
bandwidth hosts, called WebAccel, which uses an intelli-
gent mix of prioritized fetching, object reordering, and
connection management. Using ns2 simulations with the
Web parameters obtained from the top Web sites, we eval-
uate the performance of our scheme and prove its benefits
over conventional access models. We also perform a real-
life implementation using our prototype for the priority
fetching mechanism and verify its performance improve-
ment by browsing popular Web sites.
Acknowledgement
This work was supported in part by the National
Science Foundation under grants CNS-0721296, CNS-
0519733, and CNS-0519841.
References
[1] T. Squid, Squid web proxy cache, <http://www.squid-cache.org>.
[2] Qbik New Zealand Limited, WinGate. URL <http://www.wingate.
com/product-wingate.php>.
[3] IBM Inc., IBM Websphere Edge Server. URL <http://www-306.ibm.
com/software/webservers/edgeserver/>.
[4] Privoxy Developers, Privoxy Web Proxy. URL <http://www.privoxy.
org/>.
[5] Sun Microsystems, Inc., Sun Java System Web Proxy Server. URL
<http://www.sun.com/webserver>.
[6] F. Armando, S. Gribble, Y. Chawathe, E. Brewer, The transend service,
<http://transend.cs.berkeley.edu/about>.
[7] B.C. Housel, G. Samaras, D.B. Lindquist, Webexpress: a client/
intercept based system for optimizing web browsing in a wireless
environment, Mobile Networks and Applications 3 (4) (1999) 419–
432.
[8] Oracle Inc., Oracal Portal-To-Go: Any Service to Any Device (October
1999).
[9] Google Inc., Google Web Accelerator. URL <http://webaccelerator.
google.com>.
[10] B.C. Systems, Cacheflow. URL <http://www.cacheflow.com/>.
[11] I. Network Appliance, Netcache. URL <http://www.netapp.com/
products/netcache/>.
[12] R. Fielding, J. Gettys, J. Mogul, H. Frystyk, L. Masinter, P. Leach, T.
Berners-Lee, RFC 2616 – Hypertext Transfer Protocol – HTTP/1.1.
URL <www.ietf.org/rfc/rfc2616.txt>.
[13] J.C. Mogul, The case for persistent-connection http, Computer
Communication Review 25 (4) (1995) 299–313.
[14] M. Beck, D. Arnold, A. Bassi, F. Berman, H. Casanova, J. Dongarra, T.
Moore, G. Obertelli, J. Plank, M. Swany, S. Vadhiyar, R. Wolski,
Logistical computing and internetworking: Middleware for the use
of storage in communication, in: Proceedings of the 3rd Annual
International Workshop on Active Middleware Services (AMS), 2001,
pp. 12–21.
[15] Digital Island. URL <http://www.digitalisland.co.nz/>.
[16] A. Ninan, P. Kulkarni, P. Shenoy, K. Ramamritham, R. Tewari,
Cooperative leases: scalable consistency maintenance in content
distribution networks, in: Proceedings of the 11th international
conference on World Wide Web, ACM Press, New York, NY, USA,
2002, pp. 1–12.
[17] Wireless Application Protocol Forum, WML 2.0 DTDs. URL <http://
www.wapforum.org/DTD/wml20.dtd>.
[18] Open Mobile Alliance Inc., WAP FORUM. URL <http://www.
wapforum.org>.
[19] Qualcomm Inc., Binary Runtime Environment for Wireless (BREW).
URL <http://brew.qualcomm.com>.
[20] Microsoft Corporation, Microsoft Internet Explorer. URL <http://
www.microsoft.com/windows/products/winfamily/ie/>.
[21] Netscape Communications, Netscape Navigator Web Browser. URL
<http://browser.netscape.com/>.
[22] Z. Wang, P. Cao, Persistent connection behavior of popular browsers,
<http://www.cs.wisc.edu/cao/papers/persistent-connection.html>
(December 1998).
[23] comScore Networks Inc., comScore Media Metrix Top 50 Online
Property Ranking. URL <http://www.comscore.com/press/release.
asp?press=547>.
[24] LBL and Xerox PARC and UCB and USC/ISI, The Network Simulator –
ns-2. URL <http://www.isi.edu/nsnam/ns>.
[25] R. Chakravorty, S. Banerjee, P. Rodriguez, J. Chesterfield, I. Pratt,
Performance optimizations for wireless wide-area networks:
comparative study and experimental evaluation, in: Proceedings of
the ACM Mobicom 2004, 2004, pp. 159–173.
[26] W.W.W. Consortium, W3c document object model, <http://www.
w3.org/DOM/>.
[27] A. Kuzmanovic, E.W. Knightly, Tcp-lp: A distributed algorithm for
low priority data transfer, in: Proceedings of the IEEE INFOCOM
2003, 2003, pp. 1691–1701.
[28] J. Pahdye, S. Floyd, On interring TCP behavior, in: Proceedings of the
ACM SIGCOMM 2001, 2001, pp. 287–298.
[29] Mozilla Foundation, Firefox – Rediscover the Web. URL <www.
mozilla.com/firefox/>.
[30] Microsoft Corporation, About MSHTML. URL <http://msdn2.
microsoft.com/en-us/library/bb508515.aspx>.
[31] Mozilla Foundation, Mozilla Layout Engine: Gecko Engine. URL
<http://www.mozilla.org/newlayout/>.
[32] Opera Software, Opera Browser. URL <http://www.opera.com/>.
[33] T. Bray, Measuring the web, in: Proceedings of the Fifth International
World Wide Web Conference on Computer Networks and ISDN
Systems, 1996, pp. 993–1005.
[34] A. Woodruff, P.M. Aoki, E. Brewer, P. Gauthier, L.A. Rowe, An
investigation of documents from the world wide web, in:
Proceedings of the Fifth International World Wide Web Conference
on Computer Networks and ISDN Systems, 1996, pp. 963–980.
[35] L. Breslau, P. Cao, L. Fan, G. Phillips, S. Shenker, Web caching and
zipf-like distributions: Evidence and implications, in: Proceedings of
the IEEE INFOCOM 1999, 1999, pp. 126–134.
[36] F. Douglis, A. Feldmann, B. Krishnamurthy, Rate of change and other
metrics: a live study of the World Wide Web, in: Proceedings of the
USENIX Symposium on Internet Technologies and Systems, 1997, pp.
147–158.
[37] W. Shi, E. Collins, V. Karamcheti, Modeling object characteristics of
dynamic web content, Elsevier Journal of Parallel and Distributed
Computing 63 (10) (2003) 963–980.
[38] C. Rusay, User-centered design for large government portals
(January 2003). URL <http://www.digital-web.com/articles/user_
centered_design_for_large_government_portals/>.
[39] Z. Xu, Y. Hu, L. Bhuyan, Exploiting client cache: a scalable and
efficient approach to build large Web cache, in: Proceedings of the
18th International Parallel and Distributed Processing Symposium
(IPDPS), 2004, pp. 55–64.
[40] I. Cooper, J. Dilley, RFC 3143 – Known HTTP Proxy/Caching Problems
(June 2001). URL <www.ietf.org/rfc/rfc3143.txt>.
[41] J.C. Mogul, F. Douglis, A. Feldmann, B. Krishnamurthy, Potential
benefits of delta encoding and data compression for http, in:
Proceedings of the ACM conference on Applications, technologies,
architectures, and protocols for computer communication
(SIGCOMM), 1997, pp. 181–194.
[42] J. Mogul, B. Krishnamurthy, F. Douglis, A. Feldmann, Y. Goland, A. van
Hoff, D. Hellerstein, RFC 3229 – Delta Encoding in HTTP (Jan. 2002).
URL <http://www.ietf.org/rfc/rfc3229.txt>.
[43] Warp Solutions Inc., SpiderCache. URL <http://www.warpsolutions.
com/Products/ProductsSpiderCache.php>.
[44] Microsoft Corporation, The Official Microsoft ASP.NET 2.0 Site. URL
<http://www.asp.net/>.
[45] Sun Microsystems, Inc., JavaServer Pages (JSP). URL <http://java.sun.
com/products/jsp/>.

[46] Zope Corporation, Zope. URL <http://www.zope.org/>.
[47] V.N. Padmanabhan, J.C. Mogul, Using predictive prefetching to
improve World-Wide Web latency, ACM SIGCOMM Computer
Communication Review (1996) 22–36.
[48] D. Duchamp, Prefetching hyperlinks, in: Proceedings of the 2nd
USENIX Symposium on Internet Technologies and Systems, 1999, pp.
12–23.
[49] X. Chen, X. Zhang, Coordinated data prefetching by utilizing
reference information at both proxy and Web servers, in:
Proceedings of the ACM SIGMETRICS Performance Evaluation
Review 29 (2), 2001, pp. 32–38.
[50] T.I. Ibrahim, C.-Z. Xu, Neural Nets based Predictive Prefetching to
Tolerate WWW Latency, in: Proceedings of the 20th International
Conference on Distributed Computing Systems (ICDCS), 2000, pp.
636–643.
[51] I. Alshanetsky, Google Web Accelerator and the dangers of
prefetching (May 2005). URL <http://ilia.ws/archives/46-Google-
Web-Accelerator-and-the-dangers-of-prefetching.html>.
[52] C. Brooks, M.S. Mazer, S. Meeks, J. Miller, Application-speciﬁc proxy
servers as HTTP stream transducers, in: Proceedings of the 4th
International World Wide Web Conference on Computer Networks
and ISDN Systems, 1995, pp. 539–548.
[53] A. Fox, E.A. Brewer, Reducing WWW latency and bandwidth
requirements by real-time distillation, in: Computer Networks and
ISDN Systems, 1996, pp. 1445–1456.
[54] J. Smith, R. Mohan, C. Li, Transcoding internet content for
heterogeneous client devices, in: Proceedings of the International
Symposium on Circuits and Systems, 1998, pp. 599–602.
[55] R. Han, P. Bhagwat, R. LaMaire, T. Mummert, V. Perret, J. Rubas,
Dynamic adaptation in an image transcoding proxy for mobile Web
browsing, IEEE Personal Communications 5 (6) (1998) 8–17.
[56] J. Gilbert, R. Brodersen, Globally progressive interactive web
delivery, in: Proceedings of the IEEE Infocom 1999, 1999, pp.
1291–1299.
[57] B.D. Noble, M. Satyanarayanan, Experience with adaptive mobile
applications in Odyssey, Mobile Networks and Applications 4 (4)
(1999) 245–254.
[58] A. Joshi, On proxy agents, mobility, and web access, Mobile
Networks and Applications 5 (4) (2000) 233–241.
[59] E. Lara, D.S. Wallach, W. Zwaenepoel, Puppeteer: Component-based
adaptation for mobile computing, in: Proceedings of the 3rd USENIX
Symposium on Internet Technologies and Systems (USITS), 2001.
[60] J. Flinn, S. Park, M. Satyanarayanan, Balancing performance, energy,
and quality in pervasive computing, in: Proceedings of the 22nd
International Conference on Distributed Computing Systems, 2002.
Tae-Young Chang received his BS degree in
Electronic Engineering in 1999 and MS degree
in Telecommunication System Technology in
2001 from Korea University. He is at present a
PhD degree student in School of Electrical and
Computer Engineering at Georgia Institute of
Technology. He works at GNAN Research
Group, and his research interests are wireless
networks, mobile computing, and application
acceleration.
Zhenyun Zhuang received his BE degree in
Information Engineering from Beijing Uni-
versity of Posts and Telecommunications and
MS degree in Computer Science from Tsing-
hua University in China. He is a PhD student in
College of Computing at Georgia Institute of
Technology. He works at GNAN Research
Group, and his research interests are wireless
networking, distributed systems, Web-based
systems, and application acceleration
techniques.
Aravind Velayutham received his BE degree
in Computer Science and Engineering from
Anna University, India in 2002 and MS degree
in Electrical and Computer Engineering from
Georgia Institute of Technology in 2005. Cur-
rently, he is a director of development at
Asankya Inc., and his research interests are
wireless networks and mobile computing.
Raghupathy Sivakumar received his PhD and
MS degrees in Computer Science from Uni-
versity of Illinois at Urbana-Champaign in
2000 and 1998, respectively, and his BE
degree in Computer Science from Anna Uni-
versity, India in 1996. Currently, he is an
associate professor in School of Electrical and
Computer Engineering at Georgia Institute of
Technology. He leads GNAN Research Group,
where he and his students do research in the
areas of wireless networking, mobile com-
puting, and computer networks.

WebAccel: Accelerating Web access for low-bandwidth hosts

Recommended

Recommended

More Related Content

What's hot

What's hot (19)

Viewers also liked

Viewers also liked (20)

Similar to WebAccel: Accelerating Web access for low-bandwidth hosts

Similar to WebAccel: Accelerating Web access for low-bandwidth hosts (20)

More from Zhenyun Zhuang

More from Zhenyun Zhuang (10)

Recently uploaded

Recently uploaded (20)

WebAccel: Accelerating Web access for low-bandwidth hosts