After this tutorial, hopefully we will all know: Many lessons should be applicable to any networked server, e.g., files, mail, news, DNS, LDAP, etc.
Acknowledgements Many people contributed comments and suggestions to this tutorial, including: Abhishek Chandra Mark Crovella Suresh Chari Peter Druschel Jim Kurose Balachander Krishnamurthy Vivek Pai Jennifer Rexford Anees Shaikh Srinivasan Seshan Errors are all mine, of course.
Chapter 1: Introduction to the World-Wide Web (WWW)
Introduction to the WWW
HTTP: Hypertext Transfer Protocol
Communication protocol between clients and servers
Application layer protocol for WWW
Client: browser that requests, receives, displays object
Server: receives requests and responds to them
Proxy: intermediary that aggregates requests, responses
As we will see, a great deal of locality exists in web requests and web traffic.
Much of the work described above doesn't really need to be performed each time.
Optimizations fall under 2 categories: caching and custom OS primitives.
Again, cache HTTP header info on a per-url basis, rather than re-generating info over and over.
fileDescriptor = lookInFDCache(fileName); metaInfo = lookInMetaInfoCache(fileName); headerBuffer = lookInHTTPHeaderCache(fileName); Idea is to exploit locality in client requests. Many files are requested over and over (e.g., index.html).
Why open and close files over and over again? Instead, cache open file FD’s, manage them LRU.
Why stat them again and again? Cache path name and access characteristics.
Optimizations: Caching (cont)
Instead of reading and writing the data, cache data, as well as meta-data, in user space
Not every OS has full asynchronous I/O, so can still block on a file read. Flash uses helper processes to deal with this (AMPED architecture).
In-Kernel Model (Ex: Tux)
Dedicated kernel thread for HTTP requests:
One option: put whole server in kernel.
More likely, just deal with static GET requests in kernel to capture majority of requests.
Punt dynamic requests to full-scale server in user space, such as Apache.
user/ kernel boundary user-space server kernel-space server user/ kernel boundary TCP HTTP IP ETH SOCK TCP IP ETH HTTP
In-Kernel Model: Pros and Cons
In-kernel event model:
Avoids transitions to user space, copies across u-k boundary, etc.
Leverages already existing asynchronous primitives in the kernel (kernel doesn't block on a file read, etc.)
Extremely fast. Tight integration with kernel.
Small component without full server optimizes common case.
Less robust. Bugs can crash whole machine, not just server.
Harder to debug and extend, since kernel programming required, which is not as well-known as sockets.
Similarly, harder to deploy. APIs are OS-specific (Linux, BSD, NT), whereas sockets & threads are (mostly) standardized.
HTTP evolving over time, have to modify kernel code in response.
So What’s the Performance?
Graph shows server throughput for Tux, Flash, and Apache.
Experiments done on 400 MHz P/II, gigabit Ethernet, Linux 2.4.9-ac10, 8 client machines, WaspClient workload generator
Tux is fastest, but Flash close behind
Summary: Server Architectures
Many ways to code up a server
Tradeoffs in speed, safety, robustness, ease of programming and extensibility, etc.
Multiple servers exist for each kind of model
Not clear that a consensus exists.
Better case for in-kernel servers as devices
e.g. reverse proxy accelerator, Akamai CDN node
User-space servers have a role:
OS should provides proper primitives for efficiency
Leave HTTP-protocol related actions in user-space
In this case, event-driven model is attractive
Key pieces to a fast event-driven server:
Efficient event notification mechanism
Chapter 5: Workload Characterization
Why Characterize Workloads?
Gives an idea about traffic behavior
("Which documents are users interested in?")
Aids in capacity planning
("Is the number of clients increasing over time?")
Aids in implementation
("Does caching help?")
How do we capture them ?
Through server logs (typically enabled)
Through packet traces (harder to obtain and to process)
Factors to Consider
Where do I get logs from?
Client logs give us an idea, but not necessarily the same
Same for proxy logs
What we care about is the workload at the server
Is trace representative?
Corporate POP vs. News vs. Shopping site
What kind of time resolution?
e.g., second, millisecond, microsecond
Does trace/log capture all the traffic?
e.g., incoming link only, or one node out of a cluster
client? proxy? server?
Lots of variability in workloads
Use probability distributions to express
Want to consider many factors
Mean: average of samples
Median : half are bigger, half are smaller
Percentiles: dump samples into N bins
(median is 50th percentile number)
Some Frequently-Seen Distributions:
(avg. sigma, variance mu)
(x >= 0; sigma > 0)
(x >= 0)
(x >= k, shape a, scale k)
Graph shows 3 distributions with average = 2.
Note average median in some cases !
Different distributions have different “weight” in tail.
What Info is Useful?
GET, POST, HEAD, etc.
success, failure, not-modified, etc.
Size of requested files
Size of transferred objects
Popularity of requested files
Numbers of embedded objects
Inter-arrival time between requests
Protocol support (1.0 vs. 1.1)
Sample Logs for Illustration We’ll use statistics generated from these logs as examples. 12,445,739 11,485,600 5,800,000 1,586,667 Hits: 28,804,852 54,697,108 10,515,507 14,171,711 Bytes: 319,698 86,0211 80,921 256,382 Clients: Corporate Presence Corporate Presence Nagano 1998 Olympics Event Site Kasparov-Deep Blue Event Site Description: 1 day in Feb 2001 1 day in June 1998 2 days in Feb 1998 2 weeks in May 1997 Period: 42,874 15,788 30,465 2,293 URLS: IBM 2001 IBM 1998 Olympics 1998 Chess 1997 Name:
KR01: "overwhelming majority" are GETs, few POSTs
IBM2001 trace starts seeing a few 1.1 methods (CONNECT, OPTIONS, LINK), but still very small (1/10^5 %)
noise noise noise noise Others: 00.2% 00.02% 00.04 % 00.007% POST 02% 00.08% 00.3 % 04% HEAD 97% 99.3% 99.6% 96% GET IBM 2001 IBM 1998 Olympics 1998 Chess 1997