SEDA: An Architecture for Well-
Conditioned, Scalable Internet Services
by Welsh, Culler, and Brewer
UC San Diego
CSE 294
Winter Quarter 2007
Barry Demchak
2
The Objective
 Services that support millions of users
 Responsive
 Robust
 Highly available
 Support large swings in load over time (100x)
 General purpose mechanisms
3
A Web Application Example: DNS
 Verisign’s root servers
 Two 10-high stacked 1U IBM eServers
running Solaris and Red Hat Linux
 26 billion requests/day (normal traffic) or
300K-500K requests per second
 In 2000, there were 1 billion requests/day
 In 2010, predicting 200 billion requests/day
-- InfoWorld, February 16, 2007, DNS Attack
Puts Web Security in Perspecitive,
Roger Grimes
4
SEDA
 Problem Background
 Survey of existing alternatives
 SEDA design
 Implementation results
 Applicability to ESBs (e.g., Mule)
 Applicability to THE picture
5
The Environment
 Static content becoming dynamic content
 Rapidly changing service logic
 Hosting on general purpose platforms
6
Non-Solutions
 Replication
 Traditional operating systems
 Traditional concurrency
7
Solution
 SEDA = Staged Event-Driven Architecture
 Applications decompose to stages
 Stages have incoming event queues
 Dynamic resource throttling
8
Definitions
 Well-conditioned
 Behaves like a simple pipeline: output rate
scales to input rate
 Excessive demand does not degrade pipeline
throughput
 Graceful degradation
 Under load, response time degrades linearly
with length of queue
 Degraded response time constant for all
clients (subject to service policies)
9
The Existing Choices
 Thread-based Concurrency
 Bounded Thread Pools
 Event-driven Concurrency
 Structured Event Queues
10
Thread-based Concurrency
 Thread-per-request model (RPC, JMI,
DCOM)
11
Thread-based Performance
12
Bounded Thread Pools
 Initial thread pool subject to expansion
 Fixed thread pools cause unfairness to clients
because of long waits when threads are
blocked
 Policies of reordering/prioritizing threads
based on expense of request are difficult
13
Event-driven Concurrency
 Queue-based scheduling of threads
 Threads execute finite state machines
14
Event-driven Performance
15
Structured Events Queues
 Sets of event queues
16
SEDA Goals
 Support massive concurrency
 Simplify construction of well-conditioned
services
 Enable introspection
 Support self-tuning resource management
17
SEDA Terms
 Stage = fundamental processing unit
 Event handler
 Incoming event queue
 Thread pool
 Controller = scheduler for a stage
18
A SEDA Application – Web server
19
Dynamic Resource Controllers
 Shields programmers from performance
tuning
 Observes stage’s runtime characteristics, and
adjusts allocation and scheduling parameters
to meet performance targets
20
Dynamic Resource Controllers
21
Thread Pool Controller Performance
22
Batching Controller Performance
23
Implementation: Sandstorm
 Internet-based service platform
 Each application implements handleEvents()
 APIs for:
 Naming, creating, destroying stages
 Performing queue operations
 Controlling queue thresholds
 Profiling and debugging
 Asynchronous I/O package
 7900 lines of Java
24
Asynchronous I/O
25
Asynchronous I/O Performance
26
Application and Evaluation
 Haboob: a high performance HTTP server
 Benchmarks
 Architecture
 Adaptive load shedding
 Gnutella packet router
 Architecture
27
Haboob Throughput
28
Haboob Response Time
29
SEDA Summary
 Establishes principles toward Internet-style
operating environments
 Stages ease concurrency complexity and
encourage modularity
 Dynamic controllers enable novel scheduling
and resource management strategies
 Challenges: detecting cause of overload
conditions, and control strategy to cure
overload
30
Mule
 From “Implementing an ESB using Mule” by
Ross Mason for JavaZone 2005
http://mule.mulesource.org/wiki/download/attachments/223/javazone-2005-mule-real-world-old.ppt?version=1
 ESB
 Loosely coupled components
 Event driven
 Highly distributed
 Intelligent Routing
 Data transformation
 Multiprotocol message bus …
31© 2005
Mule Topologies
Enterprise Service Bus
Client/Server and Hub n' Spoke
Peer Network
Pipeline
Enterprise Service Network
32
Messaging – JMS Style
 JMS = Java Messaging Service
 Connection-oriented API exposes point-to-
point and publish/subscribe communications
 Queue-oriented
33© 2005
Loan Broker Design
34© 2005
Design With Mule

Seda an architecture for well-conditioned scalable internet services

Editor's Notes

  • #3 Internet hits translate to several I/O and network requests -> enormous load on underlying resources As of 2001: Yahoo gets 1.2 billion hits per day, AOL gets over 10 billion page views per day
  • #5 SEDA = Staged Event-Driven Architecture
  • #6 Static -> dynamic: Extensive computation and I/O Changing service logic: increasing engineering complexity and deployment General purpose platforms: Not specially engineered platforms
  • #7 Replication – cannot scale to orders of magnitude Traditional OS and concurrency – brittle under large loads Traditional OS focuses on giving providing transparency through virtual machines. Thread switching has high overhead, threads have large memory overhead. Internet applications need massive concurrency and better control over resource usage … better control makes a big difference at the margin of excessive load.
  • #8 Stages are robust building blocks subject to thresholding and filtering according to load Allows informed scheduling and resource-management decisions, including request reordering, filtering, and aggregation. Dynamic resource throttling allows control over resource allocation and scheduling of components Demonstration: High performance HTTP server (… remember CSE222a?? … multiple threads vs Select statement?)
  • #9 Well-conditioned: output latency is determined by the length of the queue/pipeline This property holds regardless of the number of stages in the pipeline, subject to queueing/dequeing times In a non-pipelined design, clients wait for entire operation to complete before moving on to next one.
  • #12 Overheads: cache and TLB misses, scheduling overhead, and lock contention Threads = multiprogramming … virtualization hides global resource management SPIN, Exokernel, Nemesis = examples of OS attempts to solve this
  • #13 Apache, IIS, Netscape Enterprise Server, BEA Weblogic, IBM WebSphere Bad case: cached static pages (cheap) vs large pages not in cache (expensive)
  • #14 Flash, thttpd, Zeus, JAWS Complex scheduler … hard to maintain … complex FSM maintenance, too … modularity difficult to achieve Needs helper threads that do blocked I/O Has well-conditioned, graceful degradation --
  • #16 Sets of event queues … promote modularity Seems to be a generic class of solutions
  • #17 Support massive concurrency = event-driven execution where possible Simplify construction of well-conditioned services = shields the application from details of scheduling and resource management … supports modular construction, support for debugging and profiling Enable introspection = applications analyze the stream to adapt behavior to load … prioritize and filter services to support degraded service under load Support self-tuning resource management = tune resource parameters to load … e.g., allocate threads to a stage based on load instead of hard-coding it apriori
  • #18 Threads pull events from queue, schedule events on downstream queues, and waits for more Controller adjusts resource allocation and scheduler dynamically Note that threads are stage resources, not task resources. Thread count can be dynamically allocated based on load. Stages can run in serial or parallel, or both. Event handler can implement its own scheduling policy irrespective of how the system filled the queue
  • #19 Set of stages separated by queues Private thread pool per stage Each state can be independently managed Stages can be run in serial or parallel Each stage can be independently load conditioned … more threads for heavy loads (thresholding) Important point: queues can be finite, which means that a stage can fail to queue an event … meaning next stage is very busy. Stage can make a decision to block (backpressure), or drop event (load shedding) and take some remedial action Question: should two modules communicate via method call or queue? A queue system promotes modularity, isolation, and load management … cost: latency Important point: debugging, billing, memory usage, and queue profiling can occur by attaching processors between stages and queues
  • #21 Thread pool controller Adjusts the number of threads executing within each stage Periodically examines queue and adds threads if queue exceeds some threshold … or removes them if they’re idle for some amount of time Batching controller Adjusts the number of events processed by each invocation of the event handler (i.e., a batching factor) This increases throughput due to cache locality Tries to strike a balance … large batching factors can degrade *overall* performance … so try to select a small batching factor … kind of like a PLL *** Controllers are a great way of enforcing performance policy ***
  • #22 This is for the Haboob web server Thread pool adjusted based on length of corresponding queue Queue length sampled every 2 seconds, thread added to pool if queue exceeded 100 entries … max 20 threads Threads are removed from the pool idle for more than 5 seconds AsyncFile used a threshold of 10 queue entries to exaggerate the behavior
  • #23 Shows single stage generating events at an oscillating rate When the output rate increases, controller decreases batching factor … and visa versa *** The point: controllers allow application to adapt to changing conditions regardless of the particular algorithms used by the operating system … or the application ***
  • #25 They’re eating their own dog food. Completed I/O goes back into the next stage’s queue
  • #26 Clients issue bursts of 8KB packets Server responds with ACK for every 1000 packets Assume GB ethernet and all Linux boxes Slight degradation for SEDA is due to non-scalability in Linux network stack Threaded implementation stops at 512 connections due to Linux thread limitations SEDA implementation had 120 threads to handle socket writes.
  • #29 Apache and Flash often perform faster, but the tails are fierce. Haboob gives consistent response time under load
  • #31 Highly Scalable!