Scalability & Availability Paul Greenfield
Building Real Systems Scalable Handle  expected  load with acceptable levels of performance Grow easily when load grows Available Available  enough  of the time Performance and availability cost Aim for ‘enough’ of each but not more Have to be ‘architected’ in… not added
Scalable Scale-up or…  Use bigger and faster systems  …  Scale-out Systems working together to handle load Server farms Clusters Implications for application design Especially state management And availability as well
Available Goal is 100% availability 24x7 operations  Including time for maintenance Redundancy is the key to availability No single points of failure Spare everything Disks, disk channels, processors, power supplies, fans, memory, .. Applications, databases, …  Hot standby, quick changeover on failure
Performance How fast is this system?  Not the same as scalability but related  Measured by response time and throughput How scalable is this system?  Scalability is concerned with the upper limits to performance How big can it grow? How does it grow? (evenly, lumpily?)
Performance Measures Response time What delay does the user see? Instantaneous is good  95% under 2 seconds is acceptable? Consistency is important psychologically Response time varies with ‘heaviness’ of transactions Fast read-only transactions Slower update transactions Effects of resource/database contention
Response Time Each transaction takes… Processor time Application, system services, database, … Shared amongst competing processes I/O time  Largely disk reads/writes Large DB caches reduce # of I/Os 2TB in IBM’s top TPCC entry Wait time for shared resources Locks, shared structures, …
Response Times
Response Times
Response Times
Throughput How many transactions can be handled in some period of time Transactions/second or tpm, tph or tpd A measure of overall capacity Inverse of response time Transaction Processing Council Standard benchmarks for TP systems www.tpc.org TPC-C models typical transaction system Current record is  4,092,799  tpmc (HP) TPC-E approved as TPC-C replacement (2/07)
Throughput Increases until resource saturation Start waiting for resources Processor, disk & network bandwidth Increasing response time with load Slowly decreases with contention Overheads of sharing, interference Some resources share/overload badly Contention for shared locks Ethernet network performance degrades Disk degrades with sharing
Throughput
System Capacity? How many clients can you support? Name an acceptable response time Average 95% under 2 secs is common And what is ‘average’? Plot response time vs # of clients Great if you can run benchmarks Reason for prototyping and proving proposed architectures before leaping into full-scale implementation
System Capacity
Scaling Out More boxes at every level Web servers (handling user interface) App servers (running business logic) Database servers (perhaps… a bit tricky?) Just add more boxes to handle more load Spread load out across boxes Load balancing  at every level Partitioning or replication for database? Impact on application design? Impact on system management All have impacts on architecture & operations
Scaling Out
‘ Load Balancing’ A few different but related meanings Distributing  client bindings  across servers or processes Needed for  stateful  systems Static allocation of client to server Balancing  requests  across server systems or processes Dynamically  allocating requests to servers Normally only done for stateless systems
Static Load Balancing Client Client Client Name Server Server process Load balancing across application process instances within a server  Server process Advertise service Request server reference Return server reference Call server object’s methods Get server object reference
Load Balancing in CORBA Client calls on  name server  to find the location of a suitable server CORBA terminology for object directory Name server can spread client objects across multiple servers Often ‘round robin’ Client is bound to server and stays bound forever Can lead to performance problems if server loads are unbalanced
Name Servers Server processes call name server as part of their initialisation Advertising their services/objects Clients call name server to find the location of a server process/object Up to the name server to match clients to servers Client then directly calls server process to create or link to objects  Client-object binding usually static
Dynamic Stateful? Dynamic load balancing with stateful servers/objects? Clients can throw away server objects and get new ones every now and again In application code or middleware  Have to save & restore state Or object replication in middleware Identical copies of objects on all servers Replication of changes between servers Clients have references to all copies
BEA WLS Load Balancing Clients Clients DBMS MACHINE B MACHINE A EJB Cluster HeartBeat via Multicast backbone EJB Servers instances EJB Servers instances
Threaded Servers No need for load-balancing within a single system Multithreaded server process Thread pool servicing requests All objects live in a single process space Any request can be picked up by any thread Used by modern app servers
Threaded Servers Client Client Client App  DLL COM+ COM+ process Thread pool Shared object space Application code COM+ using thread pools rather than load balancing within a single system
Dynamic Load Balancing Dynamically balance load across servers Requests from a client can go to any server Requests dynamically routed Often used for Web Server farms IP sprayer (Cisco etc) Network Load Balancer etc Routing decision has to be fast & reliable Routing in main processing path Applications normally stateless
Web Server Farms Web servers are highly scalable Web applications are normally stateless Next request can go to any Web server State comes from client or database Just need to spread incoming requests IP sprayers (hardware, software) Or >1 Web server looking at same IP address with some coordination
Clusters A group of independent computers acting like a single system Shared disks Single IP address Single set of services Fail-over to other members of cluster Load sharing within the cluster DEC, IBM, MS, …
Clusters Client PCs Server A Server B Disk cabinet A Disk cabinet B Heartbeat Cluster management
Clusters Address scalability Add more boxes to the cluster Replication or shared storage Address availability Fail-over Add & remove boxes from the cluster for upgrades and maintenance Can be used as one element of a highly-available system
Scaling State Stores? Scaling stateless logic is easy … but how are state stores scaled? Bigger, faster box (if this helps at all) Could hit lock contention or I/O limits Replication Multiple copies of shared data Apps access their own state stores Change anywhere & send to everyone
Scaling State Stores Partitioning Multiple servers, each looking after a part of the state store Separate customers A-M & N-Z Split customers according to state  Preferably transparent to apps e.g. SQL/Server partitioned views Or combination of these approaches
Scaling Out Summary Districts 11-20 Districts 1-10 Web  server farm (Network Load  Balancing) Application farm (Component Load  Balancing) Database servers (Cluster Services  and partitioning) UI tier Business tier Data tier
Scale-up No need for load-balancing  Just use a bigger box Add processors, memory, …. SMP (symmetric multiprocessing) May not fix problem! Runs into limits eventually Could be less available What happens on failures? Redundancy? Could be easier to manage
Scale-up eBay example  Server farm of Windows boxes (scale-out) Single database server (scale-up) 64-processor SUN box (max at time) More capacity needed? Easily add more boxes to Web farm Faster DB box? (not available) More processors? (not possible) Split DB load across multiple DB servers? See eBay presentation…
Available System Web Clients Web Server farm Load balanced using WLB App Servers farm using COM+ LB Database installed on cluster  for high availability
Availability How much? 99% 87.6 hours a year 99.9% 8.76 hours a year 99.99% 0.876 hours a year Need to consider operations as well Not just faults and recovery time Maintenance, software upgrades, backups, application changes
Availability Often a question of application design Stateful vs stateless What happens if a server fails? Can requests go to any server? Synchronous method calls or asynchronous messaging? Reduce dependency between components Failure tolerant designs And manageability decisions to consider
Redundancy=Availability Passive or active standby systems Re-route requests on failure Continuous service (almost) Recover failed system while alternative handles workload May be some hand-over time (db recovery?) Active standby & log shipping reduce this At the expense of 2x system cost…  What happens to in-flight work? State recovers by aborting in-flight ops & doing db recovery but …
Transaction Recovery Could be handled by middleware Persistent queues of accepted requests Still a failure window though Large role for client apps/users Did the request get lost on failure? Retry on error? Large role for server apps What to do with duplicate requests? Try for idempotency (repeated txns OK) Or track and reject duplicates
Fragility Large, distributed, synchronous systems are  not  robust Many independent systems & links…  Everything always has to be working Rationale for Asynchronous Messaging Loosen ‘coupling’ between components Rely on guaranteed delivery instead May just defer error handling though Could be much harder to handle later To be discussed next time…
Example

ScalabilityAvailability

  • 1.
  • 2.
    Building Real SystemsScalable Handle expected load with acceptable levels of performance Grow easily when load grows Available Available enough of the time Performance and availability cost Aim for ‘enough’ of each but not more Have to be ‘architected’ in… not added
  • 3.
    Scalable Scale-up or… Use bigger and faster systems … Scale-out Systems working together to handle load Server farms Clusters Implications for application design Especially state management And availability as well
  • 4.
    Available Goal is100% availability 24x7 operations Including time for maintenance Redundancy is the key to availability No single points of failure Spare everything Disks, disk channels, processors, power supplies, fans, memory, .. Applications, databases, … Hot standby, quick changeover on failure
  • 5.
    Performance How fastis this system? Not the same as scalability but related Measured by response time and throughput How scalable is this system? Scalability is concerned with the upper limits to performance How big can it grow? How does it grow? (evenly, lumpily?)
  • 6.
    Performance Measures Responsetime What delay does the user see? Instantaneous is good 95% under 2 seconds is acceptable? Consistency is important psychologically Response time varies with ‘heaviness’ of transactions Fast read-only transactions Slower update transactions Effects of resource/database contention
  • 7.
    Response Time Eachtransaction takes… Processor time Application, system services, database, … Shared amongst competing processes I/O time Largely disk reads/writes Large DB caches reduce # of I/Os 2TB in IBM’s top TPCC entry Wait time for shared resources Locks, shared structures, …
  • 8.
  • 9.
  • 10.
  • 11.
    Throughput How manytransactions can be handled in some period of time Transactions/second or tpm, tph or tpd A measure of overall capacity Inverse of response time Transaction Processing Council Standard benchmarks for TP systems www.tpc.org TPC-C models typical transaction system Current record is 4,092,799 tpmc (HP) TPC-E approved as TPC-C replacement (2/07)
  • 12.
    Throughput Increases untilresource saturation Start waiting for resources Processor, disk & network bandwidth Increasing response time with load Slowly decreases with contention Overheads of sharing, interference Some resources share/overload badly Contention for shared locks Ethernet network performance degrades Disk degrades with sharing
  • 13.
  • 14.
    System Capacity? Howmany clients can you support? Name an acceptable response time Average 95% under 2 secs is common And what is ‘average’? Plot response time vs # of clients Great if you can run benchmarks Reason for prototyping and proving proposed architectures before leaping into full-scale implementation
  • 15.
  • 16.
    Scaling Out Moreboxes at every level Web servers (handling user interface) App servers (running business logic) Database servers (perhaps… a bit tricky?) Just add more boxes to handle more load Spread load out across boxes Load balancing at every level Partitioning or replication for database? Impact on application design? Impact on system management All have impacts on architecture & operations
  • 17.
  • 18.
    ‘ Load Balancing’A few different but related meanings Distributing client bindings across servers or processes Needed for stateful systems Static allocation of client to server Balancing requests across server systems or processes Dynamically allocating requests to servers Normally only done for stateless systems
  • 19.
    Static Load BalancingClient Client Client Name Server Server process Load balancing across application process instances within a server Server process Advertise service Request server reference Return server reference Call server object’s methods Get server object reference
  • 20.
    Load Balancing inCORBA Client calls on name server to find the location of a suitable server CORBA terminology for object directory Name server can spread client objects across multiple servers Often ‘round robin’ Client is bound to server and stays bound forever Can lead to performance problems if server loads are unbalanced
  • 21.
    Name Servers Serverprocesses call name server as part of their initialisation Advertising their services/objects Clients call name server to find the location of a server process/object Up to the name server to match clients to servers Client then directly calls server process to create or link to objects Client-object binding usually static
  • 22.
    Dynamic Stateful? Dynamicload balancing with stateful servers/objects? Clients can throw away server objects and get new ones every now and again In application code or middleware Have to save & restore state Or object replication in middleware Identical copies of objects on all servers Replication of changes between servers Clients have references to all copies
  • 23.
    BEA WLS LoadBalancing Clients Clients DBMS MACHINE B MACHINE A EJB Cluster HeartBeat via Multicast backbone EJB Servers instances EJB Servers instances
  • 24.
    Threaded Servers Noneed for load-balancing within a single system Multithreaded server process Thread pool servicing requests All objects live in a single process space Any request can be picked up by any thread Used by modern app servers
  • 25.
    Threaded Servers ClientClient Client App DLL COM+ COM+ process Thread pool Shared object space Application code COM+ using thread pools rather than load balancing within a single system
  • 26.
    Dynamic Load BalancingDynamically balance load across servers Requests from a client can go to any server Requests dynamically routed Often used for Web Server farms IP sprayer (Cisco etc) Network Load Balancer etc Routing decision has to be fast & reliable Routing in main processing path Applications normally stateless
  • 27.
    Web Server FarmsWeb servers are highly scalable Web applications are normally stateless Next request can go to any Web server State comes from client or database Just need to spread incoming requests IP sprayers (hardware, software) Or >1 Web server looking at same IP address with some coordination
  • 28.
    Clusters A groupof independent computers acting like a single system Shared disks Single IP address Single set of services Fail-over to other members of cluster Load sharing within the cluster DEC, IBM, MS, …
  • 29.
    Clusters Client PCsServer A Server B Disk cabinet A Disk cabinet B Heartbeat Cluster management
  • 30.
    Clusters Address scalabilityAdd more boxes to the cluster Replication or shared storage Address availability Fail-over Add & remove boxes from the cluster for upgrades and maintenance Can be used as one element of a highly-available system
  • 31.
    Scaling State Stores?Scaling stateless logic is easy … but how are state stores scaled? Bigger, faster box (if this helps at all) Could hit lock contention or I/O limits Replication Multiple copies of shared data Apps access their own state stores Change anywhere & send to everyone
  • 32.
    Scaling State StoresPartitioning Multiple servers, each looking after a part of the state store Separate customers A-M & N-Z Split customers according to state Preferably transparent to apps e.g. SQL/Server partitioned views Or combination of these approaches
  • 33.
    Scaling Out SummaryDistricts 11-20 Districts 1-10 Web server farm (Network Load Balancing) Application farm (Component Load Balancing) Database servers (Cluster Services and partitioning) UI tier Business tier Data tier
  • 34.
    Scale-up No needfor load-balancing Just use a bigger box Add processors, memory, …. SMP (symmetric multiprocessing) May not fix problem! Runs into limits eventually Could be less available What happens on failures? Redundancy? Could be easier to manage
  • 35.
    Scale-up eBay example Server farm of Windows boxes (scale-out) Single database server (scale-up) 64-processor SUN box (max at time) More capacity needed? Easily add more boxes to Web farm Faster DB box? (not available) More processors? (not possible) Split DB load across multiple DB servers? See eBay presentation…
  • 36.
    Available System WebClients Web Server farm Load balanced using WLB App Servers farm using COM+ LB Database installed on cluster for high availability
  • 37.
    Availability How much?99% 87.6 hours a year 99.9% 8.76 hours a year 99.99% 0.876 hours a year Need to consider operations as well Not just faults and recovery time Maintenance, software upgrades, backups, application changes
  • 38.
    Availability Often aquestion of application design Stateful vs stateless What happens if a server fails? Can requests go to any server? Synchronous method calls or asynchronous messaging? Reduce dependency between components Failure tolerant designs And manageability decisions to consider
  • 39.
    Redundancy=Availability Passive oractive standby systems Re-route requests on failure Continuous service (almost) Recover failed system while alternative handles workload May be some hand-over time (db recovery?) Active standby & log shipping reduce this At the expense of 2x system cost… What happens to in-flight work? State recovers by aborting in-flight ops & doing db recovery but …
  • 40.
    Transaction Recovery Couldbe handled by middleware Persistent queues of accepted requests Still a failure window though Large role for client apps/users Did the request get lost on failure? Retry on error? Large role for server apps What to do with duplicate requests? Try for idempotency (repeated txns OK) Or track and reject duplicates
  • 41.
    Fragility Large, distributed,synchronous systems are not robust Many independent systems & links… Everything always has to be working Rationale for Asynchronous Messaging Loosen ‘coupling’ between components Rely on guaranteed delivery instead May just defer error handling though Could be much harder to handle later To be discussed next time…
  • 42.