Things you should know about Scalability!
Upcoming SlideShare
Loading in...5
×
 

Things you should know about Scalability!

on

  • 1,990 views

Delivering architecture@internet-scale has several challenges to be solved to be ready for extreme scalable architectures. This session is about the art of scale, scalability, and scaling of web ...

Delivering architecture@internet-scale has several challenges to be solved to be ready for extreme scalable architectures. This session is about the art of scale, scalability, and scaling of web architectures. It will give an overview of challenges, good practices and solutions to achieve high scalability for web-based systems.

Statistics

Views

Total Views
1,990
Views on SlideShare
1,987
Embed Views
3

Actions

Likes
1
Downloads
88
Comments
0

1 Embed 3

http://a0.twimg.com 3

Accessibility

Upload Details

Uploaded via as Adobe PDF

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

Things you should know about Scalability! Things you should know about Scalability! Presentation Transcript

  • Things you should know about Scalability! WJAX 2011, 08.11.2011, Munich Robert MedererCopyright © 2011 Accenture All Rights Reserved. Accenture, its logo, and High Performance Delivered are trademarks of Accenture.
  • AbstractThings you should know about Scalability! Delivering architecture@internet-scale has several challenges to be solved to be ready for extreme scalable architectures. This session is about the art of scale, scalability, and scaling of web architectures. It will give an overview of challenges, good practices and solutions to achieve high scalability for web-based systems.Copyright © 2011 Accenture. All rights reserved. 2
  • Who am I? Experience Robert Mederer Lead Architecture & Execution 2000 - 2005: Technology Architect and Software Anni-Albers-Straße 11 Engineer in several projects 80807 München 2006: Technical Architecture Lead, Integration Mobil: +49-175-57-68012 and Execution Architecture for Location-Based Service Provider robert.mederer@accenture.com 2009: Technical Architecture Lead, Frontend and Execution Architecture for a Government Agency 2009/2010: Technical Architecture and front- office integration build lead, Integration and Execution Architecture Financial Services Agency 2011: Architect and QA for Location Based Services PlatformCopyright © 2011 Accenture. All rights reserved. 3
  • Accenture High performance achieved Company Profile Worldwide Revenues $25.5 billion •  Global management consulting, (in US$ billion, as of August 31, 2011) technology services and Communications outsourcing company Resources & High Tech •  236.000 employees •  Rank 47 among the “Best Global Brands 2008” •  Top 100 Employer •  28 of the DAX-30-Companies Public Financial Service •  96 of the Fortune-Global-100 Services •  More than three-quarters of the Fortune-Global-500 Products •  87 of our Top 100-clients have been with us for 10 or more yearsCopyright © 2011 Accenture. All rights reserved. 4
  • Local Accenture … ??? Geographic unit •  Austria •  Switzerland •  Germany Employees Berlin •  >6000 Düsseldorf •  We are hiring! Exciting Technology work Frankfurt •  Large scale projects Erlangen/ (100+ people / multiple years) Nürnberg •  Most challenging requirements Munich –  Stock Exchange / Banking / Trading Systems Vienna –  AEMS Mobility Platform –  Large Scale Web Applications Zurich (> 1M page views / day) –  Batch ArchitecturesCopyright © 2011 Accenture. All rights reserved. 5
  • Agenda•  Introduction•  Case Study•  Solution and Good Practice•  Further Topics•  ConclusionCopyright © 2011 Accenture. All rights reserved. 6
  • Agenda•  Introduction•  Case Study•  Solution and Good Practice•  Further Topics•  ConclusionCopyright © 2011 Accenture. All rights reserved. 7
  • IntroductionHigh Scalability / OverloadSource: EzprezzoCopyright © 2011 Accenture. All rights reserved. 8
  • Introduction | QuestionAudience?Who are You? How large is your total – Developers, database? – Architects, – < 10 GB? – IT Manager – 10 GB-100 GB? – 100GB-1TB?How large are your – 1TB-10 TB? application (QPS)? – 10TB+? – 10-100? – 100-1000? – 1000-10000? – 10000+?Copyright © 2011 Accenture. All rights reserved. 9
  • Introduction | What is Performance?How do I know if I have a performance problem? If your system is slow for a single userCopyright © 2011 Accenture. All rights reserved. 10
  • Introduction | What is Scalability?How do I know if I have a scalability problem? If your system is fast for a individual user but slow under high loadCopyright © 2011 Accenture. All rights reserved. 11
  • Introduction | What is Performance?Non-Functional TestingPerformance Testing of Web based systemsDefinition•  Performance testing is defined as the technical investigation to determine or validate the speed, scalability, and/or stability characteristics of the web based system under test.•  Performance-related activities, such as testing and tuning, are concerned with achieving response times, throughput, and resource-utilization levels that meet the performance objectives for the application (SLA) under test.Key Types of Performance Testing:Performance Load Testing Stress Testing Capacity TestingTesting“Will it be fast "Will it support all "What happens if "What do I needenough?“ of my clients?“ something goes to plan for when I wrong?" get moreSource: Thomas Werft, Performance Engineer at Accenture customers?“Copyright © 2011 Accenture. All rights reserved. 12
  • Introduction | What is Performance?Non-Functional TestingPerformance Testing of Web based systemsKey performance indicators:Criteria KPI DescriptionResponse Time Average An average is a value found by adding all of the numbers in a(first / last byte in ms) set together and then dividing them by the quantity of numbers in the set Percentile (Target A percentile is a measure that tells us what percent of the total 98%) frequency scored at or below that measure. Median A median is simply the middle value in a data set when sequenced from lowest to highest.Throughput (QPS) Requests per Throughput is the number of units of work that can be handled Second; per unit t of time; for instance, requests per second, calls per Transaction per day, hits per second, reports per year, etc. SecondResource Utilization Processor; Resource utilization is the cost of the project in terms of system Memory; Disk I/O; resources. Network I/O Utilization is the percentage of time that a resource is busy servicing user requests. The remaining percentage of time is considered idle time. Results are used for Performance Engineering, Performance TuningSource: Thomas Werft, Performance Engineer at AccentureCopyright © 2011 Accenture. All rights reserved. 13
  • Introduction | What is Scalability?ScalabilityDefinition A system’s capacity to uphold the same performance under heavier volumes.Source: Patterns for Performance and Operability: Building and Testing Enterprise Software, Chris Ford et. al., 2008Copyright © 2011 Accenture. All rights reserved. 14
  • Introduction | What is Scalability?Vertical ScalabilityIs achieved by increasing the capacity of a single node•  CPU,•  Memory,•  Bandwidth, …Simple Process•  Application is generally not affected by those changesClassical Example are SuperComputers like•  HP Integrity Superdome•  IBM Mainframe Source: Hewlett-PackardCopyright © 2011 Accenture. All rights reserved. 15
  • Introduction | What is Scalability?Horizontal Scalability•  Application is spread on a cluster with several nodes•  Nodes can be added to scale out Produces overhead -  Keep cluster consistent -  Node error detection and handling -  Communication between nodes• May be used to increase reliability and availability•  Distributed Systems and Programs like – SETI@Home – World Wide Web – Domain Name Service Source: Space Sciences Laboratory, U.C. BerkeleyCopyright © 2011 Accenture. All rights reserved. 16
  • Introduction | Scalability Trade-Offs | Availability vs. ConsistencyCAP Theorem (Brewer‘s Theorem) • Consistency – all clients see the same data at the same time Consistency • Availability – all clients can find all data even in presence of failure • Partition Tolerance – system Partition works even when one node Availability Tolerance failed Impossible Source: PODC-keynote, Towards Robust Distributed Systems, Dr. Eric A. Brewer, 2000Copyright © 2011 Accenture. All rights reserved. 17
  • Introduction | Scalability Trade-Offs | Availability vs. ConsistencyCAP TheoremNormally, two of these properties for any shared-datasystem C Consistency + Availability •  High data integrity P A •  Single site, cluster database, LDAP, etc. •  2-phase commit, data replication, etc. C Consistency + Partition •  Distributed database, distributed locking, etc. P A •  Pessimistic locking, etc. Availability + Partition C •  High scalability P A •  Distributed cache, DNS, etc. •  Optimistic locking, expiration/leases (timeout), etc.Source: “Architecting Cloudy Applications”, David ChouCopyright © 2011 Accenture. All rights reserved. 18
  • Introduction | Scalability Trade-Offs | Availability vs. ConsistencyData and Scalability Distributed Non- Available & Partition Tolerant Relational data Consistent & Available •  Cassandra •  RDBMSs store solutions must relax •  SimpleDB Consistency (MySQL, •  CouchDB Postgres, etc.) guarantees around •  Riak •  Greenplum consistency, •  Dynamo •  Vertica partition tolerance •  Voldemort and availability, •  Tokyo resulting in Cabinet systems optimized •  KAI for different combinations Partition Availability of properties. Tolerance Data Models Key: Consistent & Partition Tolerant Relational (comparison) •  BigTable •  Scalaris Key-Value •  HyperTable •  BerkeleyDB Column-Oriented •  Hbase •  MemcacheDB Document-Oriented •  MongoDB •  RedisSource: Visual Guide to NoSQL Systems, http://blog.nahurst.com/tag/cap •  TerrastoreCopyright © 2011 Accenture. All rights reserved. 19
  • Introduction | Scalability Trade-Offs | Availability vs. ConsistencyData and ScalabilityAnalysis and ClassificationCopyright © 2011 Accenture. All rights reserved. 20
  • Introduction | Scalability Trade-Offs | Availability vs. ConsistencyData and ScalabilityACID - Do I really need it?Relational databases were originally designed for transactional data processing– reliably processing and maintaining data integrity – on different HW architectures.In order to guarantee transactional integrity, the traditional relational databasemanagement system (RDBMS) was architected to guarantee four core properties:Atomicity, Consistency, Isolation and Durability (ACID). Atomicity Consistency A database is said to be atomic if when one if the database remains in a consistent state part of the transaction fails, the entire after any transaction. Therefore, if a transaction fails and database state is left transaction violates the consistency of the unchanged. database (e.g. the value is not the right type) then the transaction should be rolled back. Durability Isolation A database is said to be durable if it recovers A database is said to be isolated if transactions all of the committed transactions in the system can’t have access to data currently being even after system failure. modified by another transaction.Copyright © 2011 Accenture. All rights reserved. 21
  • Introduction | Scalability Trade-Offs | Availability vs. ConsistencyBASEModern Internet systems: focused on BASE• Basically Available• Soft-state (or scalable)• Eventually consistentExample: Amazon outage in April 2010 brought thousandof customers down, including Pfizer, Netflix, Quora,Foursquare, Reddit, …• The Amazon.com 2010 Shareholder Letter Focusses on Technology • http://www.allthingsdistributed.com/2011/04/the_amazoncom_2010_shareholder.html• http://broadcast.oreilly.com/2011/04/the-aws-outage-the-clouds-shining-moment.html• http://www.nytimes.com/2011/04/23/technology/23cloud.html• http://www.allthingsdistributed.com/2007/12/eventually_consistent.html Dec. 2007Copyright © 2011 Accenture. All rights reserved. 22
  • Introduction | Scalability Trade-Offs | Availability vs. ConsistencyACID vs. BASE ACID BASE•  Strong consistency for transactions •  Availability and scaling highest highest priority priorities•  Availability less important •  Weak consistency•  Pessimistic •  Optimistic•  Complex mechanisms •  Simple and fastCopyright © 2011 Accenture. All rights reserved. 23
  • Introduction | Scalability Trade-Offs - Latency vs. ThroughputNetwork Latency vs. Throughput Network protocols has an inherent throughput bottleneck that becomes more severe with increased packet loss and latencySource: http://www.asperasoft.com/en/technology/shortcomings_of_TCP_2/the_shortcomings_of_TCP_file_transfer_2Copyright © 2011 Accenture. All rights reserved. 24
  • Introduction | Scalability and Edge ComputingEdge ComputingTransferring data or services from a centralized point to theedge of the network• Processing load is distributed• Closer to the user• Decreases latency• Lower cost of hardware• Increases service levels• Greater flexibility in responding to service requests• Seasonal spikes in demand can be off-loaded to other edge serversCopyright © 2011 Accenture. All rights reserved. 25
  • Introduction | CachingCaching and Types of CachesObject cache•  Store objects for the application to be reused•  Cache data from database or generated by application•  E.g. ehCache, memcached, etc.Application Cache•  Speed up performance or minimize resources used•  Proxy caching / Reverse proxy caching•  E.g. Squid, Varnish, etcContent Delivery Network (CDN)•  Faster response time and fewer requests on the origin servers•  Push content closer to end user•  E.g. Akamai, Savvis, Mirror Image Internet, Netscaler, Amazon CloudFoundry, etcCopyright © 2011 Accenture. All rights reserved. 26
  • Introduction | CachingCDNAbstract architecture of a Content Delivery Network (CDN)Source:Content Delivery Network (CDN) Research Directory, http://ww2.cs.mu.oz.au/~apathan/CDNs.htmlCopyright © 2011 Accenture. All rights reserved. 27
  • Introduction | CachingCDNBasic interaction flows in a CDN environmentSource: Basic interaction flows in a CDN environment, http://ww2.cs.mu.oz.au/~apathan/CDNs.htmlCopyright © 2011 Accenture. All rights reserved. 28
  • IntroductionBasicsLoad BalancingDefinition:• Methodology to distribute workload across multiple computers or a computer cluster, network links, central processing units, disk drives, or other resources, to achieve optimal resource utilization, maximize throughput, minimize response time, and avoid overload• Using multiple components with load balancing, instead of a single component, may increase reliability through redundancy. The load balancing service is usually provided by dedicated software or hardware, such as a multilayer switch or a Domain Name System server.Copyright © 2011 Accenture. All rights reserved. 29
  • IntroductionLoad Balancing (Major) Usage •  Distributing the load across multiple servers Server LB •  Target is to scale beyond the capacity of one server, and to tolerate a server failure. Global Server •  Directing users to different data center sites consisting of server farms •  Target is to provide users with fast response time and to tolerate a LB complete data center failure (availability, business continuity, disaster recovery, geographic routing) •  Distribute the load across multiple firewalls Firewall LB •  Target is to scale beyond the capacity of one firewall, and tolerate a firewall failure. Transparent •  Transparently directs traffic to caches to accelerate the response time for clients Cache Switching •  Or improve the performance of web servers by offloading the static content to caches.Source: Load Balancing Servers, Firewalls, and Caches by Chandra Kopparapu; John Wiley & Sons © 2002Copyright © 2011 Accenture. All rights reserved. 30
  • IntroductionBasicsLoad Balancing Algorithm’sRandom Allocation•  Pros: Simple to implement.•  Cons: Can lead to overloading of one server while under-utilization of others.Round-Robin Allocation•  Pros: Better than random allocation because the requests are equally divided among the available servers in an orderly fashion.•  Cons: Round robin algorithm is not enough for load balancing based on processing overhead required and if the server specifications are not identical to each other in the server group.Weighted Round-Robin Allocation•  Pros: Takes care of the capacity of the servers in the group.•  Cons: Does not consider the advanced load balancing requirements such as processing times for each individual request.Copyright © 2011 Accenture. All rights reserved. 31
  • IntroductionBasicsServer Load Balancing •  Hardware – Barracuda Networks – Cisco Systems – Citrix Systems – F5 Networks (BigIp) – Etc. •  Software – HAProxy Simple Load Balancing over DNS – Apache HTTP Server with (List of IP‘s with round robin) mod_proxy for Tomcat Does that work? – …Problem:• No real load balancing due to TTL of DNS• No health check for service availabilityCopyright © 2011 Accenture. All rights reserved. 32
  • Introduction | Load BalancingGlobal Server Load Balancing •  Functionality – DNS based routing – Based on IP GEO database (Geographic routing) – Assumption: Local DNS for client •  Provider – F5 Networks (Global Load Balancing Solutions) – UltraDNS (Traffic Controller Service) – Level3 (Traffic Manager,Copyright © 2011 Accenture. All rights reserved. BCDR Solution) 33
  • Introduction | Load BalancingGlobal Server Load BalancingCharacteristics / Usage•  Increase application availability in event of entire site failure or overload (Business Continuity, Disaster Recovery)•  Scale application performance by load balancing traffic across multiple sites (Edge Computing (together with CDN))•  Need for more granularity and control in directing Web traffic•  More flexibility in building and managing Internet infrastructures –  E.g. Site based downtime management during release upgrade•  Cons: Not always working! Due to assumption of a local DNS (Public DNS usage, DNS over VPN could fail to get the nearest server location) –  (see: http://www.royans.net/arch/fixing-gslb-global-server-load-balancing/)•  Fix: Google proposed a DNS enhancement to not use the DNS resolver IP further more the client / end-user IP (see: DNS resolver, http://googlecode.blogspot.com/2010/01/proposal-to-extend-dns-protocol.html )Copyright © 2011 Accenture. All rights reserved. 34
  • Agenda•  Introduction•  Case Study•  Solution and Good Practice•  Further Topics•  ConclusionCopyright © 2011 Accenture. All rights reserved. 35
  • Case Study – Internet Scale Web ServicesCase Study – Non-Functional Requirements ASIA: 15 Mil. EU: 30 Mil. USA: 50 Mil. User groups: •  Web Browser users •  Mobile users AU: 2 Mil. Availability: 99,99 %Copyright © 2011 Accenture. All rights reserved. 36
  • Case Study – Internet Scale Web ServicesCase Study – Non-Functional Requirements ASIA: 1 data center: •  Singapore Peak: 5.000 QPS EU: USA: 2 data center: 2 data center: •  Frankfurt •  New York •  London •  San Francisco Peak: 10.000 QPS Peak: 20.000 QPS AU: 1 data center: •  Sydney Peak: 3.000 QPSCopyright © 2011 Accenture. All rights reserved. 37
  • Case Study – Internet Scale Web ServicesCase Study – Non-Functional RequirementsPerformance in Case of Failure EU: USA: Failover Failover Frankfurt ↔ London: New York ↔ San Francisco: 20.000 QPS 40.000 QPS AU / ASIA: Failover Singapore ↔ Sydney: 8.000 QPSCopyright © 2011 Accenture. All rights reserved. 38
  • Case Study – Internet Scale Web ServicesCase Study – Non-Functional RequirementsResponse Times RESTful Web Services: •  Calculate service: 100 ms (50ms latency) •  Binary service: 60 ms (50 ms latency) •  Search service: 50 ms (50 ms latency)Copyright © 2011 Accenture. All rights reserved. 39
  • Case Study – Internet Scale Web ServicesCase Study – Non-Functional RequirementsData 100 TByte on each geography -  Binary (video, image, …) -  Index dataCopyright © 2011 Accenture. All rights reserved. 40
  • Agenda•  Introduction•  Case Study•  Solution and Good Practice•  Further Topics•  ConclusionCopyright © 2011 Accenture. All rights reserved. 41
  • Case Study – SolutionCopyright © 2011 Accenture. All rights reserved. 42
  • Case Study – SolutionCopyright © 2011 Accenture. All rights reserved. 43
  • Agenda•  Introduction•  Case Study•  Solution and Good Practice•  Further Topics•  ConclusionCopyright © 2011 Accenture. All rights reserved. 44
  • Furhter Topics• Organization – People, Process and Tools – Governance (Lifecycle management)• Where I do I find the truth in a highly scaled and distributed architecture? – Logging • Log Analytics (e.g. Scribe (not really), Splunk) – End-to-end data visualizationCopyright © 2011 Accenture. All rights reserved. 45
  • Agenda•  Introduction•  Case Study•  Solution and Good Practice•  Further Topics•  ConclusionCopyright © 2011 Accenture. All rights reserved. 46
  • ConclusionContent CachingReverse proxy Caching•  Fast and Scales well•  Dealing with invalidation is tricky•  Direct cache invalidation scales badly•  Instead, change URLs of modified resources•  Old ones will drop out of cache naturallyCDN – Content Delivery Network•  Faster response time and fewer requests on the origin servers•  No 100% control of caching. Based on internal statistics (Akamai).•  Operated by 3rd parties. Already in place. Not for Free•  Once something is cached on CDN, assume that it never changes•  Sometimes does load balancing as wellCopyright © 2011 Accenture. All rights reserved. 47
  • ConclusionCommon Concepts of Scalable Architecture parallelization asynchronous idempotent 7 Habits of operations Good partitioned Distributed fault-tolerance data Systems optimistic shared nothing concurrency loosely coupledSource: "Architecting Cloudy Applications", David ChouSource: highscalability.comCopyright © 2011 Accenture. All rights reserved. 48
  • ConclusionQuestionnaire•  Is there a need to scale my application? –  Vertical scaling is more easy to achieve (Cost) –  Use horizontal scaling only when required (Complexity)•  Is there a plan to proof your designed solution? –  Plan to do a lot of realistic Proof-of-Concepts•  Is there a one size fits all solution? –  NO!•  How important is ACID? – Is BASE enough? – Can a NoSQL solution be used?Copyright © 2011 Accenture. All rights reserved. 49
  • References The Art of Scalability: Scalable Web Architecture, Processes and Organizations for the Modern Enterprise; Michael T. Fisher, Martin L. Abbott; Addison-Wesley Professional; 1 edition Scalability Rules: 50 Principles for Scaling Web Sites; Martin L. Abbott, Michael T. Fisher Addison- Wesley Professional; 1 edition (May 15, 2011) Scalable Internet Architectures; Theo Schlossnagle; Sams; 1 edition (July 31, 2006) Building Scalable Web Sites; Henderson; OreillyWebsites: HighScalability.com, infoQ.com, Qcon.com, …Copyright © 2011 Accenture. All rights reserved. 50
  • Thank You!Contribution and Review: Bukowski, Markus; Conradt, Steffen; Jacobs, Mareike; Krogemann, Markus; Peuker, Jan; Van Isacker, Pieter; Wagenknecht, Dominik; Wagner, Hubert; Werft, Thomas; Zakotnik, JureCopyright © 2011 Accenture. All rights reserved. 51