The Art of Scalability - Managing growth

36,432 views

Published on

The ability to grow (and shrink) according to the needs and the available resources is an essential part of designing applications. In this talk we'll cover the fundamental elements of scalability, including aspects involving people, processes and technology. With sound and proven principles and some advice on how to shape your organisation, set the right processes and design your application, this session is a must-see for developers and technical leads alike.

Published in: Technology, Business
7 Comments
88 Likes
Statistics
Notes
No Downloads
Views
Total views
36,432
On SlideShare
0
From Embeds
0
Number of Embeds
3,580
Actions
Shares
0
Downloads
0
Comments
7
Likes
88
Embeds 0
No embeds

No notes for slide


































































































































































































































  • The Art of Scalability - Managing growth

    1. The Art of Scalabiliity Managing Growth Lorenzo Alberton Amsterdam, 11th June 2010
    2. Scalability Scalability is a desirable property of a system, a network, a business or a process, which indicates its ability to handle growing amounts of work http://en.wikipedia.org/wiki/Scalability 2
    3. Scalable ≠ Fast A service is said to be scalable if when we increase the resources in a system, it results in increased performance in a manner proportional to resources added. http://www.julianbrowne.com/article/viewer/scalability Increasing performance in general means serving more units of work, but it can also be to handle larger units of work, such as when data sets grow. http://highscalability.com/amazon-architecture 3
    4. Scalability Is About... People Processes Technology 4
    5. People Staffing, Roles, Leadership, Management 5
    6. Roles And Responsibilities Role-clarity 6
    7. Roles And Responsibilities Role-clarity overlapping areas missing wasted effort, responsibilities responsibilities value-destroying conflicts, failed scale initiatives 6
    8. Roles And Responsibilities Role-clarity overlapping areas missing wasted effort, responsibilities responsibilities value-destroying conflicts, failed scale initiatives Key scale-related responsibilities Set measurable goals Staff the team with the appropriate skills Define and implement a scalable architecture Test, monitor, develop future demand projections Define future changes based on the analysis 6
    9. Leadership Inspire people Set the right vision and goals Create the right culture Create the right tools 7
    10. Leadership } Inspire people Set the right vision and goals Accelerator for growth Create the right culture Create the right tools 7
    11. Leadership } Inspire people Set the right vision and goals Accelerator for growth Create the right culture Create the right tools vision = where we are going mission = general direction on how to get there goals = milestones along the path 7
    12. Leadership } Inspire people Set the right vision and goals Accelerator for growth Create the right culture Create the right tools vision = where we are going mission = general direction on how to get there goals = milestones along the path S Specific M Measurable A Achievable (but Aggressive) R Realistic T Time-bound 7
    13. Leadership } Inspire people Set the right vision and goals Accelerator for growth Create the right culture Create the right tools vision = where we are going mission = general direction on how to get there goals = milestones along the path S Specific Chip & Dan Heat, “Switch: How To Change Things When Change Is Hard” M Measurable A Achievable (but Aggressive) People R Realistic - Direct the rider T Time-bound - Motivate the elephant - Shape the path 7
    14. Management Project Management Goals Projects Tasks Individuals Measurement Communication Resolution 8
    15. Management Project Management Goals Projects Tasks Individuals Measurement Communication Resolution People Management Hiring Firing Growth 8
    16. Organisational Structure And Team size Too small Too big Micromanaging Poor communication managers Low morale Overworked team Low productivity members 9
    17. Team Structure functional CTO PM PM PM Designer Developer Tester Designer Developer Tester Designer Developer Tester Designer Developer Tester Designers Developers Testers 10
    18. Team Structure functional matrix CTO PM PM PM Proj 1 PM Designer Developer Tester Proj 2 PM Designer Developer Tester Proj 3 PM Designer Developer Tester Proj 4 PM Designer Developer Tester Designers Developers Testers 10
    19. Building Processes For Scale 11
    20. Why Are Processes Critical? Augment management of teams and employees Standardise actions in repetitive tasks Reduce mundane decisions to focus on grander ideas Allow the team to react quickly to crisis Determine system capacity and scalability needs 12
    21. Why Are Processes Critical? Augment management of teams and employees Standardise actions in repetitive tasks Reduce mundane decisions to focus on grander ideas Allow the team to react quickly to crisis Determine system capacity and scalability needs Challenge 12
    22. Why Are Processes Critical? Augment management of teams and employees Standardise actions in repetitive tasks Reduce mundane decisions to focus on grander ideas Allow the team to react quickly to crisis Determine system capacity and scalability needs Challenge right amount 12
    23. Why Are Processes Critical? Augment management of teams and employees Standardise actions in repetitive tasks Reduce mundane decisions to focus on grander ideas Allow the team to react quickly to crisis Determine system capacity and scalability needs Challenge right amount right process 12
    24. Why Are Processes Critical? Augment management of teams and employees Standardise actions in repetitive tasks Reduce mundane decisions to focus on grander ideas Allow the team to react quickly to crisis Determine system capacity and scalability needs Challenge right amount right process right time 12
    25. Determining Headroom For Apps Capacity Current Load 13
    26. Determining Headroom For Apps Capacity Current Load 13
    27. Determining Headroom For Apps Capacity Current Load 13
    28. Determining Headroom For Apps Why? Capacity Planning annual budget Hiring plan Current Load Prioritisation 13
    29. Headroom Process 1. Identify major components 14
    30. Headroom Process 1. Identify major components 2. Identify responsible team 14
    31. Headroom Process 1. Identify major components 2. Identify responsible team 315 queries/sec 20MB/min 3. Determine usage and capacity 14
    32. Headroom Process 1. Identify major components 2. Identify responsible team 315 queries/sec 20MB/min 3. Determine usage and capacity 4. Determine growth rate 14
    33. Headroom Process (ideal usage percentage) x (max capacity) - (current usage) - 1. Identify major components 12 2. Identify responsible team ∑ (growth(t) - (optimisation projects(t))) = ____________________________________ t=1 Headroom 315 queries/sec 20MB/min L. Abbot, M. T. Fisher, “The Art Of Scalability”, Addison Wesley M. 3. Determine usage and capacity 4. Determine growth rate 14
    34. Joint Architecture Design + Review Board Engineering Architecture Operations M. L. Abbot, M. T. Fisher, “The Art Of Scalability”, Addison Wesley 15
    35. Joint Architecture Design + Review Board Engineering Architecture Operations M. L. Abbot, M. T. Fisher, “The Art Of Scalability”, Addison Wesley 15
    36. Joint Architecture Design + Review Board Engineering Architecture Architecture Review Board Operations M. L. Abbot, M. T. Fisher, “The Art Of Scalability”, Addison Wesley 15
    37. Joint Architecture Design + Review Board Meeting Engineering State goal Review alternative designs Architecture Q&A session Deliberation Architecture Review Board Vote Conclusion Operations M. L. Abbot, M. T. Fisher, “The Art Of Scalability”, Addison Wesley 15
    38. Joint Architecture Design + Review Board Meeting Engineering State goal Review alternative designs Architecture Q&A session Deliberation Architecture Review Board Vote Conclusion Operations M. L. Abbot, M. T. Fisher, “The Art Of Scalability”, Addison Wesley 15
    39. Controlling Change in Production Environment 16
    40. Controlling Change in Production Environment Change Management Process Proposal Approval Scheduling Logging Review 16
    41. Controlling Change in Production Environment Change Management Process Proposal Approval Scheduling Logging Review Change Identification Process Date & time System undergoing Expected of the change the change results Contact information Rollback procedure 16
    42. Determining Risk #1: Gut Feeling http://dilbert.com/strips/comic/2008-05-08/ 17
    43. Determining Risk #2: Traffic Lights Feature 1 Feature 2 Feature 3 18
    44. Determining Risk #2: Traffic Lights Feature 1 Feature 2 = Overall Release Feature 3 18
    45. Determining Risk #3: FMEA Failure Mode and Effect Analysis Likelihood Severity Ability Total Remed- Revised Failure Feature Effect of If Failure to Risk iation Risk Mode Failure Occurs Detect Score Actions Score User User not - do this data not registered 3 3 3 27 3 - do that saved Sign Up Users Users can given access 1 9 3 27 - do sth 9 wrong other’s privileges data CC Credit number CC theft not 1 9 1 9 N/A 9 Card risk encrypted 19
    46. Managing Risk Rules Risk Level New Feature Release < 150 pts * Bug Fix Release < 50 pts * Peak-usage-time release < 10 pts * Off-peak release < 200 pts * * Numbers are just indicative figures 20
    47. Managing Risk (Human Factor) Rules Risk Tolerance Level 6-hour period < 150 pts * 12-hour period < 250 pts * 24-hour period < 350 pts * 72-hour period < 500 pts * * Numbers are just indicative figures 21
    48. Managing Incidents And Problems Detect, Report, Investigate, Escalate, Resolve approach M. L. Abbot, M. T. Fisher, “The Art Of Scalability”, Addison Wesley Restore services in a timely and cost-effective manner Contain chaos: each person has a place Determine root cause and correct problems Review issues regularly 22
    49. Managing Incidents And Problems Detect, Report, Investigate, Escalate, Resolve approach M. L. Abbot, M. T. Fisher, “The Art Of Scalability”, Addison Wesley Restore services in a timely and cost-effective manner Contain chaos: each person has a place Determine root cause and correct problems Review issues regularly Post-mortem Process Cross-functional brainstorming meeting 22
    50. Performance (Load) Testing 23
    51. Performance (Load) Testing ✓1.5k users/sec 1. Establish success criteria ✓RT < 150ms 23
    52. Performance (Load) Testing ✓1.5k users/sec 1. Establish success criteria ✓RT < 150ms 2. Establish the test environment TEST ≅ LIVE 23
    53. Performance (Load) Testing ✓1.5k users/sec 1. Establish success criteria ✓RT < 150ms 2. Establish the test environment TEST ≅ LIVE Pareto rule 3. Define the tests (for different things) 20% - 80% 23
    54. Performance (Load) Testing ✓1.5k users/sec 1. Establish success criteria ✓RT < 150ms 2. Establish the test environment TEST ≅ LIVE Pareto rule 3. Define the tests (for different things) 20% - 80% 4. Identify what needs to be monitored CPU - Memory What data needs to be collected TTL, RT, Services 23
    55. Performance (Load) Testing ✓1.5k users/sec 1. Establish success criteria ✓RT < 150ms 2. Establish the test environment TEST ≅ LIVE Pareto rule 3. Define the tests (for different things) 20% - 80% 4. Identify what needs to be monitored CPU - Memory What data needs to be collected TTL, RT, Services CPU: 90% 5. Run, analyse, report to engineers RT: 180ms 2K SimUsers/sec 23
    56. Performance (Load) Testing ✓1.5k users/sec 1. Establish success criteria ✓RT < 150ms 2. Establish the test environment TEST ≅ LIVE Pareto rule 3. Define the tests (for different things) 20% - 80% 4. Identify what needs to be monitored CPU - Memory What data needs to be collected TTL, RT, Services CPU: 90% 5. Run, analyse, report to engineers RT: 180ms 2K SimUsers/sec 6. Repeat tests and analysis Rinse and repeat 23
    57. Stress Testing 24
    58. Stress Testing 24
    59. Stress Testing 24
    60. Stress Testing JMeter Load Runner The Grinder Avalanche http://www.opensourcetesting.org/performance.php 24
    61. Barrier Conditions Architecture review board Code reviews Manual and automated QA processes Performance testing Dev, Test, Stage and Live environments Production monitoring and measurement 25
    62. Technology Architecting scalable solutions 26
    63. Designing For Any Technology Dell WatchGuard Cisco CSS 11501 HP ProLiant DL HP Media Cache Server Appliance 27
    64. Designing For Any Technology Dell WatchGuard Cisco CSS 11501 HP ProLiant DL HP Media Cache Server Appliance 27
    65. Designing For Any Technology Dell WatchGuard Firewall Load Balancer Cisco CSS 11501 HP ProLiant DL Application Servers HP Media Cache Server Appliance DB Server Media / Cache 27
    66. Architectural Principles 28
    67. Architectural Principles +1 N + 1 design 28
    68. Architectural Principles +1 N + 1 design for rollback 28
    69. Architectural Principles +1 N + 1 design for rollback to be disabled 28
    70. Architectural Principles +1 N + 1 design for rollback to be disabled to be monitored 28
    71. Architectural Principles +1 N + 1 design for rollback to be disabled to be for multiple monitored live sites 28
    72. Architectural Principles +1 N + 1 design for rollback to be disabled to be for multiple use mature monitored live sites technology 28
    73. Architectural Principles +1 N + 1 design for rollback to be disabled to be for multiple use mature monitored live sites technology asynchronous design 28
    74. Architectural Principles +1 N + 1 design for rollback to be disabled to be for multiple use mature monitored live sites technology asynchronous stateless design systems 28
    75. Architectural Principles +1 N + 1 design for rollback to be disabled to be for multiple use mature monitored live sites technology asynchronous stateless buy when design systems non core 28
    76. Focus On Core Competencies vs. Build Buy 29
    77. Asynchronous Design 30
    78. Asynchronous Design 30
    79. Stateless Systems State is often useful, but has a significant cost (replication between data centres, synchronous calls...) 31
    80. Stateless Systems State is often useful, but has a significant cost (replication between data centres, synchronous calls...) A B ? Avoidance No sessions / Sticky sessions 31
    81. Stateless Systems State is often useful, but has a significant cost (replication between data centres, synchronous calls...) A B ? Avoidance Decentralisation No sessions / Data in the cookie / Sticky sessions Cookie with hash 31
    82. Stateless Systems State is often useful, but has a significant cost (replication between data centres, synchronous calls...) A B ? Avoidance Decentralisation Centralisation No sessions / Data in the cookie / Store cookies in the Sticky sessions Cookie with hash db or in memcached 31
    83. Creating Fault Isolative Structures 32
    84. Creating Fault Isolative Structures Increase availability Limit impact of failures Easier debugging 32
    85. Creating Fault Isolative Structures Increase availability Limit impact of failures Easier debugging First 32
    86. Creating Fault Isolative Structures Increase availability Limit impact of failures Easier debugging Functions causing repetitive problems First 32
    87. Creating Fault Isolative Structures Increase availability Limit impact of failures Easier debugging Functions Natural layout causing or topology repetitive of the site problems First 32
    88. Scale Directions M. L. Abbot, M. T. Fisher, “The Art Of Scalability”, Addison Wesley 33
    89. Scale Directions cloning of entities or data - unbiased distribution of work x M. L. Abbot, M. T. Fisher, “The Art Of Scalability”, Addison Wesley 33
    90. Scale Directions cloning of entities or data - unbiased distribution of work x y separation of work by activity or data M. L. Abbot, M. T. Fisher, “The Art Of Scalability”, Addison Wesley 33
    91. Scale Directions cloning of entities or data - unbiased distribution of work x y z separation of work separation of work by person by activity or data for whom the work is done M. L. Abbot, M. T. Fisher, “The Art Of Scalability”, Addison Wesley 33
    92. Splitting Applications For Scale 34
    93. Splitting Applications For Scale mirroring x + scale transactions - scale data 34
    94. Splitting Applications For Scale mirroring x + scale transactions - scale data + fault isolation + scale function data - scale customer data y split by service 34
    95. Splitting Applications For Scale mirroring x + scale transactions - scale data + fault isolation + fault isolation + scale function data + scale customer data - scale customer data - scale function data y z split by need / split by service location / value 34
    96. Splitting Databases For Scale 35
    97. Splitting Databases For Scale data cloning (replication / clustering) x + easy to implement + scale transaction volume - scale data size and growth 35
    98. Splitting Databases For Scale data cloning (replication / clustering) x + easy to implement + scale transaction volume - scale data size and growth + fault isolation + reduce query time - more difficult - data migration y split by service / resource / data affinity 35
    99. Splitting Databases For Scale data cloning (replication / clustering) x + easy to implement + scale transaction volume - scale data size and growth + balanced demand + fault isolation + fault isolation + reduce query time + scale data and trans. - more difficult - more costly - data migration y z split by service / split by modulus / resource / data affinity hash-based lookups 35
    100. Caching For Performance & Scale 36
    101. Caching For Performance & Scale Object Caches Usually serialized (marshalling / unmarshalling) get() / set() / replace() APC, Memcached 36
    102. Caching For Performance & Scale Object Caches Application Caches Usually serialized Proxy caches (marshalling / Reverse proxy unmarshalling) caches get() / set() / HTTP headers replace() ISP/Uni proxies APC, Memcached Squid, Varnish, mod_cache 36
    103. Caching For Performance & Scale Object Caches Application Caches CDNs Usually serialized Proxy caches Multiple locations (marshalling / / backbones Reverse proxy unmarshalling) caches get() / set() / HTTP headers CNAME entries replace() ISP/Uni proxies Akamai, Coral, APC, Memcached Squid, Varnish, Limelight... mod_cache 36
    104. Solving Other Issues ...and challenges 37
    105. Too Much Data 38
    106. Too Much Data The more storage ...the more storage management 38
    107. Too Much Data The more storage ...the more storage management storage costs people and software power and space processing power backup time and costs 38
    108. Too Much Data The more storage ...the more storage management storage costs people and software power and space processing power backup time and costs Evaluate data retention policy Consider multi-tiered storage Distribute work (MapReduce) 38
    109. Clouds And Grids Cheap, on-demand storage and compute capacity Cost (pay for what you use) High computation rates Speed (procurement, Shared infrastructure (with provisioning, deployment) proper scheduling Flexibility (change / Unused capacity (SETI@H) reconfigure environment) Security, portability, control Not shared simultaneously Limitations of virtualisation Monolithic applications Performance Complexity (debugging, OS) 39
    110. Monitoring 40
    111. Monitoring 1. Is there a problem? User experience / Business metrics monitors 2. Where is the problem? System monitors (threshold - variance) 3. What is the problem? Application monitors 40
    112. Monitoring 1. Is there a problem? User experience / Business metrics monitors 2. Where is the problem? System monitors (threshold - variance) 3. What is the problem? Application monitors Keep Signal vs. Noise ratio high 40
    113. Monitoring 1. Is there a problem? User experience / Business metrics monitors 2. Where is the problem? System monitors (threshold - variance) 3. What is the problem? Application monitors Keep Signal vs. Noise ratio high 40
    114. Questions ? 41
    115. Links & sources http://www.slideshare.net/postwait/scalable- internet-architecture http://highscalability.com/blog/2009/4/2/art- of-scalability-1-scalability-principles.html http://agile.dzone.com/news/approaches- organizational M. L. Abbot, M. T. Fisher, “The Art Of Scalability”, Addison Wesley http://theartofscalability.com/ 42
    116. Links & sources 43
    117. Image Credits http://www.sxc.hu/photo/1217386 http://michaelscomments.files.wordpress.com/2009/10/onion- centurion.jpg http://www.travelsd.com/_images/gallery/hires/000189.jpg http://www.socketmanufacturers.com/miniature-circuit-breaker/ DZ47-63-3P-Miniature-Circuit-Breaker.jpg http://blogs.microsoft.co.il/blogs/shair/archive/2008/06/19/load- testing-features-of-visual-studio-team-system.aspx http://www.alibaba.com/member/de100430205.html/viewimg/ photo/103590047/Boxing_Ring_Competition_AIBA_Ring.jpg.html http://brandonsmarathon.com/wp-content/uploads/2009/08/ Olympics+Day+3+Swimming+43rPmSVmwHql.jpg http://en.wikipedia.org/wiki/File:Synchronized_swimming_- _Russian_team.jpg http://www.flickr.com/photos/bugeaters/3025911233/ http://www.flickr.com/photos/cote/2763677698/ http://www.iconfinder.com 44
    118. Thank you! Contact details: Lorenzo Alberton lorenzo@ibuildings.com http://www.alberton.info/talks http://joind.in/talk/view/1539

    ×