SlideShare uses cookies to improve functionality and performance, and to provide you with relevant advertising. If you continue browsing the site, you agree to the use of cookies on this website. See our User Agreement and Privacy Policy.
SlideShare uses cookies to improve functionality and performance, and to provide you with relevant advertising. If you continue browsing the site, you agree to the use of cookies on this website. See our Privacy Policy and User Agreement for details.
Successfully reported this slideshow.
Activate your 14 day free trial to unlock unlimited reading.
The ability to grow (and shrink) according to the needs and the available resources is an essential part of designing applications. In this talk we'll cover the fundamental elements of scalability, including aspects involving people, processes and technology. With sound and proven principles and some advice on how to shape your organisation, set the right processes and design your application, this session is a must-see for developers and technical leads alike.
The ability to grow (and shrink) according to the needs and the available resources is an essential part of designing applications. In this talk we'll cover the fundamental elements of scalability, including aspects involving people, processes and technology. With sound and proven principles and some advice on how to shape your organisation, set the right processes and design your application, this session is a must-see for developers and technical leads alike.
Scalability Scalability is a desirable
property of a system, a network, a business or a process, which indicates its ability to handle growing amounts of work http://en.wikipedia.org/wiki/Scalability 2
Scalable ≠ Fast A service
is said to be scalable if when we increase the resources in a system, it results in increased performance in a manner proportional to resources added. http://www.julianbrowne.com/article/viewer/scalability Increasing performance in general means serving more units of work, but it can also be to handle larger units of work, such as when data sets grow. http://highscalability.com/amazon-architecture 3
Roles And Responsibilities Role-clarity overlapping
areas missing wasted effort, responsibilities responsibilities value-destroying conflicts, failed scale initiatives Key scale-related responsibilities Set measurable goals Staff the team with the appropriate skills Define and implement a scalable architecture Test, monitor, develop future demand projections Define future changes based on the analysis 6
Leadership } Inspire people Set
the right vision and goals Accelerator for growth Create the right culture Create the right tools 7
Leadership } Inspire people Set
the right vision and goals Accelerator for growth Create the right culture Create the right tools vision = where we are going mission = general direction on how to get there goals = milestones along the path 7
Leadership } Inspire people Set
the right vision and goals Accelerator for growth Create the right culture Create the right tools vision = where we are going mission = general direction on how to get there goals = milestones along the path S Specific M Measurable A Achievable (but Aggressive) R Realistic T Time-bound 7
Leadership } Inspire people Set
the right vision and goals Accelerator for growth Create the right culture Create the right tools vision = where we are going mission = general direction on how to get there goals = milestones along the path S Specific Chip & Dan Heat, “Switch: How To Change Things When Change Is Hard” M Measurable A Achievable (but Aggressive) People R Realistic - Direct the rider T Time-bound - Motivate the elephant - Shape the path 7
Organisational Structure And Team size
Too small Too big Micromanaging Poor communication managers Low morale Overworked team Low productivity members 9
Why Are Processes Critical? Augment
management of teams and employees Standardise actions in repetitive tasks Reduce mundane decisions to focus on grander ideas Allow the team to react quickly to crisis Determine system capacity and scalability needs 12
Why Are Processes Critical? Augment
management of teams and employees Standardise actions in repetitive tasks Reduce mundane decisions to focus on grander ideas Allow the team to react quickly to crisis Determine system capacity and scalability needs Challenge 12
Why Are Processes Critical? Augment
management of teams and employees Standardise actions in repetitive tasks Reduce mundane decisions to focus on grander ideas Allow the team to react quickly to crisis Determine system capacity and scalability needs Challenge right amount 12
Why Are Processes Critical? Augment
management of teams and employees Standardise actions in repetitive tasks Reduce mundane decisions to focus on grander ideas Allow the team to react quickly to crisis Determine system capacity and scalability needs Challenge right amount right process 12
Why Are Processes Critical? Augment
management of teams and employees Standardise actions in repetitive tasks Reduce mundane decisions to focus on grander ideas Allow the team to react quickly to crisis Determine system capacity and scalability needs Challenge right amount right process right time 12
Headroom Process 1. Identify major
components 2. Identify responsible team 315 queries/sec 20MB/min 3. Determine usage and capacity 4. Determine growth rate 14
Headroom Process (ideal usage percentage)
x (max capacity) - (current usage) - 1. Identify major components 12 2. Identify responsible team ∑ (growth(t) - (optimisation projects(t))) = ____________________________________ t=1 Headroom 315 queries/sec 20MB/min L. Abbot, M. T. Fisher, “The Art Of Scalability”, Addison Wesley M. 3. Determine usage and capacity 4. Determine growth rate 14
Joint Architecture Design + Review
Board Engineering Architecture Operations M. L. Abbot, M. T. Fisher, “The Art Of Scalability”, Addison Wesley 15
Joint Architecture Design + Review
Board Engineering Architecture Operations M. L. Abbot, M. T. Fisher, “The Art Of Scalability”, Addison Wesley 15
Joint Architecture Design + Review
Board Engineering Architecture Architecture Review Board Operations M. L. Abbot, M. T. Fisher, “The Art Of Scalability”, Addison Wesley 15
Joint Architecture Design + Review
Board Meeting Engineering State goal Review alternative designs Architecture Q&A session Deliberation Architecture Review Board Vote Conclusion Operations M. L. Abbot, M. T. Fisher, “The Art Of Scalability”, Addison Wesley 15
Joint Architecture Design + Review
Board Meeting Engineering State goal Review alternative designs Architecture Q&A session Deliberation Architecture Review Board Vote Conclusion Operations M. L. Abbot, M. T. Fisher, “The Art Of Scalability”, Addison Wesley 15
Controlling Change in Production Environment
Change Management Process Proposal Approval Scheduling Logging Review Change Identification Process Date & time System undergoing Expected of the change the change results Contact information Rollback procedure 16
Determining Risk #3: FMEA Failure
Mode and Effect Analysis Likelihood Severity Ability Total Remed- Revised Failure Feature Effect of If Failure to Risk iation Risk Mode Failure Occurs Detect Score Actions Score User User not - do this data not registered 3 3 3 27 3 - do that saved Sign Up Users Users can given access 1 9 3 27 - do sth 9 wrong other’s privileges data CC Credit number CC theft not 1 9 1 9 N/A 9 Card risk encrypted 19
Managing Risk (Human Factor) Rules
Risk Tolerance Level 6-hour period < 150 pts * 12-hour period < 250 pts * 24-hour period < 350 pts * 72-hour period < 500 pts * * Numbers are just indicative figures 21
Managing Incidents And Problems Detect,
Report, Investigate, Escalate, Resolve approach M. L. Abbot, M. T. Fisher, “The Art Of Scalability”, Addison Wesley Restore services in a timely and cost-effective manner Contain chaos: each person has a place Determine root cause and correct problems Review issues regularly 22
Managing Incidents And Problems Detect,
Report, Investigate, Escalate, Resolve approach M. L. Abbot, M. T. Fisher, “The Art Of Scalability”, Addison Wesley Restore services in a timely and cost-effective manner Contain chaos: each person has a place Determine root cause and correct problems Review issues regularly Post-mortem Process Cross-functional brainstorming meeting 22
Performance (Load) Testing ✓1.5k users/sec
1. Establish success criteria ✓RT < 150ms 2. Establish the test environment TEST ≅ LIVE Pareto rule 3. Define the tests (for different things) 20% - 80% 23
Performance (Load) Testing ✓1.5k users/sec
1. Establish success criteria ✓RT < 150ms 2. Establish the test environment TEST ≅ LIVE Pareto rule 3. Define the tests (for different things) 20% - 80% 4. Identify what needs to be monitored CPU - Memory What data needs to be collected TTL, RT, Services 23
Performance (Load) Testing ✓1.5k users/sec
1. Establish success criteria ✓RT < 150ms 2. Establish the test environment TEST ≅ LIVE Pareto rule 3. Define the tests (for different things) 20% - 80% 4. Identify what needs to be monitored CPU - Memory What data needs to be collected TTL, RT, Services CPU: 90% 5. Run, analyse, report to engineers RT: 180ms 2K SimUsers/sec 23
Performance (Load) Testing ✓1.5k users/sec
1. Establish success criteria ✓RT < 150ms 2. Establish the test environment TEST ≅ LIVE Pareto rule 3. Define the tests (for different things) 20% - 80% 4. Identify what needs to be monitored CPU - Memory What data needs to be collected TTL, RT, Services CPU: 90% 5. Run, analyse, report to engineers RT: 180ms 2K SimUsers/sec 6. Repeat tests and analysis Rinse and repeat 23
Barrier Conditions Architecture review board
Code reviews Manual and automated QA processes Performance testing Dev, Test, Stage and Live environments Production monitoring and measurement 25
Designing For Any Technology Dell
WatchGuard Firewall Load Balancer Cisco CSS 11501 HP ProLiant DL Application Servers HP Media Cache Server Appliance DB Server Media / Cache 27
Architectural Principles +1 N +
1 design for rollback to be disabled to be for multiple use mature monitored live sites technology 28
Architectural Principles +1 N +
1 design for rollback to be disabled to be for multiple use mature monitored live sites technology asynchronous design 28
Architectural Principles +1 N +
1 design for rollback to be disabled to be for multiple use mature monitored live sites technology asynchronous stateless design systems 28
Architectural Principles +1 N +
1 design for rollback to be disabled to be for multiple use mature monitored live sites technology asynchronous stateless buy when design systems non core 28
Stateless Systems State is often
useful, but has a significant cost (replication between data centres, synchronous calls...) A B ? Avoidance No sessions / Sticky sessions 31
Stateless Systems State is often
useful, but has a significant cost (replication between data centres, synchronous calls...) A B ? Avoidance Decentralisation No sessions / Data in the cookie / Sticky sessions Cookie with hash 31
Stateless Systems State is often
useful, but has a significant cost (replication between data centres, synchronous calls...) A B ? Avoidance Decentralisation Centralisation No sessions / Data in the cookie / Store cookies in the Sticky sessions Cookie with hash db or in memcached 31
Creating Fault Isolative Structures Increase
availability Limit impact of failures Easier debugging Functions Natural layout causing or topology repetitive of the site problems First 32
Scale Directions cloning of entities
or data - unbiased distribution of work x M. L. Abbot, M. T. Fisher, “The Art Of Scalability”, Addison Wesley 33
Scale Directions cloning of entities
or data - unbiased distribution of work x y separation of work by activity or data M. L. Abbot, M. T. Fisher, “The Art Of Scalability”, Addison Wesley 33
Scale Directions cloning of entities
or data - unbiased distribution of work x y z separation of work separation of work by person by activity or data for whom the work is done M. L. Abbot, M. T. Fisher, “The Art Of Scalability”, Addison Wesley 33
Splitting Applications For Scale mirroring
x + scale transactions - scale data + fault isolation + fault isolation + scale function data + scale customer data - scale customer data - scale function data y z split by need / split by service location / value 34
Splitting Databases For Scale data
cloning (replication / clustering) x + easy to implement + scale transaction volume - scale data size and growth 35
Splitting Databases For Scale data
cloning (replication / clustering) x + easy to implement + scale transaction volume - scale data size and growth + fault isolation + reduce query time - more difficult - data migration y split by service / resource / data affinity 35
Splitting Databases For Scale data
cloning (replication / clustering) x + easy to implement + scale transaction volume - scale data size and growth + balanced demand + fault isolation + fault isolation + reduce query time + scale data and trans. - more difficult - more costly - data migration y z split by service / split by modulus / resource / data affinity hash-based lookups 35
Too Much Data The more
storage ...the more storage management storage costs people and software power and space processing power backup time and costs 38
Too Much Data The more
storage ...the more storage management storage costs people and software power and space processing power backup time and costs Evaluate data retention policy Consider multi-tiered storage Distribute work (MapReduce) 38
Clouds And Grids Cheap, on-demand
storage and compute capacity Cost (pay for what you use) High computation rates Speed (procurement, Shared infrastructure (with provisioning, deployment) proper scheduling Flexibility (change / Unused capacity (SETI@H) reconfigure environment) Security, portability, control Not shared simultaneously Limitations of virtualisation Monolithic applications Performance Complexity (debugging, OS) 39
Monitoring 1. Is there a
problem? User experience / Business metrics monitors 2. Where is the problem? System monitors (threshold - variance) 3. What is the problem? Application monitors 40
Monitoring 1. Is there a
problem? User experience / Business metrics monitors 2. Where is the problem? System monitors (threshold - variance) 3. What is the problem? Application monitors Keep Signal vs. Noise ratio high 40
Monitoring 1. Is there a
problem? User experience / Business metrics monitors 2. Where is the problem? System monitors (threshold - variance) 3. What is the problem? Application monitors Keep Signal vs. Noise ratio high 40
Links & sources http://www.slideshare.net/postwait/scalable- internet-architecture
http://highscalability.com/blog/2009/4/2/art- of-scalability-1-scalability-principles.html http://agile.dzone.com/news/approaches- organizational M. L. Abbot, M. T. Fisher, “The Art Of Scalability”, Addison Wesley http://theartofscalability.com/ 42