SlideShare a Scribd company logo
1 of 123
The Art of Scalabiliity

             Managing Growth




      Lorenzo Alberton
     Amsterdam, 11th June 2010
Scalability




      Scalability is a desirable property of a
       system, a network, a business or a
      process, which indicates its ability to
        handle growing amounts of work




              http://en.wikipedia.org/wiki/Scalability


                                                         2
Scalable ≠ Fast

    A service is said to be scalable if when we
      increase the resources in a system, it
      results in increased performance in a
    manner proportional to resources added.
          http://www.julianbrowne.com/article/viewer/scalability




    Increasing performance in general means
    serving more units of work, but it can also
    be to handle larger units of work, such as
              when data sets grow.
             http://highscalability.com/amazon-architecture


                                                                   3
Scalability Is About...


                     People




     Processes                Technology



                                           4
People
Staffing, Roles, Leadership, Management




                                          5
Roles And Responsibilities   Role-clarity




                                            6
Roles And Responsibilities                Role-clarity




  overlapping       areas missing             wasted effort,
responsibilities   responsibilities   value-destroying conflicts,
                                         failed scale initiatives




                                                               6
Roles And Responsibilities                        Role-clarity




  overlapping             areas missing               wasted effort,
responsibilities         responsibilities     value-destroying conflicts,
                                                 failed scale initiatives

Key scale-related responsibilities
 Set measurable goals
 Staff the team with the appropriate skills
 Define and implement a scalable architecture
 Test, monitor, develop future demand projections
 Define future changes based on the analysis
                                                                       6
Leadership
 Inspire people
 Set the right vision and goals
 Create the right culture
 Create the right tools




                                  7
Leadership



                                  }
 Inspire people
 Set the right vision and goals
                                      Accelerator for growth
 Create the right culture
 Create the right tools




                                                               7
Leadership



                                  }
 Inspire people
 Set the right vision and goals
                                      Accelerator for growth
 Create the right culture
 Create the right tools

  vision = where we are going
  mission = general direction on how to get there
  goals   = milestones along the path




                                                               7
Leadership



                                  }
 Inspire people
 Set the right vision and goals
                                       Accelerator for growth
 Create the right culture
 Create the right tools

  vision = where we are going
  mission = general direction on how to get there
  goals   = milestones along the path
    S    Specific
    M    Measurable
    A    Achievable (but Aggressive)
    R    Realistic
    T    Time-bound

                                                                7
Leadership



                                  }
 Inspire people
 Set the right vision and goals
                                       Accelerator for growth
 Create the right culture
 Create the right tools

  vision = where we are going
  mission = general direction on how to get there
  goals   = milestones along the path
    S    Specific                            Chip & Dan Heat, “Switch: How To
                                            Change Things When Change Is Hard”
    M    Measurable
    A    Achievable (but Aggressive)       People
    R    Realistic                         - Direct the rider
    T    Time-bound                        - Motivate the elephant
                                           - Shape the path
                                                                             7
Management
               Project Management


  Goals        Projects     Tasks   Individuals



 Measurement      Communication     Resolution




                                                  8
Management
               Project Management


  Goals        Projects      Tasks      Individuals



 Measurement      Communication        Resolution

               People Management

 Hiring             Firing           Growth

                                                      8
Organisational Structure And Team size

  Too small               Too big




  Micromanaging           Poor communication
  managers
                          Low morale
  Overworked team
                          Low productivity
  members




                                               9
Team Structure

functional                CTO

                 PM       PM         PM

             Designer   Developer   Tester

             Designer   Developer   Tester

             Designer   Developer   Tester

             Designer   Developer   Tester
             Designers Developers   Testers
                                              10
Team Structure

functional
  matrix                   CTO

                 PM        PM         PM

Proj 1   PM   Designer   Developer   Tester

Proj 2   PM   Designer   Developer   Tester

Proj 3   PM   Designer   Developer   Tester

Proj 4   PM   Designer   Developer   Tester
              Designers Developers   Testers
                                               10
Building Processes For Scale




                               11
Why Are Processes Critical?

 Augment management of teams and employees
 Standardise actions in repetitive tasks
 Reduce mundane decisions to focus on grander ideas
 Allow the team to react quickly to crisis
 Determine system capacity and scalability needs




                                                      12
Why Are Processes Critical?

 Augment management of teams and employees
 Standardise actions in repetitive tasks
 Reduce mundane decisions to focus on grander ideas
 Allow the team to react quickly to crisis
 Determine system capacity and scalability needs

                         Challenge




                                                      12
Why Are Processes Critical?

 Augment management of teams and employees
 Standardise actions in repetitive tasks
 Reduce mundane decisions to focus on grander ideas
 Allow the team to react quickly to crisis
 Determine system capacity and scalability needs

                         Challenge




  right amount

                                                      12
Why Are Processes Critical?

 Augment management of teams and employees
 Standardise actions in repetitive tasks
 Reduce mundane decisions to focus on grander ideas
 Allow the team to react quickly to crisis
 Determine system capacity and scalability needs

                         Challenge




  right amount           right process

                                                      12
Why Are Processes Critical?

 Augment management of teams and employees
 Standardise actions in repetitive tasks
 Reduce mundane decisions to focus on grander ideas
 Allow the team to react quickly to crisis
 Determine system capacity and scalability needs

                         Challenge




  right amount           right process         right time

                                                            12
Determining Headroom For Apps


                 Capacity




               Current Load



                                13
Determining Headroom For Apps


                 Capacity




               Current Load



                                13
Determining Headroom For Apps


                 Capacity




               Current Load



                                13
Determining Headroom For Apps

                                 Why?
                 Capacity

                                  Planning
                                   annual
                                   budget



                                 Hiring plan
               Current Load


                                Prioritisation
                                                 13
Headroom Process




1. Identify major components




                               14
Headroom Process




1. Identify major components   2. Identify responsible team




                                                              14
Headroom Process




1. Identify major components      2. Identify responsible team



 315 queries/sec
 20MB/min

3. Determine usage and capacity
                                                                 14
Headroom Process




1. Identify major components      2. Identify responsible team



 315 queries/sec
 20MB/min

3. Determine usage and capacity   4. Determine growth rate
                                                                 14
Headroom Process




           (ideal usage percentage) x (max capacity) -

                                                 (current usage) -
1. Identify major components
        12                                        2. Identify responsible team
          ∑ (growth(t) - (optimisation projects(t))) =
          ____________________________________
         t=1

                                                            Headroom

 315 queries/sec
 20MB/min L. Abbot, M. T. Fisher, “The Art Of Scalability”, Addison Wesley
         M.

3. Determine usage and capacity                   4. Determine growth rate
                                                                                 14
Joint Architecture Design + Review Board




Engineering




Architecture




Operations        M. L. Abbot, M. T. Fisher, “The Art Of Scalability”, Addison Wesley

                                                                                  15
Joint Architecture Design + Review Board




Engineering




Architecture




Operations        M. L. Abbot, M. T. Fisher, “The Art Of Scalability”, Addison Wesley

                                                                                  15
Joint Architecture Design + Review Board




Engineering




Architecture

                                                                   Architecture
                                                                   Review Board




Operations        M. L. Abbot, M. T. Fisher, “The Art Of Scalability”, Addison Wesley

                                                                                  15
Joint Architecture Design + Review Board


                                    Meeting
Engineering
                                        State goal
                                        Review
                                        alternative
                                        designs
Architecture                            Q&A session
                                        Deliberation               Architecture
                                                                   Review Board
                                        Vote
                                        Conclusion

Operations        M. L. Abbot, M. T. Fisher, “The Art Of Scalability”, Addison Wesley

                                                                                  15
Joint Architecture Design + Review Board


                                    Meeting
Engineering
                                        State goal
                                        Review
                                        alternative
                                        designs
Architecture                            Q&A session
                                        Deliberation               Architecture
                                                                   Review Board
                                        Vote
                                        Conclusion

Operations        M. L. Abbot, M. T. Fisher, “The Art Of Scalability”, Addison Wesley

                                                                                  15
Controlling Change in Production Environment




                                           16
Controlling Change in Production Environment
            Change Management Process



Proposal   Approval   Scheduling   Logging   Review




                                                      16
Controlling Change in Production Environment
                 Change Management Process



Proposal        Approval     Scheduling        Logging     Review

                 Change Identification Process


Date & time                System undergoing              Expected
of the change                  the change                  results


            Contact information           Rollback procedure
                                                                     16
Determining Risk #1: Gut Feeling




           http://dilbert.com/strips/comic/2008-05-08/
                                                         17
Determining Risk #2: Traffic Lights


Feature 1




Feature 2




Feature 3


                                      18
Determining Risk #2: Traffic Lights


Feature 1




Feature 2                =
                                 Overall Release


Feature 3


                                                   18
Determining Risk #3: FMEA
                 Failure Mode and Effect Analysis

                                Likelihood Severity     Ability   Total   Remed-      Revised
           Failure
Feature              Effect         of     If Failure    to        Risk    iation       Risk
           Mode
                                  Failure   Occurs      Detect    Score   Actions      Score

           User
                    User not                                              - do this
          data not registered       3          3          3        27                   3
                                                                          - do that
           saved
Sign Up
            Users Users can
            given    access
                                    1          9          3        27     - do sth      9
           wrong     other’s
          privileges  data

             CC
Credit     number CC theft
             not                    1          9          1        9        N/A         9
Card                risk
          encrypted



                                                                                            19
Managing Risk

                 Rules                  Risk Level

     New Feature Release                < 150 pts *

         Bug Fix Release                < 50 pts *

   Peak-usage-time release              < 10 pts *

        Off-peak release                < 200 pts *

 * Numbers are just indicative figures

                                                      20
Managing Risk (Human Factor)

                 Rules                  Risk Tolerance Level

          6-hour period                     < 150 pts *

         12-hour period                     < 250 pts *

         24-hour period                     < 350 pts *

         72-hour period                     < 500 pts *

 * Numbers are just indicative figures

                                                               21
Managing Incidents And Problems

Detect, Report, Investigate, Escalate, Resolve                                  approach

          M. L. Abbot, M. T. Fisher, “The Art Of Scalability”, Addison Wesley



 Restore services in a timely and cost-effective manner
 Contain chaos: each person has a place
 Determine root cause and correct problems
 Review issues regularly




                                                                                           22
Managing Incidents And Problems

Detect, Report, Investigate, Escalate, Resolve                                  approach

          M. L. Abbot, M. T. Fisher, “The Art Of Scalability”, Addison Wesley



 Restore services in a timely and cost-effective manner
 Contain chaos: each person has a place
 Determine root cause and correct problems
 Review issues regularly



          Post-mortem Process
          Cross-functional brainstorming meeting

                                                                                           22
Performance (Load) Testing




                             23
Performance (Load) Testing

                                 ✓1.5k users/sec
 1. Establish success criteria   ✓RT < 150ms




                                              23
Performance (Load) Testing

                                     ✓1.5k users/sec
 1. Establish success criteria       ✓RT < 150ms

 2. Establish the test environment   TEST   ≅   LIVE




                                                   23
Performance (Load) Testing

                                             ✓1.5k users/sec
 1. Establish success criteria               ✓RT < 150ms

 2. Establish the test environment           TEST   ≅   LIVE


                                              Pareto rule
 3. Define the tests (for different things)    20% - 80%




                                                            23
Performance (Load) Testing

                                             ✓1.5k users/sec
 1. Establish success criteria               ✓RT < 150ms

 2. Establish the test environment            TEST   ≅   LIVE


                                              Pareto rule
 3. Define the tests (for different things)    20% - 80%


 4. Identify what needs to be monitored      CPU - Memory
     What data needs to be collected         TTL, RT, Services




                                                            23
Performance (Load) Testing

                                             ✓1.5k users/sec
 1. Establish success criteria               ✓RT < 150ms

 2. Establish the test environment            TEST   ≅   LIVE


                                              Pareto rule
 3. Define the tests (for different things)    20% - 80%


 4. Identify what needs to be monitored      CPU - Memory
     What data needs to be collected         TTL, RT, Services

                                             CPU: 90%
 5. Run, analyse, report to engineers        RT: 180ms
                                             2K SimUsers/sec




                                                            23
Performance (Load) Testing

                                             ✓1.5k users/sec
 1. Establish success criteria               ✓RT < 150ms

 2. Establish the test environment            TEST   ≅   LIVE


                                              Pareto rule
 3. Define the tests (for different things)    20% - 80%


 4. Identify what needs to be monitored      CPU - Memory
     What data needs to be collected         TTL, RT, Services

                                             CPU: 90%
 5. Run, analyse, report to engineers        RT: 180ms
                                             2K SimUsers/sec

 6. Repeat tests and analysis                 Rinse and repeat


                                                            23
Stress Testing




                 24
Stress Testing




                 24
Stress Testing




                 24
Stress Testing




            JMeter                Load Runner

          The Grinder              Avalanche

   http://www.opensourcetesting.org/performance.php
                                                      24
Barrier Conditions




                Architecture review board
                Code reviews
                Manual and automated QA processes
                Performance testing
                Dev, Test, Stage and Live environments
                Production monitoring and measurement




                                                         25
Technology
Architecting scalable solutions




                                  26
Designing For Any Technology

           Dell WatchGuard



          Cisco CSS 11501




           HP ProLiant DL




HP Media Cache
Server Appliance


                               27
Designing For Any Technology

           Dell WatchGuard



          Cisco CSS 11501




           HP ProLiant DL




HP Media Cache
Server Appliance


                               27
Designing For Any Technology

           Dell WatchGuard                   Firewall



                                         Load Balancer
          Cisco CSS 11501




           HP ProLiant DL
                                       Application Servers




HP Media Cache
Server Appliance                                             DB Server
                             Media / Cache

                                                                         27
Architectural Principles




                           28
Architectural Principles

   +1
N + 1 design




                           28
Architectural Principles

   +1
N + 1 design       for rollback




                                  28
Architectural Principles

   +1
N + 1 design       for rollback   to be disabled




                                                   28
Architectural Principles

   +1
N + 1 design       for rollback   to be disabled




   to be
 monitored




                                                   28
Architectural Principles

   +1
N + 1 design       for rollback   to be disabled




   to be           for multiple
 monitored           live sites




                                                   28
Architectural Principles

   +1
N + 1 design       for rollback   to be disabled




   to be           for multiple    use mature
 monitored           live sites    technology




                                                   28
Architectural Principles

   +1
N + 1 design       for rollback   to be disabled




   to be           for multiple    use mature
 monitored           live sites    technology




asynchronous
    design
                                                   28
Architectural Principles

   +1
N + 1 design       for rollback   to be disabled




   to be           for multiple    use mature
 monitored           live sites    technology




asynchronous        stateless
    design          systems
                                                   28
Architectural Principles

   +1
N + 1 design       for rollback   to be disabled




   to be           for multiple    use mature
 monitored           live sites    technology




asynchronous        stateless       buy when
    design          systems         non core
                                                   28
Focus On Core Competencies

                   vs.



 Build                       Buy




                                   29
Asynchronous Design




                      30
Asynchronous Design




                      30
Stateless Systems

State is often useful, but has a significant cost
(replication between data centres, synchronous calls...)




                                                           31
Stateless Systems

State is often useful, but has a significant cost
(replication between data centres, synchronous calls...)
        A B




         ?




    Avoidance

   No sessions /
  Sticky sessions


                                                           31
Stateless Systems

State is often useful, but has a significant cost
(replication between data centres, synchronous calls...)
        A B




         ?




    Avoidance         Decentralisation

   No sessions /    Data in the cookie /
  Sticky sessions    Cookie with hash


                                                           31
Stateless Systems

State is often useful, but has a significant cost
(replication between data centres, synchronous calls...)
        A B




         ?




    Avoidance         Decentralisation        Centralisation

   No sessions /    Data in the cookie /   Store cookies in the
  Sticky sessions    Cookie with hash      db or in memcached


                                                                  31
Creating Fault Isolative Structures




                                      32
Creating Fault Isolative Structures


             Increase availability
                Limit impact of
                    failures
               Easier debugging




                                      32
Creating Fault Isolative Structures


             Increase availability
                Limit impact of
                    failures
               Easier debugging




                    First

                                      32
Creating Fault Isolative Structures


             Increase availability
                Limit impact of
                    failures
               Easier debugging
 Functions
   causing
    repetitive
      problems
                    First

                                      32
Creating Fault Isolative Structures


             Increase availability
                Limit impact of
                    failures
               Easier debugging
 Functions                            Natural layout
   causing                            or topology
    repetitive                       of the site
      problems
                    First

                                                   32
Scale Directions




    M. L. Abbot, M. T. Fisher, “The Art Of Scalability”, Addison Wesley
                                                                          33
Scale Directions
 cloning of entities or data - unbiased distribution of work
                             x




      M. L. Abbot, M. T. Fisher, “The Art Of Scalability”, Addison Wesley
                                                                            33
Scale Directions
 cloning of entities or data - unbiased distribution of work
                             x




             y

separation of work
 by activity or data
      M. L. Abbot, M. T. Fisher, “The Art Of Scalability”, Addison Wesley
                                                                            33
Scale Directions
 cloning of entities or data - unbiased distribution of work
                             x




             y                                               z
separation of work                      separation of work by person
 by activity or data                     for whom the work is done
      M. L. Abbot, M. T. Fisher, “The Art Of Scalability”, Addison Wesley
                                                                            33
Splitting Applications For Scale




                                   34
Splitting Applications For Scale
                    mirroring
                     x     + scale transactions
                           - scale data




                                                  34
Splitting Applications For Scale
                        mirroring
                         x     + scale transactions
                               - scale data



+ fault isolation
+ scale function data
- scale customer data

           y

    split by service

                                                      34
Splitting Applications For Scale
                        mirroring
                         x     + scale transactions
                               - scale data



+ fault isolation                     + fault isolation
+ scale function data                 + scale customer data
- scale customer data                 - scale function data

           y                                   z
                                         split by need /
    split by service
                                        location / value
                                                           34
Splitting Databases For Scale




                                35
Splitting Databases For Scale
        data cloning (replication / clustering)
                        x     + easy to implement
                              + scale transaction volume
                              - scale data size and growth




                                                         35
Splitting Databases For Scale
            data cloning (replication / clustering)
                            x     + easy to implement
                                  + scale transaction volume
                                  - scale data size and growth


+ fault isolation
+ reduce query time
- more difficult
- data migration
             y
   split by service /
resource / data affinity
                                                             35
Splitting Databases For Scale
            data cloning (replication / clustering)
                            x     + easy to implement
                                  + scale transaction volume
                                  - scale data size and growth


                                        + balanced demand
+ fault isolation
                                        + fault isolation
+ reduce query time
                                        + scale data and trans.
- more difficult
                                        - more costly
- data migration
             y                                    z
   split by service /                    split by modulus /
resource / data affinity                 hash-based lookups
                                                              35
Caching For Performance & Scale




                                  36
Caching For Performance & Scale

 Object Caches


 Usually serialized
 (marshalling /
 unmarshalling)



 get() / set() /
 replace()


APC, Memcached



                                  36
Caching For Performance & Scale

 Object Caches        Application Caches


 Usually serialized    Proxy caches
 (marshalling /
                       Reverse proxy
 unmarshalling)
                       caches


 get() / set() /       HTTP headers
 replace()

                        ISP/Uni proxies
APC, Memcached          Squid, Varnish,
                         mod_cache


                                           36
Caching For Performance & Scale

 Object Caches        Application Caches        CDNs


 Usually serialized    Proxy caches        Multiple locations
 (marshalling /                            / backbones
                       Reverse proxy
 unmarshalling)
                       caches


 get() / set() /       HTTP headers        CNAME entries
 replace()

                        ISP/Uni proxies     Akamai, Coral,
APC, Memcached          Squid, Varnish,
                                             Limelight...

                         mod_cache


                                                             36
Solving Other Issues
...and challenges




                       37
Too Much Data




                38
Too Much Data

                The more storage


                     ...the more
                storage management




                                     38
Too Much Data

                The more storage


                     ...the more
                storage management
                           storage costs
                    people and software
                       power and space
                      processing power
                  backup time and costs




                                      38
Too Much Data

                     The more storage


                         ...the more
                    storage management
                                 storage costs
                          people and software
                             power and space
                            processing power
                        backup time and costs
                Evaluate data retention policy
                Consider multi-tiered storage
                Distribute work (MapReduce)
                                             38
Clouds And Grids
    Cheap, on-demand storage and compute capacity
 Cost (pay for what you use)      High computation rates
 Speed (procurement,              Shared infrastructure (with
 provisioning, deployment)        proper scheduling
 Flexibility (change /            Unused capacity (SETI@H)
 reconfigure environment)




 Security, portability, control   Not shared simultaneously
 Limitations of virtualisation    Monolithic applications
 Performance                      Complexity (debugging, OS)
                                                            39
Monitoring




             40
Monitoring




1. Is there a problem?    User experience / Business metrics monitors

2. Where is the problem? System monitors (threshold - variance)

3. What is the problem?   Application monitors



                                                                    40
Monitoring




1. Is there a problem?    User experience / Business metrics monitors

2. Where is the problem? System monitors (threshold - variance)

3. What is the problem?   Application monitors

               Keep Signal vs. Noise ratio high
                                                                    40
Monitoring




1. Is there a problem?    User experience / Business metrics monitors

2. Where is the problem? System monitors (threshold - variance)

3. What is the problem?   Application monitors

               Keep Signal vs. Noise ratio high
                                                                    40
Questions ?




              41
Links & sources

http://www.slideshare.net/postwait/scalable-
internet-architecture
http://highscalability.com/blog/2009/4/2/art-
of-scalability-1-scalability-principles.html
http://agile.dzone.com/news/approaches-
organizational


M. L. Abbot, M. T. Fisher, “The Art Of
Scalability”, Addison Wesley
http://theartofscalability.com/


                                                42
Links & sources




                  43
Image Credits
http://www.sxc.hu/photo/1217386
http://michaelscomments.files.wordpress.com/2009/10/onion-
centurion.jpg
http://www.travelsd.com/_images/gallery/hires/000189.jpg
http://www.socketmanufacturers.com/miniature-circuit-breaker/
DZ47-63-3P-Miniature-Circuit-Breaker.jpg
http://blogs.microsoft.co.il/blogs/shair/archive/2008/06/19/load-
testing-features-of-visual-studio-team-system.aspx
http://www.alibaba.com/member/de100430205.html/viewimg/
photo/103590047/Boxing_Ring_Competition_AIBA_Ring.jpg.html
http://brandonsmarathon.com/wp-content/uploads/2009/08/
Olympics+Day+3+Swimming+43rPmSVmwHql.jpg
http://en.wikipedia.org/wiki/File:Synchronized_swimming_-
_Russian_team.jpg
http://www.flickr.com/photos/bugeaters/3025911233/
http://www.flickr.com/photos/cote/2763677698/
http://www.iconfinder.com

                                                                    44
Thank you!
                  Contact details:

         Lorenzo Alberton
       lorenzo@ibuildings.com
http://www.alberton.info/talks
http://joind.in/talk/view/1539
The Art of Scalability - Managing growth
The Art of Scalability - Managing growth
The Art of Scalability - Managing growth
The Art of Scalability - Managing growth
The Art of Scalability - Managing growth

More Related Content

What's hot

Resource provisioning optimization in cloud computing
Resource provisioning optimization in cloud computingResource provisioning optimization in cloud computing
Resource provisioning optimization in cloud computing
Masoumeh_tajvidi
 

What's hot (20)

Webinar presentation on cloud computing
Webinar presentation on cloud computingWebinar presentation on cloud computing
Webinar presentation on cloud computing
 
Netflix in the Cloud
Netflix in the CloudNetflix in the Cloud
Netflix in the Cloud
 
Mainframe Modernization with AWS: Patterns and Best Practices (GPSTEC305) - A...
Mainframe Modernization with AWS: Patterns and Best Practices (GPSTEC305) - A...Mainframe Modernization with AWS: Patterns and Best Practices (GPSTEC305) - A...
Mainframe Modernization with AWS: Patterns and Best Practices (GPSTEC305) - A...
 
Lecture1 introduction to big data
Lecture1 introduction to big dataLecture1 introduction to big data
Lecture1 introduction to big data
 
Disaster Recovery Best Practices and Customer Use Cases: CGS and Health Quest...
Disaster Recovery Best Practices and Customer Use Cases: CGS and Health Quest...Disaster Recovery Best Practices and Customer Use Cases: CGS and Health Quest...
Disaster Recovery Best Practices and Customer Use Cases: CGS and Health Quest...
 
Enabling Transformation through Agility & Innovation - AWS Transformation Day...
Enabling Transformation through Agility & Innovation - AWS Transformation Day...Enabling Transformation through Agility & Innovation - AWS Transformation Day...
Enabling Transformation through Agility & Innovation - AWS Transformation Day...
 
Tableau Presentation
Tableau PresentationTableau Presentation
Tableau Presentation
 
Resource provisioning optimization in cloud computing
Resource provisioning optimization in cloud computingResource provisioning optimization in cloud computing
Resource provisioning optimization in cloud computing
 
Software System Scalability: Concepts and Techniques (keynote talk at ISEC 2009)
Software System Scalability: Concepts and Techniques (keynote talk at ISEC 2009)Software System Scalability: Concepts and Techniques (keynote talk at ISEC 2009)
Software System Scalability: Concepts and Techniques (keynote talk at ISEC 2009)
 
Advantages of Cloud Computing for Business
Advantages of Cloud Computing for BusinessAdvantages of Cloud Computing for Business
Advantages of Cloud Computing for Business
 
10 Principles of Design by Dieter Rams for Data Visualization
10 Principles of Design by Dieter Rams for Data Visualization10 Principles of Design by Dieter Rams for Data Visualization
10 Principles of Design by Dieter Rams for Data Visualization
 
Introduction to Amazon Web Services
Introduction to Amazon Web ServicesIntroduction to Amazon Web Services
Introduction to Amazon Web Services
 
What is Cloud Cost Optimization and Management? How It Works?
What is Cloud Cost Optimization and Management? How It Works?What is Cloud Cost Optimization and Management? How It Works?
What is Cloud Cost Optimization and Management? How It Works?
 
Predictive analytics
Predictive analytics Predictive analytics
Predictive analytics
 
A Survey on Resource Allocation in Cloud Computing
A Survey on Resource Allocation in Cloud ComputingA Survey on Resource Allocation in Cloud Computing
A Survey on Resource Allocation in Cloud Computing
 
The Future of KM
The Future of KMThe Future of KM
The Future of KM
 
Data Lake Overview
Data Lake OverviewData Lake Overview
Data Lake Overview
 
Introduction to AWS Cloud Computing
Introduction to AWS Cloud ComputingIntroduction to AWS Cloud Computing
Introduction to AWS Cloud Computing
 
Dashboard - definition, examples
Dashboard - definition, examplesDashboard - definition, examples
Dashboard - definition, examples
 
Chapter 1 big data
Chapter 1 big dataChapter 1 big data
Chapter 1 big data
 

Viewers also liked

Scalable Architectures - Taming the Twitter Firehose
Scalable Architectures - Taming the Twitter FirehoseScalable Architectures - Taming the Twitter Firehose
Scalable Architectures - Taming the Twitter Firehose
Lorenzo Alberton
 

Viewers also liked (7)

Scalable Architectures - Taming the Twitter Firehose
Scalable Architectures - Taming the Twitter FirehoseScalable Architectures - Taming the Twitter Firehose
Scalable Architectures - Taming the Twitter Firehose
 
Scaling Teams, Processes and Architectures
Scaling Teams, Processes and ArchitecturesScaling Teams, Processes and Architectures
Scaling Teams, Processes and Architectures
 
Modern Algorithms and Data Structures - 1. Bloom Filters, Merkle Trees
Modern Algorithms and Data Structures - 1. Bloom Filters, Merkle TreesModern Algorithms and Data Structures - 1. Bloom Filters, Merkle Trees
Modern Algorithms and Data Structures - 1. Bloom Filters, Merkle Trees
 
Monitoring at scale - Intuitive dashboard design
Monitoring at scale - Intuitive dashboard designMonitoring at scale - Intuitive dashboard design
Monitoring at scale - Intuitive dashboard design
 
Graphs in the Database: Rdbms In The Social Networks Age
Graphs in the Database: Rdbms In The Social Networks AgeGraphs in the Database: Rdbms In The Social Networks Age
Graphs in the Database: Rdbms In The Social Networks Age
 
NoSQL Databases: Why, what and when
NoSQL Databases: Why, what and whenNoSQL Databases: Why, what and when
NoSQL Databases: Why, what and when
 
Trees In The Database - Advanced data structures
Trees In The Database - Advanced data structuresTrees In The Database - Advanced data structures
Trees In The Database - Advanced data structures
 

Similar to The Art of Scalability - Managing growth

Ewan developing the agile mindset for organizational agility
Ewan   developing the agile mindset for organizational agilityEwan   developing the agile mindset for organizational agility
Ewan developing the agile mindset for organizational agility
Magneta AI
 
Shadowmatch Overview
Shadowmatch OverviewShadowmatch Overview
Shadowmatch Overview
michelleSM
 
Pinstripe Presents Global Talent Webinar Creating the Business Case for Strat...
Pinstripe Presents Global Talent Webinar Creating the Business Case for Strat...Pinstripe Presents Global Talent Webinar Creating the Business Case for Strat...
Pinstripe Presents Global Talent Webinar Creating the Business Case for Strat...
Cielo
 
Scaling Agile Across the Enterprise
Scaling Agile Across the EnterpriseScaling Agile Across the Enterprise
Scaling Agile Across the Enterprise
Armond Mehrabian
 
Coaching ismorethantelling
Coaching ismorethantellingCoaching ismorethantelling
Coaching ismorethantelling
drewz lin
 
Fundamentals of Agile
Fundamentals of AgileFundamentals of Agile
Fundamentals of Agile
sparkagility
 
Symposium 2016 : Workshop 104 Brain and Leadership
Symposium 2016 : Workshop 104 Brain and LeadershipSymposium 2016 : Workshop 104 Brain and Leadership
Symposium 2016 : Workshop 104 Brain and Leadership
PMI-Montréal
 

Similar to The Art of Scalability - Managing growth (20)

Ewan developing the agile mindset for organizational agility
Ewan   developing the agile mindset for organizational agilityEwan   developing the agile mindset for organizational agility
Ewan developing the agile mindset for organizational agility
 
Agile values
Agile valuesAgile values
Agile values
 
Agile intro module 4
Agile intro   module 4Agile intro   module 4
Agile intro module 4
 
Shadowmatch Overview
Shadowmatch OverviewShadowmatch Overview
Shadowmatch Overview
 
You don’t need agile to avoid the seven deadly sins of pm
You don’t need agile to avoid the seven deadly sins of pmYou don’t need agile to avoid the seven deadly sins of pm
You don’t need agile to avoid the seven deadly sins of pm
 
IIIT Guest Talk 0512
IIIT Guest Talk 0512IIIT Guest Talk 0512
IIIT Guest Talk 0512
 
Pinstripe Presents Global Talent Webinar Creating the Business Case for Strat...
Pinstripe Presents Global Talent Webinar Creating the Business Case for Strat...Pinstripe Presents Global Talent Webinar Creating the Business Case for Strat...
Pinstripe Presents Global Talent Webinar Creating the Business Case for Strat...
 
Scrum Gathering 2012 Shanghai_领导力与组织转型:企业敏捷转型所面临的文化挑战
Scrum Gathering 2012 Shanghai_领导力与组织转型:企业敏捷转型所面临的文化挑战Scrum Gathering 2012 Shanghai_领导力与组织转型:企业敏捷转型所面临的文化挑战
Scrum Gathering 2012 Shanghai_领导力与组织转型:企业敏捷转型所面临的文化挑战
 
Human Resource Planning Course
Human Resource Planning CourseHuman Resource Planning Course
Human Resource Planning Course
 
Rapid Improvement: How to Change Behaviors & Get Stuff Done FAST
Rapid Improvement: How to Change Behaviors & Get Stuff Done FASTRapid Improvement: How to Change Behaviors & Get Stuff Done FAST
Rapid Improvement: How to Change Behaviors & Get Stuff Done FAST
 
Scaling Agile Across the Enterprise
Scaling Agile Across the EnterpriseScaling Agile Across the Enterprise
Scaling Agile Across the Enterprise
 
How to Start a Project
How to Start a ProjectHow to Start a Project
How to Start a Project
 
Coaching ismorethantelling
Coaching ismorethantellingCoaching ismorethantelling
Coaching ismorethantelling
 
Agile Lean Kanban Training 1 hour
Agile Lean Kanban Training 1 hourAgile Lean Kanban Training 1 hour
Agile Lean Kanban Training 1 hour
 
Fundamentals of Agile
Fundamentals of AgileFundamentals of Agile
Fundamentals of Agile
 
Lean Agile : voir en grand !
Lean Agile : voir en grand !Lean Agile : voir en grand !
Lean Agile : voir en grand !
 
Practices of an agile developer
Practices of an agile developerPractices of an agile developer
Practices of an agile developer
 
Shadowmatch Overview Jun2012
Shadowmatch Overview Jun2012Shadowmatch Overview Jun2012
Shadowmatch Overview Jun2012
 
Symposium 2016 : Workshop 104 Brain and Leadership
Symposium 2016 : Workshop 104 Brain and LeadershipSymposium 2016 : Workshop 104 Brain and Leadership
Symposium 2016 : Workshop 104 Brain and Leadership
 
[en] Agile Management is different - CAS2014
[en] Agile Management is different - CAS2014[en] Agile Management is different - CAS2014
[en] Agile Management is different - CAS2014
 

Recently uploaded

Breaking Down the Flutterwave Scandal What You Need to Know.pdf
Breaking Down the Flutterwave Scandal What You Need to Know.pdfBreaking Down the Flutterwave Scandal What You Need to Know.pdf
Breaking Down the Flutterwave Scandal What You Need to Know.pdf
UK Journal
 

Recently uploaded (20)

Simplified FDO Manufacturing Flow with TPMs _ Liam at Infineon.pdf
Simplified FDO Manufacturing Flow with TPMs _ Liam at Infineon.pdfSimplified FDO Manufacturing Flow with TPMs _ Liam at Infineon.pdf
Simplified FDO Manufacturing Flow with TPMs _ Liam at Infineon.pdf
 
TopCryptoSupers 12thReport OrionX May2024
TopCryptoSupers 12thReport OrionX May2024TopCryptoSupers 12thReport OrionX May2024
TopCryptoSupers 12thReport OrionX May2024
 
1111 ChatGPT Prompts PDF Free Download - Prompts for ChatGPT
1111 ChatGPT Prompts PDF Free Download - Prompts for ChatGPT1111 ChatGPT Prompts PDF Free Download - Prompts for ChatGPT
1111 ChatGPT Prompts PDF Free Download - Prompts for ChatGPT
 
WebAssembly is Key to Better LLM Performance
WebAssembly is Key to Better LLM PerformanceWebAssembly is Key to Better LLM Performance
WebAssembly is Key to Better LLM Performance
 
Powerful Start- the Key to Project Success, Barbara Laskowska
Powerful Start- the Key to Project Success, Barbara LaskowskaPowerful Start- the Key to Project Success, Barbara Laskowska
Powerful Start- the Key to Project Success, Barbara Laskowska
 
AI mind or machine power point presentation
AI mind or machine power point presentationAI mind or machine power point presentation
AI mind or machine power point presentation
 
ERP Contender Series: Acumatica vs. Sage Intacct
ERP Contender Series: Acumatica vs. Sage IntacctERP Contender Series: Acumatica vs. Sage Intacct
ERP Contender Series: Acumatica vs. Sage Intacct
 
Breaking Down the Flutterwave Scandal What You Need to Know.pdf
Breaking Down the Flutterwave Scandal What You Need to Know.pdfBreaking Down the Flutterwave Scandal What You Need to Know.pdf
Breaking Down the Flutterwave Scandal What You Need to Know.pdf
 
The Metaverse: Are We There Yet?
The  Metaverse:    Are   We  There  Yet?The  Metaverse:    Are   We  There  Yet?
The Metaverse: Are We There Yet?
 
Secure Zero Touch enabled Edge compute with Dell NativeEdge via FDO _ Brad at...
Secure Zero Touch enabled Edge compute with Dell NativeEdge via FDO _ Brad at...Secure Zero Touch enabled Edge compute with Dell NativeEdge via FDO _ Brad at...
Secure Zero Touch enabled Edge compute with Dell NativeEdge via FDO _ Brad at...
 
Syngulon - Selection technology May 2024.pdf
Syngulon - Selection technology May 2024.pdfSyngulon - Selection technology May 2024.pdf
Syngulon - Selection technology May 2024.pdf
 
Intro in Product Management - Коротко про професію продакт менеджера
Intro in Product Management - Коротко про професію продакт менеджераIntro in Product Management - Коротко про професію продакт менеджера
Intro in Product Management - Коротко про професію продакт менеджера
 
WebRTC and SIP not just audio and video @ OpenSIPS 2024
WebRTC and SIP not just audio and video @ OpenSIPS 2024WebRTC and SIP not just audio and video @ OpenSIPS 2024
WebRTC and SIP not just audio and video @ OpenSIPS 2024
 
AI presentation and introduction - Retrieval Augmented Generation RAG 101
AI presentation and introduction - Retrieval Augmented Generation RAG 101AI presentation and introduction - Retrieval Augmented Generation RAG 101
AI presentation and introduction - Retrieval Augmented Generation RAG 101
 
ECS 2024 Teams Premium - Pretty Secure
ECS 2024   Teams Premium - Pretty SecureECS 2024   Teams Premium - Pretty Secure
ECS 2024 Teams Premium - Pretty Secure
 
Long journey of Ruby Standard library at RubyKaigi 2024
Long journey of Ruby Standard library at RubyKaigi 2024Long journey of Ruby Standard library at RubyKaigi 2024
Long journey of Ruby Standard library at RubyKaigi 2024
 
Microsoft CSP Briefing Pre-Engagement - Questionnaire
Microsoft CSP Briefing Pre-Engagement - QuestionnaireMicrosoft CSP Briefing Pre-Engagement - Questionnaire
Microsoft CSP Briefing Pre-Engagement - Questionnaire
 
Linux Foundation Edge _ Overview of FDO Software Components _ Randy at Intel.pdf
Linux Foundation Edge _ Overview of FDO Software Components _ Randy at Intel.pdfLinux Foundation Edge _ Overview of FDO Software Components _ Randy at Intel.pdf
Linux Foundation Edge _ Overview of FDO Software Components _ Randy at Intel.pdf
 
Extensible Python: Robustness through Addition - PyCon 2024
Extensible Python: Robustness through Addition - PyCon 2024Extensible Python: Robustness through Addition - PyCon 2024
Extensible Python: Robustness through Addition - PyCon 2024
 
Where to Learn More About FDO _ Richard at FIDO Alliance.pdf
Where to Learn More About FDO _ Richard at FIDO Alliance.pdfWhere to Learn More About FDO _ Richard at FIDO Alliance.pdf
Where to Learn More About FDO _ Richard at FIDO Alliance.pdf
 

The Art of Scalability - Managing growth

  • 1. The Art of Scalabiliity Managing Growth Lorenzo Alberton Amsterdam, 11th June 2010
  • 2. Scalability Scalability is a desirable property of a system, a network, a business or a process, which indicates its ability to handle growing amounts of work http://en.wikipedia.org/wiki/Scalability 2
  • 3. Scalable ≠ Fast A service is said to be scalable if when we increase the resources in a system, it results in increased performance in a manner proportional to resources added. http://www.julianbrowne.com/article/viewer/scalability Increasing performance in general means serving more units of work, but it can also be to handle larger units of work, such as when data sets grow. http://highscalability.com/amazon-architecture 3
  • 4. Scalability Is About... People Processes Technology 4
  • 6. Roles And Responsibilities Role-clarity 6
  • 7. Roles And Responsibilities Role-clarity overlapping areas missing wasted effort, responsibilities responsibilities value-destroying conflicts, failed scale initiatives 6
  • 8. Roles And Responsibilities Role-clarity overlapping areas missing wasted effort, responsibilities responsibilities value-destroying conflicts, failed scale initiatives Key scale-related responsibilities Set measurable goals Staff the team with the appropriate skills Define and implement a scalable architecture Test, monitor, develop future demand projections Define future changes based on the analysis 6
  • 9. Leadership Inspire people Set the right vision and goals Create the right culture Create the right tools 7
  • 10. Leadership } Inspire people Set the right vision and goals Accelerator for growth Create the right culture Create the right tools 7
  • 11. Leadership } Inspire people Set the right vision and goals Accelerator for growth Create the right culture Create the right tools vision = where we are going mission = general direction on how to get there goals = milestones along the path 7
  • 12. Leadership } Inspire people Set the right vision and goals Accelerator for growth Create the right culture Create the right tools vision = where we are going mission = general direction on how to get there goals = milestones along the path S Specific M Measurable A Achievable (but Aggressive) R Realistic T Time-bound 7
  • 13. Leadership } Inspire people Set the right vision and goals Accelerator for growth Create the right culture Create the right tools vision = where we are going mission = general direction on how to get there goals = milestones along the path S Specific Chip & Dan Heat, “Switch: How To Change Things When Change Is Hard” M Measurable A Achievable (but Aggressive) People R Realistic - Direct the rider T Time-bound - Motivate the elephant - Shape the path 7
  • 14. Management Project Management Goals Projects Tasks Individuals Measurement Communication Resolution 8
  • 15. Management Project Management Goals Projects Tasks Individuals Measurement Communication Resolution People Management Hiring Firing Growth 8
  • 16. Organisational Structure And Team size Too small Too big Micromanaging Poor communication managers Low morale Overworked team Low productivity members 9
  • 17. Team Structure functional CTO PM PM PM Designer Developer Tester Designer Developer Tester Designer Developer Tester Designer Developer Tester Designers Developers Testers 10
  • 18. Team Structure functional matrix CTO PM PM PM Proj 1 PM Designer Developer Tester Proj 2 PM Designer Developer Tester Proj 3 PM Designer Developer Tester Proj 4 PM Designer Developer Tester Designers Developers Testers 10
  • 20. Why Are Processes Critical? Augment management of teams and employees Standardise actions in repetitive tasks Reduce mundane decisions to focus on grander ideas Allow the team to react quickly to crisis Determine system capacity and scalability needs 12
  • 21. Why Are Processes Critical? Augment management of teams and employees Standardise actions in repetitive tasks Reduce mundane decisions to focus on grander ideas Allow the team to react quickly to crisis Determine system capacity and scalability needs Challenge 12
  • 22. Why Are Processes Critical? Augment management of teams and employees Standardise actions in repetitive tasks Reduce mundane decisions to focus on grander ideas Allow the team to react quickly to crisis Determine system capacity and scalability needs Challenge right amount 12
  • 23. Why Are Processes Critical? Augment management of teams and employees Standardise actions in repetitive tasks Reduce mundane decisions to focus on grander ideas Allow the team to react quickly to crisis Determine system capacity and scalability needs Challenge right amount right process 12
  • 24. Why Are Processes Critical? Augment management of teams and employees Standardise actions in repetitive tasks Reduce mundane decisions to focus on grander ideas Allow the team to react quickly to crisis Determine system capacity and scalability needs Challenge right amount right process right time 12
  • 25. Determining Headroom For Apps Capacity Current Load 13
  • 26. Determining Headroom For Apps Capacity Current Load 13
  • 27. Determining Headroom For Apps Capacity Current Load 13
  • 28. Determining Headroom For Apps Why? Capacity Planning annual budget Hiring plan Current Load Prioritisation 13
  • 29. Headroom Process 1. Identify major components 14
  • 30. Headroom Process 1. Identify major components 2. Identify responsible team 14
  • 31. Headroom Process 1. Identify major components 2. Identify responsible team 315 queries/sec 20MB/min 3. Determine usage and capacity 14
  • 32. Headroom Process 1. Identify major components 2. Identify responsible team 315 queries/sec 20MB/min 3. Determine usage and capacity 4. Determine growth rate 14
  • 33. Headroom Process (ideal usage percentage) x (max capacity) - (current usage) - 1. Identify major components 12 2. Identify responsible team ∑ (growth(t) - (optimisation projects(t))) = ____________________________________ t=1 Headroom 315 queries/sec 20MB/min L. Abbot, M. T. Fisher, “The Art Of Scalability”, Addison Wesley M. 3. Determine usage and capacity 4. Determine growth rate 14
  • 34. Joint Architecture Design + Review Board Engineering Architecture Operations M. L. Abbot, M. T. Fisher, “The Art Of Scalability”, Addison Wesley 15
  • 35. Joint Architecture Design + Review Board Engineering Architecture Operations M. L. Abbot, M. T. Fisher, “The Art Of Scalability”, Addison Wesley 15
  • 36. Joint Architecture Design + Review Board Engineering Architecture Architecture Review Board Operations M. L. Abbot, M. T. Fisher, “The Art Of Scalability”, Addison Wesley 15
  • 37. Joint Architecture Design + Review Board Meeting Engineering State goal Review alternative designs Architecture Q&A session Deliberation Architecture Review Board Vote Conclusion Operations M. L. Abbot, M. T. Fisher, “The Art Of Scalability”, Addison Wesley 15
  • 38. Joint Architecture Design + Review Board Meeting Engineering State goal Review alternative designs Architecture Q&A session Deliberation Architecture Review Board Vote Conclusion Operations M. L. Abbot, M. T. Fisher, “The Art Of Scalability”, Addison Wesley 15
  • 39. Controlling Change in Production Environment 16
  • 40. Controlling Change in Production Environment Change Management Process Proposal Approval Scheduling Logging Review 16
  • 41. Controlling Change in Production Environment Change Management Process Proposal Approval Scheduling Logging Review Change Identification Process Date & time System undergoing Expected of the change the change results Contact information Rollback procedure 16
  • 42. Determining Risk #1: Gut Feeling http://dilbert.com/strips/comic/2008-05-08/ 17
  • 43. Determining Risk #2: Traffic Lights Feature 1 Feature 2 Feature 3 18
  • 44. Determining Risk #2: Traffic Lights Feature 1 Feature 2 = Overall Release Feature 3 18
  • 45. Determining Risk #3: FMEA Failure Mode and Effect Analysis Likelihood Severity Ability Total Remed- Revised Failure Feature Effect of If Failure to Risk iation Risk Mode Failure Occurs Detect Score Actions Score User User not - do this data not registered 3 3 3 27 3 - do that saved Sign Up Users Users can given access 1 9 3 27 - do sth 9 wrong other’s privileges data CC Credit number CC theft not 1 9 1 9 N/A 9 Card risk encrypted 19
  • 46. Managing Risk Rules Risk Level New Feature Release < 150 pts * Bug Fix Release < 50 pts * Peak-usage-time release < 10 pts * Off-peak release < 200 pts * * Numbers are just indicative figures 20
  • 47. Managing Risk (Human Factor) Rules Risk Tolerance Level 6-hour period < 150 pts * 12-hour period < 250 pts * 24-hour period < 350 pts * 72-hour period < 500 pts * * Numbers are just indicative figures 21
  • 48. Managing Incidents And Problems Detect, Report, Investigate, Escalate, Resolve approach M. L. Abbot, M. T. Fisher, “The Art Of Scalability”, Addison Wesley Restore services in a timely and cost-effective manner Contain chaos: each person has a place Determine root cause and correct problems Review issues regularly 22
  • 49. Managing Incidents And Problems Detect, Report, Investigate, Escalate, Resolve approach M. L. Abbot, M. T. Fisher, “The Art Of Scalability”, Addison Wesley Restore services in a timely and cost-effective manner Contain chaos: each person has a place Determine root cause and correct problems Review issues regularly Post-mortem Process Cross-functional brainstorming meeting 22
  • 51. Performance (Load) Testing ✓1.5k users/sec 1. Establish success criteria ✓RT < 150ms 23
  • 52. Performance (Load) Testing ✓1.5k users/sec 1. Establish success criteria ✓RT < 150ms 2. Establish the test environment TEST ≅ LIVE 23
  • 53. Performance (Load) Testing ✓1.5k users/sec 1. Establish success criteria ✓RT < 150ms 2. Establish the test environment TEST ≅ LIVE Pareto rule 3. Define the tests (for different things) 20% - 80% 23
  • 54. Performance (Load) Testing ✓1.5k users/sec 1. Establish success criteria ✓RT < 150ms 2. Establish the test environment TEST ≅ LIVE Pareto rule 3. Define the tests (for different things) 20% - 80% 4. Identify what needs to be monitored CPU - Memory What data needs to be collected TTL, RT, Services 23
  • 55. Performance (Load) Testing ✓1.5k users/sec 1. Establish success criteria ✓RT < 150ms 2. Establish the test environment TEST ≅ LIVE Pareto rule 3. Define the tests (for different things) 20% - 80% 4. Identify what needs to be monitored CPU - Memory What data needs to be collected TTL, RT, Services CPU: 90% 5. Run, analyse, report to engineers RT: 180ms 2K SimUsers/sec 23
  • 56. Performance (Load) Testing ✓1.5k users/sec 1. Establish success criteria ✓RT < 150ms 2. Establish the test environment TEST ≅ LIVE Pareto rule 3. Define the tests (for different things) 20% - 80% 4. Identify what needs to be monitored CPU - Memory What data needs to be collected TTL, RT, Services CPU: 90% 5. Run, analyse, report to engineers RT: 180ms 2K SimUsers/sec 6. Repeat tests and analysis Rinse and repeat 23
  • 60. Stress Testing JMeter Load Runner The Grinder Avalanche http://www.opensourcetesting.org/performance.php 24
  • 61. Barrier Conditions Architecture review board Code reviews Manual and automated QA processes Performance testing Dev, Test, Stage and Live environments Production monitoring and measurement 25
  • 63. Designing For Any Technology Dell WatchGuard Cisco CSS 11501 HP ProLiant DL HP Media Cache Server Appliance 27
  • 64. Designing For Any Technology Dell WatchGuard Cisco CSS 11501 HP ProLiant DL HP Media Cache Server Appliance 27
  • 65. Designing For Any Technology Dell WatchGuard Firewall Load Balancer Cisco CSS 11501 HP ProLiant DL Application Servers HP Media Cache Server Appliance DB Server Media / Cache 27
  • 67. Architectural Principles +1 N + 1 design 28
  • 68. Architectural Principles +1 N + 1 design for rollback 28
  • 69. Architectural Principles +1 N + 1 design for rollback to be disabled 28
  • 70. Architectural Principles +1 N + 1 design for rollback to be disabled to be monitored 28
  • 71. Architectural Principles +1 N + 1 design for rollback to be disabled to be for multiple monitored live sites 28
  • 72. Architectural Principles +1 N + 1 design for rollback to be disabled to be for multiple use mature monitored live sites technology 28
  • 73. Architectural Principles +1 N + 1 design for rollback to be disabled to be for multiple use mature monitored live sites technology asynchronous design 28
  • 74. Architectural Principles +1 N + 1 design for rollback to be disabled to be for multiple use mature monitored live sites technology asynchronous stateless design systems 28
  • 75. Architectural Principles +1 N + 1 design for rollback to be disabled to be for multiple use mature monitored live sites technology asynchronous stateless buy when design systems non core 28
  • 76. Focus On Core Competencies vs. Build Buy 29
  • 79. Stateless Systems State is often useful, but has a significant cost (replication between data centres, synchronous calls...) 31
  • 80. Stateless Systems State is often useful, but has a significant cost (replication between data centres, synchronous calls...) A B ? Avoidance No sessions / Sticky sessions 31
  • 81. Stateless Systems State is often useful, but has a significant cost (replication between data centres, synchronous calls...) A B ? Avoidance Decentralisation No sessions / Data in the cookie / Sticky sessions Cookie with hash 31
  • 82. Stateless Systems State is often useful, but has a significant cost (replication between data centres, synchronous calls...) A B ? Avoidance Decentralisation Centralisation No sessions / Data in the cookie / Store cookies in the Sticky sessions Cookie with hash db or in memcached 31
  • 83. Creating Fault Isolative Structures 32
  • 84. Creating Fault Isolative Structures Increase availability Limit impact of failures Easier debugging 32
  • 85. Creating Fault Isolative Structures Increase availability Limit impact of failures Easier debugging First 32
  • 86. Creating Fault Isolative Structures Increase availability Limit impact of failures Easier debugging Functions causing repetitive problems First 32
  • 87. Creating Fault Isolative Structures Increase availability Limit impact of failures Easier debugging Functions Natural layout causing or topology repetitive of the site problems First 32
  • 88. Scale Directions M. L. Abbot, M. T. Fisher, “The Art Of Scalability”, Addison Wesley 33
  • 89. Scale Directions cloning of entities or data - unbiased distribution of work x M. L. Abbot, M. T. Fisher, “The Art Of Scalability”, Addison Wesley 33
  • 90. Scale Directions cloning of entities or data - unbiased distribution of work x y separation of work by activity or data M. L. Abbot, M. T. Fisher, “The Art Of Scalability”, Addison Wesley 33
  • 91. Scale Directions cloning of entities or data - unbiased distribution of work x y z separation of work separation of work by person by activity or data for whom the work is done M. L. Abbot, M. T. Fisher, “The Art Of Scalability”, Addison Wesley 33
  • 93. Splitting Applications For Scale mirroring x + scale transactions - scale data 34
  • 94. Splitting Applications For Scale mirroring x + scale transactions - scale data + fault isolation + scale function data - scale customer data y split by service 34
  • 95. Splitting Applications For Scale mirroring x + scale transactions - scale data + fault isolation + fault isolation + scale function data + scale customer data - scale customer data - scale function data y z split by need / split by service location / value 34
  • 97. Splitting Databases For Scale data cloning (replication / clustering) x + easy to implement + scale transaction volume - scale data size and growth 35
  • 98. Splitting Databases For Scale data cloning (replication / clustering) x + easy to implement + scale transaction volume - scale data size and growth + fault isolation + reduce query time - more difficult - data migration y split by service / resource / data affinity 35
  • 99. Splitting Databases For Scale data cloning (replication / clustering) x + easy to implement + scale transaction volume - scale data size and growth + balanced demand + fault isolation + fault isolation + reduce query time + scale data and trans. - more difficult - more costly - data migration y z split by service / split by modulus / resource / data affinity hash-based lookups 35
  • 100. Caching For Performance & Scale 36
  • 101. Caching For Performance & Scale Object Caches Usually serialized (marshalling / unmarshalling) get() / set() / replace() APC, Memcached 36
  • 102. Caching For Performance & Scale Object Caches Application Caches Usually serialized Proxy caches (marshalling / Reverse proxy unmarshalling) caches get() / set() / HTTP headers replace() ISP/Uni proxies APC, Memcached Squid, Varnish, mod_cache 36
  • 103. Caching For Performance & Scale Object Caches Application Caches CDNs Usually serialized Proxy caches Multiple locations (marshalling / / backbones Reverse proxy unmarshalling) caches get() / set() / HTTP headers CNAME entries replace() ISP/Uni proxies Akamai, Coral, APC, Memcached Squid, Varnish, Limelight... mod_cache 36
  • 104. Solving Other Issues ...and challenges 37
  • 106. Too Much Data The more storage ...the more storage management 38
  • 107. Too Much Data The more storage ...the more storage management storage costs people and software power and space processing power backup time and costs 38
  • 108. Too Much Data The more storage ...the more storage management storage costs people and software power and space processing power backup time and costs Evaluate data retention policy Consider multi-tiered storage Distribute work (MapReduce) 38
  • 109. Clouds And Grids Cheap, on-demand storage and compute capacity Cost (pay for what you use) High computation rates Speed (procurement, Shared infrastructure (with provisioning, deployment) proper scheduling Flexibility (change / Unused capacity (SETI@H) reconfigure environment) Security, portability, control Not shared simultaneously Limitations of virtualisation Monolithic applications Performance Complexity (debugging, OS) 39
  • 110. Monitoring 40
  • 111. Monitoring 1. Is there a problem? User experience / Business metrics monitors 2. Where is the problem? System monitors (threshold - variance) 3. What is the problem? Application monitors 40
  • 112. Monitoring 1. Is there a problem? User experience / Business metrics monitors 2. Where is the problem? System monitors (threshold - variance) 3. What is the problem? Application monitors Keep Signal vs. Noise ratio high 40
  • 113. Monitoring 1. Is there a problem? User experience / Business metrics monitors 2. Where is the problem? System monitors (threshold - variance) 3. What is the problem? Application monitors Keep Signal vs. Noise ratio high 40
  • 114. Questions ? 41
  • 117. Image Credits http://www.sxc.hu/photo/1217386 http://michaelscomments.files.wordpress.com/2009/10/onion- centurion.jpg http://www.travelsd.com/_images/gallery/hires/000189.jpg http://www.socketmanufacturers.com/miniature-circuit-breaker/ DZ47-63-3P-Miniature-Circuit-Breaker.jpg http://blogs.microsoft.co.il/blogs/shair/archive/2008/06/19/load- testing-features-of-visual-studio-team-system.aspx http://www.alibaba.com/member/de100430205.html/viewimg/ photo/103590047/Boxing_Ring_Competition_AIBA_Ring.jpg.html http://brandonsmarathon.com/wp-content/uploads/2009/08/ Olympics+Day+3+Swimming+43rPmSVmwHql.jpg http://en.wikipedia.org/wiki/File:Synchronized_swimming_- _Russian_team.jpg http://www.flickr.com/photos/bugeaters/3025911233/ http://www.flickr.com/photos/cote/2763677698/ http://www.iconfinder.com 44
  • 118. Thank you! Contact details: Lorenzo Alberton lorenzo@ibuildings.com http://www.alberton.info/talks http://joind.in/talk/view/1539

Editor's Notes