SlideShare a Scribd company logo
HIGH SCALABILITY AND
           RELIABILITY IN THE
           CLOUD
           GREG THOMPSON
           HEAD OF ARCHITECTURE, APPS ENABLEMENT
           ALCATEL-LUCENT

@gmthomp   greg.thompson@alcatel-lucent.com
About This Session
   Target audience is backend application
    developers deploying infrastructure into a
    cloud environment
   Will cover concepts for scalability and
    reliability with the goal of helping application
    developers understand some key
    considerations when designing and building
    the backend.
Design Time Decisions
   When first building your application backend,
    consider a few important questions
     How fast should the application be recovered if a
      failure occurs?
     What kind of down time is acceptable?
     Is the application maintaining stateful data?
     What kind of information needs to be shared across
      multiple instances?
Scalability
What is Scalability?
   Scalability is a term
    used to describe
    how the application
    will handle
    increased loads of
    traffic volume
Scalability – Factors to Consider
   Horizontal vs. Vertical
   Stateless vs. Stateful
   Understanding Limitations
   Connection Management
   Segmentation of traffic
   Segmentation of responsibility (distributed arch)
   Clustering
   Messaging
What Type of Scalability?
Vertical vs. Horizontal
Vertical                        Horizontal
   Scaling up a single            Scaling out across
    node                            multiple nodes
     Physical limitations –         Ability to distribute
      instances are very
      powerful but still have         traffic over a number
      finite limits                   of nodes
     Resources such as              Allows for more
      number of sockets               flexibility over time
      can only go so high
Will the App Maintain State?
Stateless Applications
   Application does not
    persist information
    about transactions     Request       Respons
                                         e
   Each transaction is
    independent and            Application
    atomic
Will the App Maintain State?
Stateful Applications
   Application needs to
    maintain data about
    transactions in
                           First         Subseque
    progress               Request       nt
                                         Request

   Requires storage                            D
                               Application      B
   Persistence may also
    be required
    depending the
Understanding Limitations
   Thorough testing is
    key to understanding
    bottlenecks
   Test real-world
    scenarios included
    latency
   Push the system to
    the max to
    understand how it
Connection Management
Mobile Device Connections
   Mobile devices don’t always
    behave like you expect
       Connectivity is often very
        dynamic
       Devices move from 4G/3G/2G/no
        G/Wifi
       Not all TCP events will get
        reported and sockets can remain
        open
   If not handled correctly, these
    factors can be time bomb no
    matter how vertically you scale a
    component
Segmenting Traffic
   Once the application is
    able to be scaled out,
    traffic can be
    segmented in different
    ways
       Location (i.e. east coast
        vs. west coast)
       Pre-assigned criteria -
        User ID, IP, or other
        dynamic criteria
       Load Balanced
Segmenting Responsibility
   Segmenting
    responsibility allows for
    a distributed
    architecture
       Each component can be
        scaled independently
       Allows for more flexibility
        in scaling
       Adds more complexity
        and potential messaging
        overhead
Clustering
   Clustering is the
    concept of having a
    group of nodes working     App   App   App   App
                               Nod   Nod   Nod   Nod
    together to provide the     e     e     e     e
    same capability
       Nodes typically co-            Share
        located                          d
       Common data shared             Data
        as needed across the
        cluster
       Communication may be
        needed between nodes
Messaging
   Once a clustered          Types of Messaging
    and/or distributed          JMS
    architecture is used        Open Source MQ
    messaging will be            packages
    needed between              Custom Designed
    various components          Use of APIs
    and/or nodes
Example of Scaled Architecture
             Load                                 Load
               Load                                 Load
            Balancer                             Balancer
             Balancer                             Balancer

  Web         Compone     Compone      Web         Compone     Compone
    Web
 Server         Compone
                nt 1        Compone
                            nt 2         Web
                                      Server         Compone
                                                     nt 1        Compone
                                                                 nt 2
   Server          nt 1        nt 2     Server          nt 1        nt 2




              Database                             Database

               Site 1                               Site 2
Reliability/Availability
What is Reliability/Availability?
   Availability is typically
    measured by the amount of
    downtime your application
    has in a given year
       Unplanned downtime and
        planned downtime are both
        considered
   Reliability is described by the
    likelihood of failure based on
    actual measurements
   We’ll focus more on
    Availability
Reliability/Availability
Factors to Consider
   Cost vs. Need
   Problem detection
   Automation for recovery
   Active/standby, active/active, hot standby vs. cold
    standby
   Local and Geo-redundancy
   Multi-zone, multi-cloud
   Test Until You Break the System
Reliability Requirements
Cost Considerations       Need

   Number of instances      User Experience
   Bandwidth                Customer
    requirements              requirements
    between sites
                             Negative Publicity
   Complexity of
    software
   Monitoring
Problem Detection
   Effective monitoring of
    the application is key to
    minimizing downtime
       Event reporting in the
        software
       External monitoring –
        test for successful
        behavior
       Auto detection and
        alerting to minimize cost
        of operations personnel
Automation for Recovery
   How quickly a failed
    component recovers
    increases reliability
     Automatic detection
      and automatic
      recovery
     Automated installation
      key for minimizing
      setup time during
      recovery
Availability Models
   N = number of nodes
    required for normal     N   N
    processing
   N+1 = one additional
    node to provide         N   N   +1
    redundancy in case of
    failure
   N+K = K nodes provide   N   N   K    K
    additional redundancy
Redundancy Models
   Active/Cold Standby                    Cold
       backup site is booted    Active   Standb
        up when needed                       y

   Active/Hot Standby
                                          Active
       Backup site is running   Active   Standb
        and ready to takeover                y

   Active/Active
       Both sites active and    Active   Active
        processing traffic
Local and Geo-Redundancy
   Local                       Geo-Graphic
     Backup  instances           Backup   instances
      are available within         are available in
      the same location            another geo-graphic
                                   location
     Use of availability
                                  Typically in a
      zones within a               separate region to
      region very similar          account for events
                                   such as natural
                                   disasters
Availability to the Max
   Multi-Zone/Multi-              Multi-Cloud
    Region
                                     Ifyour application
     Multi-zone typically
                                      requires the
      provide instances
      running in different            maximum possible
      physical locations, but         availability
      in same region                 Run in different
     Multi-region provides           cloud providers in
      different geographic
      regions of availability
                                      different regions
Test Until You Break the System
   Push the system to
    the max and observe
    the breaking points
   Fix the problem,
    repeat
   The best way to find
    problems to prevent
    unplanned downtime
    is to thoroughly test
    with a mindset to
    break
Q&A
THANK YOU!
Greg Thompson
@gmthomps
greg.thompson@alcatel-lucent.com

More Related Content

What's hot

Cloud Computing - Benefits and Challenges
Cloud Computing - Benefits and ChallengesCloud Computing - Benefits and Challenges
Cloud Computing - Benefits and Challenges
ThoughtWorks Studios
 
Cloud security Presentation
Cloud security PresentationCloud security Presentation
Cloud security Presentation
Ajay p
 
Slides cloud computing
Slides cloud computingSlides cloud computing
Slides cloud computingHaslina
 
cloud computing:Types of virtualization
cloud computing:Types of virtualizationcloud computing:Types of virtualization
cloud computing:Types of virtualization
Dr.Neeraj Kumar Pandey
 
What is Virtualization and its types & Techniques.What is hypervisor and its ...
What is Virtualization and its types & Techniques.What is hypervisor and its ...What is Virtualization and its types & Techniques.What is hypervisor and its ...
What is Virtualization and its types & Techniques.What is hypervisor and its ...
Shashi soni
 
Overview of computing paradigm
Overview of computing paradigmOverview of computing paradigm
Overview of computing paradigm
Ripal Ranpara
 
Lecture5 virtualization
Lecture5 virtualizationLecture5 virtualization
Lecture5 virtualization
hktripathy
 
Virtualization.ppt
Virtualization.pptVirtualization.ppt
Virtualization.ppt
vishal choudhary
 
Cloud Computing- components, working, pros and cons
Cloud Computing- components, working, pros and consCloud Computing- components, working, pros and cons
Cloud Computing- components, working, pros and cons
Amritpal Singh Bedi
 
Cloud architecture
Cloud architectureCloud architecture
Cloud architectureAdeel Javaid
 
Unit5 Cloud Federation,
Unit5 Cloud Federation,Unit5 Cloud Federation,
Unit5 Cloud Federation,
Integral university, India
 
SLA Agreement, types and Life Cycle
SLA Agreement, types and Life Cycle SLA Agreement, types and Life Cycle
SLA Agreement, types and Life Cycle
Dr Neelesh Jain
 
Unit 2 -Cloud Computing Architecture
Unit 2 -Cloud Computing ArchitectureUnit 2 -Cloud Computing Architecture
Unit 2 -Cloud Computing Architecture
MonishaNehkal
 
Distributed computing
Distributed computingDistributed computing
Distributed computingshivli0769
 
Cloud computing ppt
Cloud computing pptCloud computing ppt
Cloud computing pptJagriti Rai
 
Multi Tenancy In The Cloud
Multi Tenancy In The CloudMulti Tenancy In The Cloud
Multi Tenancy In The Cloud
rohit_ainapure
 
Cloud computing system models for distributed and cloud computing
Cloud computing system models for distributed and cloud computingCloud computing system models for distributed and cloud computing
Cloud computing system models for distributed and cloud computing
hrmalik20
 
Cloud security
Cloud securityCloud security
Cloud security
Niharika Varshney
 
On demand provisioning
On demand provisioningOn demand provisioning
Cloud computing
Cloud computingCloud computing
Cloud computing
Karthik Sathyanarayanan
 

What's hot (20)

Cloud Computing - Benefits and Challenges
Cloud Computing - Benefits and ChallengesCloud Computing - Benefits and Challenges
Cloud Computing - Benefits and Challenges
 
Cloud security Presentation
Cloud security PresentationCloud security Presentation
Cloud security Presentation
 
Slides cloud computing
Slides cloud computingSlides cloud computing
Slides cloud computing
 
cloud computing:Types of virtualization
cloud computing:Types of virtualizationcloud computing:Types of virtualization
cloud computing:Types of virtualization
 
What is Virtualization and its types & Techniques.What is hypervisor and its ...
What is Virtualization and its types & Techniques.What is hypervisor and its ...What is Virtualization and its types & Techniques.What is hypervisor and its ...
What is Virtualization and its types & Techniques.What is hypervisor and its ...
 
Overview of computing paradigm
Overview of computing paradigmOverview of computing paradigm
Overview of computing paradigm
 
Lecture5 virtualization
Lecture5 virtualizationLecture5 virtualization
Lecture5 virtualization
 
Virtualization.ppt
Virtualization.pptVirtualization.ppt
Virtualization.ppt
 
Cloud Computing- components, working, pros and cons
Cloud Computing- components, working, pros and consCloud Computing- components, working, pros and cons
Cloud Computing- components, working, pros and cons
 
Cloud architecture
Cloud architectureCloud architecture
Cloud architecture
 
Unit5 Cloud Federation,
Unit5 Cloud Federation,Unit5 Cloud Federation,
Unit5 Cloud Federation,
 
SLA Agreement, types and Life Cycle
SLA Agreement, types and Life Cycle SLA Agreement, types and Life Cycle
SLA Agreement, types and Life Cycle
 
Unit 2 -Cloud Computing Architecture
Unit 2 -Cloud Computing ArchitectureUnit 2 -Cloud Computing Architecture
Unit 2 -Cloud Computing Architecture
 
Distributed computing
Distributed computingDistributed computing
Distributed computing
 
Cloud computing ppt
Cloud computing pptCloud computing ppt
Cloud computing ppt
 
Multi Tenancy In The Cloud
Multi Tenancy In The CloudMulti Tenancy In The Cloud
Multi Tenancy In The Cloud
 
Cloud computing system models for distributed and cloud computing
Cloud computing system models for distributed and cloud computingCloud computing system models for distributed and cloud computing
Cloud computing system models for distributed and cloud computing
 
Cloud security
Cloud securityCloud security
Cloud security
 
On demand provisioning
On demand provisioningOn demand provisioning
On demand provisioning
 
Cloud computing
Cloud computingCloud computing
Cloud computing
 

Viewers also liked

Scalability and fault tolerance
Scalability and fault toleranceScalability and fault tolerance
Scalability and fault tolerance
gaurav jain
 
Scalability Design Principles - Internal Session
Scalability Design Principles - Internal SessionScalability Design Principles - Internal Session
Scalability Design Principles - Internal Session
Sachin Sancheti - Microsoft Azure Architect
 
The Analysis of green university resource planning on cloud computing.
The Analysis of green university resource planning on cloud computing.The Analysis of green university resource planning on cloud computing.
The Analysis of green university resource planning on cloud computing.
Prachyanun Nilsook
 
Cloud computing availability
Cloud computing availabilityCloud computing availability
Cloud computing availabilitys2page
 
API Reliability Guide
API Reliability GuideAPI Reliability Guide
API Reliability Guide
Nick DeNardis
 
Cloud Computing - Availability Issues and Controls
Cloud Computing - Availability Issues and ControlsCloud Computing - Availability Issues and Controls
Cloud Computing - Availability Issues and Controls
lylcheng88
 
Resource Management in Cloud Computing
Resource Management in Cloud ComputingResource Management in Cloud Computing
Resource Management in Cloud Computing
Cristian Klein
 
Redis memcached pdf
Redis memcached pdfRedis memcached pdf
Redis memcached pdf
Erin O'Neill
 
fault tolerance management in cloud computing
fault tolerance management in cloud computingfault tolerance management in cloud computing
fault tolerance management in cloud computingKruthikka Palraj
 
Scalable Reliable Secure REST
Scalable Reliable Secure RESTScalable Reliable Secure REST
Scalable Reliable Secure RESTguestb2ed5f
 
Building Scalable, Highly Concurrent & Fault Tolerant Systems - Lessons Learned
Building Scalable, Highly Concurrent & Fault Tolerant Systems -  Lessons LearnedBuilding Scalable, Highly Concurrent & Fault Tolerant Systems -  Lessons Learned
Building Scalable, Highly Concurrent & Fault Tolerant Systems - Lessons Learned
Jonas Bonér
 
Cloud level scalability - Nuxeo Tour 2014
Cloud level scalability - Nuxeo Tour 2014Cloud level scalability - Nuxeo Tour 2014
Cloud level scalability - Nuxeo Tour 2014
Nuxeo
 
Developing High Performance and Scalable ColdFusion Applications Using Terrac...
Developing High Performance and Scalable ColdFusion Applications Using Terrac...Developing High Performance and Scalable ColdFusion Applications Using Terrac...
Developing High Performance and Scalable ColdFusion Applications Using Terrac...
Shailendra Prasad
 
Buffer management --database buffering
Buffer management --database buffering Buffer management --database buffering
Buffer management --database buffering
julia121214
 
Reliable, cheaper, and modular new scada 1
Reliable, cheaper, and modular new scada 1Reliable, cheaper, and modular new scada 1
Reliable, cheaper, and modular new scada 1Mohamed Zahran
 
Research and technology explosion in scale-out storage
Research and technology explosion in scale-out storageResearch and technology explosion in scale-out storage
Research and technology explosion in scale-out storage
Jeff Spencer
 
Fundamental cloud computing
Fundamental cloud computingFundamental cloud computing
Fundamental cloud computing
Asmaa Ibrahim
 
Database , 12 Reliability
Database , 12 ReliabilityDatabase , 12 Reliability
Database , 12 ReliabilityAli Usman
 
Cloud computing security and privacy
Cloud computing security and privacyCloud computing security and privacy
Cloud computing security and privacy
Adeel Javaid
 

Viewers also liked (20)

Scalability and fault tolerance
Scalability and fault toleranceScalability and fault tolerance
Scalability and fault tolerance
 
Scalability Design Principles - Internal Session
Scalability Design Principles - Internal SessionScalability Design Principles - Internal Session
Scalability Design Principles - Internal Session
 
The Analysis of green university resource planning on cloud computing.
The Analysis of green university resource planning on cloud computing.The Analysis of green university resource planning on cloud computing.
The Analysis of green university resource planning on cloud computing.
 
Cloud computing availability
Cloud computing availabilityCloud computing availability
Cloud computing availability
 
API Reliability Guide
API Reliability GuideAPI Reliability Guide
API Reliability Guide
 
Cloud Computing - Availability Issues and Controls
Cloud Computing - Availability Issues and ControlsCloud Computing - Availability Issues and Controls
Cloud Computing - Availability Issues and Controls
 
Buffer manager
Buffer managerBuffer manager
Buffer manager
 
Resource Management in Cloud Computing
Resource Management in Cloud ComputingResource Management in Cloud Computing
Resource Management in Cloud Computing
 
Redis memcached pdf
Redis memcached pdfRedis memcached pdf
Redis memcached pdf
 
fault tolerance management in cloud computing
fault tolerance management in cloud computingfault tolerance management in cloud computing
fault tolerance management in cloud computing
 
Scalable Reliable Secure REST
Scalable Reliable Secure RESTScalable Reliable Secure REST
Scalable Reliable Secure REST
 
Building Scalable, Highly Concurrent & Fault Tolerant Systems - Lessons Learned
Building Scalable, Highly Concurrent & Fault Tolerant Systems -  Lessons LearnedBuilding Scalable, Highly Concurrent & Fault Tolerant Systems -  Lessons Learned
Building Scalable, Highly Concurrent & Fault Tolerant Systems - Lessons Learned
 
Cloud level scalability - Nuxeo Tour 2014
Cloud level scalability - Nuxeo Tour 2014Cloud level scalability - Nuxeo Tour 2014
Cloud level scalability - Nuxeo Tour 2014
 
Developing High Performance and Scalable ColdFusion Applications Using Terrac...
Developing High Performance and Scalable ColdFusion Applications Using Terrac...Developing High Performance and Scalable ColdFusion Applications Using Terrac...
Developing High Performance and Scalable ColdFusion Applications Using Terrac...
 
Buffer management --database buffering
Buffer management --database buffering Buffer management --database buffering
Buffer management --database buffering
 
Reliable, cheaper, and modular new scada 1
Reliable, cheaper, and modular new scada 1Reliable, cheaper, and modular new scada 1
Reliable, cheaper, and modular new scada 1
 
Research and technology explosion in scale-out storage
Research and technology explosion in scale-out storageResearch and technology explosion in scale-out storage
Research and technology explosion in scale-out storage
 
Fundamental cloud computing
Fundamental cloud computingFundamental cloud computing
Fundamental cloud computing
 
Database , 12 Reliability
Database , 12 ReliabilityDatabase , 12 Reliability
Database , 12 Reliability
 
Cloud computing security and privacy
Cloud computing security and privacyCloud computing security and privacy
Cloud computing security and privacy
 

Similar to Scalability and Reliability in the Cloud

Orleans: Cloud Computing for Everyone - SOCC 2011
Orleans: Cloud Computing for Everyone - SOCC 2011Orleans: Cloud Computing for Everyone - SOCC 2011
Orleans: Cloud Computing for Everyone - SOCC 2011
Jorgen Thelin
 
Adopting the Cloud
Adopting the CloudAdopting the Cloud
Adopting the Cloud
Tapio Rautonen
 
What does performance mean in the cloud
What does performance mean in the cloudWhat does performance mean in the cloud
What does performance mean in the cloud
Michael Kopp
 
Reactive Architecture
Reactive ArchitectureReactive Architecture
Reactive Architecture
Knoldus Inc.
 
Building Cloud capability for startups
Building Cloud capability for startupsBuilding Cloud capability for startups
Building Cloud capability for startups
Sekhar Mohanty
 
High Availability of Services in Wide-Area Shared Computing Networks
High Availability of Services in Wide-Area Shared Computing NetworksHigh Availability of Services in Wide-Area Shared Computing Networks
High Availability of Services in Wide-Area Shared Computing Networks
Mário Almeida
 
Databarracks & SolidFire - How to run tier 1 applications in the cloud
Databarracks & SolidFire - How to run tier 1 applications in the cloud Databarracks & SolidFire - How to run tier 1 applications in the cloud
Databarracks & SolidFire - How to run tier 1 applications in the cloud
NetApp
 
Atmosphere 2014: Switching from monolithic approach to modular cloud computin...
Atmosphere 2014: Switching from monolithic approach to modular cloud computin...Atmosphere 2014: Switching from monolithic approach to modular cloud computin...
Atmosphere 2014: Switching from monolithic approach to modular cloud computin...
PROIDEA
 
Intro to Cloud Native _ v1.0en (2021/01)
Intro to Cloud Native _ v1.0en (2021/01)Intro to Cloud Native _ v1.0en (2021/01)
Intro to Cloud Native _ v1.0en (2021/01)
Young Suk Ahn Park
 
Crossing the river by feeling the stones from legacy to cloud native applica...
Crossing the river by feeling the stones  from legacy to cloud native applica...Crossing the river by feeling the stones  from legacy to cloud native applica...
Crossing the river by feeling the stones from legacy to cloud native applica...
OPNFV
 
Cloud capability for startups
Cloud capability for startupsCloud capability for startups
Cloud capability for startups
Cloud and analytics Lab
 
Making sense of Cloud Computing
Making sense of Cloud ComputingMaking sense of Cloud Computing
Making sense of Cloud ComputingLawrence Wilkes
 
Nfv open stack-shuo-yang
Nfv open stack-shuo-yangNfv open stack-shuo-yang
Nfv open stack-shuo-yangOW2
 
Gomez Blazing Fast Cloud Best Practices
Gomez Blazing Fast Cloud Best Practices Gomez Blazing Fast Cloud Best Practices
Gomez Blazing Fast Cloud Best Practices
Compuware APM
 
Dr관련 세미나 자료 v2
Dr관련 세미나 자료 v2Dr관련 세미나 자료 v2
Dr관련 세미나 자료 v2종필 김
 
Dr관련 세미나 자료 v2333
Dr관련 세미나 자료 v2333Dr관련 세미나 자료 v2333
Dr관련 세미나 자료 v2333종필 김
 
Sa 006 modifiability
Sa 006 modifiabilitySa 006 modifiability
Sa 006 modifiability
Frank Gielen
 
Move fast and make things with microservices
Move fast and make things with microservicesMove fast and make things with microservices
Move fast and make things with microservices
Mithun Arunan
 
Clearing the air on Cloud Computing
Clearing the air on Cloud ComputingClearing the air on Cloud Computing
Clearing the air on Cloud Computing
Karthik Sankar
 

Similar to Scalability and Reliability in the Cloud (20)

Orleans: Cloud Computing for Everyone - SOCC 2011
Orleans: Cloud Computing for Everyone - SOCC 2011Orleans: Cloud Computing for Everyone - SOCC 2011
Orleans: Cloud Computing for Everyone - SOCC 2011
 
Adopting the Cloud
Adopting the CloudAdopting the Cloud
Adopting the Cloud
 
What does performance mean in the cloud
What does performance mean in the cloudWhat does performance mean in the cloud
What does performance mean in the cloud
 
Reactive Architecture
Reactive ArchitectureReactive Architecture
Reactive Architecture
 
Building Cloud capability for startups
Building Cloud capability for startupsBuilding Cloud capability for startups
Building Cloud capability for startups
 
High Availability of Services in Wide-Area Shared Computing Networks
High Availability of Services in Wide-Area Shared Computing NetworksHigh Availability of Services in Wide-Area Shared Computing Networks
High Availability of Services in Wide-Area Shared Computing Networks
 
Databarracks & SolidFire - How to run tier 1 applications in the cloud
Databarracks & SolidFire - How to run tier 1 applications in the cloud Databarracks & SolidFire - How to run tier 1 applications in the cloud
Databarracks & SolidFire - How to run tier 1 applications in the cloud
 
Atmosphere 2014: Switching from monolithic approach to modular cloud computin...
Atmosphere 2014: Switching from monolithic approach to modular cloud computin...Atmosphere 2014: Switching from monolithic approach to modular cloud computin...
Atmosphere 2014: Switching from monolithic approach to modular cloud computin...
 
Intro to Cloud Native _ v1.0en (2021/01)
Intro to Cloud Native _ v1.0en (2021/01)Intro to Cloud Native _ v1.0en (2021/01)
Intro to Cloud Native _ v1.0en (2021/01)
 
Crossing the river by feeling the stones from legacy to cloud native applica...
Crossing the river by feeling the stones  from legacy to cloud native applica...Crossing the river by feeling the stones  from legacy to cloud native applica...
Crossing the river by feeling the stones from legacy to cloud native applica...
 
Cloud capability for startups
Cloud capability for startupsCloud capability for startups
Cloud capability for startups
 
Making sense of Cloud Computing
Making sense of Cloud ComputingMaking sense of Cloud Computing
Making sense of Cloud Computing
 
Nfv open stack-shuo-yang
Nfv open stack-shuo-yangNfv open stack-shuo-yang
Nfv open stack-shuo-yang
 
Gomez Blazing Fast Cloud Best Practices
Gomez Blazing Fast Cloud Best Practices Gomez Blazing Fast Cloud Best Practices
Gomez Blazing Fast Cloud Best Practices
 
Dr관련 세미나 자료 v2
Dr관련 세미나 자료 v2Dr관련 세미나 자료 v2
Dr관련 세미나 자료 v2
 
Dr관련 세미나 자료 v2333
Dr관련 세미나 자료 v2333Dr관련 세미나 자료 v2333
Dr관련 세미나 자료 v2333
 
Sa 006 modifiability
Sa 006 modifiabilitySa 006 modifiability
Sa 006 modifiability
 
Spo1 w25 spo1-w25
Spo1 w25 spo1-w25Spo1 w25 spo1-w25
Spo1 w25 spo1-w25
 
Move fast and make things with microservices
Move fast and make things with microservicesMove fast and make things with microservices
Move fast and make things with microservices
 
Clearing the air on Cloud Computing
Clearing the air on Cloud ComputingClearing the air on Cloud Computing
Clearing the air on Cloud Computing
 

Recently uploaded

Neuro-symbolic is not enough, we need neuro-*semantic*
Neuro-symbolic is not enough, we need neuro-*semantic*Neuro-symbolic is not enough, we need neuro-*semantic*
Neuro-symbolic is not enough, we need neuro-*semantic*
Frank van Harmelen
 
Elevating Tactical DDD Patterns Through Object Calisthenics
Elevating Tactical DDD Patterns Through Object CalisthenicsElevating Tactical DDD Patterns Through Object Calisthenics
Elevating Tactical DDD Patterns Through Object Calisthenics
Dorra BARTAGUIZ
 
Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........
Alison B. Lowndes
 
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdfFIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance
 
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
James Anderson
 
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
BookNet Canada
 
GraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge GraphGraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge Graph
Guy Korland
 
Generating a custom Ruby SDK for your web service or Rails API using Smithy
Generating a custom Ruby SDK for your web service or Rails API using SmithyGenerating a custom Ruby SDK for your web service or Rails API using Smithy
Generating a custom Ruby SDK for your web service or Rails API using Smithy
g2nightmarescribd
 
How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...
Product School
 
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
DanBrown980551
 
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Ramesh Iyer
 
Connector Corner: Automate dynamic content and events by pushing a button
Connector Corner: Automate dynamic content and events by pushing a buttonConnector Corner: Automate dynamic content and events by pushing a button
Connector Corner: Automate dynamic content and events by pushing a button
DianaGray10
 
Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !
KatiaHIMEUR1
 
Monitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR EventsMonitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR Events
Ana-Maria Mihalceanu
 
UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4
DianaGray10
 
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Albert Hoitingh
 
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Tobias Schneck
 
FIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdfFIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance
 
To Graph or Not to Graph Knowledge Graph Architectures and LLMs
To Graph or Not to Graph Knowledge Graph Architectures and LLMsTo Graph or Not to Graph Knowledge Graph Architectures and LLMs
To Graph or Not to Graph Knowledge Graph Architectures and LLMs
Paul Groth
 
Knowledge engineering: from people to machines and back
Knowledge engineering: from people to machines and backKnowledge engineering: from people to machines and back
Knowledge engineering: from people to machines and back
Elena Simperl
 

Recently uploaded (20)

Neuro-symbolic is not enough, we need neuro-*semantic*
Neuro-symbolic is not enough, we need neuro-*semantic*Neuro-symbolic is not enough, we need neuro-*semantic*
Neuro-symbolic is not enough, we need neuro-*semantic*
 
Elevating Tactical DDD Patterns Through Object Calisthenics
Elevating Tactical DDD Patterns Through Object CalisthenicsElevating Tactical DDD Patterns Through Object Calisthenics
Elevating Tactical DDD Patterns Through Object Calisthenics
 
Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........
 
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdfFIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
 
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
 
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
 
GraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge GraphGraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge Graph
 
Generating a custom Ruby SDK for your web service or Rails API using Smithy
Generating a custom Ruby SDK for your web service or Rails API using SmithyGenerating a custom Ruby SDK for your web service or Rails API using Smithy
Generating a custom Ruby SDK for your web service or Rails API using Smithy
 
How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...
 
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
 
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
 
Connector Corner: Automate dynamic content and events by pushing a button
Connector Corner: Automate dynamic content and events by pushing a buttonConnector Corner: Automate dynamic content and events by pushing a button
Connector Corner: Automate dynamic content and events by pushing a button
 
Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !
 
Monitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR EventsMonitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR Events
 
UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4
 
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
 
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
 
FIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdfFIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdf
 
To Graph or Not to Graph Knowledge Graph Architectures and LLMs
To Graph or Not to Graph Knowledge Graph Architectures and LLMsTo Graph or Not to Graph Knowledge Graph Architectures and LLMs
To Graph or Not to Graph Knowledge Graph Architectures and LLMs
 
Knowledge engineering: from people to machines and back
Knowledge engineering: from people to machines and backKnowledge engineering: from people to machines and back
Knowledge engineering: from people to machines and back
 

Scalability and Reliability in the Cloud

  • 1. HIGH SCALABILITY AND RELIABILITY IN THE CLOUD GREG THOMPSON HEAD OF ARCHITECTURE, APPS ENABLEMENT ALCATEL-LUCENT @gmthomp greg.thompson@alcatel-lucent.com
  • 2. About This Session  Target audience is backend application developers deploying infrastructure into a cloud environment  Will cover concepts for scalability and reliability with the goal of helping application developers understand some key considerations when designing and building the backend.
  • 3. Design Time Decisions  When first building your application backend, consider a few important questions  How fast should the application be recovered if a failure occurs?  What kind of down time is acceptable?  Is the application maintaining stateful data?  What kind of information needs to be shared across multiple instances?
  • 5. What is Scalability?  Scalability is a term used to describe how the application will handle increased loads of traffic volume
  • 6. Scalability – Factors to Consider  Horizontal vs. Vertical  Stateless vs. Stateful  Understanding Limitations  Connection Management  Segmentation of traffic  Segmentation of responsibility (distributed arch)  Clustering  Messaging
  • 7. What Type of Scalability? Vertical vs. Horizontal Vertical Horizontal  Scaling up a single  Scaling out across node multiple nodes  Physical limitations –  Ability to distribute instances are very powerful but still have traffic over a number finite limits of nodes  Resources such as  Allows for more number of sockets flexibility over time can only go so high
  • 8. Will the App Maintain State? Stateless Applications  Application does not persist information about transactions Request Respons e  Each transaction is independent and Application atomic
  • 9. Will the App Maintain State? Stateful Applications  Application needs to maintain data about transactions in First Subseque progress Request nt Request  Requires storage D Application B  Persistence may also be required depending the
  • 10. Understanding Limitations  Thorough testing is key to understanding bottlenecks  Test real-world scenarios included latency  Push the system to the max to understand how it
  • 11. Connection Management Mobile Device Connections  Mobile devices don’t always behave like you expect  Connectivity is often very dynamic  Devices move from 4G/3G/2G/no G/Wifi  Not all TCP events will get reported and sockets can remain open  If not handled correctly, these factors can be time bomb no matter how vertically you scale a component
  • 12. Segmenting Traffic  Once the application is able to be scaled out, traffic can be segmented in different ways  Location (i.e. east coast vs. west coast)  Pre-assigned criteria - User ID, IP, or other dynamic criteria  Load Balanced
  • 13. Segmenting Responsibility  Segmenting responsibility allows for a distributed architecture  Each component can be scaled independently  Allows for more flexibility in scaling  Adds more complexity and potential messaging overhead
  • 14. Clustering  Clustering is the concept of having a group of nodes working App App App App Nod Nod Nod Nod together to provide the e e e e same capability  Nodes typically co- Share located d  Common data shared Data as needed across the cluster  Communication may be needed between nodes
  • 15. Messaging  Once a clustered  Types of Messaging and/or distributed  JMS architecture is used  Open Source MQ messaging will be packages needed between  Custom Designed various components  Use of APIs and/or nodes
  • 16. Example of Scaled Architecture Load Load Load Load Balancer Balancer Balancer Balancer Web Compone Compone Web Compone Compone Web Server Compone nt 1 Compone nt 2 Web Server Compone nt 1 Compone nt 2 Server nt 1 nt 2 Server nt 1 nt 2 Database Database Site 1 Site 2
  • 18. What is Reliability/Availability?  Availability is typically measured by the amount of downtime your application has in a given year  Unplanned downtime and planned downtime are both considered  Reliability is described by the likelihood of failure based on actual measurements  We’ll focus more on Availability
  • 19. Reliability/Availability Factors to Consider  Cost vs. Need  Problem detection  Automation for recovery  Active/standby, active/active, hot standby vs. cold standby  Local and Geo-redundancy  Multi-zone, multi-cloud  Test Until You Break the System
  • 20. Reliability Requirements Cost Considerations Need  Number of instances  User Experience  Bandwidth  Customer requirements requirements between sites  Negative Publicity  Complexity of software  Monitoring
  • 21. Problem Detection  Effective monitoring of the application is key to minimizing downtime  Event reporting in the software  External monitoring – test for successful behavior  Auto detection and alerting to minimize cost of operations personnel
  • 22. Automation for Recovery  How quickly a failed component recovers increases reliability  Automatic detection and automatic recovery  Automated installation key for minimizing setup time during recovery
  • 23. Availability Models  N = number of nodes required for normal N N processing  N+1 = one additional node to provide N N +1 redundancy in case of failure  N+K = K nodes provide N N K K additional redundancy
  • 24. Redundancy Models  Active/Cold Standby Cold  backup site is booted Active Standb up when needed y  Active/Hot Standby Active  Backup site is running Active Standb and ready to takeover y  Active/Active  Both sites active and Active Active processing traffic
  • 25. Local and Geo-Redundancy  Local  Geo-Graphic  Backup instances  Backup instances are available within are available in the same location another geo-graphic location  Use of availability  Typically in a zones within a separate region to region very similar account for events such as natural disasters
  • 26. Availability to the Max  Multi-Zone/Multi-  Multi-Cloud Region  Ifyour application  Multi-zone typically requires the provide instances running in different maximum possible physical locations, but availability in same region  Run in different  Multi-region provides cloud providers in different geographic regions of availability different regions
  • 27. Test Until You Break the System  Push the system to the max and observe the breaking points  Fix the problem, repeat  The best way to find problems to prevent unplanned downtime is to thoroughly test with a mindset to break
  • 28. Q&A