SlideShare a Scribd company logo
1 of 54
Source: http://ir.netflix.com
(I’m skipping all the cloud intro etc. Netflix runs in the
cloud, if you hadn’t figured that out already you aren’t
   paying attention and should go to the other Netflix
talks at AWS Re:Invent or read slideshare.net/netflix)
In production at
Netflix
2009
2009
2010
2010
2010
2010
2010
2011
Architecture applies to any cloud or datacenter
  Illustrated today using real world examples
Consumer                                               User Data
Electronics
                                      Web Site or
                       Browse        Discovery API
AWS Cloud
 Services                                            Personalization

CDN Edge
Locations
                                                          DRM
               Customer       Play
              Device (PC,            Streaming API
              PS3, TV…)
                                                      QoS Logging


                                                         CDN
                                                      Management
                                                      and Steering
                            Watch    OpenConnect
                                      CDN Boxes
                                                        Content
                                                       Encoding
Each icon is three to a
 few hundred
 instances across                    Cassandra
 three AWS zones

                                                 memcached
                                             Web service
                        Start Here
                                                 S3 bucket




Personalization movie
group chooser
Deployed in Three Balanced Availability Zones

                           Load Balancers




        Zone A                 Zone B                  Zone C
Cassandra and Evcache   Cassandra and Evcache   Cassandra and Evcache
      Replicas                Replicas                Replicas
Triple Replicated Persistence

                             Load Balancers




       Zone A                    Zone B                  Zone C
Cassandra and Evcache     Cassandra and Evcache   Cassandra and Evcache
      Replicas                  Replicas                Replicas
Isolated Regions


                     US-East Load Balancers                                                EU-West Load Balancers




     Zone A                     Zone B                Zone C               Zone A                     Zone B               Zone C

Cassandra Replicas         Cassandra Replicas    Cassandra Replicas   Cassandra Replicas         Cassandra Replicas   Cassandra Replicas
Failure Mode          Probability   Mitigation Plan
Application Failure   High          Automatic degraded response
AWS Region Failure    Low           Wait for region to recover
AWS Zone Failure      Medium        Continue to run on 2 out of 3 zones
Datacenter Failure    Medium        Migrate more functions to cloud
Data store failure    Low           Restore from S3 backups
S3 failure            Low           Restore from remote archive
Run what you wrote
 Rapid detection
 Rapid Response
http://techblog.netflix.com/2012/06/annoucing-archaius-dynamic-properties.html
http://techblog.netflix.com/2012/02/fault-tolerance-in-high-volume.html
http://techblog.netflix.com/2012/07/chaos-monkey-released-into-wild.html
http://techblog.netflix.com/2012/11/edda-learn-stories-of-your-cloud.html



                                             Eureka Services
                                                metadata




                      AWS Instances, ASGs,                     AppDynamics Request
                              etc.                                    flow




                                             Edda              Monkeys
http://techblog.netflix.com/2012/06/asgard-web-based-cloud-management-and.html
Classify and name the types of things that
might go wrong in the platform or infrastructure
Zone Network Outage


                         US-East Load Balancers                                                   EU-West Load Balancers




         Zone A                     Zone B                   Zone C               Zone A                     Zone B               Zone C

    Cassandra Replicas         Cassandra Replicas       Cassandra Replicas   Cassandra Replicas         Cassandra Replicas   Cassandra Replicas




                                                    Zone Dependent
Zone Power Outage
                                                    Service Outage


                                               Dependent Service could be @NetflixOSS
                                                 platform or underlying infrastructure
Regional Network Outage


                     US-East Load Balancers                                                  EU-West Load Balancers




     Zone A                     Zone B                  Zone C               Zone A                     Zone B               Zone C

Cassandra Replicas         Cassandra Replicas      Cassandra Replicas   Cassandra Replicas         Cassandra Replicas   Cassandra Replicas




                                         Control Plane Overload
Cascading Capacity Overload


                         US-East Load Balancers                                                     EU-West Load Balancers




         Zone A                     Zone B               Zone C                     Zone A                      Zone B               Zone C

    Cassandra Replicas         Cassandra Replicas   Cassandra Replicas         Cassandra Replicas          Cassandra Replicas   Cassandra Replicas




Capacity demand migrates to services                    Platform and Infrastructure
                                                                                                    Migrating demand across regions may
in another zone that don’t scale up fast                Software Bugs and Global
                                                                                                    just spread the problem further…
enough to take the load                                    Configuration Errors
                                                                     “Oops…”
Hardening the cloud
 Lessons Learned at Scale
Why Netflix Stays Up (Mostly)
http://techblog.netflix.com/2011/04/lessons-netflix-learned-from-aws-outage.html
http://googleappengine.blogspot.com/2012/10/about-todays-app-engine-outage.html
http://aws.amazon.com/message/67457/
http://techblog.netflix.com/2012/07/lessons-netflix-learned-from-aws-storm.html
@NetflixOSS Eureka service directory failed to mark
                                   down dead instances due to a configuration error

                         US-East Load Balancers                                                      EU-West Load Balancers




         Zone A                     Zone B                    Zone C                  Zone A                    Zone B               Zone C

    Cassandra Replicas         Cassandra Replicas        Cassandra Replicas     Cassandra Replicas         Cassandra Replicas   Cassandra Replicas




                                                                                           Effect: higher latency and errors
Zone Power Outage                                                                          Mitigation: Fixed configuration, and made
                                               Applications not using Zone-
                                                                                           zone aware routing the default
                                               aware routing kept trying to talk to
                                               dead instances and timing out
Zone Enable DNS
Command Queue                                     Per-Zone Control Plane
                                                  Command Queues


                      US-East Load Balancers                                               EU-West Load Balancers




      Zone A                     Zone B               Zone C               Zone A                     Zone B               Zone C

 Cassandra Replicas         Cassandra Replicas   Cassandra Replicas   Cassandra Replicas         Cassandra Replicas   Cassandra Replicas
A highly scalable, available and durable
          deployment pattern
Single function Cassandra Cluster
  Many Different Single-Function REST Clients                                Managed by Priam
                                                                             Between 6 and 72 nodes

                                            Stateless Data Access REST Service
                                            Astyanax Cassandra Client




                                                                                         Optional
Each icon represents a horizontally scaled service of three to hundreds of               Datacenter
instances deployed over three availability zones                                         Update Flow
                                    Appdynamics Service Flow Visualization
Linux Base AMI (CentOS or Ubuntu)

Optional Apache
    frontend,        Java (JDK 6 or 7)
memcached, non-
   java apps
                      AppDynamics
                        appagent
                       monitoring     Tomcat
   Monitoring
Log rotation to S3                    Application war file, base servlet,
                                                                            Healthcheck, status servlets, JMX
  AppDynamics         GC and thread    platform, client interface jars,
                                                                               interface, Servo autoscale
 machineagent         dump logging                Astyanax
   Epic/Atlas
http://github.com/netflix
Linux Base AMI (CentOS or Ubuntu)

Tomcat and Priam on
        JDK
                       Java (JDK 7)
 Healthcheck, Status

                          AppDynamics
                            appagent
                           monitoring       Cassandra Server
    Monitoring
   AppDynamics                              Local Ephemeral Disk Space – 2TB of SSD or 1.6TB disk holding Commit log and
                       GC and thread dump                                    SSTables
   machineagent             logging
    Epic/Atlas
http://github.com/netflix
Cassandra

              Cassandra                     Cassandra




  Cassandra                                             Cassandra




                               S3
                             Backup
Cassandra                                                 Cassandra




       Cassandra                                  Cassandra




                     Cassandra       Cassandra




 Archive
@NetflixOSS
http://techblog.netflix.com
Legend
 Github / Techblog                Priam                                Exhibitor
                                                                                                     Servo and Autoscaling Scripts
                           Cassandra as a Service                Zookeeper as a Service
Apache Contributions
                                Astyanax                                Curator                                  Honu
Techblog Post Only
                          Cassandra client for Java                Zookeeper Patterns                 Log4j streaming to Hadoop
   Coming Soon
                                CassJMeter                           EVCache                            Circuit Breaker - Hystrix
                             Cassandra test suite               Memcached as a Service                  Robust service pattern

                         Cassandra Multi-region EC2                Eureka / Discovery             Asgard - AutoScaleGroup based AWS
                             datastore support                      Service Directory                           console

                                 Aegisthus                            Archaius                             Chaos Monkey
                          Hadoop ETL for Cassandra            Dynamics Properties Service               Robustness verification
                                                                        Edda
                                   Explorers                                                               Latency Monkey
                                                                Queryable config history

                       Governator - Library lifecycle and
                                                            Server-side latency/error injection             Janitor Monkey
                            dependency injection

                                    Odin
                                                                REST Client + mid-tier LB                  Bakeries and AMI
                            Workflow orchestration

                            Blitz4j - Async logging          Configuration REST endpoints                  Build dynaslaves
http://github.com/Netflix
       http://techblog.netflix.com
       http://slideshare.net/Netflix

http://www.linkedin.com/in/adriancockcroft
We are sincerely eager to
hear your FEEDBACK on this
presentation and on re:Invent.

 Please fill out an evaluation
   form when you have a
            chance.

More Related Content

What's hot

Amazon EKS - Elastic Container Service for Kubernetes
Amazon EKS - Elastic Container Service for KubernetesAmazon EKS - Elastic Container Service for Kubernetes
Amazon EKS - Elastic Container Service for KubernetesAmazon Web Services
 
(DVO202) DevOps at Amazon: A Look At Our Tools & Processes
(DVO202) DevOps at Amazon: A Look At Our Tools & Processes(DVO202) DevOps at Amazon: A Look At Our Tools & Processes
(DVO202) DevOps at Amazon: A Look At Our Tools & ProcessesAmazon Web Services
 
금융권 최신 AWS 도입 사례 총정리 – 신한 제주 은행, KB손해보험 사례를 중심으로 - 지성국 사업 개발 담당 이사, AWS / 정을용...
금융권 최신 AWS 도입 사례 총정리 – 신한 제주 은행, KB손해보험 사례를 중심으로 - 지성국 사업 개발 담당 이사, AWS / 정을용...금융권 최신 AWS 도입 사례 총정리 – 신한 제주 은행, KB손해보험 사례를 중심으로 - 지성국 사업 개발 담당 이사, AWS / 정을용...
금융권 최신 AWS 도입 사례 총정리 – 신한 제주 은행, KB손해보험 사례를 중심으로 - 지성국 사업 개발 담당 이사, AWS / 정을용...Amazon Web Services Korea
 
Microservices architecture
Microservices architectureMicroservices architecture
Microservices architectureAbdelghani Azri
 
Microservice vs. Monolithic Architecture
Microservice vs. Monolithic ArchitectureMicroservice vs. Monolithic Architecture
Microservice vs. Monolithic ArchitecturePaul Mooney
 
VisiQuate: Azure cloud migration case study
VisiQuate: Azure cloud migration case studyVisiQuate: Azure cloud migration case study
VisiQuate: Azure cloud migration case studyLeonid Nekhymchuk
 
[AWS Builders] AWS상의 보안 위협 탐지 및 대응
[AWS Builders] AWS상의 보안 위협 탐지 및 대응[AWS Builders] AWS상의 보안 위협 탐지 및 대응
[AWS Builders] AWS상의 보안 위협 탐지 및 대응Amazon Web Services Korea
 
Cloud native integration
Cloud native integrationCloud native integration
Cloud native integrationKim Clark
 
AWS 클라우드 비용 최적화를 위한 TIP - 임성은 AWS 매니저
AWS 클라우드 비용 최적화를 위한 TIP - 임성은 AWS 매니저AWS 클라우드 비용 최적화를 위한 TIP - 임성은 AWS 매니저
AWS 클라우드 비용 최적화를 위한 TIP - 임성은 AWS 매니저Amazon Web Services Korea
 
Flowcon (added to for CMG) Keynote talk on how Speed Wins and how Netflix is ...
Flowcon (added to for CMG) Keynote talk on how Speed Wins and how Netflix is ...Flowcon (added to for CMG) Keynote talk on how Speed Wins and how Netflix is ...
Flowcon (added to for CMG) Keynote talk on how Speed Wins and how Netflix is ...Adrian Cockcroft
 
Deep Dive on Amazon Elastic Container Service (ECS) and Fargate
Deep Dive on Amazon Elastic Container Service (ECS) and FargateDeep Dive on Amazon Elastic Container Service (ECS) and Fargate
Deep Dive on Amazon Elastic Container Service (ECS) and FargateAmazon Web Services
 
Deploy, Manage, and Scale your Apps with AWS Elastic Beanstalk
Deploy, Manage, and Scale your Apps with AWS Elastic BeanstalkDeploy, Manage, and Scale your Apps with AWS Elastic Beanstalk
Deploy, Manage, and Scale your Apps with AWS Elastic BeanstalkAmazon Web Services
 
Serverless Computing: build and run applications without thinking about servers
Serverless Computing: build and run applications without thinking about serversServerless Computing: build and run applications without thinking about servers
Serverless Computing: build and run applications without thinking about serversAmazon Web Services
 
Pets vs. Cattle: The Elastic Cloud Story
Pets vs. Cattle: The Elastic Cloud StoryPets vs. Cattle: The Elastic Cloud Story
Pets vs. Cattle: The Elastic Cloud StoryRandy Bias
 
CodeBuild CodePipeline CodeDeploy CodeCommit in AWS | Edureka
CodeBuild CodePipeline CodeDeploy CodeCommit in AWS | EdurekaCodeBuild CodePipeline CodeDeploy CodeCommit in AWS | Edureka
CodeBuild CodePipeline CodeDeploy CodeCommit in AWS | EdurekaEdureka!
 

What's hot (20)

Microservices and Amazon ECS
Microservices and Amazon ECSMicroservices and Amazon ECS
Microservices and Amazon ECS
 
Amazon EKS - Elastic Container Service for Kubernetes
Amazon EKS - Elastic Container Service for KubernetesAmazon EKS - Elastic Container Service for Kubernetes
Amazon EKS - Elastic Container Service for Kubernetes
 
(DVO202) DevOps at Amazon: A Look At Our Tools & Processes
(DVO202) DevOps at Amazon: A Look At Our Tools & Processes(DVO202) DevOps at Amazon: A Look At Our Tools & Processes
(DVO202) DevOps at Amazon: A Look At Our Tools & Processes
 
금융권 최신 AWS 도입 사례 총정리 – 신한 제주 은행, KB손해보험 사례를 중심으로 - 지성국 사업 개발 담당 이사, AWS / 정을용...
금융권 최신 AWS 도입 사례 총정리 – 신한 제주 은행, KB손해보험 사례를 중심으로 - 지성국 사업 개발 담당 이사, AWS / 정을용...금융권 최신 AWS 도입 사례 총정리 – 신한 제주 은행, KB손해보험 사례를 중심으로 - 지성국 사업 개발 담당 이사, AWS / 정을용...
금융권 최신 AWS 도입 사례 총정리 – 신한 제주 은행, KB손해보험 사례를 중심으로 - 지성국 사업 개발 담당 이사, AWS / 정을용...
 
Microservices architecture
Microservices architectureMicroservices architecture
Microservices architecture
 
Microservice vs. Monolithic Architecture
Microservice vs. Monolithic ArchitectureMicroservice vs. Monolithic Architecture
Microservice vs. Monolithic Architecture
 
VisiQuate: Azure cloud migration case study
VisiQuate: Azure cloud migration case studyVisiQuate: Azure cloud migration case study
VisiQuate: Azure cloud migration case study
 
Introduction to Microservices
Introduction to MicroservicesIntroduction to Microservices
Introduction to Microservices
 
Why to Cloud Native
Why to Cloud NativeWhy to Cloud Native
Why to Cloud Native
 
[AWS Builders] AWS상의 보안 위협 탐지 및 대응
[AWS Builders] AWS상의 보안 위협 탐지 및 대응[AWS Builders] AWS상의 보안 위협 탐지 및 대응
[AWS Builders] AWS상의 보안 위협 탐지 및 대응
 
Cloud native integration
Cloud native integrationCloud native integration
Cloud native integration
 
Introduction to microservices
Introduction to microservicesIntroduction to microservices
Introduction to microservices
 
AWS 클라우드 비용 최적화를 위한 TIP - 임성은 AWS 매니저
AWS 클라우드 비용 최적화를 위한 TIP - 임성은 AWS 매니저AWS 클라우드 비용 최적화를 위한 TIP - 임성은 AWS 매니저
AWS 클라우드 비용 최적화를 위한 TIP - 임성은 AWS 매니저
 
Why Microservice
Why Microservice Why Microservice
Why Microservice
 
Flowcon (added to for CMG) Keynote talk on how Speed Wins and how Netflix is ...
Flowcon (added to for CMG) Keynote talk on how Speed Wins and how Netflix is ...Flowcon (added to for CMG) Keynote talk on how Speed Wins and how Netflix is ...
Flowcon (added to for CMG) Keynote talk on how Speed Wins and how Netflix is ...
 
Deep Dive on Amazon Elastic Container Service (ECS) and Fargate
Deep Dive on Amazon Elastic Container Service (ECS) and FargateDeep Dive on Amazon Elastic Container Service (ECS) and Fargate
Deep Dive on Amazon Elastic Container Service (ECS) and Fargate
 
Deploy, Manage, and Scale your Apps with AWS Elastic Beanstalk
Deploy, Manage, and Scale your Apps with AWS Elastic BeanstalkDeploy, Manage, and Scale your Apps with AWS Elastic Beanstalk
Deploy, Manage, and Scale your Apps with AWS Elastic Beanstalk
 
Serverless Computing: build and run applications without thinking about servers
Serverless Computing: build and run applications without thinking about serversServerless Computing: build and run applications without thinking about servers
Serverless Computing: build and run applications without thinking about servers
 
Pets vs. Cattle: The Elastic Cloud Story
Pets vs. Cattle: The Elastic Cloud StoryPets vs. Cattle: The Elastic Cloud Story
Pets vs. Cattle: The Elastic Cloud Story
 
CodeBuild CodePipeline CodeDeploy CodeCommit in AWS | Edureka
CodeBuild CodePipeline CodeDeploy CodeCommit in AWS | EdurekaCodeBuild CodePipeline CodeDeploy CodeCommit in AWS | Edureka
CodeBuild CodePipeline CodeDeploy CodeCommit in AWS | Edureka
 

Viewers also liked

Netflix Global Cloud Architecture
Netflix Global Cloud ArchitectureNetflix Global Cloud Architecture
Netflix Global Cloud ArchitectureAdrian Cockcroft
 
Netflix Architecture Tutorial at Gluecon
Netflix Architecture Tutorial at GlueconNetflix Architecture Tutorial at Gluecon
Netflix Architecture Tutorial at GlueconAdrian Cockcroft
 
Netflix Cloud Architecture and Open Source
Netflix Cloud Architecture and Open SourceNetflix Cloud Architecture and Open Source
Netflix Cloud Architecture and Open Sourceaspyker
 
Architectures for High Availability - QConSF
Architectures for High Availability - QConSFArchitectures for High Availability - QConSF
Architectures for High Availability - QConSFAdrian Cockcroft
 
Yow Conference Dec 2013 Netflix Workshop Slides with Notes
Yow Conference Dec 2013 Netflix Workshop Slides with NotesYow Conference Dec 2013 Netflix Workshop Slides with Notes
Yow Conference Dec 2013 Netflix Workshop Slides with NotesAdrian Cockcroft
 
CMG2013 Workshop: Netflix Cloud Native, Capacity, Performance and Cost Optimi...
CMG2013 Workshop: Netflix Cloud Native, Capacity, Performance and Cost Optimi...CMG2013 Workshop: Netflix Cloud Native, Capacity, Performance and Cost Optimi...
CMG2013 Workshop: Netflix Cloud Native, Capacity, Performance and Cost Optimi...Adrian Cockcroft
 
Aws multi-region High Availability
Aws multi-region High Availability Aws multi-region High Availability
Aws multi-region High Availability Adam Book
 
Gluecon 2013 - NetflixOSS Cloud Native Tutorial Introduction
Gluecon 2013 - NetflixOSS Cloud Native Tutorial IntroductionGluecon 2013 - NetflixOSS Cloud Native Tutorial Introduction
Gluecon 2013 - NetflixOSS Cloud Native Tutorial IntroductionAdrian Cockcroft
 
Bottleneck analysis - Devopsdays Silicon Valley 2013
Bottleneck analysis - Devopsdays Silicon Valley 2013Bottleneck analysis - Devopsdays Silicon Valley 2013
Bottleneck analysis - Devopsdays Silicon Valley 2013Adrian Cockcroft
 
Netflix: From Clouds to Roots
Netflix: From Clouds to RootsNetflix: From Clouds to Roots
Netflix: From Clouds to RootsBrendan Gregg
 
MicroServices at Netflix - challenges of scale
MicroServices at Netflix - challenges of scaleMicroServices at Netflix - challenges of scale
MicroServices at Netflix - challenges of scaleSudhir Tonse
 
Gluecon 2013 - Netflix Cloud Native Tutorial Details (part 2)
Gluecon 2013 - Netflix Cloud Native Tutorial Details (part 2)Gluecon 2013 - Netflix Cloud Native Tutorial Details (part 2)
Gluecon 2013 - Netflix Cloud Native Tutorial Details (part 2)Adrian Cockcroft
 
AWS re:Invent 2016: Another Day in the Life of a Netflix Engineer (DEV209)
AWS re:Invent 2016: Another Day in the Life of a Netflix Engineer (DEV209)AWS re:Invent 2016: Another Day in the Life of a Netflix Engineer (DEV209)
AWS re:Invent 2016: Another Day in the Life of a Netflix Engineer (DEV209)Amazon Web Services
 
SV Forum Platform Architecture SIG - Netflix Open Source Platform
SV Forum Platform Architecture SIG - Netflix Open Source PlatformSV Forum Platform Architecture SIG - Netflix Open Source Platform
SV Forum Platform Architecture SIG - Netflix Open Source PlatformAdrian Cockcroft
 

Viewers also liked (20)

Netflix Global Cloud Architecture
Netflix Global Cloud ArchitectureNetflix Global Cloud Architecture
Netflix Global Cloud Architecture
 
Netflix Architecture Tutorial at Gluecon
Netflix Architecture Tutorial at GlueconNetflix Architecture Tutorial at Gluecon
Netflix Architecture Tutorial at Gluecon
 
Netflix Cloud Architecture and Open Source
Netflix Cloud Architecture and Open SourceNetflix Cloud Architecture and Open Source
Netflix Cloud Architecture and Open Source
 
Architectures for High Availability - QConSF
Architectures for High Availability - QConSFArchitectures for High Availability - QConSF
Architectures for High Availability - QConSF
 
Global Netflix Platform
Global Netflix PlatformGlobal Netflix Platform
Global Netflix Platform
 
Yow Conference Dec 2013 Netflix Workshop Slides with Notes
Yow Conference Dec 2013 Netflix Workshop Slides with NotesYow Conference Dec 2013 Netflix Workshop Slides with Notes
Yow Conference Dec 2013 Netflix Workshop Slides with Notes
 
CMG2013 Workshop: Netflix Cloud Native, Capacity, Performance and Cost Optimi...
CMG2013 Workshop: Netflix Cloud Native, Capacity, Performance and Cost Optimi...CMG2013 Workshop: Netflix Cloud Native, Capacity, Performance and Cost Optimi...
CMG2013 Workshop: Netflix Cloud Native, Capacity, Performance and Cost Optimi...
 
Dystopia as a Service
Dystopia as a ServiceDystopia as a Service
Dystopia as a Service
 
Aws multi-region High Availability
Aws multi-region High Availability Aws multi-region High Availability
Aws multi-region High Availability
 
Gluecon 2013 - NetflixOSS Cloud Native Tutorial Introduction
Gluecon 2013 - NetflixOSS Cloud Native Tutorial IntroductionGluecon 2013 - NetflixOSS Cloud Native Tutorial Introduction
Gluecon 2013 - NetflixOSS Cloud Native Tutorial Introduction
 
Bottleneck analysis - Devopsdays Silicon Valley 2013
Bottleneck analysis - Devopsdays Silicon Valley 2013Bottleneck analysis - Devopsdays Silicon Valley 2013
Bottleneck analysis - Devopsdays Silicon Valley 2013
 
Netflix: From Clouds to Roots
Netflix: From Clouds to RootsNetflix: From Clouds to Roots
Netflix: From Clouds to Roots
 
MicroServices at Netflix - challenges of scale
MicroServices at Netflix - challenges of scaleMicroServices at Netflix - challenges of scale
MicroServices at Netflix - challenges of scale
 
Gluecon 2013 - Netflix Cloud Native Tutorial Details (part 2)
Gluecon 2013 - Netflix Cloud Native Tutorial Details (part 2)Gluecon 2013 - Netflix Cloud Native Tutorial Details (part 2)
Gluecon 2013 - Netflix Cloud Native Tutorial Details (part 2)
 
AWS re:Invent 2016: Another Day in the Life of a Netflix Engineer (DEV209)
AWS re:Invent 2016: Another Day in the Life of a Netflix Engineer (DEV209)AWS re:Invent 2016: Another Day in the Life of a Netflix Engineer (DEV209)
AWS re:Invent 2016: Another Day in the Life of a Netflix Engineer (DEV209)
 
SV Forum Platform Architecture SIG - Netflix Open Source Platform
SV Forum Platform Architecture SIG - Netflix Open Source PlatformSV Forum Platform Architecture SIG - Netflix Open Source Platform
SV Forum Platform Architecture SIG - Netflix Open Source Platform
 
Gluecon keynote
Gluecon keynoteGluecon keynote
Gluecon keynote
 
How to Design for High Availability & Scale with AWS
How to Design for High Availability & Scale with AWSHow to Design for High Availability & Scale with AWS
How to Design for High Availability & Scale with AWS
 
NetflixOSS Meetup
NetflixOSS MeetupNetflixOSS Meetup
NetflixOSS Meetup
 
Netflix and Open Source
Netflix and Open SourceNetflix and Open Source
Netflix and Open Source
 

Similar to AWS Re:Invent - High Availability Architecture at Netflix

ARC203 Highly Available Architecture at Netflix - AWS re: Invent 2012
ARC203 Highly Available Architecture at Netflix - AWS re: Invent 2012ARC203 Highly Available Architecture at Netflix - AWS re: Invent 2012
ARC203 Highly Available Architecture at Netflix - AWS re: Invent 2012Amazon Web Services
 
Cassandra EU 2012 - Netflix's Cassandra Architecture and Open Source Efforts
Cassandra EU 2012 - Netflix's Cassandra Architecture and Open Source EffortsCassandra EU 2012 - Netflix's Cassandra Architecture and Open Source Efforts
Cassandra EU 2012 - Netflix's Cassandra Architecture and Open Source EffortsAcunu
 
C* Summit 2013: Netflix Open Source Tools and Benchmarks for Cassandra by Adr...
C* Summit 2013: Netflix Open Source Tools and Benchmarks for Cassandra by Adr...C* Summit 2013: Netflix Open Source Tools and Benchmarks for Cassandra by Adr...
C* Summit 2013: Netflix Open Source Tools and Benchmarks for Cassandra by Adr...DataStax Academy
 
Servers fail, who cares?
Servers fail, who cares? Servers fail, who cares?
Servers fail, who cares? greggulrich
 
CloudFest Denver Windows Azure Design Patterns
CloudFest Denver Windows Azure Design PatternsCloudFest Denver Windows Azure Design Patterns
CloudFest Denver Windows Azure Design PatternsDavid Pallmann
 
Cassandra Performance and Scalability on AWS
Cassandra Performance and Scalability on AWSCassandra Performance and Scalability on AWS
Cassandra Performance and Scalability on AWSAdrian Cockcroft
 
Disaster Recovery with the AWS Cloud
Disaster Recovery with the AWS CloudDisaster Recovery with the AWS Cloud
Disaster Recovery with the AWS CloudAmazon Web Services
 
Running High Availability Websites with Acquia and AWS
Running High Availability Websites with Acquia and AWSRunning High Availability Websites with Acquia and AWS
Running High Availability Websites with Acquia and AWSAcquia
 
The Netflix Open Source Platform
The Netflix Open Source PlatformThe Netflix Open Source Platform
The Netflix Open Source PlatformRuslan Meshenberg
 
1 Introduction at CloudStack Developer Day
1 Introduction at CloudStack Developer Day 1 Introduction at CloudStack Developer Day
1 Introduction at CloudStack Developer Day Kimihiko Kitase
 
Netflix presents at MassTLC Cloud Summit 2013
Netflix presents at MassTLC Cloud Summit 2013Netflix presents at MassTLC Cloud Summit 2013
Netflix presents at MassTLC Cloud Summit 2013MassTLC
 
Building Fault Tolerant Applications in the cloud - AWS Summit 2012 - NYC
Building Fault Tolerant Applications in the cloud - AWS Summit 2012 - NYC Building Fault Tolerant Applications in the cloud - AWS Summit 2012 - NYC
Building Fault Tolerant Applications in the cloud - AWS Summit 2012 - NYC Amazon Web Services
 
Cassandra Summit 2014: Cassandra Compute Cloud: An elastic Cassandra Infrastr...
Cassandra Summit 2014: Cassandra Compute Cloud: An elastic Cassandra Infrastr...Cassandra Summit 2014: Cassandra Compute Cloud: An elastic Cassandra Infrastr...
Cassandra Summit 2014: Cassandra Compute Cloud: An elastic Cassandra Infrastr...DataStax Academy
 
Ram chinta hug-20120922-v1
Ram chinta hug-20120922-v1Ram chinta hug-20120922-v1
Ram chinta hug-20120922-v1Ram Chinta
 
AWS re:Invent 2016: How to Migrate Microsoft Windows Applications to AWS Quic...
AWS re:Invent 2016: How to Migrate Microsoft Windows Applications to AWS Quic...AWS re:Invent 2016: How to Migrate Microsoft Windows Applications to AWS Quic...
AWS re:Invent 2016: How to Migrate Microsoft Windows Applications to AWS Quic...Amazon Web Services
 
AWS for Start-ups - Case Study - Go Squared
AWS for Start-ups - Case Study - Go SquaredAWS for Start-ups - Case Study - Go Squared
AWS for Start-ups - Case Study - Go SquaredAmazon Web Services
 
Windows Azure Design Patterns
Windows Azure Design PatternsWindows Azure Design Patterns
Windows Azure Design PatternsDavid Pallmann
 
Netflix Global Applications - NoSQL Search Roadshow
Netflix Global Applications - NoSQL Search RoadshowNetflix Global Applications - NoSQL Search Roadshow
Netflix Global Applications - NoSQL Search RoadshowAdrian Cockcroft
 
Disaster Recovery and Business Continuity - Toronto FSI Symposium - October 2016
Disaster Recovery and Business Continuity - Toronto FSI Symposium - October 2016Disaster Recovery and Business Continuity - Toronto FSI Symposium - October 2016
Disaster Recovery and Business Continuity - Toronto FSI Symposium - October 2016Amazon Web Services
 

Similar to AWS Re:Invent - High Availability Architecture at Netflix (20)

ARC203 Highly Available Architecture at Netflix - AWS re: Invent 2012
ARC203 Highly Available Architecture at Netflix - AWS re: Invent 2012ARC203 Highly Available Architecture at Netflix - AWS re: Invent 2012
ARC203 Highly Available Architecture at Netflix - AWS re: Invent 2012
 
Cassandra EU 2012 - Netflix's Cassandra Architecture and Open Source Efforts
Cassandra EU 2012 - Netflix's Cassandra Architecture and Open Source EffortsCassandra EU 2012 - Netflix's Cassandra Architecture and Open Source Efforts
Cassandra EU 2012 - Netflix's Cassandra Architecture and Open Source Efforts
 
C* Summit 2013: Netflix Open Source Tools and Benchmarks for Cassandra by Adr...
C* Summit 2013: Netflix Open Source Tools and Benchmarks for Cassandra by Adr...C* Summit 2013: Netflix Open Source Tools and Benchmarks for Cassandra by Adr...
C* Summit 2013: Netflix Open Source Tools and Benchmarks for Cassandra by Adr...
 
Servers fail, who cares?
Servers fail, who cares? Servers fail, who cares?
Servers fail, who cares?
 
CloudFest Denver Windows Azure Design Patterns
CloudFest Denver Windows Azure Design PatternsCloudFest Denver Windows Azure Design Patterns
CloudFest Denver Windows Azure Design Patterns
 
Cassandra Performance and Scalability on AWS
Cassandra Performance and Scalability on AWSCassandra Performance and Scalability on AWS
Cassandra Performance and Scalability on AWS
 
Disaster Recovery with the AWS Cloud
Disaster Recovery with the AWS CloudDisaster Recovery with the AWS Cloud
Disaster Recovery with the AWS Cloud
 
CloudStack technical overview
CloudStack technical overviewCloudStack technical overview
CloudStack technical overview
 
Running High Availability Websites with Acquia and AWS
Running High Availability Websites with Acquia and AWSRunning High Availability Websites with Acquia and AWS
Running High Availability Websites with Acquia and AWS
 
The Netflix Open Source Platform
The Netflix Open Source PlatformThe Netflix Open Source Platform
The Netflix Open Source Platform
 
1 Introduction at CloudStack Developer Day
1 Introduction at CloudStack Developer Day 1 Introduction at CloudStack Developer Day
1 Introduction at CloudStack Developer Day
 
Netflix presents at MassTLC Cloud Summit 2013
Netflix presents at MassTLC Cloud Summit 2013Netflix presents at MassTLC Cloud Summit 2013
Netflix presents at MassTLC Cloud Summit 2013
 
Building Fault Tolerant Applications in the cloud - AWS Summit 2012 - NYC
Building Fault Tolerant Applications in the cloud - AWS Summit 2012 - NYC Building Fault Tolerant Applications in the cloud - AWS Summit 2012 - NYC
Building Fault Tolerant Applications in the cloud - AWS Summit 2012 - NYC
 
Cassandra Summit 2014: Cassandra Compute Cloud: An elastic Cassandra Infrastr...
Cassandra Summit 2014: Cassandra Compute Cloud: An elastic Cassandra Infrastr...Cassandra Summit 2014: Cassandra Compute Cloud: An elastic Cassandra Infrastr...
Cassandra Summit 2014: Cassandra Compute Cloud: An elastic Cassandra Infrastr...
 
Ram chinta hug-20120922-v1
Ram chinta hug-20120922-v1Ram chinta hug-20120922-v1
Ram chinta hug-20120922-v1
 
AWS re:Invent 2016: How to Migrate Microsoft Windows Applications to AWS Quic...
AWS re:Invent 2016: How to Migrate Microsoft Windows Applications to AWS Quic...AWS re:Invent 2016: How to Migrate Microsoft Windows Applications to AWS Quic...
AWS re:Invent 2016: How to Migrate Microsoft Windows Applications to AWS Quic...
 
AWS for Start-ups - Case Study - Go Squared
AWS for Start-ups - Case Study - Go SquaredAWS for Start-ups - Case Study - Go Squared
AWS for Start-ups - Case Study - Go Squared
 
Windows Azure Design Patterns
Windows Azure Design PatternsWindows Azure Design Patterns
Windows Azure Design Patterns
 
Netflix Global Applications - NoSQL Search Roadshow
Netflix Global Applications - NoSQL Search RoadshowNetflix Global Applications - NoSQL Search Roadshow
Netflix Global Applications - NoSQL Search Roadshow
 
Disaster Recovery and Business Continuity - Toronto FSI Symposium - October 2016
Disaster Recovery and Business Continuity - Toronto FSI Symposium - October 2016Disaster Recovery and Business Continuity - Toronto FSI Symposium - October 2016
Disaster Recovery and Business Continuity - Toronto FSI Symposium - October 2016
 

More from Adrian Cockcroft

Netflix in the Cloud at SV Forum
Netflix in the Cloud at SV ForumNetflix in the Cloud at SV Forum
Netflix in the Cloud at SV ForumAdrian Cockcroft
 
Cloud Architecture Tutorial - Why and What (1of 3)
Cloud Architecture Tutorial - Why and What (1of 3) Cloud Architecture Tutorial - Why and What (1of 3)
Cloud Architecture Tutorial - Why and What (1of 3) Adrian Cockcroft
 
Cloud Architecture Tutorial - Platform Component Architecture (2of3)
Cloud Architecture Tutorial - Platform Component Architecture (2of3)Cloud Architecture Tutorial - Platform Component Architecture (2of3)
Cloud Architecture Tutorial - Platform Component Architecture (2of3)Adrian Cockcroft
 
Cloud Architecture Tutorial - Running in the Cloud (3of3)
Cloud Architecture Tutorial - Running in the Cloud (3of3)Cloud Architecture Tutorial - Running in the Cloud (3of3)
Cloud Architecture Tutorial - Running in the Cloud (3of3)Adrian Cockcroft
 
Global Netflix - HPTS Workshop - Scaling Cassandra benchmark to over 1M write...
Global Netflix - HPTS Workshop - Scaling Cassandra benchmark to over 1M write...Global Netflix - HPTS Workshop - Scaling Cassandra benchmark to over 1M write...
Global Netflix - HPTS Workshop - Scaling Cassandra benchmark to over 1M write...Adrian Cockcroft
 
Migrating Netflix from Datacenter Oracle to Global Cassandra
Migrating Netflix from Datacenter Oracle to Global CassandraMigrating Netflix from Datacenter Oracle to Global Cassandra
Migrating Netflix from Datacenter Oracle to Global CassandraAdrian Cockcroft
 
Netflix Velocity Conference 2011
Netflix Velocity Conference 2011Netflix Velocity Conference 2011
Netflix Velocity Conference 2011Adrian Cockcroft
 
Performance architecture for cloud connect
Performance architecture for cloud connectPerformance architecture for cloud connect
Performance architecture for cloud connectAdrian Cockcroft
 
Cmg06 utilization is useless
Cmg06 utilization is uselessCmg06 utilization is useless
Cmg06 utilization is uselessAdrian Cockcroft
 
Netflix on Cloud - combined slides for Dev and Ops
Netflix on Cloud - combined slides for Dev and OpsNetflix on Cloud - combined slides for Dev and Ops
Netflix on Cloud - combined slides for Dev and OpsAdrian Cockcroft
 

More from Adrian Cockcroft (13)

Netflix in the Cloud at SV Forum
Netflix in the Cloud at SV ForumNetflix in the Cloud at SV Forum
Netflix in the Cloud at SV Forum
 
Cloud Architecture Tutorial - Why and What (1of 3)
Cloud Architecture Tutorial - Why and What (1of 3) Cloud Architecture Tutorial - Why and What (1of 3)
Cloud Architecture Tutorial - Why and What (1of 3)
 
Cloud Architecture Tutorial - Platform Component Architecture (2of3)
Cloud Architecture Tutorial - Platform Component Architecture (2of3)Cloud Architecture Tutorial - Platform Component Architecture (2of3)
Cloud Architecture Tutorial - Platform Component Architecture (2of3)
 
Cloud Architecture Tutorial - Running in the Cloud (3of3)
Cloud Architecture Tutorial - Running in the Cloud (3of3)Cloud Architecture Tutorial - Running in the Cloud (3of3)
Cloud Architecture Tutorial - Running in the Cloud (3of3)
 
Global Netflix - HPTS Workshop - Scaling Cassandra benchmark to over 1M write...
Global Netflix - HPTS Workshop - Scaling Cassandra benchmark to over 1M write...Global Netflix - HPTS Workshop - Scaling Cassandra benchmark to over 1M write...
Global Netflix - HPTS Workshop - Scaling Cassandra benchmark to over 1M write...
 
Migrating Netflix from Datacenter Oracle to Global Cassandra
Migrating Netflix from Datacenter Oracle to Global CassandraMigrating Netflix from Datacenter Oracle to Global Cassandra
Migrating Netflix from Datacenter Oracle to Global Cassandra
 
Netflix Velocity Conference 2011
Netflix Velocity Conference 2011Netflix Velocity Conference 2011
Netflix Velocity Conference 2011
 
Migrating to Public Cloud
Migrating to Public CloudMigrating to Public Cloud
Migrating to Public Cloud
 
Performance architecture for cloud connect
Performance architecture for cloud connectPerformance architecture for cloud connect
Performance architecture for cloud connect
 
Netflix in the cloud 2011
Netflix in the cloud 2011Netflix in the cloud 2011
Netflix in the cloud 2011
 
Cmg06 utilization is useless
Cmg06 utilization is uselessCmg06 utilization is useless
Cmg06 utilization is useless
 
Netflix on Cloud - combined slides for Dev and Ops
Netflix on Cloud - combined slides for Dev and OpsNetflix on Cloud - combined slides for Dev and Ops
Netflix on Cloud - combined slides for Dev and Ops
 
NoSQL for Netflix
NoSQL for NetflixNoSQL for Netflix
NoSQL for Netflix
 

Recently uploaded

Speed Wins: From Kafka to APIs in Minutes
Speed Wins: From Kafka to APIs in MinutesSpeed Wins: From Kafka to APIs in Minutes
Speed Wins: From Kafka to APIs in Minutesconfluent
 
ASRock Industrial FDO Solutions in Action for Industrial Edge AI _ Kenny at A...
ASRock Industrial FDO Solutions in Action for Industrial Edge AI _ Kenny at A...ASRock Industrial FDO Solutions in Action for Industrial Edge AI _ Kenny at A...
ASRock Industrial FDO Solutions in Action for Industrial Edge AI _ Kenny at A...FIDO Alliance
 
AI presentation and introduction - Retrieval Augmented Generation RAG 101
AI presentation and introduction - Retrieval Augmented Generation RAG 101AI presentation and introduction - Retrieval Augmented Generation RAG 101
AI presentation and introduction - Retrieval Augmented Generation RAG 101vincent683379
 
WebAssembly is Key to Better LLM Performance
WebAssembly is Key to Better LLM PerformanceWebAssembly is Key to Better LLM Performance
WebAssembly is Key to Better LLM PerformanceSamy Fodil
 
What's New in Teams Calling, Meetings and Devices April 2024
What's New in Teams Calling, Meetings and Devices April 2024What's New in Teams Calling, Meetings and Devices April 2024
What's New in Teams Calling, Meetings and Devices April 2024Stephanie Beckett
 
Linux Foundation Edge _ Overview of FDO Software Components _ Randy at Intel.pdf
Linux Foundation Edge _ Overview of FDO Software Components _ Randy at Intel.pdfLinux Foundation Edge _ Overview of FDO Software Components _ Randy at Intel.pdf
Linux Foundation Edge _ Overview of FDO Software Components _ Randy at Intel.pdfFIDO Alliance
 
Microsoft CSP Briefing Pre-Engagement - Questionnaire
Microsoft CSP Briefing Pre-Engagement - QuestionnaireMicrosoft CSP Briefing Pre-Engagement - Questionnaire
Microsoft CSP Briefing Pre-Engagement - QuestionnaireExakis Nelite
 
Choosing the Right FDO Deployment Model for Your Application _ Geoffrey at In...
Choosing the Right FDO Deployment Model for Your Application _ Geoffrey at In...Choosing the Right FDO Deployment Model for Your Application _ Geoffrey at In...
Choosing the Right FDO Deployment Model for Your Application _ Geoffrey at In...FIDO Alliance
 
Secure Zero Touch enabled Edge compute with Dell NativeEdge via FDO _ Brad at...
Secure Zero Touch enabled Edge compute with Dell NativeEdge via FDO _ Brad at...Secure Zero Touch enabled Edge compute with Dell NativeEdge via FDO _ Brad at...
Secure Zero Touch enabled Edge compute with Dell NativeEdge via FDO _ Brad at...FIDO Alliance
 
Powerful Start- the Key to Project Success, Barbara Laskowska
Powerful Start- the Key to Project Success, Barbara LaskowskaPowerful Start- the Key to Project Success, Barbara Laskowska
Powerful Start- the Key to Project Success, Barbara LaskowskaCzechDreamin
 
Salesforce Adoption – Metrics, Methods, and Motivation, Antone Kom
Salesforce Adoption – Metrics, Methods, and Motivation, Antone KomSalesforce Adoption – Metrics, Methods, and Motivation, Antone Kom
Salesforce Adoption – Metrics, Methods, and Motivation, Antone KomCzechDreamin
 
TEST BANK For, Information Technology Project Management 9th Edition Kathy Sc...
TEST BANK For, Information Technology Project Management 9th Edition Kathy Sc...TEST BANK For, Information Technology Project Management 9th Edition Kathy Sc...
TEST BANK For, Information Technology Project Management 9th Edition Kathy Sc...marcuskenyatta275
 
Portal Kombat : extension du réseau de propagande russe
Portal Kombat : extension du réseau de propagande russePortal Kombat : extension du réseau de propagande russe
Portal Kombat : extension du réseau de propagande russe中 央社
 
TopCryptoSupers 12thReport OrionX May2024
TopCryptoSupers 12thReport OrionX May2024TopCryptoSupers 12thReport OrionX May2024
TopCryptoSupers 12thReport OrionX May2024Stephen Perrenod
 
Easier, Faster, and More Powerful – Notes Document Properties Reimagined
Easier, Faster, and More Powerful – Notes Document Properties ReimaginedEasier, Faster, and More Powerful – Notes Document Properties Reimagined
Easier, Faster, and More Powerful – Notes Document Properties Reimaginedpanagenda
 
PLAI - Acceleration Program for Generative A.I. Startups
PLAI - Acceleration Program for Generative A.I. StartupsPLAI - Acceleration Program for Generative A.I. Startups
PLAI - Acceleration Program for Generative A.I. StartupsStefano
 
Easier, Faster, and More Powerful – Alles Neu macht der Mai -Wir durchleuchte...
Easier, Faster, and More Powerful – Alles Neu macht der Mai -Wir durchleuchte...Easier, Faster, and More Powerful – Alles Neu macht der Mai -Wir durchleuchte...
Easier, Faster, and More Powerful – Alles Neu macht der Mai -Wir durchleuchte...panagenda
 
Demystifying gRPC in .Net by John Staveley
Demystifying gRPC in .Net by John StaveleyDemystifying gRPC in .Net by John Staveley
Demystifying gRPC in .Net by John StaveleyJohn Staveley
 
Syngulon - Selection technology May 2024.pdf
Syngulon - Selection technology May 2024.pdfSyngulon - Selection technology May 2024.pdf
Syngulon - Selection technology May 2024.pdfSyngulon
 

Recently uploaded (20)

Speed Wins: From Kafka to APIs in Minutes
Speed Wins: From Kafka to APIs in MinutesSpeed Wins: From Kafka to APIs in Minutes
Speed Wins: From Kafka to APIs in Minutes
 
ASRock Industrial FDO Solutions in Action for Industrial Edge AI _ Kenny at A...
ASRock Industrial FDO Solutions in Action for Industrial Edge AI _ Kenny at A...ASRock Industrial FDO Solutions in Action for Industrial Edge AI _ Kenny at A...
ASRock Industrial FDO Solutions in Action for Industrial Edge AI _ Kenny at A...
 
AI presentation and introduction - Retrieval Augmented Generation RAG 101
AI presentation and introduction - Retrieval Augmented Generation RAG 101AI presentation and introduction - Retrieval Augmented Generation RAG 101
AI presentation and introduction - Retrieval Augmented Generation RAG 101
 
WebAssembly is Key to Better LLM Performance
WebAssembly is Key to Better LLM PerformanceWebAssembly is Key to Better LLM Performance
WebAssembly is Key to Better LLM Performance
 
What's New in Teams Calling, Meetings and Devices April 2024
What's New in Teams Calling, Meetings and Devices April 2024What's New in Teams Calling, Meetings and Devices April 2024
What's New in Teams Calling, Meetings and Devices April 2024
 
Linux Foundation Edge _ Overview of FDO Software Components _ Randy at Intel.pdf
Linux Foundation Edge _ Overview of FDO Software Components _ Randy at Intel.pdfLinux Foundation Edge _ Overview of FDO Software Components _ Randy at Intel.pdf
Linux Foundation Edge _ Overview of FDO Software Components _ Randy at Intel.pdf
 
Microsoft CSP Briefing Pre-Engagement - Questionnaire
Microsoft CSP Briefing Pre-Engagement - QuestionnaireMicrosoft CSP Briefing Pre-Engagement - Questionnaire
Microsoft CSP Briefing Pre-Engagement - Questionnaire
 
Choosing the Right FDO Deployment Model for Your Application _ Geoffrey at In...
Choosing the Right FDO Deployment Model for Your Application _ Geoffrey at In...Choosing the Right FDO Deployment Model for Your Application _ Geoffrey at In...
Choosing the Right FDO Deployment Model for Your Application _ Geoffrey at In...
 
Secure Zero Touch enabled Edge compute with Dell NativeEdge via FDO _ Brad at...
Secure Zero Touch enabled Edge compute with Dell NativeEdge via FDO _ Brad at...Secure Zero Touch enabled Edge compute with Dell NativeEdge via FDO _ Brad at...
Secure Zero Touch enabled Edge compute with Dell NativeEdge via FDO _ Brad at...
 
Powerful Start- the Key to Project Success, Barbara Laskowska
Powerful Start- the Key to Project Success, Barbara LaskowskaPowerful Start- the Key to Project Success, Barbara Laskowska
Powerful Start- the Key to Project Success, Barbara Laskowska
 
Salesforce Adoption – Metrics, Methods, and Motivation, Antone Kom
Salesforce Adoption – Metrics, Methods, and Motivation, Antone KomSalesforce Adoption – Metrics, Methods, and Motivation, Antone Kom
Salesforce Adoption – Metrics, Methods, and Motivation, Antone Kom
 
TEST BANK For, Information Technology Project Management 9th Edition Kathy Sc...
TEST BANK For, Information Technology Project Management 9th Edition Kathy Sc...TEST BANK For, Information Technology Project Management 9th Edition Kathy Sc...
TEST BANK For, Information Technology Project Management 9th Edition Kathy Sc...
 
Portal Kombat : extension du réseau de propagande russe
Portal Kombat : extension du réseau de propagande russePortal Kombat : extension du réseau de propagande russe
Portal Kombat : extension du réseau de propagande russe
 
TopCryptoSupers 12thReport OrionX May2024
TopCryptoSupers 12thReport OrionX May2024TopCryptoSupers 12thReport OrionX May2024
TopCryptoSupers 12thReport OrionX May2024
 
Easier, Faster, and More Powerful – Notes Document Properties Reimagined
Easier, Faster, and More Powerful – Notes Document Properties ReimaginedEasier, Faster, and More Powerful – Notes Document Properties Reimagined
Easier, Faster, and More Powerful – Notes Document Properties Reimagined
 
PLAI - Acceleration Program for Generative A.I. Startups
PLAI - Acceleration Program for Generative A.I. StartupsPLAI - Acceleration Program for Generative A.I. Startups
PLAI - Acceleration Program for Generative A.I. Startups
 
Easier, Faster, and More Powerful – Alles Neu macht der Mai -Wir durchleuchte...
Easier, Faster, and More Powerful – Alles Neu macht der Mai -Wir durchleuchte...Easier, Faster, and More Powerful – Alles Neu macht der Mai -Wir durchleuchte...
Easier, Faster, and More Powerful – Alles Neu macht der Mai -Wir durchleuchte...
 
Demystifying gRPC in .Net by John Staveley
Demystifying gRPC in .Net by John StaveleyDemystifying gRPC in .Net by John Staveley
Demystifying gRPC in .Net by John Staveley
 
Syngulon - Selection technology May 2024.pdf
Syngulon - Selection technology May 2024.pdfSyngulon - Selection technology May 2024.pdf
Syngulon - Selection technology May 2024.pdf
 
Overview of Hyperledger Foundation
Overview of Hyperledger FoundationOverview of Hyperledger Foundation
Overview of Hyperledger Foundation
 

AWS Re:Invent - High Availability Architecture at Netflix

  • 1.
  • 3. (I’m skipping all the cloud intro etc. Netflix runs in the cloud, if you hadn’t figured that out already you aren’t paying attention and should go to the other Netflix talks at AWS Re:Invent or read slideshare.net/netflix)
  • 4.
  • 6.
  • 7.
  • 8.
  • 9. Architecture applies to any cloud or datacenter Illustrated today using real world examples
  • 10. Consumer User Data Electronics Web Site or Browse Discovery API AWS Cloud Services Personalization CDN Edge Locations DRM Customer Play Device (PC, Streaming API PS3, TV…) QoS Logging CDN Management and Steering Watch OpenConnect CDN Boxes Content Encoding
  • 11. Each icon is three to a few hundred instances across Cassandra three AWS zones memcached Web service Start Here S3 bucket Personalization movie group chooser
  • 12.
  • 13. Deployed in Three Balanced Availability Zones Load Balancers Zone A Zone B Zone C Cassandra and Evcache Cassandra and Evcache Cassandra and Evcache Replicas Replicas Replicas
  • 14. Triple Replicated Persistence Load Balancers Zone A Zone B Zone C Cassandra and Evcache Cassandra and Evcache Cassandra and Evcache Replicas Replicas Replicas
  • 15. Isolated Regions US-East Load Balancers EU-West Load Balancers Zone A Zone B Zone C Zone A Zone B Zone C Cassandra Replicas Cassandra Replicas Cassandra Replicas Cassandra Replicas Cassandra Replicas Cassandra Replicas
  • 16. Failure Mode Probability Mitigation Plan Application Failure High Automatic degraded response AWS Region Failure Low Wait for region to recover AWS Zone Failure Medium Continue to run on 2 out of 3 zones Datacenter Failure Medium Migrate more functions to cloud Data store failure Low Restore from S3 backups S3 failure Low Restore from remote archive
  • 17. Run what you wrote Rapid detection Rapid Response
  • 21. http://techblog.netflix.com/2012/11/edda-learn-stories-of-your-cloud.html Eureka Services metadata AWS Instances, ASGs, AppDynamics Request etc. flow Edda Monkeys
  • 22.
  • 23.
  • 24.
  • 26. Classify and name the types of things that might go wrong in the platform or infrastructure
  • 27. Zone Network Outage US-East Load Balancers EU-West Load Balancers Zone A Zone B Zone C Zone A Zone B Zone C Cassandra Replicas Cassandra Replicas Cassandra Replicas Cassandra Replicas Cassandra Replicas Cassandra Replicas Zone Dependent Zone Power Outage Service Outage Dependent Service could be @NetflixOSS platform or underlying infrastructure
  • 28.
  • 29. Regional Network Outage US-East Load Balancers EU-West Load Balancers Zone A Zone B Zone C Zone A Zone B Zone C Cassandra Replicas Cassandra Replicas Cassandra Replicas Cassandra Replicas Cassandra Replicas Cassandra Replicas Control Plane Overload
  • 30.
  • 31. Cascading Capacity Overload US-East Load Balancers EU-West Load Balancers Zone A Zone B Zone C Zone A Zone B Zone C Cassandra Replicas Cassandra Replicas Cassandra Replicas Cassandra Replicas Cassandra Replicas Cassandra Replicas Capacity demand migrates to services Platform and Infrastructure Migrating demand across regions may in another zone that don’t scale up fast Software Bugs and Global just spread the problem further… enough to take the load Configuration Errors “Oops…”
  • 32.
  • 33. Hardening the cloud Lessons Learned at Scale Why Netflix Stays Up (Mostly)
  • 34.
  • 38. @NetflixOSS Eureka service directory failed to mark down dead instances due to a configuration error US-East Load Balancers EU-West Load Balancers Zone A Zone B Zone C Zone A Zone B Zone C Cassandra Replicas Cassandra Replicas Cassandra Replicas Cassandra Replicas Cassandra Replicas Cassandra Replicas Effect: higher latency and errors Zone Power Outage Mitigation: Fixed configuration, and made Applications not using Zone- zone aware routing the default aware routing kept trying to talk to dead instances and timing out
  • 39.
  • 40. Zone Enable DNS Command Queue Per-Zone Control Plane Command Queues US-East Load Balancers EU-West Load Balancers Zone A Zone B Zone C Zone A Zone B Zone C Cassandra Replicas Cassandra Replicas Cassandra Replicas Cassandra Replicas Cassandra Replicas Cassandra Replicas
  • 41. A highly scalable, available and durable deployment pattern
  • 42. Single function Cassandra Cluster Many Different Single-Function REST Clients Managed by Priam Between 6 and 72 nodes Stateless Data Access REST Service Astyanax Cassandra Client Optional Each icon represents a horizontally scaled service of three to hundreds of Datacenter instances deployed over three availability zones Update Flow Appdynamics Service Flow Visualization
  • 43. Linux Base AMI (CentOS or Ubuntu) Optional Apache frontend, Java (JDK 6 or 7) memcached, non- java apps AppDynamics appagent monitoring Tomcat Monitoring Log rotation to S3 Application war file, base servlet, Healthcheck, status servlets, JMX AppDynamics GC and thread platform, client interface jars, interface, Servo autoscale machineagent dump logging Astyanax Epic/Atlas
  • 45.
  • 46. Linux Base AMI (CentOS or Ubuntu) Tomcat and Priam on JDK Java (JDK 7) Healthcheck, Status AppDynamics appagent monitoring Cassandra Server Monitoring AppDynamics Local Ephemeral Disk Space – 2TB of SSD or 1.6TB disk holding Commit log and GC and thread dump SSTables machineagent logging Epic/Atlas
  • 48. Cassandra Cassandra Cassandra Cassandra Cassandra S3 Backup Cassandra Cassandra Cassandra Cassandra Cassandra Cassandra Archive
  • 51. Legend Github / Techblog Priam Exhibitor Servo and Autoscaling Scripts Cassandra as a Service Zookeeper as a Service Apache Contributions Astyanax Curator Honu Techblog Post Only Cassandra client for Java Zookeeper Patterns Log4j streaming to Hadoop Coming Soon CassJMeter EVCache Circuit Breaker - Hystrix Cassandra test suite Memcached as a Service Robust service pattern Cassandra Multi-region EC2 Eureka / Discovery Asgard - AutoScaleGroup based AWS datastore support Service Directory console Aegisthus Archaius Chaos Monkey Hadoop ETL for Cassandra Dynamics Properties Service Robustness verification Edda Explorers Latency Monkey Queryable config history Governator - Library lifecycle and Server-side latency/error injection Janitor Monkey dependency injection Odin REST Client + mid-tier LB Bakeries and AMI Workflow orchestration Blitz4j - Async logging Configuration REST endpoints Build dynaslaves
  • 52.
  • 53. http://github.com/Netflix http://techblog.netflix.com http://slideshare.net/Netflix http://www.linkedin.com/in/adriancockcroft
  • 54. We are sincerely eager to hear your FEEDBACK on this presentation and on re:Invent. Please fill out an evaluation form when you have a chance.