SlideShare a Scribd company logo
Source: http://ir.netflix.com
(I’m skipping all the cloud intro etc. Netflix runs in the
cloud, if you hadn’t figured that out already you aren’t
   paying attention and should go to the other Netflix
talks at AWS Re:Invent or read slideshare.net/netflix)
In production at
Netflix
2009
2009
2010
2010
2010
2010
2010
2011
Architecture applies to any cloud or datacenter
  Illustrated today using real world examples
Consumer                                               User Data
Electronics
                                      Web Site or
                       Browse        Discovery API
AWS Cloud
 Services                                            Personalization

CDN Edge
Locations
                                                          DRM
               Customer       Play
              Device (PC,            Streaming API
              PS3, TV…)
                                                      QoS Logging


                                                         CDN
                                                      Management
                                                      and Steering
                            Watch    OpenConnect
                                      CDN Boxes
                                                        Content
                                                       Encoding
Each icon is three to a
 few hundred
 instances across                    Cassandra
 three AWS zones

                                                 memcached
                                             Web service
                        Start Here
                                                 S3 bucket




Personalization movie
group chooser
Deployed in Three Balanced Availability Zones

                           Load Balancers




        Zone A                 Zone B                  Zone C
Cassandra and Evcache   Cassandra and Evcache   Cassandra and Evcache
      Replicas                Replicas                Replicas
Triple Replicated Persistence

                             Load Balancers




       Zone A                    Zone B                  Zone C
Cassandra and Evcache     Cassandra and Evcache   Cassandra and Evcache
      Replicas                  Replicas                Replicas
Isolated Regions


                     US-East Load Balancers                                                EU-West Load Balancers




     Zone A                     Zone B                Zone C               Zone A                     Zone B               Zone C

Cassandra Replicas         Cassandra Replicas    Cassandra Replicas   Cassandra Replicas         Cassandra Replicas   Cassandra Replicas
Failure Mode          Probability   Mitigation Plan
Application Failure   High          Automatic degraded response
AWS Region Failure    Low           Wait for region to recover
AWS Zone Failure      Medium        Continue to run on 2 out of 3 zones
Datacenter Failure    Medium        Migrate more functions to cloud
Data store failure    Low           Restore from S3 backups
S3 failure            Low           Restore from remote archive
Run what you wrote
 Rapid detection
 Rapid Response
http://techblog.netflix.com/2012/06/annoucing-archaius-dynamic-properties.html
http://techblog.netflix.com/2012/02/fault-tolerance-in-high-volume.html
http://techblog.netflix.com/2012/07/chaos-monkey-released-into-wild.html
http://techblog.netflix.com/2012/11/edda-learn-stories-of-your-cloud.html



                                             Eureka Services
                                                metadata




                      AWS Instances, ASGs,                     AppDynamics Request
                              etc.                                    flow




                                             Edda              Monkeys
http://techblog.netflix.com/2012/06/asgard-web-based-cloud-management-and.html
Classify and name the types of things that
might go wrong in the platform or infrastructure
Zone Network Outage


                         US-East Load Balancers                                                   EU-West Load Balancers




         Zone A                     Zone B                   Zone C               Zone A                     Zone B               Zone C

    Cassandra Replicas         Cassandra Replicas       Cassandra Replicas   Cassandra Replicas         Cassandra Replicas   Cassandra Replicas




                                                    Zone Dependent
Zone Power Outage
                                                    Service Outage


                                               Dependent Service could be @NetflixOSS
                                                 platform or underlying infrastructure
Regional Network Outage


                     US-East Load Balancers                                                  EU-West Load Balancers




     Zone A                     Zone B                  Zone C               Zone A                     Zone B               Zone C

Cassandra Replicas         Cassandra Replicas      Cassandra Replicas   Cassandra Replicas         Cassandra Replicas   Cassandra Replicas




                                         Control Plane Overload
Cascading Capacity Overload


                         US-East Load Balancers                                                     EU-West Load Balancers




         Zone A                     Zone B               Zone C                     Zone A                      Zone B               Zone C

    Cassandra Replicas         Cassandra Replicas   Cassandra Replicas         Cassandra Replicas          Cassandra Replicas   Cassandra Replicas




Capacity demand migrates to services                    Platform and Infrastructure
                                                                                                    Migrating demand across regions may
in another zone that don’t scale up fast                Software Bugs and Global
                                                                                                    just spread the problem further…
enough to take the load                                    Configuration Errors
                                                                     “Oops…”
Hardening the cloud
 Lessons Learned at Scale
Why Netflix Stays Up (Mostly)
http://techblog.netflix.com/2011/04/lessons-netflix-learned-from-aws-outage.html
http://googleappengine.blogspot.com/2012/10/about-todays-app-engine-outage.html
http://aws.amazon.com/message/67457/
http://techblog.netflix.com/2012/07/lessons-netflix-learned-from-aws-storm.html
@NetflixOSS Eureka service directory failed to mark
                                   down dead instances due to a configuration error

                         US-East Load Balancers                                                      EU-West Load Balancers




         Zone A                     Zone B                    Zone C                  Zone A                    Zone B               Zone C

    Cassandra Replicas         Cassandra Replicas        Cassandra Replicas     Cassandra Replicas         Cassandra Replicas   Cassandra Replicas




                                                                                           Effect: higher latency and errors
Zone Power Outage                                                                          Mitigation: Fixed configuration, and made
                                               Applications not using Zone-
                                                                                           zone aware routing the default
                                               aware routing kept trying to talk to
                                               dead instances and timing out
Zone Enable DNS
Command Queue                                     Per-Zone Control Plane
                                                  Command Queues


                      US-East Load Balancers                                               EU-West Load Balancers




      Zone A                     Zone B               Zone C               Zone A                     Zone B               Zone C

 Cassandra Replicas         Cassandra Replicas   Cassandra Replicas   Cassandra Replicas         Cassandra Replicas   Cassandra Replicas
A highly scalable, available and durable
          deployment pattern
Single function Cassandra Cluster
  Many Different Single-Function REST Clients                                Managed by Priam
                                                                             Between 6 and 72 nodes

                                            Stateless Data Access REST Service
                                            Astyanax Cassandra Client




                                                                                         Optional
Each icon represents a horizontally scaled service of three to hundreds of               Datacenter
instances deployed over three availability zones                                         Update Flow
                                    Appdynamics Service Flow Visualization
Linux Base AMI (CentOS or Ubuntu)

Optional Apache
    frontend,        Java (JDK 6 or 7)
memcached, non-
   java apps
                      AppDynamics
                        appagent
                       monitoring     Tomcat
   Monitoring
Log rotation to S3                    Application war file, base servlet,
                                                                            Healthcheck, status servlets, JMX
  AppDynamics         GC and thread    platform, client interface jars,
                                                                               interface, Servo autoscale
 machineagent         dump logging                Astyanax
   Epic/Atlas
http://github.com/netflix
Linux Base AMI (CentOS or Ubuntu)

Tomcat and Priam on
        JDK
                       Java (JDK 7)
 Healthcheck, Status

                          AppDynamics
                            appagent
                           monitoring       Cassandra Server
    Monitoring
   AppDynamics                              Local Ephemeral Disk Space – 2TB of SSD or 1.6TB disk holding Commit log and
                       GC and thread dump                                    SSTables
   machineagent             logging
    Epic/Atlas
http://github.com/netflix
Cassandra

              Cassandra                     Cassandra




  Cassandra                                             Cassandra




                               S3
                             Backup
Cassandra                                                 Cassandra




       Cassandra                                  Cassandra




                     Cassandra       Cassandra




 Archive
@NetflixOSS
http://techblog.netflix.com
Legend
 Github / Techblog                Priam                                Exhibitor
                                                                                                     Servo and Autoscaling Scripts
                           Cassandra as a Service                Zookeeper as a Service
Apache Contributions
                                Astyanax                                Curator                                  Honu
Techblog Post Only
                          Cassandra client for Java                Zookeeper Patterns                 Log4j streaming to Hadoop
   Coming Soon
                                CassJMeter                           EVCache                            Circuit Breaker - Hystrix
                             Cassandra test suite               Memcached as a Service                  Robust service pattern

                         Cassandra Multi-region EC2                Eureka / Discovery             Asgard - AutoScaleGroup based AWS
                             datastore support                      Service Directory                           console

                                 Aegisthus                            Archaius                             Chaos Monkey
                          Hadoop ETL for Cassandra            Dynamics Properties Service               Robustness verification
                                                                        Edda
                                   Explorers                                                               Latency Monkey
                                                                Queryable config history

                       Governator - Library lifecycle and
                                                            Server-side latency/error injection             Janitor Monkey
                            dependency injection

                                    Odin
                                                                REST Client + mid-tier LB                  Bakeries and AMI
                            Workflow orchestration

                            Blitz4j - Async logging          Configuration REST endpoints                  Build dynaslaves
http://github.com/Netflix
       http://techblog.netflix.com
       http://slideshare.net/Netflix

http://www.linkedin.com/in/adriancockcroft
We are sincerely eager to
hear your FEEDBACK on this
presentation and on re:Invent.

 Please fill out an evaluation
   form when you have a
            chance.

More Related Content

What's hot

The journey to GitOps
The journey to GitOpsThe journey to GitOps
The journey to GitOps
Nicola Baldi
 
Multi-Clusters Made Easy with Liqo:
Getting Rid of Your Clusters Keeping Them...
Multi-Clusters Made Easy with Liqo:
Getting Rid of Your Clusters Keeping Them...Multi-Clusters Made Easy with Liqo:
Getting Rid of Your Clusters Keeping Them...
Multi-Clusters Made Easy with Liqo:
Getting Rid of Your Clusters Keeping Them...
KCDItaly
 
Docker & Kubernetes 기초 - 최용호
Docker & Kubernetes 기초 - 최용호Docker & Kubernetes 기초 - 최용호
Docker & Kubernetes 기초 - 최용호
용호 최
 
Containers Anywhere with OpenShift by Red Hat
Containers Anywhere with OpenShift by Red HatContainers Anywhere with OpenShift by Red Hat
Containers Anywhere with OpenShift by Red Hat
Amazon Web Services
 
(Draft) Kubernetes - A Comprehensive Overview
(Draft) Kubernetes - A Comprehensive Overview(Draft) Kubernetes - A Comprehensive Overview
(Draft) Kubernetes - A Comprehensive Overview
Bob Killen
 
Kubecon US 2019: Kubernetes Multitenancy WG Deep Dive
Kubecon US 2019: Kubernetes Multitenancy WG Deep DiveKubecon US 2019: Kubernetes Multitenancy WG Deep Dive
Kubecon US 2019: Kubernetes Multitenancy WG Deep Dive
Sanjeev Rampal
 
Infrastructure-as-Code (IaC) Using Terraform (Advanced Edition)
Infrastructure-as-Code (IaC) Using Terraform (Advanced Edition)Infrastructure-as-Code (IaC) Using Terraform (Advanced Edition)
Infrastructure-as-Code (IaC) Using Terraform (Advanced Edition)
Adin Ermie
 
OpenShift Introduction
OpenShift IntroductionOpenShift Introduction
OpenShift Introduction
Red Hat Developers
 
20211109 bleaの使い方(基本編)
20211109 bleaの使い方(基本編)20211109 bleaの使い方(基本編)
20211109 bleaの使い方(基本編)
Amazon Web Services Japan
 
[2018] 오픈스택 5년 운영의 경험
[2018] 오픈스택 5년 운영의 경험[2018] 오픈스택 5년 운영의 경험
[2018] 오픈스택 5년 운영의 경험
NHN FORWARD
 
ハイブリッドクラウドの現実とAzureの使いどころ
ハイブリッドクラウドの現実とAzureの使いどころハイブリッドクラウドの現実とAzureの使いどころ
ハイブリッドクラウドの現実とAzureの使いどころ
Toru Makabe
 
Aks pimarox from zero to hero
Aks pimarox from zero to heroAks pimarox from zero to hero
Aks pimarox from zero to hero
Johan Biere
 
VMware vSphere 6.0 - Troubleshooting Training - Day 4
VMware vSphere 6.0 - Troubleshooting Training - Day 4VMware vSphere 6.0 - Troubleshooting Training - Day 4
VMware vSphere 6.0 - Troubleshooting Training - Day 4
Sanjeev Kumar
 
Temel linux
Temel linuxTemel linux
Temel linux
emreberber07
 
Git ops & Continuous Infrastructure with terra*
Git ops  & Continuous Infrastructure with terra*Git ops  & Continuous Infrastructure with terra*
Git ops & Continuous Infrastructure with terra*
Haggai Philip Zagury
 
インメモリーデータグリッドの選択肢
インメモリーデータグリッドの選択肢インメモリーデータグリッドの選択肢
インメモリーデータグリッドの選択肢
Masaki Yamakawa
 
ジュニパーアイコン集
ジュニパーアイコン集ジュニパーアイコン集
ジュニパーアイコン集
Juniper Networks (日本)
 
Apache Kafka Best Practices
Apache Kafka Best PracticesApache Kafka Best Practices
Apache Kafka Best Practices
DataWorks Summit/Hadoop Summit
 
大規模なリアルタイム監視の導入と展開
大規模なリアルタイム監視の導入と展開大規模なリアルタイム監視の導入と展開
大規模なリアルタイム監視の導入と展開
Rakuten Group, Inc.
 
Red Hat OpenShift Container Platform Overview
Red Hat OpenShift Container Platform OverviewRed Hat OpenShift Container Platform Overview
Red Hat OpenShift Container Platform Overview
James Falkner
 

What's hot (20)

The journey to GitOps
The journey to GitOpsThe journey to GitOps
The journey to GitOps
 
Multi-Clusters Made Easy with Liqo:
Getting Rid of Your Clusters Keeping Them...
Multi-Clusters Made Easy with Liqo:
Getting Rid of Your Clusters Keeping Them...Multi-Clusters Made Easy with Liqo:
Getting Rid of Your Clusters Keeping Them...
Multi-Clusters Made Easy with Liqo:
Getting Rid of Your Clusters Keeping Them...
 
Docker & Kubernetes 기초 - 최용호
Docker & Kubernetes 기초 - 최용호Docker & Kubernetes 기초 - 최용호
Docker & Kubernetes 기초 - 최용호
 
Containers Anywhere with OpenShift by Red Hat
Containers Anywhere with OpenShift by Red HatContainers Anywhere with OpenShift by Red Hat
Containers Anywhere with OpenShift by Red Hat
 
(Draft) Kubernetes - A Comprehensive Overview
(Draft) Kubernetes - A Comprehensive Overview(Draft) Kubernetes - A Comprehensive Overview
(Draft) Kubernetes - A Comprehensive Overview
 
Kubecon US 2019: Kubernetes Multitenancy WG Deep Dive
Kubecon US 2019: Kubernetes Multitenancy WG Deep DiveKubecon US 2019: Kubernetes Multitenancy WG Deep Dive
Kubecon US 2019: Kubernetes Multitenancy WG Deep Dive
 
Infrastructure-as-Code (IaC) Using Terraform (Advanced Edition)
Infrastructure-as-Code (IaC) Using Terraform (Advanced Edition)Infrastructure-as-Code (IaC) Using Terraform (Advanced Edition)
Infrastructure-as-Code (IaC) Using Terraform (Advanced Edition)
 
OpenShift Introduction
OpenShift IntroductionOpenShift Introduction
OpenShift Introduction
 
20211109 bleaの使い方(基本編)
20211109 bleaの使い方(基本編)20211109 bleaの使い方(基本編)
20211109 bleaの使い方(基本編)
 
[2018] 오픈스택 5년 운영의 경험
[2018] 오픈스택 5년 운영의 경험[2018] 오픈스택 5년 운영의 경험
[2018] 오픈스택 5년 운영의 경험
 
ハイブリッドクラウドの現実とAzureの使いどころ
ハイブリッドクラウドの現実とAzureの使いどころハイブリッドクラウドの現実とAzureの使いどころ
ハイブリッドクラウドの現実とAzureの使いどころ
 
Aks pimarox from zero to hero
Aks pimarox from zero to heroAks pimarox from zero to hero
Aks pimarox from zero to hero
 
VMware vSphere 6.0 - Troubleshooting Training - Day 4
VMware vSphere 6.0 - Troubleshooting Training - Day 4VMware vSphere 6.0 - Troubleshooting Training - Day 4
VMware vSphere 6.0 - Troubleshooting Training - Day 4
 
Temel linux
Temel linuxTemel linux
Temel linux
 
Git ops & Continuous Infrastructure with terra*
Git ops  & Continuous Infrastructure with terra*Git ops  & Continuous Infrastructure with terra*
Git ops & Continuous Infrastructure with terra*
 
インメモリーデータグリッドの選択肢
インメモリーデータグリッドの選択肢インメモリーデータグリッドの選択肢
インメモリーデータグリッドの選択肢
 
ジュニパーアイコン集
ジュニパーアイコン集ジュニパーアイコン集
ジュニパーアイコン集
 
Apache Kafka Best Practices
Apache Kafka Best PracticesApache Kafka Best Practices
Apache Kafka Best Practices
 
大規模なリアルタイム監視の導入と展開
大規模なリアルタイム監視の導入と展開大規模なリアルタイム監視の導入と展開
大規模なリアルタイム監視の導入と展開
 
Red Hat OpenShift Container Platform Overview
Red Hat OpenShift Container Platform OverviewRed Hat OpenShift Container Platform Overview
Red Hat OpenShift Container Platform Overview
 

Viewers also liked

Netflix Global Cloud Architecture
Netflix Global Cloud ArchitectureNetflix Global Cloud Architecture
Netflix Global Cloud Architecture
Adrian Cockcroft
 
Netflix Architecture Tutorial at Gluecon
Netflix Architecture Tutorial at GlueconNetflix Architecture Tutorial at Gluecon
Netflix Architecture Tutorial at Gluecon
Adrian Cockcroft
 
Netflix Cloud Architecture and Open Source
Netflix Cloud Architecture and Open SourceNetflix Cloud Architecture and Open Source
Netflix Cloud Architecture and Open Source
aspyker
 
Architectures for High Availability - QConSF
Architectures for High Availability - QConSFArchitectures for High Availability - QConSF
Architectures for High Availability - QConSF
Adrian Cockcroft
 
Global Netflix Platform
Global Netflix PlatformGlobal Netflix Platform
Global Netflix Platform
Adrian Cockcroft
 
Flowcon (added to for CMG) Keynote talk on how Speed Wins and how Netflix is ...
Flowcon (added to for CMG) Keynote talk on how Speed Wins and how Netflix is ...Flowcon (added to for CMG) Keynote talk on how Speed Wins and how Netflix is ...
Flowcon (added to for CMG) Keynote talk on how Speed Wins and how Netflix is ...
Adrian Cockcroft
 
Yow Conference Dec 2013 Netflix Workshop Slides with Notes
Yow Conference Dec 2013 Netflix Workshop Slides with NotesYow Conference Dec 2013 Netflix Workshop Slides with Notes
Yow Conference Dec 2013 Netflix Workshop Slides with Notes
Adrian Cockcroft
 
CMG2013 Workshop: Netflix Cloud Native, Capacity, Performance and Cost Optimi...
CMG2013 Workshop: Netflix Cloud Native, Capacity, Performance and Cost Optimi...CMG2013 Workshop: Netflix Cloud Native, Capacity, Performance and Cost Optimi...
CMG2013 Workshop: Netflix Cloud Native, Capacity, Performance and Cost Optimi...
Adrian Cockcroft
 
Dystopia as a Service
Dystopia as a ServiceDystopia as a Service
Dystopia as a Service
Adrian Cockcroft
 
Aws multi-region High Availability
Aws multi-region High Availability Aws multi-region High Availability
Aws multi-region High Availability
Adam Book
 
Gluecon 2013 - NetflixOSS Cloud Native Tutorial Introduction
Gluecon 2013 - NetflixOSS Cloud Native Tutorial IntroductionGluecon 2013 - NetflixOSS Cloud Native Tutorial Introduction
Gluecon 2013 - NetflixOSS Cloud Native Tutorial Introduction
Adrian Cockcroft
 
Bottleneck analysis - Devopsdays Silicon Valley 2013
Bottleneck analysis - Devopsdays Silicon Valley 2013Bottleneck analysis - Devopsdays Silicon Valley 2013
Bottleneck analysis - Devopsdays Silicon Valley 2013
Adrian Cockcroft
 
Netflix: From Clouds to Roots
Netflix: From Clouds to RootsNetflix: From Clouds to Roots
Netflix: From Clouds to Roots
Brendan Gregg
 
MicroServices at Netflix - challenges of scale
MicroServices at Netflix - challenges of scaleMicroServices at Netflix - challenges of scale
MicroServices at Netflix - challenges of scale
Sudhir Tonse
 
Gluecon 2013 - Netflix Cloud Native Tutorial Details (part 2)
Gluecon 2013 - Netflix Cloud Native Tutorial Details (part 2)Gluecon 2013 - Netflix Cloud Native Tutorial Details (part 2)
Gluecon 2013 - Netflix Cloud Native Tutorial Details (part 2)
Adrian Cockcroft
 
AWS re:Invent 2016: Another Day in the Life of a Netflix Engineer (DEV209)
AWS re:Invent 2016: Another Day in the Life of a Netflix Engineer (DEV209)AWS re:Invent 2016: Another Day in the Life of a Netflix Engineer (DEV209)
AWS re:Invent 2016: Another Day in the Life of a Netflix Engineer (DEV209)
Amazon Web Services
 
SV Forum Platform Architecture SIG - Netflix Open Source Platform
SV Forum Platform Architecture SIG - Netflix Open Source PlatformSV Forum Platform Architecture SIG - Netflix Open Source Platform
SV Forum Platform Architecture SIG - Netflix Open Source Platform
Adrian Cockcroft
 
Gluecon keynote
Gluecon keynoteGluecon keynote
Gluecon keynote
Adrian Cockcroft
 
How to Design for High Availability & Scale with AWS
How to Design for High Availability & Scale with AWSHow to Design for High Availability & Scale with AWS
How to Design for High Availability & Scale with AWS
Blazeclan Technologies Private Limited
 
NetflixOSS Meetup
NetflixOSS MeetupNetflixOSS Meetup
NetflixOSS Meetup
Adrian Cockcroft
 

Viewers also liked (20)

Netflix Global Cloud Architecture
Netflix Global Cloud ArchitectureNetflix Global Cloud Architecture
Netflix Global Cloud Architecture
 
Netflix Architecture Tutorial at Gluecon
Netflix Architecture Tutorial at GlueconNetflix Architecture Tutorial at Gluecon
Netflix Architecture Tutorial at Gluecon
 
Netflix Cloud Architecture and Open Source
Netflix Cloud Architecture and Open SourceNetflix Cloud Architecture and Open Source
Netflix Cloud Architecture and Open Source
 
Architectures for High Availability - QConSF
Architectures for High Availability - QConSFArchitectures for High Availability - QConSF
Architectures for High Availability - QConSF
 
Global Netflix Platform
Global Netflix PlatformGlobal Netflix Platform
Global Netflix Platform
 
Flowcon (added to for CMG) Keynote talk on how Speed Wins and how Netflix is ...
Flowcon (added to for CMG) Keynote talk on how Speed Wins and how Netflix is ...Flowcon (added to for CMG) Keynote talk on how Speed Wins and how Netflix is ...
Flowcon (added to for CMG) Keynote talk on how Speed Wins and how Netflix is ...
 
Yow Conference Dec 2013 Netflix Workshop Slides with Notes
Yow Conference Dec 2013 Netflix Workshop Slides with NotesYow Conference Dec 2013 Netflix Workshop Slides with Notes
Yow Conference Dec 2013 Netflix Workshop Slides with Notes
 
CMG2013 Workshop: Netflix Cloud Native, Capacity, Performance and Cost Optimi...
CMG2013 Workshop: Netflix Cloud Native, Capacity, Performance and Cost Optimi...CMG2013 Workshop: Netflix Cloud Native, Capacity, Performance and Cost Optimi...
CMG2013 Workshop: Netflix Cloud Native, Capacity, Performance and Cost Optimi...
 
Dystopia as a Service
Dystopia as a ServiceDystopia as a Service
Dystopia as a Service
 
Aws multi-region High Availability
Aws multi-region High Availability Aws multi-region High Availability
Aws multi-region High Availability
 
Gluecon 2013 - NetflixOSS Cloud Native Tutorial Introduction
Gluecon 2013 - NetflixOSS Cloud Native Tutorial IntroductionGluecon 2013 - NetflixOSS Cloud Native Tutorial Introduction
Gluecon 2013 - NetflixOSS Cloud Native Tutorial Introduction
 
Bottleneck analysis - Devopsdays Silicon Valley 2013
Bottleneck analysis - Devopsdays Silicon Valley 2013Bottleneck analysis - Devopsdays Silicon Valley 2013
Bottleneck analysis - Devopsdays Silicon Valley 2013
 
Netflix: From Clouds to Roots
Netflix: From Clouds to RootsNetflix: From Clouds to Roots
Netflix: From Clouds to Roots
 
MicroServices at Netflix - challenges of scale
MicroServices at Netflix - challenges of scaleMicroServices at Netflix - challenges of scale
MicroServices at Netflix - challenges of scale
 
Gluecon 2013 - Netflix Cloud Native Tutorial Details (part 2)
Gluecon 2013 - Netflix Cloud Native Tutorial Details (part 2)Gluecon 2013 - Netflix Cloud Native Tutorial Details (part 2)
Gluecon 2013 - Netflix Cloud Native Tutorial Details (part 2)
 
AWS re:Invent 2016: Another Day in the Life of a Netflix Engineer (DEV209)
AWS re:Invent 2016: Another Day in the Life of a Netflix Engineer (DEV209)AWS re:Invent 2016: Another Day in the Life of a Netflix Engineer (DEV209)
AWS re:Invent 2016: Another Day in the Life of a Netflix Engineer (DEV209)
 
SV Forum Platform Architecture SIG - Netflix Open Source Platform
SV Forum Platform Architecture SIG - Netflix Open Source PlatformSV Forum Platform Architecture SIG - Netflix Open Source Platform
SV Forum Platform Architecture SIG - Netflix Open Source Platform
 
Gluecon keynote
Gluecon keynoteGluecon keynote
Gluecon keynote
 
How to Design for High Availability & Scale with AWS
How to Design for High Availability & Scale with AWSHow to Design for High Availability & Scale with AWS
How to Design for High Availability & Scale with AWS
 
NetflixOSS Meetup
NetflixOSS MeetupNetflixOSS Meetup
NetflixOSS Meetup
 

Similar to AWS Re:Invent - High Availability Architecture at Netflix

ARC203 Highly Available Architecture at Netflix - AWS re: Invent 2012
ARC203 Highly Available Architecture at Netflix - AWS re: Invent 2012ARC203 Highly Available Architecture at Netflix - AWS re: Invent 2012
ARC203 Highly Available Architecture at Netflix - AWS re: Invent 2012
Amazon Web Services
 
Cassandra EU 2012 - Netflix's Cassandra Architecture and Open Source Efforts
Cassandra EU 2012 - Netflix's Cassandra Architecture and Open Source EffortsCassandra EU 2012 - Netflix's Cassandra Architecture and Open Source Efforts
Cassandra EU 2012 - Netflix's Cassandra Architecture and Open Source Efforts
Acunu
 
Netflix and Open Source
Netflix and Open SourceNetflix and Open Source
Netflix and Open Source
Adrian Cockcroft
 
C* Summit 2013: Netflix Open Source Tools and Benchmarks for Cassandra by Adr...
C* Summit 2013: Netflix Open Source Tools and Benchmarks for Cassandra by Adr...C* Summit 2013: Netflix Open Source Tools and Benchmarks for Cassandra by Adr...
C* Summit 2013: Netflix Open Source Tools and Benchmarks for Cassandra by Adr...
DataStax Academy
 
Servers fail, who cares?
Servers fail, who cares? Servers fail, who cares?
Servers fail, who cares?
greggulrich
 
CloudFest Denver Windows Azure Design Patterns
CloudFest Denver Windows Azure Design PatternsCloudFest Denver Windows Azure Design Patterns
CloudFest Denver Windows Azure Design Patterns
David Pallmann
 
Cassandra Performance and Scalability on AWS
Cassandra Performance and Scalability on AWSCassandra Performance and Scalability on AWS
Cassandra Performance and Scalability on AWS
Adrian Cockcroft
 
Disaster Recovery with the AWS Cloud
Disaster Recovery with the AWS CloudDisaster Recovery with the AWS Cloud
Disaster Recovery with the AWS Cloud
Amazon Web Services
 
CloudStack technical overview
CloudStack technical overviewCloudStack technical overview
Running High Availability Websites with Acquia and AWS
Running High Availability Websites with Acquia and AWSRunning High Availability Websites with Acquia and AWS
Running High Availability Websites with Acquia and AWS
Acquia
 
The Netflix Open Source Platform
The Netflix Open Source PlatformThe Netflix Open Source Platform
The Netflix Open Source Platform
Ruslan Meshenberg
 
1 Introduction at CloudStack Developer Day
1 Introduction at CloudStack Developer Day 1 Introduction at CloudStack Developer Day
1 Introduction at CloudStack Developer Day
Kimihiko Kitase
 
Netflix presents at MassTLC Cloud Summit 2013
Netflix presents at MassTLC Cloud Summit 2013Netflix presents at MassTLC Cloud Summit 2013
Netflix presents at MassTLC Cloud Summit 2013
MassTLC
 
Building Fault Tolerant Applications in the cloud - AWS Summit 2012 - NYC
Building Fault Tolerant Applications in the cloud - AWS Summit 2012 - NYC Building Fault Tolerant Applications in the cloud - AWS Summit 2012 - NYC
Building Fault Tolerant Applications in the cloud - AWS Summit 2012 - NYC
Amazon Web Services
 
Cassandra Summit 2014: Cassandra Compute Cloud: An elastic Cassandra Infrastr...
Cassandra Summit 2014: Cassandra Compute Cloud: An elastic Cassandra Infrastr...Cassandra Summit 2014: Cassandra Compute Cloud: An elastic Cassandra Infrastr...
Cassandra Summit 2014: Cassandra Compute Cloud: An elastic Cassandra Infrastr...
DataStax Academy
 
Ram chinta hug-20120922-v1
Ram chinta hug-20120922-v1Ram chinta hug-20120922-v1
Ram chinta hug-20120922-v1
Ram Chinta
 
AWS re:Invent 2016: How to Migrate Microsoft Windows Applications to AWS Quic...
AWS re:Invent 2016: How to Migrate Microsoft Windows Applications to AWS Quic...AWS re:Invent 2016: How to Migrate Microsoft Windows Applications to AWS Quic...
AWS re:Invent 2016: How to Migrate Microsoft Windows Applications to AWS Quic...
Amazon Web Services
 
AWS for Start-ups - Case Study - Go Squared
AWS for Start-ups - Case Study - Go SquaredAWS for Start-ups - Case Study - Go Squared
AWS for Start-ups - Case Study - Go Squared
Amazon Web Services
 
Windows Azure Design Patterns
Windows Azure Design PatternsWindows Azure Design Patterns
Windows Azure Design Patterns
David Pallmann
 
Netflix Global Applications - NoSQL Search Roadshow
Netflix Global Applications - NoSQL Search RoadshowNetflix Global Applications - NoSQL Search Roadshow
Netflix Global Applications - NoSQL Search Roadshow
Adrian Cockcroft
 

Similar to AWS Re:Invent - High Availability Architecture at Netflix (20)

ARC203 Highly Available Architecture at Netflix - AWS re: Invent 2012
ARC203 Highly Available Architecture at Netflix - AWS re: Invent 2012ARC203 Highly Available Architecture at Netflix - AWS re: Invent 2012
ARC203 Highly Available Architecture at Netflix - AWS re: Invent 2012
 
Cassandra EU 2012 - Netflix's Cassandra Architecture and Open Source Efforts
Cassandra EU 2012 - Netflix's Cassandra Architecture and Open Source EffortsCassandra EU 2012 - Netflix's Cassandra Architecture and Open Source Efforts
Cassandra EU 2012 - Netflix's Cassandra Architecture and Open Source Efforts
 
Netflix and Open Source
Netflix and Open SourceNetflix and Open Source
Netflix and Open Source
 
C* Summit 2013: Netflix Open Source Tools and Benchmarks for Cassandra by Adr...
C* Summit 2013: Netflix Open Source Tools and Benchmarks for Cassandra by Adr...C* Summit 2013: Netflix Open Source Tools and Benchmarks for Cassandra by Adr...
C* Summit 2013: Netflix Open Source Tools and Benchmarks for Cassandra by Adr...
 
Servers fail, who cares?
Servers fail, who cares? Servers fail, who cares?
Servers fail, who cares?
 
CloudFest Denver Windows Azure Design Patterns
CloudFest Denver Windows Azure Design PatternsCloudFest Denver Windows Azure Design Patterns
CloudFest Denver Windows Azure Design Patterns
 
Cassandra Performance and Scalability on AWS
Cassandra Performance and Scalability on AWSCassandra Performance and Scalability on AWS
Cassandra Performance and Scalability on AWS
 
Disaster Recovery with the AWS Cloud
Disaster Recovery with the AWS CloudDisaster Recovery with the AWS Cloud
Disaster Recovery with the AWS Cloud
 
CloudStack technical overview
CloudStack technical overviewCloudStack technical overview
CloudStack technical overview
 
Running High Availability Websites with Acquia and AWS
Running High Availability Websites with Acquia and AWSRunning High Availability Websites with Acquia and AWS
Running High Availability Websites with Acquia and AWS
 
The Netflix Open Source Platform
The Netflix Open Source PlatformThe Netflix Open Source Platform
The Netflix Open Source Platform
 
1 Introduction at CloudStack Developer Day
1 Introduction at CloudStack Developer Day 1 Introduction at CloudStack Developer Day
1 Introduction at CloudStack Developer Day
 
Netflix presents at MassTLC Cloud Summit 2013
Netflix presents at MassTLC Cloud Summit 2013Netflix presents at MassTLC Cloud Summit 2013
Netflix presents at MassTLC Cloud Summit 2013
 
Building Fault Tolerant Applications in the cloud - AWS Summit 2012 - NYC
Building Fault Tolerant Applications in the cloud - AWS Summit 2012 - NYC Building Fault Tolerant Applications in the cloud - AWS Summit 2012 - NYC
Building Fault Tolerant Applications in the cloud - AWS Summit 2012 - NYC
 
Cassandra Summit 2014: Cassandra Compute Cloud: An elastic Cassandra Infrastr...
Cassandra Summit 2014: Cassandra Compute Cloud: An elastic Cassandra Infrastr...Cassandra Summit 2014: Cassandra Compute Cloud: An elastic Cassandra Infrastr...
Cassandra Summit 2014: Cassandra Compute Cloud: An elastic Cassandra Infrastr...
 
Ram chinta hug-20120922-v1
Ram chinta hug-20120922-v1Ram chinta hug-20120922-v1
Ram chinta hug-20120922-v1
 
AWS re:Invent 2016: How to Migrate Microsoft Windows Applications to AWS Quic...
AWS re:Invent 2016: How to Migrate Microsoft Windows Applications to AWS Quic...AWS re:Invent 2016: How to Migrate Microsoft Windows Applications to AWS Quic...
AWS re:Invent 2016: How to Migrate Microsoft Windows Applications to AWS Quic...
 
AWS for Start-ups - Case Study - Go Squared
AWS for Start-ups - Case Study - Go SquaredAWS for Start-ups - Case Study - Go Squared
AWS for Start-ups - Case Study - Go Squared
 
Windows Azure Design Patterns
Windows Azure Design PatternsWindows Azure Design Patterns
Windows Azure Design Patterns
 
Netflix Global Applications - NoSQL Search Roadshow
Netflix Global Applications - NoSQL Search RoadshowNetflix Global Applications - NoSQL Search Roadshow
Netflix Global Applications - NoSQL Search Roadshow
 

More from Adrian Cockcroft

Netflix in the Cloud at SV Forum
Netflix in the Cloud at SV ForumNetflix in the Cloud at SV Forum
Netflix in the Cloud at SV Forum
Adrian Cockcroft
 
Cloud Architecture Tutorial - Why and What (1of 3)
Cloud Architecture Tutorial - Why and What (1of 3) Cloud Architecture Tutorial - Why and What (1of 3)
Cloud Architecture Tutorial - Why and What (1of 3)
Adrian Cockcroft
 
Cloud Architecture Tutorial - Platform Component Architecture (2of3)
Cloud Architecture Tutorial - Platform Component Architecture (2of3)Cloud Architecture Tutorial - Platform Component Architecture (2of3)
Cloud Architecture Tutorial - Platform Component Architecture (2of3)
Adrian Cockcroft
 
Cloud Architecture Tutorial - Running in the Cloud (3of3)
Cloud Architecture Tutorial - Running in the Cloud (3of3)Cloud Architecture Tutorial - Running in the Cloud (3of3)
Cloud Architecture Tutorial - Running in the Cloud (3of3)
Adrian Cockcroft
 
Global Netflix - HPTS Workshop - Scaling Cassandra benchmark to over 1M write...
Global Netflix - HPTS Workshop - Scaling Cassandra benchmark to over 1M write...Global Netflix - HPTS Workshop - Scaling Cassandra benchmark to over 1M write...
Global Netflix - HPTS Workshop - Scaling Cassandra benchmark to over 1M write...
Adrian Cockcroft
 
Migrating Netflix from Datacenter Oracle to Global Cassandra
Migrating Netflix from Datacenter Oracle to Global CassandraMigrating Netflix from Datacenter Oracle to Global Cassandra
Migrating Netflix from Datacenter Oracle to Global Cassandra
Adrian Cockcroft
 
Netflix Velocity Conference 2011
Netflix Velocity Conference 2011Netflix Velocity Conference 2011
Netflix Velocity Conference 2011
Adrian Cockcroft
 
Migrating to Public Cloud
Migrating to Public CloudMigrating to Public Cloud
Migrating to Public Cloud
Adrian Cockcroft
 
Performance architecture for cloud connect
Performance architecture for cloud connectPerformance architecture for cloud connect
Performance architecture for cloud connect
Adrian Cockcroft
 
Netflix in the cloud 2011
Netflix in the cloud 2011Netflix in the cloud 2011
Netflix in the cloud 2011
Adrian Cockcroft
 
Cmg06 utilization is useless
Cmg06 utilization is uselessCmg06 utilization is useless
Cmg06 utilization is useless
Adrian Cockcroft
 
Netflix on Cloud - combined slides for Dev and Ops
Netflix on Cloud - combined slides for Dev and OpsNetflix on Cloud - combined slides for Dev and Ops
Netflix on Cloud - combined slides for Dev and Ops
Adrian Cockcroft
 
NoSQL for Netflix
NoSQL for NetflixNoSQL for Netflix
NoSQL for Netflix
Adrian Cockcroft
 

More from Adrian Cockcroft (13)

Netflix in the Cloud at SV Forum
Netflix in the Cloud at SV ForumNetflix in the Cloud at SV Forum
Netflix in the Cloud at SV Forum
 
Cloud Architecture Tutorial - Why and What (1of 3)
Cloud Architecture Tutorial - Why and What (1of 3) Cloud Architecture Tutorial - Why and What (1of 3)
Cloud Architecture Tutorial - Why and What (1of 3)
 
Cloud Architecture Tutorial - Platform Component Architecture (2of3)
Cloud Architecture Tutorial - Platform Component Architecture (2of3)Cloud Architecture Tutorial - Platform Component Architecture (2of3)
Cloud Architecture Tutorial - Platform Component Architecture (2of3)
 
Cloud Architecture Tutorial - Running in the Cloud (3of3)
Cloud Architecture Tutorial - Running in the Cloud (3of3)Cloud Architecture Tutorial - Running in the Cloud (3of3)
Cloud Architecture Tutorial - Running in the Cloud (3of3)
 
Global Netflix - HPTS Workshop - Scaling Cassandra benchmark to over 1M write...
Global Netflix - HPTS Workshop - Scaling Cassandra benchmark to over 1M write...Global Netflix - HPTS Workshop - Scaling Cassandra benchmark to over 1M write...
Global Netflix - HPTS Workshop - Scaling Cassandra benchmark to over 1M write...
 
Migrating Netflix from Datacenter Oracle to Global Cassandra
Migrating Netflix from Datacenter Oracle to Global CassandraMigrating Netflix from Datacenter Oracle to Global Cassandra
Migrating Netflix from Datacenter Oracle to Global Cassandra
 
Netflix Velocity Conference 2011
Netflix Velocity Conference 2011Netflix Velocity Conference 2011
Netflix Velocity Conference 2011
 
Migrating to Public Cloud
Migrating to Public CloudMigrating to Public Cloud
Migrating to Public Cloud
 
Performance architecture for cloud connect
Performance architecture for cloud connectPerformance architecture for cloud connect
Performance architecture for cloud connect
 
Netflix in the cloud 2011
Netflix in the cloud 2011Netflix in the cloud 2011
Netflix in the cloud 2011
 
Cmg06 utilization is useless
Cmg06 utilization is uselessCmg06 utilization is useless
Cmg06 utilization is useless
 
Netflix on Cloud - combined slides for Dev and Ops
Netflix on Cloud - combined slides for Dev and OpsNetflix on Cloud - combined slides for Dev and Ops
Netflix on Cloud - combined slides for Dev and Ops
 
NoSQL for Netflix
NoSQL for NetflixNoSQL for Netflix
NoSQL for Netflix
 

Recently uploaded

How to use Firebase Data Connect For Flutter
How to use Firebase Data Connect For FlutterHow to use Firebase Data Connect For Flutter
How to use Firebase Data Connect For Flutter
Daiki Mogmet Ito
 
20240609 QFM020 Irresponsible AI Reading List May 2024
20240609 QFM020 Irresponsible AI Reading List May 202420240609 QFM020 Irresponsible AI Reading List May 2024
20240609 QFM020 Irresponsible AI Reading List May 2024
Matthew Sinclair
 
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAUHCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
panagenda
 
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
名前 です男
 
Best 20 SEO Techniques To Improve Website Visibility In SERP
Best 20 SEO Techniques To Improve Website Visibility In SERPBest 20 SEO Techniques To Improve Website Visibility In SERP
Best 20 SEO Techniques To Improve Website Visibility In SERP
Pixlogix Infotech
 
How to Get CNIC Information System with Paksim Ga.pptx
How to Get CNIC Information System with Paksim Ga.pptxHow to Get CNIC Information System with Paksim Ga.pptx
How to Get CNIC Information System with Paksim Ga.pptx
danishmna97
 
Mind map of terminologies used in context of Generative AI
Mind map of terminologies used in context of Generative AIMind map of terminologies used in context of Generative AI
Mind map of terminologies used in context of Generative AI
Kumud Singh
 
Infrastructure Challenges in Scaling RAG with Custom AI models
Infrastructure Challenges in Scaling RAG with Custom AI modelsInfrastructure Challenges in Scaling RAG with Custom AI models
Infrastructure Challenges in Scaling RAG with Custom AI models
Zilliz
 
Cosa hanno in comune un mattoncino Lego e la backdoor XZ?
Cosa hanno in comune un mattoncino Lego e la backdoor XZ?Cosa hanno in comune un mattoncino Lego e la backdoor XZ?
Cosa hanno in comune un mattoncino Lego e la backdoor XZ?
Speck&Tech
 
“Building and Scaling AI Applications with the Nx AI Manager,” a Presentation...
“Building and Scaling AI Applications with the Nx AI Manager,” a Presentation...“Building and Scaling AI Applications with the Nx AI Manager,” a Presentation...
“Building and Scaling AI Applications with the Nx AI Manager,” a Presentation...
Edge AI and Vision Alliance
 
Serial Arm Control in Real Time Presentation
Serial Arm Control in Real Time PresentationSerial Arm Control in Real Time Presentation
Serial Arm Control in Real Time Presentation
tolgahangng
 
Your One-Stop Shop for Python Success: Top 10 US Python Development Providers
Your One-Stop Shop for Python Success: Top 10 US Python Development ProvidersYour One-Stop Shop for Python Success: Top 10 US Python Development Providers
Your One-Stop Shop for Python Success: Top 10 US Python Development Providers
akankshawande
 
Presentation of the OECD Artificial Intelligence Review of Germany
Presentation of the OECD Artificial Intelligence Review of GermanyPresentation of the OECD Artificial Intelligence Review of Germany
Presentation of the OECD Artificial Intelligence Review of Germany
innovationoecd
 
Driving Business Innovation: Latest Generative AI Advancements & Success Story
Driving Business Innovation: Latest Generative AI Advancements & Success StoryDriving Business Innovation: Latest Generative AI Advancements & Success Story
Driving Business Innovation: Latest Generative AI Advancements & Success Story
Safe Software
 
CAKE: Sharing Slices of Confidential Data on Blockchain
CAKE: Sharing Slices of Confidential Data on BlockchainCAKE: Sharing Slices of Confidential Data on Blockchain
CAKE: Sharing Slices of Confidential Data on Blockchain
Claudio Di Ciccio
 
Mariano G Tinti - Decoding SpaceX
Mariano G Tinti - Decoding SpaceXMariano G Tinti - Decoding SpaceX
Mariano G Tinti - Decoding SpaceX
Mariano Tinti
 
Programming Foundation Models with DSPy - Meetup Slides
Programming Foundation Models with DSPy - Meetup SlidesProgramming Foundation Models with DSPy - Meetup Slides
Programming Foundation Models with DSPy - Meetup Slides
Zilliz
 
Monitoring and Managing Anomaly Detection on OpenShift.pdf
Monitoring and Managing Anomaly Detection on OpenShift.pdfMonitoring and Managing Anomaly Detection on OpenShift.pdf
Monitoring and Managing Anomaly Detection on OpenShift.pdf
Tosin Akinosho
 
National Security Agency - NSA mobile device best practices
National Security Agency - NSA mobile device best practicesNational Security Agency - NSA mobile device best practices
National Security Agency - NSA mobile device best practices
Quotidiano Piemontese
 
Video Streaming: Then, Now, and in the Future
Video Streaming: Then, Now, and in the FutureVideo Streaming: Then, Now, and in the Future
Video Streaming: Then, Now, and in the Future
Alpen-Adria-Universität
 

Recently uploaded (20)

How to use Firebase Data Connect For Flutter
How to use Firebase Data Connect For FlutterHow to use Firebase Data Connect For Flutter
How to use Firebase Data Connect For Flutter
 
20240609 QFM020 Irresponsible AI Reading List May 2024
20240609 QFM020 Irresponsible AI Reading List May 202420240609 QFM020 Irresponsible AI Reading List May 2024
20240609 QFM020 Irresponsible AI Reading List May 2024
 
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAUHCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
 
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
 
Best 20 SEO Techniques To Improve Website Visibility In SERP
Best 20 SEO Techniques To Improve Website Visibility In SERPBest 20 SEO Techniques To Improve Website Visibility In SERP
Best 20 SEO Techniques To Improve Website Visibility In SERP
 
How to Get CNIC Information System with Paksim Ga.pptx
How to Get CNIC Information System with Paksim Ga.pptxHow to Get CNIC Information System with Paksim Ga.pptx
How to Get CNIC Information System with Paksim Ga.pptx
 
Mind map of terminologies used in context of Generative AI
Mind map of terminologies used in context of Generative AIMind map of terminologies used in context of Generative AI
Mind map of terminologies used in context of Generative AI
 
Infrastructure Challenges in Scaling RAG with Custom AI models
Infrastructure Challenges in Scaling RAG with Custom AI modelsInfrastructure Challenges in Scaling RAG with Custom AI models
Infrastructure Challenges in Scaling RAG with Custom AI models
 
Cosa hanno in comune un mattoncino Lego e la backdoor XZ?
Cosa hanno in comune un mattoncino Lego e la backdoor XZ?Cosa hanno in comune un mattoncino Lego e la backdoor XZ?
Cosa hanno in comune un mattoncino Lego e la backdoor XZ?
 
“Building and Scaling AI Applications with the Nx AI Manager,” a Presentation...
“Building and Scaling AI Applications with the Nx AI Manager,” a Presentation...“Building and Scaling AI Applications with the Nx AI Manager,” a Presentation...
“Building and Scaling AI Applications with the Nx AI Manager,” a Presentation...
 
Serial Arm Control in Real Time Presentation
Serial Arm Control in Real Time PresentationSerial Arm Control in Real Time Presentation
Serial Arm Control in Real Time Presentation
 
Your One-Stop Shop for Python Success: Top 10 US Python Development Providers
Your One-Stop Shop for Python Success: Top 10 US Python Development ProvidersYour One-Stop Shop for Python Success: Top 10 US Python Development Providers
Your One-Stop Shop for Python Success: Top 10 US Python Development Providers
 
Presentation of the OECD Artificial Intelligence Review of Germany
Presentation of the OECD Artificial Intelligence Review of GermanyPresentation of the OECD Artificial Intelligence Review of Germany
Presentation of the OECD Artificial Intelligence Review of Germany
 
Driving Business Innovation: Latest Generative AI Advancements & Success Story
Driving Business Innovation: Latest Generative AI Advancements & Success StoryDriving Business Innovation: Latest Generative AI Advancements & Success Story
Driving Business Innovation: Latest Generative AI Advancements & Success Story
 
CAKE: Sharing Slices of Confidential Data on Blockchain
CAKE: Sharing Slices of Confidential Data on BlockchainCAKE: Sharing Slices of Confidential Data on Blockchain
CAKE: Sharing Slices of Confidential Data on Blockchain
 
Mariano G Tinti - Decoding SpaceX
Mariano G Tinti - Decoding SpaceXMariano G Tinti - Decoding SpaceX
Mariano G Tinti - Decoding SpaceX
 
Programming Foundation Models with DSPy - Meetup Slides
Programming Foundation Models with DSPy - Meetup SlidesProgramming Foundation Models with DSPy - Meetup Slides
Programming Foundation Models with DSPy - Meetup Slides
 
Monitoring and Managing Anomaly Detection on OpenShift.pdf
Monitoring and Managing Anomaly Detection on OpenShift.pdfMonitoring and Managing Anomaly Detection on OpenShift.pdf
Monitoring and Managing Anomaly Detection on OpenShift.pdf
 
National Security Agency - NSA mobile device best practices
National Security Agency - NSA mobile device best practicesNational Security Agency - NSA mobile device best practices
National Security Agency - NSA mobile device best practices
 
Video Streaming: Then, Now, and in the Future
Video Streaming: Then, Now, and in the FutureVideo Streaming: Then, Now, and in the Future
Video Streaming: Then, Now, and in the Future
 

AWS Re:Invent - High Availability Architecture at Netflix

  • 1.
  • 3. (I’m skipping all the cloud intro etc. Netflix runs in the cloud, if you hadn’t figured that out already you aren’t paying attention and should go to the other Netflix talks at AWS Re:Invent or read slideshare.net/netflix)
  • 4.
  • 6.
  • 7.
  • 8.
  • 9. Architecture applies to any cloud or datacenter Illustrated today using real world examples
  • 10. Consumer User Data Electronics Web Site or Browse Discovery API AWS Cloud Services Personalization CDN Edge Locations DRM Customer Play Device (PC, Streaming API PS3, TV…) QoS Logging CDN Management and Steering Watch OpenConnect CDN Boxes Content Encoding
  • 11. Each icon is three to a few hundred instances across Cassandra three AWS zones memcached Web service Start Here S3 bucket Personalization movie group chooser
  • 12.
  • 13. Deployed in Three Balanced Availability Zones Load Balancers Zone A Zone B Zone C Cassandra and Evcache Cassandra and Evcache Cassandra and Evcache Replicas Replicas Replicas
  • 14. Triple Replicated Persistence Load Balancers Zone A Zone B Zone C Cassandra and Evcache Cassandra and Evcache Cassandra and Evcache Replicas Replicas Replicas
  • 15. Isolated Regions US-East Load Balancers EU-West Load Balancers Zone A Zone B Zone C Zone A Zone B Zone C Cassandra Replicas Cassandra Replicas Cassandra Replicas Cassandra Replicas Cassandra Replicas Cassandra Replicas
  • 16. Failure Mode Probability Mitigation Plan Application Failure High Automatic degraded response AWS Region Failure Low Wait for region to recover AWS Zone Failure Medium Continue to run on 2 out of 3 zones Datacenter Failure Medium Migrate more functions to cloud Data store failure Low Restore from S3 backups S3 failure Low Restore from remote archive
  • 17. Run what you wrote Rapid detection Rapid Response
  • 21. http://techblog.netflix.com/2012/11/edda-learn-stories-of-your-cloud.html Eureka Services metadata AWS Instances, ASGs, AppDynamics Request etc. flow Edda Monkeys
  • 22.
  • 23.
  • 24.
  • 26. Classify and name the types of things that might go wrong in the platform or infrastructure
  • 27. Zone Network Outage US-East Load Balancers EU-West Load Balancers Zone A Zone B Zone C Zone A Zone B Zone C Cassandra Replicas Cassandra Replicas Cassandra Replicas Cassandra Replicas Cassandra Replicas Cassandra Replicas Zone Dependent Zone Power Outage Service Outage Dependent Service could be @NetflixOSS platform or underlying infrastructure
  • 28.
  • 29. Regional Network Outage US-East Load Balancers EU-West Load Balancers Zone A Zone B Zone C Zone A Zone B Zone C Cassandra Replicas Cassandra Replicas Cassandra Replicas Cassandra Replicas Cassandra Replicas Cassandra Replicas Control Plane Overload
  • 30.
  • 31. Cascading Capacity Overload US-East Load Balancers EU-West Load Balancers Zone A Zone B Zone C Zone A Zone B Zone C Cassandra Replicas Cassandra Replicas Cassandra Replicas Cassandra Replicas Cassandra Replicas Cassandra Replicas Capacity demand migrates to services Platform and Infrastructure Migrating demand across regions may in another zone that don’t scale up fast Software Bugs and Global just spread the problem further… enough to take the load Configuration Errors “Oops…”
  • 32.
  • 33. Hardening the cloud Lessons Learned at Scale Why Netflix Stays Up (Mostly)
  • 34.
  • 38. @NetflixOSS Eureka service directory failed to mark down dead instances due to a configuration error US-East Load Balancers EU-West Load Balancers Zone A Zone B Zone C Zone A Zone B Zone C Cassandra Replicas Cassandra Replicas Cassandra Replicas Cassandra Replicas Cassandra Replicas Cassandra Replicas Effect: higher latency and errors Zone Power Outage Mitigation: Fixed configuration, and made Applications not using Zone- zone aware routing the default aware routing kept trying to talk to dead instances and timing out
  • 39.
  • 40. Zone Enable DNS Command Queue Per-Zone Control Plane Command Queues US-East Load Balancers EU-West Load Balancers Zone A Zone B Zone C Zone A Zone B Zone C Cassandra Replicas Cassandra Replicas Cassandra Replicas Cassandra Replicas Cassandra Replicas Cassandra Replicas
  • 41. A highly scalable, available and durable deployment pattern
  • 42. Single function Cassandra Cluster Many Different Single-Function REST Clients Managed by Priam Between 6 and 72 nodes Stateless Data Access REST Service Astyanax Cassandra Client Optional Each icon represents a horizontally scaled service of three to hundreds of Datacenter instances deployed over three availability zones Update Flow Appdynamics Service Flow Visualization
  • 43. Linux Base AMI (CentOS or Ubuntu) Optional Apache frontend, Java (JDK 6 or 7) memcached, non- java apps AppDynamics appagent monitoring Tomcat Monitoring Log rotation to S3 Application war file, base servlet, Healthcheck, status servlets, JMX AppDynamics GC and thread platform, client interface jars, interface, Servo autoscale machineagent dump logging Astyanax Epic/Atlas
  • 45.
  • 46. Linux Base AMI (CentOS or Ubuntu) Tomcat and Priam on JDK Java (JDK 7) Healthcheck, Status AppDynamics appagent monitoring Cassandra Server Monitoring AppDynamics Local Ephemeral Disk Space – 2TB of SSD or 1.6TB disk holding Commit log and GC and thread dump SSTables machineagent logging Epic/Atlas
  • 48. Cassandra Cassandra Cassandra Cassandra Cassandra S3 Backup Cassandra Cassandra Cassandra Cassandra Cassandra Cassandra Archive
  • 51. Legend Github / Techblog Priam Exhibitor Servo and Autoscaling Scripts Cassandra as a Service Zookeeper as a Service Apache Contributions Astyanax Curator Honu Techblog Post Only Cassandra client for Java Zookeeper Patterns Log4j streaming to Hadoop Coming Soon CassJMeter EVCache Circuit Breaker - Hystrix Cassandra test suite Memcached as a Service Robust service pattern Cassandra Multi-region EC2 Eureka / Discovery Asgard - AutoScaleGroup based AWS datastore support Service Directory console Aegisthus Archaius Chaos Monkey Hadoop ETL for Cassandra Dynamics Properties Service Robustness verification Edda Explorers Latency Monkey Queryable config history Governator - Library lifecycle and Server-side latency/error injection Janitor Monkey dependency injection Odin REST Client + mid-tier LB Bakeries and AMI Workflow orchestration Blitz4j - Async logging Configuration REST endpoints Build dynaslaves
  • 52.
  • 53. http://github.com/Netflix http://techblog.netflix.com http://slideshare.net/Netflix http://www.linkedin.com/in/adriancockcroft
  • 54. We are sincerely eager to hear your FEEDBACK on this presentation and on re:Invent. Please fill out an evaluation form when you have a chance.