SlideShare a Scribd company logo
A State of Xen
Chaos Monkey & Cassandra
Who we are
Jean-Sebastien Jeannotte – JS
Senior Software Engineer
Platform Automation Engineering
jjeannotte@netflix.com
@jsjeannotte
http://www.linkedin.com/in/jsjeannotte
Nir Alfasi
Senior Software Engineer
Platform Automation Engineering
alfasi@netflix.com
@niralfasi
http://www.linkedin.com/in/alfasin
Christos Kalantzis
Director of Engineering
Cloud Database Engineering
Cassandra MVP
ckalantzis@netflix.com
@chriskalan
http://www.linkedin.com/in/christoskalantzis
AWS
Bootre:
September 2014, Every AZ
Our stack during Re:boot 2014
C*
Priam
C*
Priam
C*
Priam
REST + SSH
Our stack during Re:boot 2014
Our stack during Re:boot 2014
Our stack during Re:boot 2014
C*
Priam
C*
Priam
C*
Priam
REST + SSH
AtlasAtlasApp
1
App
2
Our stack during Re:boot 2014
Our stack during Re:boot 2014
Disappearing
instance?
Launch new
instance
All good
Is the C* ring
healthy?
Are all instances
healthy?
All good
Can we fix
automatically?
Replace bad
instance
All good
Is there an
offline
maintenance?
First failure?
Sleep for X
minutes and
retry
PagerDuty
Is there an
offline
maintenance?
First failure?
All good
Every
30 min
Our stack during Re:boot 2014
AWS
Bootre:
September 2014, Every AZ
Gaps we identified
Gaps we identified
Gaps we identified
Gaps we identified
New direction
New direction – What others are doing
New direction – What we decided to do
New direction – What we decided to do
New direction – What we decided to do
C*
Priam
C*
Priam
C*
Priam
AtlasAtlasApp
1
App
2
New direction – What we learned (principles)
New direction – What we learned (principles)
New direction – What we learned (principles)
Synchronous Asynchronous
SSH HTTP / REST
New direction – What we learned (principles)
New direction – What we learned (principles)
What does the future look like?
What does the future look like?
What does the future look like?
Check out our https://jobs.netflix.com page for current
openings
Who we are
Jean-Sebastien Jeannotte – JS
Senior Software Engineer
Platform Automation Engineering
jjeannotte@netflix.com
@jsjeannotte
http://www.linkedin.com/in/jsjeannotte
Nir Alfasi
Senior Software Engineer
Platform Automation Engineering
alfasi@netflix.com
@niralfasi
http://www.linkedin.com/in/alfasin
Christos Kalantzis
Director of Engineering
Cloud Database Engineering
Cassandra MVP
ckalantzis@netflix.com
@chriskalan
http://www.linkedin.com/in/christoskalantzis

More Related Content

What's hot

My Top 5 Favorite Gems
My Top 5 Favorite GemsMy Top 5 Favorite Gems
My Top 5 Favorite Gems
Jimmy Ngu
 
RxJS - The Basics & The Future
RxJS - The Basics & The FutureRxJS - The Basics & The Future
RxJS - The Basics & The Future
Tracy Lee
 
SFScon18 - Juri Strumpflohner - End-to-end testing done right!
SFScon18 - Juri Strumpflohner - End-to-end testing done right!SFScon18 - Juri Strumpflohner - End-to-end testing done right!
SFScon18 - Juri Strumpflohner - End-to-end testing done right!
South Tyrol Free Software Conference
 
Exactly once delivery is a harsh mistress - DevOps Days TLV
Exactly once delivery is a harsh mistress - DevOps Days TLVExactly once delivery is a harsh mistress - DevOps Days TLV
Exactly once delivery is a harsh mistress - DevOps Days TLV
Natan Silnitsky
 
Azure Portal - the largest SPA in the World
Azure Portal - the largest SPA in the WorldAzure Portal - the largest SPA in the World
Azure Portal - the largest SPA in the World
Jakub Jedryszek
 
mykola marzhan - jenkins on aws spot instance
mykola marzhan - jenkins on aws spot instancemykola marzhan - jenkins on aws spot instance
mykola marzhan - jenkins on aws spot instance
Dariia Seimova
 
The Power of RxJS in Nativescript + Angular
The Power of RxJS in Nativescript + AngularThe Power of RxJS in Nativescript + Angular
The Power of RxJS in Nativescript + Angular
Tracy Lee
 
Testing Grails Applications With Selenium Rc
Testing Grails Applications With Selenium RcTesting Grails Applications With Selenium Rc
Testing Grails Applications With Selenium Rc
Robert Fletcher
 
Development, Deployment & Collaboration at Etsy
Development, Deployment & Collaboration at EtsyDevelopment, Deployment & Collaboration at Etsy
Development, Deployment & Collaboration at EtsyDaniel Schauenberg
 
Mobile CI at Etsy
Mobile CI at EtsyMobile CI at Etsy
Mobile CI at Etsy
Daniel Schauenberg
 
Building Services on and off Rails
Building Services on and off RailsBuilding Services on and off Rails
Building Services on and off Rails
Yan Pritzker
 
Developer day - AWS: Fast Environments = Fast Deployments
Developer day - AWS: Fast Environments = Fast DeploymentsDeveloper day - AWS: Fast Environments = Fast Deployments
Developer day - AWS: Fast Environments = Fast Deployments
Matthew Cwalinski
 
How Percolate uses CFEngine to Manage AWS Stateless Infrastructure
How Percolate uses CFEngine to Manage AWS Stateless InfrastructureHow Percolate uses CFEngine to Manage AWS Stateless Infrastructure
How Percolate uses CFEngine to Manage AWS Stateless InfrastructurePercolate
 
Swift + GraphQL
Swift + GraphQLSwift + GraphQL
Swift + GraphQL
Sommer Panage
 
Technology | Serverless
Technology | ServerlessTechnology | Serverless
Technology | Serverless
Ani Sinanaj
 
DevOps with Serverless
DevOps with ServerlessDevOps with Serverless
DevOps with Serverless
Yan Cui
 
Angular is one fire(base)! - Shmuela Jacobs
Angular is one fire(base)! - Shmuela JacobsAngular is one fire(base)! - Shmuela Jacobs
Angular is one fire(base)! - Shmuela Jacobs
Codemotion Tel Aviv
 
Elm & Elixir: Functional Programming and Web
Elm & Elixir: Functional Programming and WebElm & Elixir: Functional Programming and Web
Elm & Elixir: Functional Programming and Web
Publitory
 
RxJS: A Beginner & Expert's Perspective - ng-conf 2017
RxJS: A Beginner & Expert's Perspective - ng-conf 2017RxJS: A Beginner & Expert's Perspective - ng-conf 2017
RxJS: A Beginner & Expert's Perspective - ng-conf 2017
Tracy Lee
 

What's hot (20)

My Top 5 Favorite Gems
My Top 5 Favorite GemsMy Top 5 Favorite Gems
My Top 5 Favorite Gems
 
RxJS - The Basics & The Future
RxJS - The Basics & The FutureRxJS - The Basics & The Future
RxJS - The Basics & The Future
 
SFScon18 - Juri Strumpflohner - End-to-end testing done right!
SFScon18 - Juri Strumpflohner - End-to-end testing done right!SFScon18 - Juri Strumpflohner - End-to-end testing done right!
SFScon18 - Juri Strumpflohner - End-to-end testing done right!
 
Exactly once delivery is a harsh mistress - DevOps Days TLV
Exactly once delivery is a harsh mistress - DevOps Days TLVExactly once delivery is a harsh mistress - DevOps Days TLV
Exactly once delivery is a harsh mistress - DevOps Days TLV
 
Azure Portal - the largest SPA in the World
Azure Portal - the largest SPA in the WorldAzure Portal - the largest SPA in the World
Azure Portal - the largest SPA in the World
 
mykola marzhan - jenkins on aws spot instance
mykola marzhan - jenkins on aws spot instancemykola marzhan - jenkins on aws spot instance
mykola marzhan - jenkins on aws spot instance
 
The Power of RxJS in Nativescript + Angular
The Power of RxJS in Nativescript + AngularThe Power of RxJS in Nativescript + Angular
The Power of RxJS in Nativescript + Angular
 
Testing Grails Applications With Selenium Rc
Testing Grails Applications With Selenium RcTesting Grails Applications With Selenium Rc
Testing Grails Applications With Selenium Rc
 
Development, Deployment & Collaboration at Etsy
Development, Deployment & Collaboration at EtsyDevelopment, Deployment & Collaboration at Etsy
Development, Deployment & Collaboration at Etsy
 
Mobile CI at Etsy
Mobile CI at EtsyMobile CI at Etsy
Mobile CI at Etsy
 
Building Services on and off Rails
Building Services on and off RailsBuilding Services on and off Rails
Building Services on and off Rails
 
Developer day - AWS: Fast Environments = Fast Deployments
Developer day - AWS: Fast Environments = Fast DeploymentsDeveloper day - AWS: Fast Environments = Fast Deployments
Developer day - AWS: Fast Environments = Fast Deployments
 
How Percolate uses CFEngine to Manage AWS Stateless Infrastructure
How Percolate uses CFEngine to Manage AWS Stateless InfrastructureHow Percolate uses CFEngine to Manage AWS Stateless Infrastructure
How Percolate uses CFEngine to Manage AWS Stateless Infrastructure
 
Swift + GraphQL
Swift + GraphQLSwift + GraphQL
Swift + GraphQL
 
Technology | Serverless
Technology | ServerlessTechnology | Serverless
Technology | Serverless
 
DevOps with Serverless
DevOps with ServerlessDevOps with Serverless
DevOps with Serverless
 
Angular is one fire(base)! - Shmuela Jacobs
Angular is one fire(base)! - Shmuela JacobsAngular is one fire(base)! - Shmuela Jacobs
Angular is one fire(base)! - Shmuela Jacobs
 
Elm & Elixir: Functional Programming and Web
Elm & Elixir: Functional Programming and WebElm & Elixir: Functional Programming and Web
Elm & Elixir: Functional Programming and Web
 
RxJS: A Beginner & Expert's Perspective - ng-conf 2017
RxJS: A Beginner & Expert's Perspective - ng-conf 2017RxJS: A Beginner & Expert's Perspective - ng-conf 2017
RxJS: A Beginner & Expert's Perspective - ng-conf 2017
 
presentation-chaos-monkey
presentation-chaos-monkeypresentation-chaos-monkey
presentation-chaos-monkey
 

Similar to Cassandra Summit 2015 - A State of Xen - Chaos Monkey & Cassandra

Netflix Winston meetup presentation 2015-11-18
Netflix Winston meetup presentation 2015-11-18Netflix Winston meetup presentation 2015-11-18
Netflix Winston meetup presentation 2015-11-18
Sayli Karmarkar
 
Introduction to EKS (AWS User Group Slovakia)
Introduction to EKS (AWS User Group Slovakia)Introduction to EKS (AWS User Group Slovakia)
Introduction to EKS (AWS User Group Slovakia)
Vladimir Simek
 
Architecting Container Infrastructure for Security and Compliance - CON406 - ...
Architecting Container Infrastructure for Security and Compliance - CON406 - ...Architecting Container Infrastructure for Security and Compliance - CON406 - ...
Architecting Container Infrastructure for Security and Compliance - CON406 - ...
Amazon Web Services
 
K8s on AWS: Introducing Amazon EKS
K8s on AWS: Introducing Amazon EKSK8s on AWS: Introducing Amazon EKS
K8s on AWS: Introducing Amazon EKS
Amazon Web Services
 
【IVS CTO Night & Day】Amazon Container Services
【IVS CTO Night & Day】Amazon Container Services【IVS CTO Night & Day】Amazon Container Services
【IVS CTO Night & Day】Amazon Container Services
Amazon Web Services Japan
 
re:Invent 2017 Recap
re:Invent 2017 Recap re:Invent 2017 Recap
re:Invent 2017 Recap
Amazon Web Services
 
NEW LAUNCH! Introducing Amazon EKS - CON215 - re:Invent 2017
NEW LAUNCH! Introducing Amazon EKS - CON215 - re:Invent 2017NEW LAUNCH! Introducing Amazon EKS - CON215 - re:Invent 2017
NEW LAUNCH! Introducing Amazon EKS - CON215 - re:Invent 2017
Amazon Web Services
 
Deep Learning with Tensorflow and Apache MXNet on AWS (April 2019)
Deep Learning with Tensorflow and Apache MXNet on AWS (April 2019)Deep Learning with Tensorflow and Apache MXNet on AWS (April 2019)
Deep Learning with Tensorflow and Apache MXNet on AWS (April 2019)
Julien SIMON
 
Serverless in production, an experience report (CoDe-Conf)
Serverless in production, an experience report (CoDe-Conf)Serverless in production, an experience report (CoDe-Conf)
Serverless in production, an experience report (CoDe-Conf)
Yan Cui
 
Abusing the Cloud for Fun and Profit
Abusing the Cloud for Fun and ProfitAbusing the Cloud for Fun and Profit
Abusing the Cloud for Fun and Profit
Alan Pinstein
 
Open Source at AWS: Code, Contributions, Collaboration, and Communication
Open Source at AWS: Code, Contributions, Collaboration, and CommunicationOpen Source at AWS: Code, Contributions, Collaboration, and Communication
Open Source at AWS: Code, Contributions, Collaboration, and Communication
Amazon Web Services
 
Serverless in Production, an experience report (AWS UG South Wales)
Serverless in Production, an experience report (AWS UG South Wales)Serverless in Production, an experience report (AWS UG South Wales)
Serverless in Production, an experience report (AWS UG South Wales)
Yan Cui
 
Introducing Amazon EKS
Introducing Amazon EKSIntroducing Amazon EKS
Introducing Amazon EKS
Amazon Web Services
 
A real-life account of moving 100% to a public cloud
A real-life account of moving 100% to a public cloudA real-life account of moving 100% to a public cloud
A real-life account of moving 100% to a public cloud
Julien SIMON
 
Kubernetes on AWS
Kubernetes on AWSKubernetes on AWS
Kubernetes on AWS
Amazon Web Services
 
Amazon Elastic Container Service for Kubernetes (Amazon EKS)
Amazon Elastic Container Service for Kubernetes (Amazon EKS)Amazon Elastic Container Service for Kubernetes (Amazon EKS)
Amazon Elastic Container Service for Kubernetes (Amazon EKS)Amazon Web Services
 
ENT210-How to Get from Zero to Hundreds of AWS-Certified Engineers
ENT210-How to Get from Zero to Hundreds of AWS-Certified EngineersENT210-How to Get from Zero to Hundreds of AWS-Certified Engineers
ENT210-How to Get from Zero to Hundreds of AWS-Certified Engineers
Amazon Web Services
 
Introduction to AWS Fargate & Amazon Elastic Container Service for Kubernetes
Introduction to AWS Fargate & Amazon Elastic Container Service for KubernetesIntroduction to AWS Fargate & Amazon Elastic Container Service for Kubernetes
Introduction to AWS Fargate & Amazon Elastic Container Service for Kubernetes
Amazon Web Services
 
Test driven infrastructure development (2 - puppetconf 2013 edition)
Test driven infrastructure development (2 - puppetconf 2013 edition)Test driven infrastructure development (2 - puppetconf 2013 edition)
Test driven infrastructure development (2 - puppetconf 2013 edition)Tomas Doran
 
Continues Deployment - Tech Talk week
Continues Deployment - Tech Talk weekContinues Deployment - Tech Talk week
Continues Deployment - Tech Talk weekrantav
 

Similar to Cassandra Summit 2015 - A State of Xen - Chaos Monkey & Cassandra (20)

Netflix Winston meetup presentation 2015-11-18
Netflix Winston meetup presentation 2015-11-18Netflix Winston meetup presentation 2015-11-18
Netflix Winston meetup presentation 2015-11-18
 
Introduction to EKS (AWS User Group Slovakia)
Introduction to EKS (AWS User Group Slovakia)Introduction to EKS (AWS User Group Slovakia)
Introduction to EKS (AWS User Group Slovakia)
 
Architecting Container Infrastructure for Security and Compliance - CON406 - ...
Architecting Container Infrastructure for Security and Compliance - CON406 - ...Architecting Container Infrastructure for Security and Compliance - CON406 - ...
Architecting Container Infrastructure for Security and Compliance - CON406 - ...
 
K8s on AWS: Introducing Amazon EKS
K8s on AWS: Introducing Amazon EKSK8s on AWS: Introducing Amazon EKS
K8s on AWS: Introducing Amazon EKS
 
【IVS CTO Night & Day】Amazon Container Services
【IVS CTO Night & Day】Amazon Container Services【IVS CTO Night & Day】Amazon Container Services
【IVS CTO Night & Day】Amazon Container Services
 
re:Invent 2017 Recap
re:Invent 2017 Recap re:Invent 2017 Recap
re:Invent 2017 Recap
 
NEW LAUNCH! Introducing Amazon EKS - CON215 - re:Invent 2017
NEW LAUNCH! Introducing Amazon EKS - CON215 - re:Invent 2017NEW LAUNCH! Introducing Amazon EKS - CON215 - re:Invent 2017
NEW LAUNCH! Introducing Amazon EKS - CON215 - re:Invent 2017
 
Deep Learning with Tensorflow and Apache MXNet on AWS (April 2019)
Deep Learning with Tensorflow and Apache MXNet on AWS (April 2019)Deep Learning with Tensorflow and Apache MXNet on AWS (April 2019)
Deep Learning with Tensorflow and Apache MXNet on AWS (April 2019)
 
Serverless in production, an experience report (CoDe-Conf)
Serverless in production, an experience report (CoDe-Conf)Serverless in production, an experience report (CoDe-Conf)
Serverless in production, an experience report (CoDe-Conf)
 
Abusing the Cloud for Fun and Profit
Abusing the Cloud for Fun and ProfitAbusing the Cloud for Fun and Profit
Abusing the Cloud for Fun and Profit
 
Open Source at AWS: Code, Contributions, Collaboration, and Communication
Open Source at AWS: Code, Contributions, Collaboration, and CommunicationOpen Source at AWS: Code, Contributions, Collaboration, and Communication
Open Source at AWS: Code, Contributions, Collaboration, and Communication
 
Serverless in Production, an experience report (AWS UG South Wales)
Serverless in Production, an experience report (AWS UG South Wales)Serverless in Production, an experience report (AWS UG South Wales)
Serverless in Production, an experience report (AWS UG South Wales)
 
Introducing Amazon EKS
Introducing Amazon EKSIntroducing Amazon EKS
Introducing Amazon EKS
 
A real-life account of moving 100% to a public cloud
A real-life account of moving 100% to a public cloudA real-life account of moving 100% to a public cloud
A real-life account of moving 100% to a public cloud
 
Kubernetes on AWS
Kubernetes on AWSKubernetes on AWS
Kubernetes on AWS
 
Amazon Elastic Container Service for Kubernetes (Amazon EKS)
Amazon Elastic Container Service for Kubernetes (Amazon EKS)Amazon Elastic Container Service for Kubernetes (Amazon EKS)
Amazon Elastic Container Service for Kubernetes (Amazon EKS)
 
ENT210-How to Get from Zero to Hundreds of AWS-Certified Engineers
ENT210-How to Get from Zero to Hundreds of AWS-Certified EngineersENT210-How to Get from Zero to Hundreds of AWS-Certified Engineers
ENT210-How to Get from Zero to Hundreds of AWS-Certified Engineers
 
Introduction to AWS Fargate & Amazon Elastic Container Service for Kubernetes
Introduction to AWS Fargate & Amazon Elastic Container Service for KubernetesIntroduction to AWS Fargate & Amazon Elastic Container Service for Kubernetes
Introduction to AWS Fargate & Amazon Elastic Container Service for Kubernetes
 
Test driven infrastructure development (2 - puppetconf 2013 edition)
Test driven infrastructure development (2 - puppetconf 2013 edition)Test driven infrastructure development (2 - puppetconf 2013 edition)
Test driven infrastructure development (2 - puppetconf 2013 edition)
 
Continues Deployment - Tech Talk week
Continues Deployment - Tech Talk weekContinues Deployment - Tech Talk week
Continues Deployment - Tech Talk week
 

Recently uploaded

Standard Reomte Control Interface - Neometrix
Standard Reomte Control Interface - NeometrixStandard Reomte Control Interface - Neometrix
Standard Reomte Control Interface - Neometrix
Neometrix_Engineering_Pvt_Ltd
 
CME397 Surface Engineering- Professional Elective
CME397 Surface Engineering- Professional ElectiveCME397 Surface Engineering- Professional Elective
CME397 Surface Engineering- Professional Elective
karthi keyan
 
WATER CRISIS and its solutions-pptx 1234
WATER CRISIS and its solutions-pptx 1234WATER CRISIS and its solutions-pptx 1234
WATER CRISIS and its solutions-pptx 1234
AafreenAbuthahir2
 
J.Yang, ICLR 2024, MLILAB, KAIST AI.pdf
J.Yang,  ICLR 2024, MLILAB, KAIST AI.pdfJ.Yang,  ICLR 2024, MLILAB, KAIST AI.pdf
J.Yang, ICLR 2024, MLILAB, KAIST AI.pdf
MLILAB
 
space technology lecture notes on satellite
space technology lecture notes on satellitespace technology lecture notes on satellite
space technology lecture notes on satellite
ongomchris
 
Design and Analysis of Algorithms-DP,Backtracking,Graphs,B&B
Design and Analysis of Algorithms-DP,Backtracking,Graphs,B&BDesign and Analysis of Algorithms-DP,Backtracking,Graphs,B&B
Design and Analysis of Algorithms-DP,Backtracking,Graphs,B&B
Sreedhar Chowdam
 
Hierarchical Digital Twin of a Naval Power System
Hierarchical Digital Twin of a Naval Power SystemHierarchical Digital Twin of a Naval Power System
Hierarchical Digital Twin of a Naval Power System
Kerry Sado
 
weather web application report.pdf
weather web application report.pdfweather web application report.pdf
weather web application report.pdf
Pratik Pawar
 
Investor-Presentation-Q1FY2024 investor presentation document.pptx
Investor-Presentation-Q1FY2024 investor presentation document.pptxInvestor-Presentation-Q1FY2024 investor presentation document.pptx
Investor-Presentation-Q1FY2024 investor presentation document.pptx
AmarGB2
 
DESIGN A COTTON SEED SEPARATION MACHINE.docx
DESIGN A COTTON SEED SEPARATION MACHINE.docxDESIGN A COTTON SEED SEPARATION MACHINE.docx
DESIGN A COTTON SEED SEPARATION MACHINE.docx
FluxPrime1
 
road safety engineering r s e unit 3.pdf
road safety engineering  r s e unit 3.pdfroad safety engineering  r s e unit 3.pdf
road safety engineering r s e unit 3.pdf
VENKATESHvenky89705
 
H.Seo, ICLR 2024, MLILAB, KAIST AI.pdf
H.Seo,  ICLR 2024, MLILAB,  KAIST AI.pdfH.Seo,  ICLR 2024, MLILAB,  KAIST AI.pdf
H.Seo, ICLR 2024, MLILAB, KAIST AI.pdf
MLILAB
 
Top 10 Oil and Gas Projects in Saudi Arabia 2024.pdf
Top 10 Oil and Gas Projects in Saudi Arabia 2024.pdfTop 10 Oil and Gas Projects in Saudi Arabia 2024.pdf
Top 10 Oil and Gas Projects in Saudi Arabia 2024.pdf
Teleport Manpower Consultant
 
English lab ppt no titlespecENG PPTt.pdf
English lab ppt no titlespecENG PPTt.pdfEnglish lab ppt no titlespecENG PPTt.pdf
English lab ppt no titlespecENG PPTt.pdf
BrazilAccount1
 
Governing Equations for Fundamental Aerodynamics_Anderson2010.pdf
Governing Equations for Fundamental Aerodynamics_Anderson2010.pdfGoverning Equations for Fundamental Aerodynamics_Anderson2010.pdf
Governing Equations for Fundamental Aerodynamics_Anderson2010.pdf
WENKENLI1
 
The Benefits and Techniques of Trenchless Pipe Repair.pdf
The Benefits and Techniques of Trenchless Pipe Repair.pdfThe Benefits and Techniques of Trenchless Pipe Repair.pdf
The Benefits and Techniques of Trenchless Pipe Repair.pdf
Pipe Restoration Solutions
 
Railway Signalling Principles Edition 3.pdf
Railway Signalling Principles Edition 3.pdfRailway Signalling Principles Edition 3.pdf
Railway Signalling Principles Edition 3.pdf
TeeVichai
 
RAT: Retrieval Augmented Thoughts Elicit Context-Aware Reasoning in Long-Hori...
RAT: Retrieval Augmented Thoughts Elicit Context-Aware Reasoning in Long-Hori...RAT: Retrieval Augmented Thoughts Elicit Context-Aware Reasoning in Long-Hori...
RAT: Retrieval Augmented Thoughts Elicit Context-Aware Reasoning in Long-Hori...
thanhdowork
 
Nuclear Power Economics and Structuring 2024
Nuclear Power Economics and Structuring 2024Nuclear Power Economics and Structuring 2024
Nuclear Power Economics and Structuring 2024
Massimo Talia
 
NO1 Uk best vashikaran specialist in delhi vashikaran baba near me online vas...
NO1 Uk best vashikaran specialist in delhi vashikaran baba near me online vas...NO1 Uk best vashikaran specialist in delhi vashikaran baba near me online vas...
NO1 Uk best vashikaran specialist in delhi vashikaran baba near me online vas...
Amil Baba Dawood bangali
 

Recently uploaded (20)

Standard Reomte Control Interface - Neometrix
Standard Reomte Control Interface - NeometrixStandard Reomte Control Interface - Neometrix
Standard Reomte Control Interface - Neometrix
 
CME397 Surface Engineering- Professional Elective
CME397 Surface Engineering- Professional ElectiveCME397 Surface Engineering- Professional Elective
CME397 Surface Engineering- Professional Elective
 
WATER CRISIS and its solutions-pptx 1234
WATER CRISIS and its solutions-pptx 1234WATER CRISIS and its solutions-pptx 1234
WATER CRISIS and its solutions-pptx 1234
 
J.Yang, ICLR 2024, MLILAB, KAIST AI.pdf
J.Yang,  ICLR 2024, MLILAB, KAIST AI.pdfJ.Yang,  ICLR 2024, MLILAB, KAIST AI.pdf
J.Yang, ICLR 2024, MLILAB, KAIST AI.pdf
 
space technology lecture notes on satellite
space technology lecture notes on satellitespace technology lecture notes on satellite
space technology lecture notes on satellite
 
Design and Analysis of Algorithms-DP,Backtracking,Graphs,B&B
Design and Analysis of Algorithms-DP,Backtracking,Graphs,B&BDesign and Analysis of Algorithms-DP,Backtracking,Graphs,B&B
Design and Analysis of Algorithms-DP,Backtracking,Graphs,B&B
 
Hierarchical Digital Twin of a Naval Power System
Hierarchical Digital Twin of a Naval Power SystemHierarchical Digital Twin of a Naval Power System
Hierarchical Digital Twin of a Naval Power System
 
weather web application report.pdf
weather web application report.pdfweather web application report.pdf
weather web application report.pdf
 
Investor-Presentation-Q1FY2024 investor presentation document.pptx
Investor-Presentation-Q1FY2024 investor presentation document.pptxInvestor-Presentation-Q1FY2024 investor presentation document.pptx
Investor-Presentation-Q1FY2024 investor presentation document.pptx
 
DESIGN A COTTON SEED SEPARATION MACHINE.docx
DESIGN A COTTON SEED SEPARATION MACHINE.docxDESIGN A COTTON SEED SEPARATION MACHINE.docx
DESIGN A COTTON SEED SEPARATION MACHINE.docx
 
road safety engineering r s e unit 3.pdf
road safety engineering  r s e unit 3.pdfroad safety engineering  r s e unit 3.pdf
road safety engineering r s e unit 3.pdf
 
H.Seo, ICLR 2024, MLILAB, KAIST AI.pdf
H.Seo,  ICLR 2024, MLILAB,  KAIST AI.pdfH.Seo,  ICLR 2024, MLILAB,  KAIST AI.pdf
H.Seo, ICLR 2024, MLILAB, KAIST AI.pdf
 
Top 10 Oil and Gas Projects in Saudi Arabia 2024.pdf
Top 10 Oil and Gas Projects in Saudi Arabia 2024.pdfTop 10 Oil and Gas Projects in Saudi Arabia 2024.pdf
Top 10 Oil and Gas Projects in Saudi Arabia 2024.pdf
 
English lab ppt no titlespecENG PPTt.pdf
English lab ppt no titlespecENG PPTt.pdfEnglish lab ppt no titlespecENG PPTt.pdf
English lab ppt no titlespecENG PPTt.pdf
 
Governing Equations for Fundamental Aerodynamics_Anderson2010.pdf
Governing Equations for Fundamental Aerodynamics_Anderson2010.pdfGoverning Equations for Fundamental Aerodynamics_Anderson2010.pdf
Governing Equations for Fundamental Aerodynamics_Anderson2010.pdf
 
The Benefits and Techniques of Trenchless Pipe Repair.pdf
The Benefits and Techniques of Trenchless Pipe Repair.pdfThe Benefits and Techniques of Trenchless Pipe Repair.pdf
The Benefits and Techniques of Trenchless Pipe Repair.pdf
 
Railway Signalling Principles Edition 3.pdf
Railway Signalling Principles Edition 3.pdfRailway Signalling Principles Edition 3.pdf
Railway Signalling Principles Edition 3.pdf
 
RAT: Retrieval Augmented Thoughts Elicit Context-Aware Reasoning in Long-Hori...
RAT: Retrieval Augmented Thoughts Elicit Context-Aware Reasoning in Long-Hori...RAT: Retrieval Augmented Thoughts Elicit Context-Aware Reasoning in Long-Hori...
RAT: Retrieval Augmented Thoughts Elicit Context-Aware Reasoning in Long-Hori...
 
Nuclear Power Economics and Structuring 2024
Nuclear Power Economics and Structuring 2024Nuclear Power Economics and Structuring 2024
Nuclear Power Economics and Structuring 2024
 
NO1 Uk best vashikaran specialist in delhi vashikaran baba near me online vas...
NO1 Uk best vashikaran specialist in delhi vashikaran baba near me online vas...NO1 Uk best vashikaran specialist in delhi vashikaran baba near me online vas...
NO1 Uk best vashikaran specialist in delhi vashikaran baba near me online vas...
 

Cassandra Summit 2015 - A State of Xen - Chaos Monkey & Cassandra

Editor's Notes

  1. Building a house of cards on a solid database foundation.
  2. Lead Cloud database Engineering for Netflix. Among other things, we offer C* as a service within Netflix. Feel free to follow me on Twitter or link up on LinkedIn.
  3. Talk about the Simian Army - introduce simian army Netflix LOVES chaos. We love it so much that we generate it. Monkey - run in prod Kong - Exercice We run it on most of Netflix services, and even on C*
  4. Talk about the Simian Army - introduce simian army Netflix LOVES chaos. We love it so much that we generate it. Monkey - run in prod Kong - Exercice We run it on most of Netflix services, and even on C*
  5. Talk about the Simian Army - introduce simian army Netflix LOVES chaos. We love it so much that we generate it. Monkey - run in prod Kong - Exercice We run it on most of Netflix services, and even on C*
  6. CDE has Chaos Monkey enabled on our C* clusters Maximum 1 node per day, during business hours Our Healthcheck dectects the missing instance and replaces it
  7. 218 C* nodes rebooted 22 nodes didn’t start and were automatically terminated by AWS internal healthcheck Our heathcheck identified the missing nodes and automatically remediated the issue 0 downtime
  8. - Bunch of Python/Shell scripts - Jenkins as job scheduler (HC, node-replacements, repairs, upgrades and etc) - On C* nodes: C* + Priam - Is something missing? Monitoring? OpsCenter?
  9. - Why not OpsCenter? - Didn’t exist when Netflix started using C* - Redundant in our stack
  10. ( continuation on why not OpsCenter) - change slide according to christos's feedback - Atlas is already a very powerful metrics and alerting tool, and our metric systems add non-C* related metrics (App metrics for example) that help in correlation. Alerts can be a combination of C* and App metrics. - How it behaved during the Re:boot - How did the healthcheck behave, how does it work and react to Chaos Monkey
  11. ( continuation on why not OpsCenter) Atlas is already a very powerful metrics and alerting tool, and our metric systems add non-C* related metrics (App metrics for example) that help in correlation.
  12. ( continuation on why not OpsCenter) Alerts can be a combination of C* and App metrics.
  13. Healthcheck flow 2 scenarios are automatically remediated
  14. How did the healthcheck behave during Re:boot
  15. HC - Big monolith About 100k lines of Python/Bash scripts Hard to maintain
  16. Lack of chaining (statefulness: if this job failed run that, else…) Stateless Lack of native support for TRIGGERING jobs based on events, like listening to SQS queues
  17. High Availability: The Jenkins master node is a Single Point of Failure Long running processes may crash due to a transient connection issue between the slave & the master
  18. High Availability: The Jenkins master node is a Single Point of Failure Long running processes may crash due to a transient connection issue between the slave & the master
  19. What we learned, and what we decided to focus on (Principles)
  20. What others are doing: Facebook (FBAR) / LinkedIn (Nurse) / DropBox (Naoru)
  21. Do our own or adopt existing solution? We started with our own POC, then we decided to go with Stackstorm-  event-driven automation platform Facilitated Troubleshooting/Event handling Automated remediation (Discovery example)
  22. Do our own or adopt existing solution? We started with our own POC, then we decided to go with Stackstorm-  event-driven automation platform Facilitated Troubleshooting/Event handling Automated remediation (Discovery example)
  23. What we decided to do: new env SackStorm-desc (rules/actions…) Example of the Disk Space Alert gap recap
  24. Idempotence (make a stateless system feels like a stateful system) Automation tools need to assure that you reach a certain state Example: Downloading the C* tarball: First, check the nodetool version
  25. K.I.S.S. - “Simplicity is the ultimate sophistication”  (Example: Resumable repairs - make more concise)
  26. Prefer HTTP over SSH and Async over Sync
  27. Retries with Timeouts and exponential back-off
  28. Serving-fallbacks Example: Dynamic property service with hard-coded defaults Netflix personalized recommendations falling back to default recommendations
  29. Audit trail: use logstash to index data into Elasticsearch for Trend Analysis - Talk about the fact that we already use LogStash @ Netflix, but we want to plug it into our automated remediation system
  30. Metadata / Statistics / Long term metrics Use Trend Analysis to be proactive instead of reactive: Disk usage to predict when we need to increase the cluster size with automated resizing
  31. Lead Cloud database Engineering for Netflix. Among other things, we offer C* as a service within Netflix. Feel free to follow me on Twitter or link up on LinkedIn.