SlideShare a Scribd company logo
1 of 59
Netflix Built Its Own
Monitoring System
(And You Probably Shouldn’t)
Roy Rapoport
rsr@netflix.com @royrapoport
6 March 2015
InfoQ.com: News & Community Site
• 750,000 unique visitors/month
• Published in 4 languages (English, Chinese, Japanese and Brazilian
Portuguese)
• Post content from our QCon conferences
• News 15-20 / week
• Articles 3-4 / week
• Presentations (videos) 12-15 / week
• Interviews 2-3 / week
• Books 1 / month
Watch the video with slide
synchronization on InfoQ.com!
http://www.infoq.com/presentations
/netflix-monitoring-system
Presented at QCon London
www.qconlondon.com
Purpose of QCon
- to empower software development by facilitating the spread of
knowledge and innovation
Strategy
- practitioner-driven conference designed for YOU: influencers of
change and innovation in your teams
- speakers and topics driving the evolution and innovation
- connecting and catalyzing the influencers and innovators
Highlights
- attended by more than 12,000 delegates since 2007
- held in 9 cities worldwide
Not So Much About Telemetry
• I telemetry
• Architecture track Open Space,
11:30AM, Fleming 3rd Floor
The Knights
Who Say
NIH
Agenda
• Introductions
• On Judgment
• Your Problem
• Your (no, really) Solution
• Mitigation and Anecdotes
• (Not) building your own monitoring
system
Introductions: Me
• About 23 years in technology
• Systems engineering, networking, software
development, QA, release management
• Time at Netflix: 2076 days (5y:8m:7d)
• At Netflix:
• Systems Engineering, Service Delivery in IT
• Troubleshooter and Builder of Python Things
in Product Engineering
• Now: Engineering Manager, Insight Engineering
Introductions: Netflix
• Optimize speed of innovation
• Constrain availability
• Cost is what it is
• Hire smart people,

get out of their way
• Anti-process bias
“Freedom and Responsibility”
Judgment
You Have a Problem
(Your job would likely be boring otherwise)
• Are you the first
• To have it?
• To care?
• Are you sure?
One that looks nice
And not too expensive
You Have a Problem
(Your job would likely be boring otherwise)
• You’re not the first, or only
• Good news!
• Then what?
Adventures in IT-Land
• (import disclaimer)
• Not developers
• Cautious about ongoing support
load
• Not well-trusted
Adventures in IT-Land
A Little Bit of …
• Time, courage, knowledge, pride
• Cynicism, hubris, fear
Technical Reasons for Rejection
(Or: It’s Not You, It’s … Actually, It’s You)
• Financial Cost
• Technical incompatibility
Overqualified!
• https://www.flickr.com/photos/54945394@N00
A Moment for Pedantry
Or: Requirements for “Not Invented Here”
The Knights
Who Say
IbPWAU
A Question of Trust
• Technical: I don’t trust your product
• Organizational: I don’t trust you
I Don’t Trust You
To Care About Me as a Customer
• You’re selling me something
• I’m not your only customer
• I’m not an important customer
• You don’t care about your
customers
I Don’t Trust You
To build a good product
• Past performance …
• “Good for me”
• Because you said so, that’s why!
I Don’t Trust You
To build it fast enough
• Unpredictable velocity
• When best-case is too slow
• Or maybe ever (OSS)
What Now?
Eventual Consistency
• Fork n’ merge
• THE model for OSS
• Works better for incremental
changes
• Requires alignment of goals
Eventual Consistency
No Fork Required
• Start With a New Idea
• Eventually merge concepts
Eventual Consistency Example
Mainline
Cloud Orchestration
2011
Eventual Consistency Example
Mainline
Cloud Orchestration
2011 2013
Eventual Consistency Example
Mainline
Cloud Orchestration
2011 2013
Insight Engineering
CD Automation
Eventual Consistency Example
Mainline
Cloud Orchestration
2011 2013
Insight Engineering
CD Automation
2014
Mainline
CD Automation
Eventual Consistency Example
Mainline
Cloud Orchestration
2011 2013
Insight Engineering
CD Automation
2014
Mainline
CD Automation
2015
Eventual Consistency Example
Mainline
Cloud Orchestration
2011 2013 2014
Mainline
CD Automation
2015
Insight Engineering
CD Automation
Composability
• Want this anyway
• Map scope to options’ scopes
Composability: Example
Netflix’s Atlas Telemetry Platform
Global Query
Endpoint
Composability: Example
Netflix’s Atlas Telemetry Platform
Global Query
Endpoint
Regional Query
Endpoint
Regional Query
Endpoint
Regional Query
Endpoint
Regional Query
Endpoint
Regional
Boundary
Composability: Example
Netflix’s Atlas Telemetry Platform
Global Query
Endpoint
Regional Query
Endpoint
Regional Query
Endpoint
Regional Query
Endpoint
Regional Query
Endpoint
Memory
Epic
Cloudwatch
Composability: Example
Netflix’s Atlas Telemetry Platform
Global Query
Endpoint
Regional Query
Endpoint
Regional Query
Endpoint
Regional Query
Endpoint
Regional Query
Endpoint
Memory
Cloudwatch
Composability: Example
Netflix’s Atlas Telemetry Platform
Global Query
Endpoint
Regional Query
Endpoint
Regional Query
Endpoint
Regional Query
Endpoint
Regional Query
Endpoint
Memory
Cloudwatch
OpenTSDB
InfluxDB
Composability: Example
Deployments and Automated Canary Analysis at Netflix
Edge Systems
Deployment
Automation Platform
Edge Systems
Canary Analysis
API
API
Mainline
Deployment
Automation Platform
Composability: Example
Deployments and Automated Canary Analysis at Netflix
Edge Systems
Deployment
Automation Platform
Edge Systems
Canary Analysis
API
Email
Insight Engineering
Canary Analysis
Mainline
Deployment
Automation Platform
Composability: Example
Deployments and Automated Canary Analysis at Netflix
Edge Systems
Deployment
Automation Platform
Edge Systems
Canary Analysis
API
Insight Engineering
Canary Analysis
Mainline
Deployment
Automation Platform
Composability: Example
Deployments and Automated Canary Analysis at Netflix
Edge Systems
Deployment
Automation Platform
Edge Systems
Canary Analysis
Insight Engineering
Canary Analysis
Mainline
Deployment
Automation Platform
Composability: Example
Deployments and Automated Canary Analysis at Netflix
Edge Systems
Deployment
Automation Platform
Insight Engineering
Canary Analysis
Mainline
Deployment
Automation Platform
Composability: Example
Deployments and Automated Canary Analysis at Netflix
Edge Systems
Deployment
Automation Platform
Insight Engineering
Canary Analysis
Mainline
Deployment
Automation Platform
Composability: Example
Deployments and Automated Canary Analysis at Netflix
Insight Engineering
Canary Analysis
Mainline
Deployment
Automation Platform
One More Reason“Think of the glory.
Think of your
reputation. Think how
great it'll look on your
next resume.”
- Lois McMaster Bujold
Judgment
The Grand Example
Netflix’s Monitoring Platform
• Prior system owned by IT
The Grand Example
Netflix’s Monitoring Platform
• Prior system owned by IT
• No great OSS products
The Grand Example
Netflix’s Monitoring Platform
• Prior system owned by IT
• No great OSS products
• Ridiculous scale
The Grand Example
Netflix’s Monitoring Platform
• Prior system owned by IT
• No great OSS products
• Ridiculous scale
• Seriously, how hard can it be?
The Grand Example
Netflix’s Monitoring Platform
• Took longer than expected
• Ongoing maintenance
• UI only recent priority
The Grand Example
Netflix’s Monitoring Platform
• Scales efficientlyish
• impedance match with dev lifestyle
• Nicely pluggable*
• Aggressivish OSS efforts
* Ask me about Real-Time Analytics!
The Grand Example
Netflix’s Monitoring Platform
• Still the right solution
• Worried about Sunk Cost Fallacy
• Most shouldn’t do this
Can You Repeat That?
Or: What’s Your Point?
Or: I was Tweeting. Did I miss something?
• What’s important to you?
• Is this a technical decision? Really?
• Honest and non-judgmental
• Any mitigation?
• Don’t build your own monitoring
system. Seriously.
Name This Group
• United States
• Europe
• China
• Russia
• India
• Japan
• Blue Origin
• SpaceX
• Virgin Galactic
11:30am Frasier Room (3rd Floor)
@royrapoport
rsr@netflix.com
Watch the video with slide synchronization on
InfoQ.com!
http://www.infoq.com/presentations/netflix-
monitoring-system

More Related Content

Viewers also liked

Spring Boot + Netflix Eureka
Spring Boot + Netflix EurekaSpring Boot + Netflix Eureka
Spring Boot + Netflix Eureka心 谷本
 
Netflix Global Cloud Architecture
Netflix Global Cloud ArchitectureNetflix Global Cloud Architecture
Netflix Global Cloud ArchitectureAdrian Cockcroft
 
AWS Lambda from the trenches
AWS Lambda from the trenchesAWS Lambda from the trenches
AWS Lambda from the trenchesYan Cui
 
Application Quality Gates in Continuous Delivery: Deliver Better Software Fas...
Application Quality Gates in Continuous Delivery: Deliver Better Software Fas...Application Quality Gates in Continuous Delivery: Deliver Better Software Fas...
Application Quality Gates in Continuous Delivery: Deliver Better Software Fas...Andreas Grabner
 
Docker/DevOps Meetup: Metrics-Driven Continuous Performance and Scalabilty
Docker/DevOps Meetup: Metrics-Driven Continuous Performance and ScalabiltyDocker/DevOps Meetup: Metrics-Driven Continuous Performance and Scalabilty
Docker/DevOps Meetup: Metrics-Driven Continuous Performance and ScalabiltyAndreas Grabner
 
Velocity 2015 linux perf tools
Velocity 2015 linux perf toolsVelocity 2015 linux perf tools
Velocity 2015 linux perf toolsBrendan Gregg
 
Monitorama 2015 Netflix Instance Analysis
Monitorama 2015 Netflix Instance AnalysisMonitorama 2015 Netflix Instance Analysis
Monitorama 2015 Netflix Instance AnalysisBrendan Gregg
 

Viewers also liked (9)

Spring Boot + Netflix Eureka
Spring Boot + Netflix EurekaSpring Boot + Netflix Eureka
Spring Boot + Netflix Eureka
 
Scalable Real-time analytics using Druid
Scalable Real-time analytics using DruidScalable Real-time analytics using Druid
Scalable Real-time analytics using Druid
 
Netflix Global Cloud Architecture
Netflix Global Cloud ArchitectureNetflix Global Cloud Architecture
Netflix Global Cloud Architecture
 
AWS Lambda from the trenches
AWS Lambda from the trenchesAWS Lambda from the trenches
AWS Lambda from the trenches
 
Application Quality Gates in Continuous Delivery: Deliver Better Software Fas...
Application Quality Gates in Continuous Delivery: Deliver Better Software Fas...Application Quality Gates in Continuous Delivery: Deliver Better Software Fas...
Application Quality Gates in Continuous Delivery: Deliver Better Software Fas...
 
Docker/DevOps Meetup: Metrics-Driven Continuous Performance and Scalabilty
Docker/DevOps Meetup: Metrics-Driven Continuous Performance and ScalabiltyDocker/DevOps Meetup: Metrics-Driven Continuous Performance and Scalabilty
Docker/DevOps Meetup: Metrics-Driven Continuous Performance and Scalabilty
 
Velocity 2015 linux perf tools
Velocity 2015 linux perf toolsVelocity 2015 linux perf tools
Velocity 2015 linux perf tools
 
Monitorama 2015 Netflix Instance Analysis
Monitorama 2015 Netflix Instance AnalysisMonitorama 2015 Netflix Instance Analysis
Monitorama 2015 Netflix Instance Analysis
 
Culture
CultureCulture
Culture
 

More from C4Media

Streaming a Million Likes/Second: Real-Time Interactions on Live Video
Streaming a Million Likes/Second: Real-Time Interactions on Live VideoStreaming a Million Likes/Second: Real-Time Interactions on Live Video
Streaming a Million Likes/Second: Real-Time Interactions on Live VideoC4Media
 
Next Generation Client APIs in Envoy Mobile
Next Generation Client APIs in Envoy MobileNext Generation Client APIs in Envoy Mobile
Next Generation Client APIs in Envoy MobileC4Media
 
Software Teams and Teamwork Trends Report Q1 2020
Software Teams and Teamwork Trends Report Q1 2020Software Teams and Teamwork Trends Report Q1 2020
Software Teams and Teamwork Trends Report Q1 2020C4Media
 
Understand the Trade-offs Using Compilers for Java Applications
Understand the Trade-offs Using Compilers for Java ApplicationsUnderstand the Trade-offs Using Compilers for Java Applications
Understand the Trade-offs Using Compilers for Java ApplicationsC4Media
 
Kafka Needs No Keeper
Kafka Needs No KeeperKafka Needs No Keeper
Kafka Needs No KeeperC4Media
 
High Performing Teams Act Like Owners
High Performing Teams Act Like OwnersHigh Performing Teams Act Like Owners
High Performing Teams Act Like OwnersC4Media
 
Does Java Need Inline Types? What Project Valhalla Can Bring to Java
Does Java Need Inline Types? What Project Valhalla Can Bring to JavaDoes Java Need Inline Types? What Project Valhalla Can Bring to Java
Does Java Need Inline Types? What Project Valhalla Can Bring to JavaC4Media
 
Service Meshes- The Ultimate Guide
Service Meshes- The Ultimate GuideService Meshes- The Ultimate Guide
Service Meshes- The Ultimate GuideC4Media
 
Shifting Left with Cloud Native CI/CD
Shifting Left with Cloud Native CI/CDShifting Left with Cloud Native CI/CD
Shifting Left with Cloud Native CI/CDC4Media
 
CI/CD for Machine Learning
CI/CD for Machine LearningCI/CD for Machine Learning
CI/CD for Machine LearningC4Media
 
Fault Tolerance at Speed
Fault Tolerance at SpeedFault Tolerance at Speed
Fault Tolerance at SpeedC4Media
 
Architectures That Scale Deep - Regaining Control in Deep Systems
Architectures That Scale Deep - Regaining Control in Deep SystemsArchitectures That Scale Deep - Regaining Control in Deep Systems
Architectures That Scale Deep - Regaining Control in Deep SystemsC4Media
 
ML in the Browser: Interactive Experiences with Tensorflow.js
ML in the Browser: Interactive Experiences with Tensorflow.jsML in the Browser: Interactive Experiences with Tensorflow.js
ML in the Browser: Interactive Experiences with Tensorflow.jsC4Media
 
Build Your Own WebAssembly Compiler
Build Your Own WebAssembly CompilerBuild Your Own WebAssembly Compiler
Build Your Own WebAssembly CompilerC4Media
 
User & Device Identity for Microservices @ Netflix Scale
User & Device Identity for Microservices @ Netflix ScaleUser & Device Identity for Microservices @ Netflix Scale
User & Device Identity for Microservices @ Netflix ScaleC4Media
 
Scaling Patterns for Netflix's Edge
Scaling Patterns for Netflix's EdgeScaling Patterns for Netflix's Edge
Scaling Patterns for Netflix's EdgeC4Media
 
Make Your Electron App Feel at Home Everywhere
Make Your Electron App Feel at Home EverywhereMake Your Electron App Feel at Home Everywhere
Make Your Electron App Feel at Home EverywhereC4Media
 
The Talk You've Been Await-ing For
The Talk You've Been Await-ing ForThe Talk You've Been Await-ing For
The Talk You've Been Await-ing ForC4Media
 
Future of Data Engineering
Future of Data EngineeringFuture of Data Engineering
Future of Data EngineeringC4Media
 
Automated Testing for Terraform, Docker, Packer, Kubernetes, and More
Automated Testing for Terraform, Docker, Packer, Kubernetes, and MoreAutomated Testing for Terraform, Docker, Packer, Kubernetes, and More
Automated Testing for Terraform, Docker, Packer, Kubernetes, and MoreC4Media
 

More from C4Media (20)

Streaming a Million Likes/Second: Real-Time Interactions on Live Video
Streaming a Million Likes/Second: Real-Time Interactions on Live VideoStreaming a Million Likes/Second: Real-Time Interactions on Live Video
Streaming a Million Likes/Second: Real-Time Interactions on Live Video
 
Next Generation Client APIs in Envoy Mobile
Next Generation Client APIs in Envoy MobileNext Generation Client APIs in Envoy Mobile
Next Generation Client APIs in Envoy Mobile
 
Software Teams and Teamwork Trends Report Q1 2020
Software Teams and Teamwork Trends Report Q1 2020Software Teams and Teamwork Trends Report Q1 2020
Software Teams and Teamwork Trends Report Q1 2020
 
Understand the Trade-offs Using Compilers for Java Applications
Understand the Trade-offs Using Compilers for Java ApplicationsUnderstand the Trade-offs Using Compilers for Java Applications
Understand the Trade-offs Using Compilers for Java Applications
 
Kafka Needs No Keeper
Kafka Needs No KeeperKafka Needs No Keeper
Kafka Needs No Keeper
 
High Performing Teams Act Like Owners
High Performing Teams Act Like OwnersHigh Performing Teams Act Like Owners
High Performing Teams Act Like Owners
 
Does Java Need Inline Types? What Project Valhalla Can Bring to Java
Does Java Need Inline Types? What Project Valhalla Can Bring to JavaDoes Java Need Inline Types? What Project Valhalla Can Bring to Java
Does Java Need Inline Types? What Project Valhalla Can Bring to Java
 
Service Meshes- The Ultimate Guide
Service Meshes- The Ultimate GuideService Meshes- The Ultimate Guide
Service Meshes- The Ultimate Guide
 
Shifting Left with Cloud Native CI/CD
Shifting Left with Cloud Native CI/CDShifting Left with Cloud Native CI/CD
Shifting Left with Cloud Native CI/CD
 
CI/CD for Machine Learning
CI/CD for Machine LearningCI/CD for Machine Learning
CI/CD for Machine Learning
 
Fault Tolerance at Speed
Fault Tolerance at SpeedFault Tolerance at Speed
Fault Tolerance at Speed
 
Architectures That Scale Deep - Regaining Control in Deep Systems
Architectures That Scale Deep - Regaining Control in Deep SystemsArchitectures That Scale Deep - Regaining Control in Deep Systems
Architectures That Scale Deep - Regaining Control in Deep Systems
 
ML in the Browser: Interactive Experiences with Tensorflow.js
ML in the Browser: Interactive Experiences with Tensorflow.jsML in the Browser: Interactive Experiences with Tensorflow.js
ML in the Browser: Interactive Experiences with Tensorflow.js
 
Build Your Own WebAssembly Compiler
Build Your Own WebAssembly CompilerBuild Your Own WebAssembly Compiler
Build Your Own WebAssembly Compiler
 
User & Device Identity for Microservices @ Netflix Scale
User & Device Identity for Microservices @ Netflix ScaleUser & Device Identity for Microservices @ Netflix Scale
User & Device Identity for Microservices @ Netflix Scale
 
Scaling Patterns for Netflix's Edge
Scaling Patterns for Netflix's EdgeScaling Patterns for Netflix's Edge
Scaling Patterns for Netflix's Edge
 
Make Your Electron App Feel at Home Everywhere
Make Your Electron App Feel at Home EverywhereMake Your Electron App Feel at Home Everywhere
Make Your Electron App Feel at Home Everywhere
 
The Talk You've Been Await-ing For
The Talk You've Been Await-ing ForThe Talk You've Been Await-ing For
The Talk You've Been Await-ing For
 
Future of Data Engineering
Future of Data EngineeringFuture of Data Engineering
Future of Data Engineering
 
Automated Testing for Terraform, Docker, Packer, Kubernetes, and More
Automated Testing for Terraform, Docker, Packer, Kubernetes, and MoreAutomated Testing for Terraform, Docker, Packer, Kubernetes, and More
Automated Testing for Terraform, Docker, Packer, Kubernetes, and More
 

Recently uploaded

Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesSinan KOZAK
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slidespraypatel2
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticscarlostorres15106
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitecturePixlogix Infotech
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersThousandEyes
 
Snow Chain-Integrated Tire for a Safe Drive on Winter Roads
Snow Chain-Integrated Tire for a Safe Drive on Winter RoadsSnow Chain-Integrated Tire for a Safe Drive on Winter Roads
Snow Chain-Integrated Tire for a Safe Drive on Winter RoadsHyundai Motor Group
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024Scott Keck-Warren
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Patryk Bandurski
 
Key Features Of Token Development (1).pptx
Key  Features Of Token  Development (1).pptxKey  Features Of Token  Development (1).pptx
Key Features Of Token Development (1).pptxLBM Solutions
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsMark Billinghurst
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxOnBoard
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsMemoori
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 
Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 

Recently uploaded (20)

Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen Frames
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC Architecture
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
 
Snow Chain-Integrated Tire for a Safe Drive on Winter Roads
Snow Chain-Integrated Tire for a Safe Drive on Winter RoadsSnow Chain-Integrated Tire for a Safe Drive on Winter Roads
Snow Chain-Integrated Tire for a Safe Drive on Winter Roads
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
 
Key Features Of Token Development (1).pptx
Key  Features Of Token  Development (1).pptxKey  Features Of Token  Development (1).pptx
Key Features Of Token Development (1).pptx
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR Systems
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptx
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial Buildings
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food Manufacturing
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 

Netflix Built Its Own Monitoring System - and Why You Probably Shouldn't

  • 1. Netflix Built Its Own Monitoring System (And You Probably Shouldn’t) Roy Rapoport rsr@netflix.com @royrapoport 6 March 2015
  • 2. InfoQ.com: News & Community Site • 750,000 unique visitors/month • Published in 4 languages (English, Chinese, Japanese and Brazilian Portuguese) • Post content from our QCon conferences • News 15-20 / week • Articles 3-4 / week • Presentations (videos) 12-15 / week • Interviews 2-3 / week • Books 1 / month Watch the video with slide synchronization on InfoQ.com! http://www.infoq.com/presentations /netflix-monitoring-system
  • 3. Presented at QCon London www.qconlondon.com Purpose of QCon - to empower software development by facilitating the spread of knowledge and innovation Strategy - practitioner-driven conference designed for YOU: influencers of change and innovation in your teams - speakers and topics driving the evolution and innovation - connecting and catalyzing the influencers and innovators Highlights - attended by more than 12,000 delegates since 2007 - held in 9 cities worldwide
  • 4. Not So Much About Telemetry • I telemetry • Architecture track Open Space, 11:30AM, Fleming 3rd Floor
  • 6. Agenda • Introductions • On Judgment • Your Problem • Your (no, really) Solution • Mitigation and Anecdotes • (Not) building your own monitoring system
  • 7. Introductions: Me • About 23 years in technology • Systems engineering, networking, software development, QA, release management • Time at Netflix: 2076 days (5y:8m:7d) • At Netflix: • Systems Engineering, Service Delivery in IT • Troubleshooter and Builder of Python Things in Product Engineering • Now: Engineering Manager, Insight Engineering
  • 8. Introductions: Netflix • Optimize speed of innovation • Constrain availability • Cost is what it is • Hire smart people,
 get out of their way • Anti-process bias “Freedom and Responsibility”
  • 10. You Have a Problem (Your job would likely be boring otherwise) • Are you the first • To have it? • To care? • Are you sure? One that looks nice And not too expensive
  • 11. You Have a Problem (Your job would likely be boring otherwise) • You’re not the first, or only • Good news! • Then what?
  • 12. Adventures in IT-Land • (import disclaimer) • Not developers • Cautious about ongoing support load • Not well-trusted
  • 14. A Little Bit of … • Time, courage, knowledge, pride • Cynicism, hubris, fear
  • 15.
  • 16. Technical Reasons for Rejection (Or: It’s Not You, It’s … Actually, It’s You) • Financial Cost • Technical incompatibility
  • 19. A Moment for Pedantry Or: Requirements for “Not Invented Here”
  • 21. A Question of Trust • Technical: I don’t trust your product • Organizational: I don’t trust you
  • 22. I Don’t Trust You To Care About Me as a Customer • You’re selling me something • I’m not your only customer • I’m not an important customer • You don’t care about your customers
  • 23. I Don’t Trust You To build a good product • Past performance … • “Good for me” • Because you said so, that’s why!
  • 24. I Don’t Trust You To build it fast enough • Unpredictable velocity • When best-case is too slow • Or maybe ever (OSS)
  • 26. Eventual Consistency • Fork n’ merge • THE model for OSS • Works better for incremental changes • Requires alignment of goals
  • 27. Eventual Consistency No Fork Required • Start With a New Idea • Eventually merge concepts
  • 30. Eventual Consistency Example Mainline Cloud Orchestration 2011 2013 Insight Engineering CD Automation
  • 31. Eventual Consistency Example Mainline Cloud Orchestration 2011 2013 Insight Engineering CD Automation 2014 Mainline CD Automation
  • 32. Eventual Consistency Example Mainline Cloud Orchestration 2011 2013 Insight Engineering CD Automation 2014 Mainline CD Automation 2015
  • 33. Eventual Consistency Example Mainline Cloud Orchestration 2011 2013 2014 Mainline CD Automation 2015 Insight Engineering CD Automation
  • 34. Composability • Want this anyway • Map scope to options’ scopes
  • 35. Composability: Example Netflix’s Atlas Telemetry Platform Global Query Endpoint
  • 36. Composability: Example Netflix’s Atlas Telemetry Platform Global Query Endpoint Regional Query Endpoint Regional Query Endpoint Regional Query Endpoint Regional Query Endpoint Regional Boundary
  • 37. Composability: Example Netflix’s Atlas Telemetry Platform Global Query Endpoint Regional Query Endpoint Regional Query Endpoint Regional Query Endpoint Regional Query Endpoint Memory Epic Cloudwatch
  • 38. Composability: Example Netflix’s Atlas Telemetry Platform Global Query Endpoint Regional Query Endpoint Regional Query Endpoint Regional Query Endpoint Regional Query Endpoint Memory Cloudwatch
  • 39. Composability: Example Netflix’s Atlas Telemetry Platform Global Query Endpoint Regional Query Endpoint Regional Query Endpoint Regional Query Endpoint Regional Query Endpoint Memory Cloudwatch OpenTSDB InfluxDB
  • 40. Composability: Example Deployments and Automated Canary Analysis at Netflix Edge Systems Deployment Automation Platform Edge Systems Canary Analysis API API Mainline Deployment Automation Platform
  • 41. Composability: Example Deployments and Automated Canary Analysis at Netflix Edge Systems Deployment Automation Platform Edge Systems Canary Analysis API Email Insight Engineering Canary Analysis Mainline Deployment Automation Platform
  • 42. Composability: Example Deployments and Automated Canary Analysis at Netflix Edge Systems Deployment Automation Platform Edge Systems Canary Analysis API Insight Engineering Canary Analysis Mainline Deployment Automation Platform
  • 43. Composability: Example Deployments and Automated Canary Analysis at Netflix Edge Systems Deployment Automation Platform Edge Systems Canary Analysis Insight Engineering Canary Analysis Mainline Deployment Automation Platform
  • 44. Composability: Example Deployments and Automated Canary Analysis at Netflix Edge Systems Deployment Automation Platform Insight Engineering Canary Analysis Mainline Deployment Automation Platform
  • 45. Composability: Example Deployments and Automated Canary Analysis at Netflix Edge Systems Deployment Automation Platform Insight Engineering Canary Analysis Mainline Deployment Automation Platform
  • 46. Composability: Example Deployments and Automated Canary Analysis at Netflix Insight Engineering Canary Analysis Mainline Deployment Automation Platform
  • 47. One More Reason“Think of the glory. Think of your reputation. Think how great it'll look on your next resume.” - Lois McMaster Bujold
  • 49. The Grand Example Netflix’s Monitoring Platform • Prior system owned by IT
  • 50. The Grand Example Netflix’s Monitoring Platform • Prior system owned by IT • No great OSS products
  • 51. The Grand Example Netflix’s Monitoring Platform • Prior system owned by IT • No great OSS products • Ridiculous scale
  • 52. The Grand Example Netflix’s Monitoring Platform • Prior system owned by IT • No great OSS products • Ridiculous scale • Seriously, how hard can it be?
  • 53. The Grand Example Netflix’s Monitoring Platform • Took longer than expected • Ongoing maintenance • UI only recent priority
  • 54. The Grand Example Netflix’s Monitoring Platform • Scales efficientlyish • impedance match with dev lifestyle • Nicely pluggable* • Aggressivish OSS efforts * Ask me about Real-Time Analytics!
  • 55. The Grand Example Netflix’s Monitoring Platform • Still the right solution • Worried about Sunk Cost Fallacy • Most shouldn’t do this
  • 56. Can You Repeat That? Or: What’s Your Point? Or: I was Tweeting. Did I miss something? • What’s important to you? • Is this a technical decision? Really? • Honest and non-judgmental • Any mitigation? • Don’t build your own monitoring system. Seriously.
  • 57. Name This Group • United States • Europe • China • Russia • India • Japan • Blue Origin • SpaceX • Virgin Galactic
  • 58. 11:30am Frasier Room (3rd Floor) @royrapoport rsr@netflix.com
  • 59. Watch the video with slide synchronization on InfoQ.com! http://www.infoq.com/presentations/netflix- monitoring-system