SlideShare a Scribd company logo

Velocity NY 2018: Monitoring Containers Correctly

Michael Kehoe walks you through building a small monitoring utility for cgroup containers to illustrate best practices in container monitoring. You'll explore various cgroup constraints and learn how to specifically monitor for each of them to ensure that your application is behaving as expected. Along the way, Michael shares tricks and tips about monitoring containerized applications.

1 of 27
Download to read offline
Monitoring Containers Correctly
Michael Kehoe
Staff Site Reliability Engineer
https://github.com/michael-kehoe/container-monitoring-workshop
Getting Started
• Setup your workshop platform:
• https://app.strigo.io/event/QXDpmTiR
AufQ4LBis
• Token: F7C7
• Background slides:
https://bit.ly/2NcEBQN
• Code repo: https://github.com/michael-
kehoe/container-monitoring-workshop
• Please let me know ASAP if you’re
Today’s
agenda
1 Introductions
2 Container Primitives
3 What we’ll monitor
4 Cgroup interface file formats
5 Exercises
Today’s
agenda
Exercises
100 CPU Basics
101 CPU Enhanced
102 CPU Advanced
200 Memory Basics
201 Memory Enhanced
300 IO Basics
400 PID
Michael Kehoe
$ WHOAMI
• Staff Site Reliability Engineer @ LinkedIn
• Production-SRE Team
• Funny accent = Australian + 4 years
American
• Worked on:
• Networks
• Micro-services
• Traffic Engineering
• Databases
Production-SRE Team @ LinkedIn
$ WHOAMI
• Disaster Recovery - Planning &
Automation
• Incident Response – Process &
Automation
• Visibility Engineering – Making use of
operational data
• Reliability Principles – Defining best
practice & automating it

Recommended

Code Yellow: Helping operations top-heavy teams the smart way
Code Yellow: Helping operations top-heavy teams the smart wayCode Yellow: Helping operations top-heavy teams the smart way
Code Yellow: Helping operations top-heavy teams the smart wayMichael Kehoe
 
QConSF 2018: Building Production-Ready Applications
QConSF 2018: Building Production-Ready ApplicationsQConSF 2018: Building Production-Ready Applications
QConSF 2018: Building Production-Ready ApplicationsMichael Kehoe
 
Helping operations top-heavy teams the smart way
Helping operations top-heavy teams the smart wayHelping operations top-heavy teams the smart way
Helping operations top-heavy teams the smart wayMichael Kehoe
 
AllDayDevops: What the NTSB teaches us about incident management & postmortems
AllDayDevops: What the NTSB teaches us about incident management & postmortemsAllDayDevops: What the NTSB teaches us about incident management & postmortems
AllDayDevops: What the NTSB teaches us about incident management & postmortemsMichael Kehoe
 
Linux Container Basics
Linux Container BasicsLinux Container Basics
Linux Container BasicsMichael Kehoe
 
Papers We Love Sept. 2018: 007: Democratically Finding The Cause of Packet Drops
Papers We Love Sept. 2018: 007: Democratically Finding The Cause of Packet DropsPapers We Love Sept. 2018: 007: Democratically Finding The Cause of Packet Drops
Papers We Love Sept. 2018: 007: Democratically Finding The Cause of Packet DropsMichael Kehoe
 

More Related Content

More from Michael Kehoe

What the NTSB teaches us about incident management & postmortems
What the NTSB teaches us about incident management & postmortemsWhat the NTSB teaches us about incident management & postmortems
What the NTSB teaches us about incident management & postmortemsMichael Kehoe
 
PyBay 2018: Production-Ready Python Applications
PyBay 2018: Production-Ready Python ApplicationsPyBay 2018: Production-Ready Python Applications
PyBay 2018: Production-Ready Python ApplicationsMichael Kehoe
 
Helping operations top-heavy teams the smart way
Helping operations top-heavy teams the smart wayHelping operations top-heavy teams the smart way
Helping operations top-heavy teams the smart wayMichael Kehoe
 
The Next Wave of Reliability Engineering
The Next Wave of Reliability EngineeringThe Next Wave of Reliability Engineering
The Next Wave of Reliability EngineeringMichael Kehoe
 
Building Production-Ready Microservices: DevopsExchangeSF
Building Production-Ready Microservices: DevopsExchangeSFBuilding Production-Ready Microservices: DevopsExchangeSF
Building Production-Ready Microservices: DevopsExchangeSFMichael Kehoe
 
SF Chaos Engineering Meetup: Building Disaster Recovery via Resilience Engine...
SF Chaos Engineering Meetup: Building Disaster Recovery via Resilience Engine...SF Chaos Engineering Meetup: Building Disaster Recovery via Resilience Engine...
SF Chaos Engineering Meetup: Building Disaster Recovery via Resilience Engine...Michael Kehoe
 
SRECon-Europe-2017: Reducing MTTR and False Escalations: Event Correlation at...
SRECon-Europe-2017: Reducing MTTR and False Escalations: Event Correlation at...SRECon-Europe-2017: Reducing MTTR and False Escalations: Event Correlation at...
SRECon-Europe-2017: Reducing MTTR and False Escalations: Event Correlation at...Michael Kehoe
 
SRECon-Europe-2017: Networks for SREs
SRECon-Europe-2017: Networks for SREsSRECon-Europe-2017: Networks for SREs
SRECon-Europe-2017: Networks for SREsMichael Kehoe
 
Velocity San Jose 2017: Traffic shifts: Avoiding disasters at scale
Velocity San Jose 2017: Traffic shifts: Avoiding disasters at scaleVelocity San Jose 2017: Traffic shifts: Avoiding disasters at scale
Velocity San Jose 2017: Traffic shifts: Avoiding disasters at scaleMichael Kehoe
 
Reducing MTTR and False Escalations: Event Correlation at LinkedIn
Reducing MTTR and False Escalations: Event Correlation at LinkedInReducing MTTR and False Escalations: Event Correlation at LinkedIn
Reducing MTTR and False Escalations: Event Correlation at LinkedInMichael Kehoe
 
APRICOT 2017: Trafficshifting: Avoiding Disasters & Improving Performance at ...
APRICOT 2017: Trafficshifting: Avoiding Disasters & Improving Performance at ...APRICOT 2017: Trafficshifting: Avoiding Disasters & Improving Performance at ...
APRICOT 2017: Trafficshifting: Avoiding Disasters & Improving Performance at ...Michael Kehoe
 
Couchbase Connect 2016: Monitoring Production Deployments The Tools – LinkedIn
Couchbase Connect 2016: Monitoring Production Deployments The Tools – LinkedInCouchbase Connect 2016: Monitoring Production Deployments The Tools – LinkedIn
Couchbase Connect 2016: Monitoring Production Deployments The Tools – LinkedInMichael Kehoe
 
Couchbase Connect 2016
Couchbase Connect 2016Couchbase Connect 2016
Couchbase Connect 2016Michael Kehoe
 
Using SaltStack to Auto Triage and Remediate Production Systems
Using SaltStack to Auto Triage and Remediate Production SystemsUsing SaltStack to Auto Triage and Remediate Production Systems
Using SaltStack to Auto Triage and Remediate Production SystemsMichael Kehoe
 
SRECon USA 2016: Growing your Entry Level Talent
SRECon USA 2016: Growing your Entry Level TalentSRECon USA 2016: Growing your Entry Level Talent
SRECon USA 2016: Growing your Entry Level TalentMichael Kehoe
 
SouthBay SRE Meetup Jan 2016
SouthBay SRE Meetup Jan 2016SouthBay SRE Meetup Jan 2016
SouthBay SRE Meetup Jan 2016Michael Kehoe
 
Couchbase Meetup Jan 2016
Couchbase Meetup Jan 2016Couchbase Meetup Jan 2016
Couchbase Meetup Jan 2016Michael Kehoe
 
CouchbasetoHadoop_Matt_Michael_Justin v4
CouchbasetoHadoop_Matt_Michael_Justin v4CouchbasetoHadoop_Matt_Michael_Justin v4
CouchbasetoHadoop_Matt_Michael_Justin v4Michael Kehoe
 

More from Michael Kehoe (18)

What the NTSB teaches us about incident management & postmortems
What the NTSB teaches us about incident management & postmortemsWhat the NTSB teaches us about incident management & postmortems
What the NTSB teaches us about incident management & postmortems
 
PyBay 2018: Production-Ready Python Applications
PyBay 2018: Production-Ready Python ApplicationsPyBay 2018: Production-Ready Python Applications
PyBay 2018: Production-Ready Python Applications
 
Helping operations top-heavy teams the smart way
Helping operations top-heavy teams the smart wayHelping operations top-heavy teams the smart way
Helping operations top-heavy teams the smart way
 
The Next Wave of Reliability Engineering
The Next Wave of Reliability EngineeringThe Next Wave of Reliability Engineering
The Next Wave of Reliability Engineering
 
Building Production-Ready Microservices: DevopsExchangeSF
Building Production-Ready Microservices: DevopsExchangeSFBuilding Production-Ready Microservices: DevopsExchangeSF
Building Production-Ready Microservices: DevopsExchangeSF
 
SF Chaos Engineering Meetup: Building Disaster Recovery via Resilience Engine...
SF Chaos Engineering Meetup: Building Disaster Recovery via Resilience Engine...SF Chaos Engineering Meetup: Building Disaster Recovery via Resilience Engine...
SF Chaos Engineering Meetup: Building Disaster Recovery via Resilience Engine...
 
SRECon-Europe-2017: Reducing MTTR and False Escalations: Event Correlation at...
SRECon-Europe-2017: Reducing MTTR and False Escalations: Event Correlation at...SRECon-Europe-2017: Reducing MTTR and False Escalations: Event Correlation at...
SRECon-Europe-2017: Reducing MTTR and False Escalations: Event Correlation at...
 
SRECon-Europe-2017: Networks for SREs
SRECon-Europe-2017: Networks for SREsSRECon-Europe-2017: Networks for SREs
SRECon-Europe-2017: Networks for SREs
 
Velocity San Jose 2017: Traffic shifts: Avoiding disasters at scale
Velocity San Jose 2017: Traffic shifts: Avoiding disasters at scaleVelocity San Jose 2017: Traffic shifts: Avoiding disasters at scale
Velocity San Jose 2017: Traffic shifts: Avoiding disasters at scale
 
Reducing MTTR and False Escalations: Event Correlation at LinkedIn
Reducing MTTR and False Escalations: Event Correlation at LinkedInReducing MTTR and False Escalations: Event Correlation at LinkedIn
Reducing MTTR and False Escalations: Event Correlation at LinkedIn
 
APRICOT 2017: Trafficshifting: Avoiding Disasters & Improving Performance at ...
APRICOT 2017: Trafficshifting: Avoiding Disasters & Improving Performance at ...APRICOT 2017: Trafficshifting: Avoiding Disasters & Improving Performance at ...
APRICOT 2017: Trafficshifting: Avoiding Disasters & Improving Performance at ...
 
Couchbase Connect 2016: Monitoring Production Deployments The Tools – LinkedIn
Couchbase Connect 2016: Monitoring Production Deployments The Tools – LinkedInCouchbase Connect 2016: Monitoring Production Deployments The Tools – LinkedIn
Couchbase Connect 2016: Monitoring Production Deployments The Tools – LinkedIn
 
Couchbase Connect 2016
Couchbase Connect 2016Couchbase Connect 2016
Couchbase Connect 2016
 
Using SaltStack to Auto Triage and Remediate Production Systems
Using SaltStack to Auto Triage and Remediate Production SystemsUsing SaltStack to Auto Triage and Remediate Production Systems
Using SaltStack to Auto Triage and Remediate Production Systems
 
SRECon USA 2016: Growing your Entry Level Talent
SRECon USA 2016: Growing your Entry Level TalentSRECon USA 2016: Growing your Entry Level Talent
SRECon USA 2016: Growing your Entry Level Talent
 
SouthBay SRE Meetup Jan 2016
SouthBay SRE Meetup Jan 2016SouthBay SRE Meetup Jan 2016
SouthBay SRE Meetup Jan 2016
 
Couchbase Meetup Jan 2016
Couchbase Meetup Jan 2016Couchbase Meetup Jan 2016
Couchbase Meetup Jan 2016
 
CouchbasetoHadoop_Matt_Michael_Justin v4
CouchbasetoHadoop_Matt_Michael_Justin v4CouchbasetoHadoop_Matt_Michael_Justin v4
CouchbasetoHadoop_Matt_Michael_Justin v4
 

Recently uploaded

biofilm fouling of the membrane present in aquaculture
biofilm fouling of the membrane present in aquaculturebiofilm fouling of the membrane present in aquaculture
biofilm fouling of the membrane present in aquacultureVINETUBE2
 
MAXIMUM POWER POINT TRACKING ALGORITHMS APPLIED TO WIND-SOLAR HYBRID SYSTEM
MAXIMUM POWER POINT TRACKING ALGORITHMS APPLIED TO WIND-SOLAR HYBRID SYSTEMMAXIMUM POWER POINT TRACKING ALGORITHMS APPLIED TO WIND-SOLAR HYBRID SYSTEM
MAXIMUM POWER POINT TRACKING ALGORITHMS APPLIED TO WIND-SOLAR HYBRID SYSTEMArunkumar Tulasi
 
SATHVIKA A AD21049 SELF INTRODUCTION.pdf
SATHVIKA A AD21049 SELF INTRODUCTION.pdfSATHVIKA A AD21049 SELF INTRODUCTION.pdf
SATHVIKA A AD21049 SELF INTRODUCTION.pdfSathvikaAlagar
 
Introduction about Technology roadmap for Industry 4.0
Introduction about Technology roadmap for Industry 4.0Introduction about Technology roadmap for Industry 4.0
Introduction about Technology roadmap for Industry 4.0RaishKhanji
 
Into the World of AI GDSC YCCE PPTX.pptx
Into the World of AI GDSC YCCE PPTX.pptxInto the World of AI GDSC YCCE PPTX.pptx
Into the World of AI GDSC YCCE PPTX.pptxGDSCYCCE
 
Student Challange as Google Developers at NKOCET
Student Challange as Google Developers at NKOCETStudent Challange as Google Developers at NKOCET
Student Challange as Google Developers at NKOCETGDSCNKOCET
 
Pointers and Array, pointer and String.pptx
Pointers and Array, pointer and String.pptxPointers and Array, pointer and String.pptx
Pointers and Array, pointer and String.pptxAnanthi Palanisamy
 
PM24_Oral_Presentation_Template_Guidelines.pptx
PM24_Oral_Presentation_Template_Guidelines.pptxPM24_Oral_Presentation_Template_Guidelines.pptx
PM24_Oral_Presentation_Template_Guidelines.pptxnissamant
 
Pre-assessment & Data Sheet presentation template - 2023.pptx
Pre-assessment & Data Sheet presentation template - 2023.pptxPre-assessment & Data Sheet presentation template - 2023.pptx
Pre-assessment & Data Sheet presentation template - 2023.pptxssuserc79a6f
 
UNIT I INTRODUCTION TO INTERNET OF THINGS
UNIT I INTRODUCTION TO INTERNET OF THINGSUNIT I INTRODUCTION TO INTERNET OF THINGS
UNIT I INTRODUCTION TO INTERNET OF THINGSbinuvijay1
 
20CE501PE – INDUSTRIAL WASTE MANAGEMENT.ppt
20CE501PE – INDUSTRIAL WASTE MANAGEMENT.ppt20CE501PE – INDUSTRIAL WASTE MANAGEMENT.ppt
20CE501PE – INDUSTRIAL WASTE MANAGEMENT.pptMohanumar S
 
Critical Literature Review Final -MW.pdf
Critical Literature Review Final -MW.pdfCritical Literature Review Final -MW.pdf
Critical Literature Review Final -MW.pdfMollyWinterbottom
 
Architectural Preservation - Heritage, focused on Saudi Arabia
Architectural Preservation - Heritage, focused on Saudi ArabiaArchitectural Preservation - Heritage, focused on Saudi Arabia
Architectural Preservation - Heritage, focused on Saudi ArabiaIgnacio J. Palma, Arch PhD.
 
Plant Design for bioplastic production from Microalgae in Pakistan.pdf
Plant Design for bioplastic production from Microalgae in Pakistan.pdfPlant Design for bioplastic production from Microalgae in Pakistan.pdf
Plant Design for bioplastic production from Microalgae in Pakistan.pdfMianHusnainIqbal2
 
SR Globals Profile - Building Vision, Exceeding Expectations.
SR Globals Profile -  Building Vision, Exceeding Expectations.SR Globals Profile -  Building Vision, Exceeding Expectations.
SR Globals Profile - Building Vision, Exceeding Expectations.srglobalsenterprises
 
Metrology Measurements and All units PPT
Metrology Measurements and  All units PPTMetrology Measurements and  All units PPT
Metrology Measurements and All units PPTdinesh babu
 
Sample Case Study of industry 4.0 and its Outcome
Sample Case Study of industry 4.0 and its OutcomeSample Case Study of industry 4.0 and its Outcome
Sample Case Study of industry 4.0 and its OutcomeHarshith A S
 
Presentation of Helmet Detection Using Machine Learning.pptx
Presentation of Helmet Detection Using Machine Learning.pptxPresentation of Helmet Detection Using Machine Learning.pptx
Presentation of Helmet Detection Using Machine Learning.pptxasmitaTele2
 
chap. 3. lipid deterioration oil and fat processign
chap. 3. lipid deterioration oil and fat processignchap. 3. lipid deterioration oil and fat processign
chap. 3. lipid deterioration oil and fat processignteddymebratie
 

Recently uploaded (20)

biofilm fouling of the membrane present in aquaculture
biofilm fouling of the membrane present in aquaculturebiofilm fouling of the membrane present in aquaculture
biofilm fouling of the membrane present in aquaculture
 
MAXIMUM POWER POINT TRACKING ALGORITHMS APPLIED TO WIND-SOLAR HYBRID SYSTEM
MAXIMUM POWER POINT TRACKING ALGORITHMS APPLIED TO WIND-SOLAR HYBRID SYSTEMMAXIMUM POWER POINT TRACKING ALGORITHMS APPLIED TO WIND-SOLAR HYBRID SYSTEM
MAXIMUM POWER POINT TRACKING ALGORITHMS APPLIED TO WIND-SOLAR HYBRID SYSTEM
 
SATHVIKA A AD21049 SELF INTRODUCTION.pdf
SATHVIKA A AD21049 SELF INTRODUCTION.pdfSATHVIKA A AD21049 SELF INTRODUCTION.pdf
SATHVIKA A AD21049 SELF INTRODUCTION.pdf
 
Introduction about Technology roadmap for Industry 4.0
Introduction about Technology roadmap for Industry 4.0Introduction about Technology roadmap for Industry 4.0
Introduction about Technology roadmap for Industry 4.0
 
Into the World of AI GDSC YCCE PPTX.pptx
Into the World of AI GDSC YCCE PPTX.pptxInto the World of AI GDSC YCCE PPTX.pptx
Into the World of AI GDSC YCCE PPTX.pptx
 
Student Challange as Google Developers at NKOCET
Student Challange as Google Developers at NKOCETStudent Challange as Google Developers at NKOCET
Student Challange as Google Developers at NKOCET
 
Pointers and Array, pointer and String.pptx
Pointers and Array, pointer and String.pptxPointers and Array, pointer and String.pptx
Pointers and Array, pointer and String.pptx
 
PM24_Oral_Presentation_Template_Guidelines.pptx
PM24_Oral_Presentation_Template_Guidelines.pptxPM24_Oral_Presentation_Template_Guidelines.pptx
PM24_Oral_Presentation_Template_Guidelines.pptx
 
Pre-assessment & Data Sheet presentation template - 2023.pptx
Pre-assessment & Data Sheet presentation template - 2023.pptxPre-assessment & Data Sheet presentation template - 2023.pptx
Pre-assessment & Data Sheet presentation template - 2023.pptx
 
UNIT I INTRODUCTION TO INTERNET OF THINGS
UNIT I INTRODUCTION TO INTERNET OF THINGSUNIT I INTRODUCTION TO INTERNET OF THINGS
UNIT I INTRODUCTION TO INTERNET OF THINGS
 
AC DISTRIBUTION - ELECTRICAL POWER SYSTEM
AC DISTRIBUTION - ELECTRICAL POWER SYSTEMAC DISTRIBUTION - ELECTRICAL POWER SYSTEM
AC DISTRIBUTION - ELECTRICAL POWER SYSTEM
 
20CE501PE – INDUSTRIAL WASTE MANAGEMENT.ppt
20CE501PE – INDUSTRIAL WASTE MANAGEMENT.ppt20CE501PE – INDUSTRIAL WASTE MANAGEMENT.ppt
20CE501PE – INDUSTRIAL WASTE MANAGEMENT.ppt
 
Critical Literature Review Final -MW.pdf
Critical Literature Review Final -MW.pdfCritical Literature Review Final -MW.pdf
Critical Literature Review Final -MW.pdf
 
Architectural Preservation - Heritage, focused on Saudi Arabia
Architectural Preservation - Heritage, focused on Saudi ArabiaArchitectural Preservation - Heritage, focused on Saudi Arabia
Architectural Preservation - Heritage, focused on Saudi Arabia
 
Plant Design for bioplastic production from Microalgae in Pakistan.pdf
Plant Design for bioplastic production from Microalgae in Pakistan.pdfPlant Design for bioplastic production from Microalgae in Pakistan.pdf
Plant Design for bioplastic production from Microalgae in Pakistan.pdf
 
SR Globals Profile - Building Vision, Exceeding Expectations.
SR Globals Profile -  Building Vision, Exceeding Expectations.SR Globals Profile -  Building Vision, Exceeding Expectations.
SR Globals Profile - Building Vision, Exceeding Expectations.
 
Metrology Measurements and All units PPT
Metrology Measurements and  All units PPTMetrology Measurements and  All units PPT
Metrology Measurements and All units PPT
 
Sample Case Study of industry 4.0 and its Outcome
Sample Case Study of industry 4.0 and its OutcomeSample Case Study of industry 4.0 and its Outcome
Sample Case Study of industry 4.0 and its Outcome
 
Presentation of Helmet Detection Using Machine Learning.pptx
Presentation of Helmet Detection Using Machine Learning.pptxPresentation of Helmet Detection Using Machine Learning.pptx
Presentation of Helmet Detection Using Machine Learning.pptx
 
chap. 3. lipid deterioration oil and fat processign
chap. 3. lipid deterioration oil and fat processignchap. 3. lipid deterioration oil and fat processign
chap. 3. lipid deterioration oil and fat processign
 

Velocity NY 2018: Monitoring Containers Correctly

Editor's Notes

  1. So I’m apart of a team at LinkedIn called Production-SRE The key tenants of production-sre at LinkedIn is: Assist in restoring stability during site-critical issues Developing applications to reduce MTTD and MTTR Provide direction and guidelines for site-troubleshooting Build tools for efficient site-issue troubleshooting, issue detection and correlation As this presentation goes on, you’ll notice how an Event Correlation system fits into these
  2. Cgroups Kernel >= 2.6.24 Namespaces Kernel >= 2.4.19 Copy-on-Write Linux Security Modules
  3. Resource limiting – groups can be set to not exceed a configured memory limit, which also includes the file system cache[8][9] Prioritization – some groups may get a larger share of CPU utilization[10] or disk I/O throughput[11] Accounting – measures a group's resource usage, which may be used, for example, for billing purposes[12] Control – freezing groups of processes, their checkpointing and restarting[12]
  4. Nlsv Ssv Fk nk