Velocity NY 2018: Monitoring Containers Correctly

•Download as PPTX, PDF•

0 likes•192 views

Michael Kehoe walks you through building a small monitoring utility for cgroup containers to illustrate best practices in container monitoring. You'll explore various cgroup constraints and learn how to specifically monitor for each of them to ensure that your application is behaving as expected. Along the way, Michael shares tricks and tips about monitoring containerized applications.

Engineering

Monitoring Containers Correctly
Michael Kehoe
Staff Site Reliability Engineer
https://github.com/michael-kehoe/container-monitoring-workshop

Getting Started
• Setup your workshop platform:
• https://app.strigo.io/event/QXDpmTiR
AufQ4LBis
• Token: F7C7
• Background slides:
https://bit.ly/2NcEBQN
• Code repo: https://github.com/michael-
kehoe/container-monitoring-workshop
• Please let me know ASAP if you’re

Today’s
agenda
1 Introductions
2 Container Primitives
3 What we’ll monitor
4 Cgroup interface file formats
5 Exercises

Today’s
agenda
Exercises
100 CPU Basics
101 CPU Enhanced
102 CPU Advanced
200 Memory Basics
201 Memory Enhanced
300 IO Basics
400 PID

Michael Kehoe
$ WHOAMI
• Staff Site Reliability Engineer @ LinkedIn
• Production-SRE Team
• Funny accent = Australian + 4 years
American
• Worked on:
• Networks
• Micro-services
• Traffic Engineering
• Databases

Production-SRE Team @ LinkedIn
$ WHOAMI
• Disaster Recovery - Planning &
Automation
• Incident Response – Process &
Automation
• Visibility Engineering – Making use of
operational data
• Reliability Principles – Defining best
practice & automating it

Containers
Limiting the
resources that can
be used by a
process/ set of
processes
cgroups
Isolating filesystem
resources
Namespaces
Implicit sharing or
shadowing
Copy on Write
Locking down
container privileges
Linux Security
Modules

Cgroup
• Abbreviation for ‘Control Groups’
• Provides
• Resource Limiting
• Prioritization
• Accounting
• Control

• 100: Basic cgroup CPU
utilization
• 101: Enhanced cgroup CPU
utilization (with percentiles
• 102: cgroup throttles
What we’ll monitor
CPU

• 200: Memory Basics
• Cgroup utilization
• 201: Enhanced Memory
Metrics
What we’ll monitor
MEMORY

• 300: Disk IO Monitoring
What we’ll monitor
DISK/ NETWORK

• 400: PID Utilization
What we’ll monitor
PID

Cgroup interface file formats
https://www.kernel.org/doc/Documentation/cgroup-v2.txt

Velocity NY 2018: Monitoring Containers Correctly

Recently uploaded

Model Call Girl in Narela Delhi reach out to us at 🔝8264348440🔝soniya singh

Biology for Computer Engineers Course Handout.pptxDeepakSakkari2

Sachpazis Costas: Geotechnical Engineering: A student's Perspective IntroductionDr.Costas Sachpazis

9953056974 Call Girls In South Ex, Escorts (Delhi) NCR.pdf9953056974 Low Rate Call Girls In Saket, Delhi NCR

What are the advantages and disadvantages of membrane structures.pptxwendy cai

Call Girls Narol 7397865700 Independent Call Girlsssuser7cb4ff

College Call Girls Nashik Nehal 7001305949 Independent Escort Service NashikCall Girls in Nagpur High Profile

High Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur EscortsCall Girls in Nagpur High Profile

VICTOR MAESTRE RAMIREZ - Planetary Defender on NASA's Double Asteroid Redirec...VICTOR MAESTRE RAMIREZ

VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130Suhani Kapoor

Past, Present and Future of Generative AIabhishek36461

chaitra-1.pptx fake news detection using machine learningmisbanausheenparvam

Decoding Kotlin - Your guide to solving the mysterious in Kotlin.pptxJoão Esperancinha

HARMONY IN THE NATURE AND EXISTENCE - Unit-IVRajaP95

power system scada applications and usesDevarapalliHaritha

main PPT.pptx of girls hostel security using rfidNikhilNagaraju

Exploring_Network_Security_with_JA3_by_Rakesh Seal.pptxnull - The Open Security Community

★ CALL US 9953330565 ( HOT Young Call Girls In Badarpur delhi NCR9953056974 Low Rate Call Girls In Saket, Delhi NCR

OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...Soham Mondal

Gurgaon ✡️9711147426✨Call In girls Gurgaon Sector 51 escort servicejennyeacort

Recently uploaded (20)

Model Call Girl in Narela Delhi reach out to us at 🔝8264348440🔝

Biology for Computer Engineers Course Handout.pptx

Sachpazis Costas: Geotechnical Engineering: A student's Perspective Introduction

9953056974 Call Girls In South Ex, Escorts (Delhi) NCR.pdf

What are the advantages and disadvantages of membrane structures.pptx

Call Girls Narol 7397865700 Independent Call Girls

College Call Girls Nashik Nehal 7001305949 Independent Escort Service Nashik

High Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur Escorts

VICTOR MAESTRE RAMIREZ - Planetary Defender on NASA's Double Asteroid Redirec...

VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130

Past, Present and Future of Generative AI

chaitra-1.pptx fake news detection using machine learning

Decoding Kotlin - Your guide to solving the mysterious in Kotlin.pptx

HARMONY IN THE NATURE AND EXISTENCE - Unit-IV

power system scada applications and uses

main PPT.pptx of girls hostel security using rfid

Exploring_Network_Security_with_JA3_by_Rakesh Seal.pptx

★ CALL US 9953330565 ( HOT Young Call Girls In Badarpur delhi NCR

OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...

Gurgaon ✡️9711147426✨Call In girls Gurgaon Sector 51 escort service

Velocity NY 2018: Monitoring Containers Correctly

1. Monitoring Containers Correctly Michael Kehoe Staff Site Reliability Engineer https://github.com/michael-kehoe/container-monitoring-workshop

2. Getting Started • Setup your workshop platform: • https://app.strigo.io/event/QXDpmTiR AufQ4LBis • Token: F7C7 • Background slides: https://bit.ly/2NcEBQN • Code repo: https://github.com/michael- kehoe/container-monitoring-workshop • Please let me know ASAP if you’re

3. Today’s agenda 1 Introductions 2 Container Primitives 3 What we’ll monitor 4 Cgroup interface file formats 5 Exercises

4. Today’s agenda Exercises 100 CPU Basics 101 CPU Enhanced 102 CPU Advanced 200 Memory Basics 201 Memory Enhanced 300 IO Basics 400 PID

5. Michael Kehoe $ WHOAMI • Staff Site Reliability Engineer @ LinkedIn • Production-SRE Team • Funny accent = Australian + 4 years American • Worked on: • Networks • Micro-services • Traffic Engineering • Databases

6. Production-SRE Team @ LinkedIn $ WHOAMI • Disaster Recovery - Planning & Automation • Incident Response – Process & Automation • Visibility Engineering – Making use of operational data • Reliability Principles – Defining best practice & automating it

7. Container Primitives

8. Containers Limiting the resources that can be used by a process/ set of processes cgroups Isolating filesystem resources Namespaces Implicit sharing or shadowing Copy on Write Locking down container privileges Linux Security Modules

9. Cgroup • Abbreviation for ‘Control Groups’ • Provides • Resource Limiting • Prioritization • Accounting • Control

10. What we’ll monitor

11. • 100: Basic cgroup CPU utilization • 101: Enhanced cgroup CPU utilization (with percentiles • 102: cgroup throttles What we’ll monitor CPU

12. • 200: Memory Basics • Cgroup utilization • 201: Enhanced Memory Metrics What we’ll monitor MEMORY

13. • 300: Disk IO Monitoring What we’ll monitor DISK/ NETWORK

14. • 400: PID Utilization What we’ll monitor PID

15. Cgroup interface file formats

16. Cgroup interface file formats https://www.kernel.org/doc/Documentation/cgroup-v2.txt

17. Exercises

18. 100: CPU Monitoring

19. 101: Enhanced CPU Monitoring

20. Enhanced CPU Monitoring

21. 102: CPU Advanced Monitoring

22. Advanced CPU Monitoring

23. 200: Memory Basics

24. 201: Memory Enhanced

25. 300: Disk IO Basics

26. 400: PID Monitoring

Editor's Notes

So I’m apart of a team at LinkedIn called Production-SRE The key tenants of production-sre at LinkedIn is: Assist in restoring stability during site-critical issues Developing applications to reduce MTTD and MTTR Provide direction and guidelines for site-troubleshooting Build tools for efficient site-issue troubleshooting, issue detection and correlation As this presentation goes on, you’ll notice how an Event Correlation system fits into these
Cgroups Kernel >= 2.6.24 Namespaces Kernel >= 2.4.19 Copy-on-Write Linux Security Modules
Resource limiting – groups can be set to not exceed a configured memory limit, which also includes the file system cache[8][9] Prioritization – some groups may get a larger share of CPU utilization[10] or disk I/O throughput[11] Accounting – measures a group's resource usage, which may be used, for example, for billing purposes[12] Control – freezing groups of processes, their checkpointing and restarting[12]
Nlsv Ssv Fk nk

Velocity NY 2018: Monitoring Containers Correctly

Recommended

Recommended

More Related Content

More from Michael Kehoe

More from Michael Kehoe (18)

Recently uploaded

Recently uploaded (20)

Velocity NY 2018: Monitoring Containers Correctly

Editor's Notes