Heinrich.Hartmann@Circonus.com
Linux System Monitoring with eBPF
DevOpsDays Kiel, 2018-05-16
Heinrich Hartmann
Heinrich.Hartmann@Circonus.com
System Monitoring is about Kernel & Hardware
Heinrich.Hartmann@Circonus.com
Best Practice: The USE Method
https://www.circonus.com/2017/08/system-monitoring-with-the-use-dashboard
CPU
Memory
Network
Disks
Utilization Saturation Errors
Heinrich.Hartmann@Circonus.com
Best Practice: The USE Method
https://www.circonus.com/2017/08/system-monitoring-with-the-use-dashboard
CPU
Memory
Network
Disks
Utilization Saturation Errors
Heinrich.Hartmann@Circonus.com
Lot’s of Unknowns remaining
https://www.circonus.com/2017/08/system-monitoring-with-the-use-dashboard
?
?
?
~
~ ~
CPU
Memory
Network
Disks
Utilization Saturation Errors
Heinrich.Hartmann@Circonus.com
eBPF allows unparalleled insights
https://github.com/iovisor/bcc
Credits:
- Brendan Gregg @ Netflix (Sun)
- Sasha Goldshtein @ Sela, Microsoft
- Brenden Blanco @ VMWare
- Linus Torvalds, et. al.
Heinrich.Hartmann@Circonus.com
eBPF allows unparalleled insights
https://github.com/iovisor/bcc
Credits:
- Brendan Gregg @ Netflix (Sun)
- Sasha Goldshtein @ Sela, Microsoft
- Brenden Blanco @ VMWare
- Linus Torvalds, et. al.
Heinrich.Hartmann@Circonus.com
CPU: Scheduling Latency
Heinrich.Hartmann@Circonus.com
Disk: Block-I/O Latency
Heinrich.Hartmann@Circonus.com
Disk: Block-I/O Latency
Heinrich.Hartmann@Circonus.com
Disk: Block-I/O Latency over time
Heinrich.Hartmann@Circonus.com
Disk: Block-I/O Latency over time
Heinrich.Hartmann@Circonus.com
Don’t shout in the Datacenter
Brendan Gregg (2008) https://www.youtube.com/watch?v=tDacjrSCeq4
Heinrich.Hartmann@Circonus.com
System Calls: The Kernel API
Monitor
Rate
Errors
Duration
System Call API
Heinrich.Hartmann@Circonus.com
Syscalls: Rate / Count
sched_yield (2tn)
clock_time (1.5tn)
recvfrom (300bn)
394 Metrics
Heinrich.Hartmann@Circonus.com
Syscalls: Duration
1
us
10
us
Heinrich.Hartmann@Circonus.com
Syscall durations span >8 orders of magnitude
1s
100
ms
10
us 1.5 tn
events total
Heinrich.Hartmann@Circonus.com
File System: Latency
Heinrich.Hartmann@Circonus.com
Memory: Allocation Latency
Heinrich.Hartmann@Circonus.com
Further Reading
Slides: @HeinrichHartman / #dodkiel18
Code: https://github.com/circonus-labs/nad/.../bccbpf
Blog: http://www.circonus.com/2018/05/linux-system-monitoring-with-ebpf/

Linux System Monitoring with eBPF