Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Twan Koot - Beyond the % usage, an in-depth look into monitoring

Since its beginning, the Performance Advisory Council aims to promote engagement between various experts from around the world, to create relevant, value-added content sharing between members. For Neotys, to strengthen our position as a thought leader in load & performance testing. During this event, 12 participants convened in Chamonix (France) exploring several topics on the minds of today’s performance tester such as DevOps, Shift Left/Right, Test Automation, Blockchain and Artificial Intelligence.

Related Books

Free with a 30 day trial from Scribd

See all

Related Audiobooks

Free with a 30 day trial from Scribd

See all
  • Be the first to comment

  • Be the first to like this

Twan Koot - Beyond the % usage, an in-depth look into monitoring

  1. 1. Beyond the % usage, an in-depth look into monitoring. Twan Koot
  2. 2. Introduction • Twan Koot 26 Years • Senior Performance tester / Engineer @ • 5 years of IT experience. • Loves: fast IT solutions on small hardware. • Hates: unfounded decisions on IT architecture. • Apart from working, I love to photograph and drive motorcycle.
  3. 3. Beyond the % usage, an in-depth look into monitoring. • Topics: • Method for analyzing recourse monitoring. • Showcasing some monitoring metrics. • Introduction into BCC tools.
  4. 4. Monitoring
  5. 5. Monitoring- thebasics – Analyze • So, you have run a performance test and even had monitoring running. • Now we can do the basic 3 step dance. • Check CPU, RAM and IO. • Check if any counter exceeds a static threshold like % usage. • Match if peaks in recourses overlap peaks in response times.
  6. 6. Monitoring- thebasics – Dashboard hype
  7. 7. USE
  8. 8. Monitoring– USE – Brendan Gregg • The USE method enables a Methodical approach to analyzing recourses. • It is developed by Brendan Gregg. http://www.brendangregg.com "Industry expert in computing performance and cloud computing. Solves hard problems. Makes things faster."
  9. 9. Monitoring– USE – USE method • Utilization – Average time a recourse was busy servicing work. • Saturation – The degree of work which can't be handled, and which is being queued. • Errors – The amount of errors. Recourse Utilization(Easy) Saturation(Moderate) Errors(Hard) CPU CPU utilization (%) Run-queue Length / scheduler latency Correctable CPU cache ECC events or faulted CPUs Memory Available free memory Anonymous paging or thread swapping Failed malloc()s Storage device I/O Device busy % Wait queue length Device errors
  10. 10. Monitoring– USE – The flow • How do we apply the USE method ? • We can use the following flow:
  11. 11. Monitoring
  12. 12. Monitoring– Let’sgo deeper Lots of tools
  13. 13. Monitoring– Let'sgo deeper - CPU Utilisation • One of most measured metrics during a performance test. • What does 80% utilization even mean ? • Overloaded ? One of the most misread metrics ! Util Sat CPU RAM I/O
  14. 14. Monitoring– Let'sgo deeper - CPU Utilisation • When checking the utilization counter we may observe: • What is happening: • What are we waiting for ? • IO • Memory • So, CPU utilization is wrong ? It’s a good starting point to begin monitoring Busy idle Busy Waiting"stalled" Waiting"Idle" Util Sat CPU RAM I/O
  15. 15. Monitoring– Let'sgo deeper – CPU Saturation • CPU saturation -> run queue • Nmon "k" • We see a run queue of 9, this will cause latency Util Sat CPU RAM I/O
  16. 16. Monitoring– Let'sgo deeper - Memory Utilisation • Using Nmon “m” Util Sat CPU RAM I/O
  17. 17. Monitoring– Let'sgo deeper - Memory Saturation • Using Nmon “m” “V” • We can see lots of page activity and big usage of swap space. Util Sat CPU RAM I/O
  18. 18. Monitoring– Let'sgo deeper - IO Utilisation • Using Nmon “d” • We can see multiple counters for measuring IO utilization. • We can measure the amount of data reads and writes to the disk and compare this to specs. • Reading 3726,2 transfers/sec. Util Sat CPU RAM I/O
  19. 19. Monitoring– Let'sgo deeper - IO Saturation • Using iostat -d to filter to a specific disk. • We can see we had a queue of ~ 43 requests (1 Sec interval). • queueing means latency. Util Sat CPU RAM I/O
  20. 20. (e)BPF/BCC
  21. 21. Monitoring– Let'sgo deeper – eBPF • ‘Extended’ Berkeley Packet Filter. • In-kernel virtual machine, to run mini filtered programs. • Gives access to many new metrics about kernel, performance, scheduler and more. • Some use-cases: • Deep performance analysis • Network tracing • DDOS mitigation/detection
  22. 22. Monitoring– Let'sgo deeper -BCC BPF Compiler Collection (BCC) “BCC is a toolkit for creating efficient kernel tracing and manipulation programs and includes several useful tools and examples.”
  23. 23. Monitoring– BCC – Overview Lots of tools
  24. 24. Monitoring– BCC– CPU saturation/ Runqlat • Runqlat is used to measure schedular latency. We can even filter to specific PID: Util Sat CPU RAM I/O
  25. 25. Monitoring– BCC– Cachestat / Cachetop • We can observe cache stats. • We can show the same stats with more information. Util Sat CPU RAM I/O
  26. 26. Monitoring– BCC– Biolatency/ Filetop • BCC contains powerful tools such as Biolatency • Which measures Disk I/O latency: • Using Filetop we can observe metrics about file activity. Util Sat CPU RAM I/O
  27. 27. Monitoring– BCC– Fileslower / Filelife • We can measure File read and writes slower than a threshold: • We can also measure the reads and writes to files using Filetop: Util Sat CPU RAM I/O
  28. 28. Monitoring– BCC– TCPlife • Used for tracing TCP sessions that open and close.
  29. 29. Monitoring– Hopes for afterthis showcase • More performance engineers start using a methodical approach for analyzing recourse monitoring. • Performance testers/engineers use this presentation as a start point to learn more about in-depth monitoring. • We can gaze upon more dashboards with metrics from eBPF or following the USE method
  30. 30. Monitoring– Let’srecap • Use USE for analyzing recourses. • Begin with analyzing Utilization of the recourses. • Go deeper by checking the Saturation metrics. • When available check the Error metrics for the recourses. • Use BCC tools to analyze even more metrics. • When analyzing monitoring data keep Yoda in mind. Thank you!
  31. 31. Thank you

×