Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

The power of linux advanced tracer [POUG18]

1,147 views

Published on

Published in: Technology
  • Intimacy has never been so much fun! Buy the clinically proven men's natural supplement that helped guys increase satisfaction by 71.43%! ♥♥♥ https://tinyurl.com/yy3nfggr
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here
  • Have You Seen Mike Walden's new holistic acne System yet? It's called Acne No More I've read the whole thing (all 223 pages) and there's some great information in there about how to naturally and permanently eliminate your acne without drugs, creams or any kind of gimmicks. I highly recommend it - it's very honest and straightforward without all the hype and b.s. you see all over the net these days. Here's the website where you can get more information  http://t.cn/AiWGkfAm
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here
  • Is Your Ex With a Man? Don't lose your Ex girlfriend! This weird trick will get her back! ♥♥♥ http://ow.ly/f23I301xGAo
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here

The power of linux advanced tracer [POUG18]

  1. 1. THE POWER OF LINUX ADVANCED TRACER HATEM MAHMOUD HTTPS://MAHMOUDHATEM.WORDPRESS.COM HIGH FIVE POUG
  2. 2. 2 WHO AM I Oracle DBA Oracle experience: 7 years Located in TUNISIA Oracle Certified Master Oracle geek https://mahmoudhatem.wordpress.com
  3. 3. 3 TAKE AWAYS • Better understanding of Linux tracing landscape • Getting an idea of what can be done. As someone else said : “Knowing what can be done is more important than knowing how to do it - you can always google that”
  4. 4. 4 AGENDA 1. Linux tracing landscape 2. Static tracing 3. Dynamic tracing 4. Monkey patching 5. Deeper look at CPU utilization
  5. 5. 5 LINUX TRACING LANDSCAPE
  6. 6. 6 LINUX TRACING TIMELINE http://www.brendangregg.com/Slides/SCALE2017_perf_analysis_eBPF.pdf
  7. 7. 7 LINUX TRACING LANDSCAPE eBPF kprobe uprobe tracepoints software events hardware events systemtap perf_events bcc/ebpf USDT Ftrace
  8. 8. 8 LINUX TRACING SYSTEMS • systemtap,perf,bcc,pmu-tools,etcFront-end tools • stap module,eBPF,perf_events (perf_event_open syscall ),ftrace(/sys/kernel/debug/tracing),etc Mechanisms for extracting data • kprobes and uprobes (dynamic tracing), • tracepoints ,software events and USDT (static tracing) • PMCs (hardware counters). • Etc Event source https://jvns.ca/blog/2017/07/05/linux-tracing-systems/ Breakdown as suggested by Brendan Gregg and Julia Evans
  9. 9. 9 LINUX TRACING SYSTEMS • systemtap,perf,bcc,pmu-tools,etcFront-end tools • stap module,eBPF,perf_events (perf_event_open syscall ),ftrace(/sys/kernel/debug/tracing),etc Mechanisms for extracting data • kprobes and uprobes (dynamic tracing), • tracepoints ,software events and USDT (static tracing) • PMCs (hardware counters). • Etc Event source https://jvns.ca/blog/2017/07/05/linux-tracing-systems/ Breakdown as suggested by Brendan Gregg and Julia Evans
  10. 10. 10 LINUX TRACING SYSTEMS • systemtap,perf,bcc,pmu-tools,etcFront-end tools • stap module,eBPF,perf_events (perf_event_open syscall ),ftrace(/sys/kernel/debug/tracing),etc Mechanisms for extracting data • kprobes and uprobes (dynamic tracing), • tracepoints ,software events and USDT (static tracing) • PMCs (hardware counters). • Etc Event source https://jvns.ca/blog/2017/07/05/linux-tracing-systems/ Breakdown as suggested by Brendan Gregg and Julia Evans
  11. 11. 11 LINUX TRACING SYSTEMS • systemtap,perf,bcc,pmu-tools,etcFront-end tools • stap module,eBPF,perf_events (perf_event_open syscall ),ftrace(/sys/kernel/debug/tracing),etc Mechanisms for extracting data • kprobes and uprobes (dynamic tracing), • tracepoints ,software events and USDT (static tracing) • PMCs (hardware counters). • Etc Event source https://jvns.ca/blog/2017/07/05/linux-tracing-systems/ Breakdown as suggested by Brendan Gregg and Julia Evans
  12. 12. 12 STATIC TRACING
  13. 13. 13 STATIC TRACING Tracepoints : • Kernel predefined trace probe • Inserted by kernel developers at important locations in the code (system calls, disk I/O, etc) User Statically-Defined Tracing (USDT) : • Application predefined trace probe • Inserted by application developers at important locations in the code, Software Events : • kernel counters (CPU migrations, minor faults, major faults,etc) http://www.brendangregg.com/perf.html
  14. 14. 14 KERNEL STATIC TRACEPOINT
  15. 15. 15 BCC/TOOLS : BIOLATENCY SUMMARIZE BLOCK DEVICE I/O LATENCY AS A HISTOGRAM https://github.com/iovisor/bcc/blob/master/tools/biolatency_example.txt • Traditional tools such iostat and sar show average latency which can be misleading (Hide latency outliers) • Need to study the full distribution • Biolatency based on kernel tracepoints (blk_start_request, blk_account_io_completion,etc)
  16. 16. 16 BCC/TOOLS : EXT4SLOWER TRACE SLOW EXT4 OPERATIONS. https://github.com/iovisor/bcc/blob/master/tools/ext4slower_example.txt • Better measure of the latency suffered by applications reading from the file system. • The measured Latency spans • block device I/O (disk I/O) • file system CPU cycles • file system locks • run queue latency • etc Great CPU saturation metric !
  17. 17. 17 BCC/TOOLS : RUNQLAT: RUN QUEUE (SCHEDULER) LATENCY AS A HISTOGRAM https://github.com/iovisor/bcc/blob/master/tools/runqlat_example.txt • The best CPU saturation metrics are measures of run queue (or scheduler) latency. • Time a task spends waiting on a run queue for a turn on-CPU, • Better than the run queue length metric for estimating the magnitude of CPU saturation !
  18. 18. 18 BCC/TOOLS : RUNQLAT: RUN QUEUE (SCHEDULER) LATENCY AS A HISTOGRAM https://github.com/iovisor/bcc/blob/master/tools/runqlat_example.txt
  19. 19. 19 https://github.com/brendangregg/systemtap-lwtools https://github.com/iovisor/bcc https://github.com/brendangregg/perf-tools
  20. 20. 20 SYSTEMTAP : SCHEDTIMES_WSI.STP : TRACK TIME PROCESSES SPEND IN VARIOUS STATES https://mahmoudhatem.wordpress.com/2017/02/06/extending-systemtap-scripts-with-oracle-session-info/ • Bring application context to your monitoring tools !
  21. 21. 21 USERSPACE STATIC TRACEPOINT/USDT
  22. 22. 22 BCC/TOOLS : DBSLOWER: TRACE MYSQL/POSTGRESQL QUERIES SLOWER THAN A THRESHOLD https://github.com/iovisor/bcc/blob/master/tools/dbslower_example.txt • dbslower is based USDT probes (needs MySQL and PostgreSQL built with USDT (DTrace) support.
  23. 23. 23 ORACLE database don’t have USDT support 
  24. 24. 24 DYNAMIC TRACING
  25. 25. 25 DYNAMIC TRACING • Dynamically instrumenting (creating events in) any software location. • kprobes: kernel dynamic tracing • uprobes: user-level dynamic tracing • No need to modify the probed process's binaries or restart the program.
  26. 26. 26 DYNAMIC TRACING (UPROBE) • Function prologue of “kskthewt”(called at the end of an Oracle wait event) before inserting probe point : • After inserting a probe point at function call : The original opcode was replaced with int3 (software interrupt). https://mahmoudhatem.wordpress.com/2017/03/21/uprobes-issue-with-oracle-12c/
  27. 27. 27 TRACING ORACLE KERNEL FUNCTIONS kcbgtcr kcbgcur kcbzib kskthbwt kskthewt kcbzgb kcbzvb opiexe delrow qerdlFetch kpoal8
  28. 28. 28 SYSTEMTAP : TRACING ORACLE WAIT EVENTS https://externaltable.blogspot.com/2014/09/systemtap-into-oracle-for-fun-and-profit.html
  29. 29. 29 SYSTEMTAP : AGGREGATIONS AND FILTERING OF WAIT EVENT DATA https://externaltable.blogspot.com/2014/09/systemtap-into-oracle-for-fun-and-profit.html Collect and display microsec-precision histograms for all Oracle version (Note 12.1.0.2 has V$EVENT_HISTOGRAM_MICRO) What this wait event and the other I/O wait events are really measuring ?
  30. 30. 30 SYSTEMTAP : WHAT ARE THE I/O-RELATED WAIT EVENTS REALLY MEASURING? [TRACING LOGICAL AND PHYSICAL I/O ] https://externaltable.blogspot.com/2014/11/life-of-oracle-io-tracing-logical-and.html The elapsed time for the wait event "direct path read" does not accurately reflect I/O latency
  31. 31. 31 TRACING BEYOND FUNCTION BOUNDARY PROBE AT SPECIFIC ORACLE KERNEL FUNCTION OFFSET
  32. 32. 32 SYSTEMTAP : A SIMPLE USER/PASSWORD SNIFFER https://mahmoudhatem.wordpress.com/2018/03/23/systemtap-probe-at-specific-oracle-function-offset-bonus/ • Powerful and scary at the same time !
  33. 33. 33 TRACING PL/SQL
  34. 34. 34 SYSTEMTAP : TRACING PL/SQL WITH LINE NUMBER https://mahmoudhatem.wordpress.com/2017/09/15/geeky-plsql-tracerprofiler-first-step/
  35. 35. 35 SYSTEMTAP : TRACING PL/SQL SUBPROGRAM CALLS WITH PARAMETERS VALUES https://mahmoudhatem.wordpress.com/2017/11/29/tracing-pl-sql-subprogram-calls-with-parameters-values-dynamic-tracing/
  36. 36. 36 SYSTEMTAP : FROM MEMORY REQUEST TO PL/SQL SOURCE LINE https://mahmoudhatem.wordpress.com/2018/01/15/from-memory-request-to-pl-sql-source-line/ Based on v$process_memory_detail
  37. 37. 37 MONKEY PATCHING ACTIVE MANIPULATIONS OF STATE
  38. 38. 38 SYSTEMTAP : A MINI ORACLE DB FIREWALL [LIVE PATCHING] https://mahmoudhatem.wordpress.com/2016/04/18/systemtap-a-mini-oracle-db-firewall/ https://externaltable.blogspot.com/2016/03/systemtap-guru-mode-and-oracle-sql.html
  39. 39. 39 SYSTEMTAP : PLAYING WITH ORACLE DB 18C ON-PREMISES BEFORE OFFICIAL RELEASE https://mahmoudhatem.wordpress.com/2018/03/01/playing-with-oracle-db-18c-on-premises-before-official-release/
  40. 40. 40 DEEPER LOOK AT CPU UTILIZATION • Which code-paths are causing high CPU usage ? • What’s my CPU bottleneck ? • How much my CPU are stalled ? For what resource ?
  41. 41. 41 CPU PROFILING • Linux advanced tracer tools are capable of lightweight profiling of CPU usage by stack sampling such as : • Systemtap • Perf • Bcc • To quickly understand CPU usage the collected profiling data can be Visualized using a Flame graphs. http://www.brendangregg.com/flamegraphs.html
  42. 42. 42 FLAMEGRAPH https://fr.slideshare.net/ennael/kernel-recipes-2017-using-linux-perf-at-netflix-brendan-gregg
  43. 43. 43 EXTENDED FLAMEGRAPH : WAIT EVENTS https://mahmoudhatem.wordpress.com/2016/09/23/perf_events-offonmixed-cpu-flamegraph-extended-with-oracle-wait-events/ https://db-blog.web.cern.ch/blog/luca-canali/2015-11-oracle-wait-events-investigated-extended-stack-profiling-and-flame-graphs
  44. 44. 44 EXTENDED FLAMEGRAPH : PL/SQL PROGRAM AND LINE NUMBER https://mahmoudhatem.wordpress.com/2017/09/22/geeky-plsql-tracerprofiler-another-step/
  45. 45. 45 BUT WHAT THAT FUNCTIONS WAS DOING WHEN THEY WHERE ON-CPU ? RUNNING OR STALLED ?
  46. 46. 46 CPU UTILIZATION IS WRONG http://www.brendangregg.com/blog/2017-05-09/cpu-utilization-is-wrong.html
  47. 47. 47 WHEN THE CPU UTILIZATION DOES NOT TELL YOU THE UTILIZATION OF THE CPU PERFORMANCE MONITOR COUNTER - A BETTER WAY TO MEASURE CPU UTILIZATION *The next sections are only covering the Intel platforms
  48. 48. 48 HARDWARE EVENTS (PMC) • PMCs instrument low-level processor activity • Can be used to understand how efficiently a workload uses the processor resources (CPU caches, MMU, memory busses, CPU interconnects,Execution units,etc) • PMCs : • Cores : Measure only values on a single core • Uncore : The shared socket-wide values
  49. 49. 49 HARDWARE EVENTS https://www.intel.com/content/www/us/en/architecture-and-technology/64-ia-32-architectures-software-developer-vol-3b-part-2-m • PMCs are documented in the Intel Software Developer's Manual Volume 3B: System Programming Guide, Part 2
  50. 50. 50 HARDWARE EVENTS • Not all of them are listed when using perf list !
  51. 51. 51 HIGH-LEVEL METRICS (IPC A GENERAL EFFICIENCY METRIC ) • Events can be observed and combined to create useful high-level metrics such as Instruction per Cycle (IPC) * Modern superscalar processors can issue multiples instructions per cycle
  52. 52. 52 CPI FLAME GRAPH • The color now shows what that function was doing when it was on- CPU: running or stalled • Highest CPI blue (slowest instructions) • Lowest CPI red (fastest instructions) • Visualization of CPU efficiency by function. https://mahmoudhatem.wordpress.com/2017/10/26/deeper-look-at-cpu-utilization-the-power-of-pmu-events/ get consistent read
  53. 53. 53 IPC INTERPRETATION AND ACTIONABLE ITEMS http://www.brendangregg.com/blog/2017-05-09/cpu-utilization-is-wrong.html • A good starting point for identifying what the CPU is really doing is IPC (Instruction per cycle)
  54. 54. 54 WHERE ARE WE REALLY WASTING OUR PRECIOUS CPU CYCLES ? False data sharing Split Stores Loads Blocked by Store Forwarding 4K Aliasing DTLB miss Microcode assists Memory Bandwidth Memory Latency Bad speculation Port Utilization L1 miss L2 miss Vectorization Remote DRAM
  55. 55. 55 PMC-CLOUD-TOOLS/PMCARCH https://github.com/brendangregg/pmc-cloud-tools
  56. 56. 56 PMC-CLOUD-TOOLS/TLBSTAT https://github.com/brendangregg/pmc-cloud-tools
  57. 57. 57 PMC-CLOUD-TOOLS/CPUCACHE https://github.com/brendangregg/pmc-cloud-tools
  58. 58. 58 MESURING IPC IS GOOD STARTING POINT BUT HOW TO DRILL DOWN FURTHER ? A specific microarchitecture may make available hundreds of events through its PMU ! Which events are useful in detecting the true bottleneck ? Require and in-depth knowledge of both the microarchitecture design and PMU specifications ! “Analysis without a methodology can become a fishing expedition, where metrics are examined ad hoc, until the issue is found –if it is at all.” Source: Brendan D. Gregg, http://www.brendangregg.com/methodology.html
  59. 59. 59 TOP-DOWN MICRO-ARCHITECTURE ANALYSIS METHOD [ TMAM ] • Systematically Find True Bottleneck (Eliminates guess work) • Provide an hierarchical execution cycles breakdown (CPI breakdown) • Avoids the µ-arch high-learning curve • Correctly Characterizes All Workloads • Frequent performance bottlenecks are organized in a hierarchical structure https://software.intel.com/en-us/vtune-amplifier-help-tuning-applications-using-a-top-down-microarchitecture-analysis-method
  60. 60. 60 THE TMAM HIERARCHY https://www.intel.com/content/dam/www/public/us/en/documents/manuals/64-ia-32-architectures-optimization-manual.pdf
  61. 61. 61 PERF http://cs.haifa.ac.il/~yosi/PARC/yasin.pdf Linux perf supports TopDown Level-1 metrics since Linux kernel 4.8
  62. 62. 62 PMU-TOOLS : TOPLEV.PY https://github.com/andikleen/pmu-tools/wiki/toplev-manual • toplev is a tool, part of pmu-tools, that implements TopDown on top of Linux perf
  63. 63. 63 INTEL VTUNE : GENERAL EXPLORATION https://software.intel.com/en-us/intel-vtune-amplifier-xe
  64. 64. 64 INTEL VTUNE : GROUPING BY FUNCTION/CALL STACK https://software.intel.com/en-us/intel-vtune-amplifier-xe get consistent read kernel data scan table full
  65. 65. 65 TMAM EXAMPLE TEST env : ORACLE 12.2.0.1/OEL 7.0 /kernel-3.10 /Processor i5-6500 /2*DDR3-1600 (4GB*2) Testing the impact of huge pages with SLOB LIO test & intel vtune
  66. 66. 66 SLOB CONF
  67. 67. 67 WITHOUT HUGEPAGES : LIOPS 3 099 420 DTLB overhead was measured using the following formula
  68. 68. 68 WITH HUGEPAGES : LIOPS 3 415 969 About 10% improvement Workload Characterization How much ??
  69. 69. 69 MEASURING MEMORY THROUGHPUT https://github.com/LucaCanali/Miscellaneous/blob/master/Spark_Notes/Tools_Linux_Memory_Perf_Measure.md • Other tools that can be used to measure memory throughput and many other metrics (QPI utilisation, power consumption,local and remote memory bandwidth,etc) : • Intel Processor Counter Monitor (PCM) • Likwid • pmu-tools • Perf (ex:MEM_BW_READS = CAS_COUNT.RD*64 (size of cache line).) https://yunmingzhang.wordpress.com/2015/07/22/measure-memory-bandwidth-using-uncore-counters/ High memory bandwidth utilization can have an impact on main memory latency !
  70. 70. 70 MEMORY BANDWIDTH VS LATENCY RESPONSE CURVE • Even if this two concepts are often described independently they are inherently interrelated. • According to Bruce Jacob in ” The memory system: you can’t avoid it, you can’t ignore it, you can’t fake it” the bandwidth vs latency response curve for a system has three regions : • Constant region: The latency response is fairly constant for the first 40% of the sustained bandwidth. • Linear region: In between 40% to 80% of the sustained bandwidth, the latency response increases almost linearly with the bandwidth demand of the system due to contention overhead by numerous memory requests. • Exponential region: Between 80% to 100% of the sustained bandwidth, the memory latency is dominated by the contention latency which can be as much as twice the idle latency or more. • Maximum sustained bandwidth : Is 65% to 75% of the theoretical maximum bandwidth. https://mahmoudhatem.wordpress.com/2017/11/07/memory-bandwidth-vs-latency-response-curve/
  71. 71. 71 MEMORY BANDWIDTH VS LATENCY RESPONSE CURVE • Visualization of how memory latency is affected by the increase of the memory bandwidth consumption. • Armed with Intel Memory Latency Checker (MLC) let’s check our current system ! https://mahmoudhatem.wordpress.com/2017/11/07/memory-bandwidth-vs-latency-response-curve/
  72. 72. 72 “PMCS ARE CRUCIAL FOR ANALYZING A (IF NOT THE) MODERN SYSTEM BOTTLENECK: MEMORY I/O.” http://www.brendangregg.com/blog/2017-05-04/the-pmcs-of-ec2.html Brendan Gregg
  73. 73. 73 THANK YOU FOR YOUR ATTENTION https://mahmoudhatem.wordpress.com @Hatem__Mahmoud https://linkedin.com/in/mahmoudhatemoracle

×