Delivered as plenary at USENIX LISA 2013. video here: https://www.youtube.com/watch?v=nZfNehCzGdw and https://www.usenix.org/conference/lisa13/technical-sessions/plenary/gregg . "How did we ever analyze performance before Flame Graphs?" This new visualization invented by Brendan can help you quickly understand application and kernel performance, especially CPU usage, where stacks (call graphs) can be sampled and then visualized as an interactive flame graph. Flame Graphs are now used for a growing variety of targets: for applications and kernels on Linux, SmartOS, Mac OS X, and Windows; for languages including C, C++, node.js, ruby, and Lua; and in WebKit Web Inspector. This talk will explain them and provide use cases and new visualizations for other event types, including I/O, memory usage, and latency.
Talk for SCaLE13x. Video: https://www.youtube.com/watch?v=_Ik8oiQvWgo . Profiling can show what your Linux kernel and appliacations are doing in detail, across all software stack layers. This talk shows how we are using Linux perf_events (aka "perf") and flame graphs at Netflix to understand CPU usage in detail, to optimize our cloud usage, solve performance issues, and identify regressions. This will be more than just an intro: profiling difficult targets, including Java and Node.js, will be covered, which includes ways to resolve JITed symbols and broken stacks. Included are the easy examples, the hard, and the cutting edge.
USENIX ATC 2017: Visualizing Performance with Flame GraphsBrendan Gregg
Talk by Brendan Gregg for USENIX ATC 2017.
"Flame graphs are a simple stack trace visualization that helps answer an everyday problem: how is software consuming resources, especially CPUs, and how did this change since the last software version? Flame graphs have been adopted by many languages, products, and companies, including Netflix, and have become a standard tool for performance analysis. They were published in "The Flame Graph" article in the June 2016 issue of Communications of the ACM, by their creator, Brendan Gregg.
This talk describes the background for this work, and the challenges encountered when profiling stack traces and resolving symbols for different languages, including for just-in-time compiler runtimes. Instructions will be included generating mixed-mode flame graphs on Linux, and examples from our use at Netflix with Java. Advanced flame graph types will be described, including differential, off-CPU, chain graphs, memory, and TCP events. Finally, future work and unsolved problems in this area will be discussed."
Talk by Brendan Gregg for USENIX LISA 2019: Linux Systems Performance. Abstract: "
Systems performance is an effective discipline for performance analysis and tuning, and can help you find performance wins for your applications and the kernel. However, most of us are not performance or kernel engineers, and have limited time to study this topic. This talk summarizes the topic for everyone, touring six important areas of Linux systems performance: observability tools, methodologies, benchmarking, profiling, tracing, and tuning. Included are recipes for Linux performance analysis and tuning (using vmstat, mpstat, iostat, etc), overviews of complex areas including profiling (perf_events) and tracing (Ftrace, bcc/BPF, and bpftrace/BPF), and much advice about what is and isn't important to learn. This talk is aimed at everyone: developers, operations, sysadmins, etc, and in any environment running Linux, bare metal or the cloud."
Kernel Recipes 2017: Using Linux perf at NetflixBrendan Gregg
Talk for Kernel Recipes 2017 by Brendan Gregg. "Linux perf is a crucial performance analysis tool at Netflix, and is used by a self-service GUI for generating CPU flame graphs and other reports. This sounds like an easy task, however, getting perf to work properly in VM guests running Java, Node.js, containers, and other software, has been at times a challenge. This talk summarizes Linux perf, how we use it at Netflix, the various gotchas we have encountered, and a summary of advanced features."
Linux Performance Analysis: New Tools and Old SecretsBrendan Gregg
Talk for USENIX/LISA2014 by Brendan Gregg, Netflix. At Netflix performance is crucial, and we use many high to low level tools to analyze our stack in different ways. In this talk, I will introduce new system observability tools we are using at Netflix, which I've ported from my DTraceToolkit, and are intended for our Linux 3.2 cloud instances. These show that Linux can do more than you may think, by using creative hacks and workarounds with existing kernel features (ftrace, perf_events). While these are solving issues on current versions of Linux, I'll also briefly summarize the future in this space: eBPF, ktap, SystemTap, sysdig, etc.
Talk for SCaLE13x. Video: https://www.youtube.com/watch?v=_Ik8oiQvWgo . Profiling can show what your Linux kernel and appliacations are doing in detail, across all software stack layers. This talk shows how we are using Linux perf_events (aka "perf") and flame graphs at Netflix to understand CPU usage in detail, to optimize our cloud usage, solve performance issues, and identify regressions. This will be more than just an intro: profiling difficult targets, including Java and Node.js, will be covered, which includes ways to resolve JITed symbols and broken stacks. Included are the easy examples, the hard, and the cutting edge.
USENIX ATC 2017: Visualizing Performance with Flame GraphsBrendan Gregg
Talk by Brendan Gregg for USENIX ATC 2017.
"Flame graphs are a simple stack trace visualization that helps answer an everyday problem: how is software consuming resources, especially CPUs, and how did this change since the last software version? Flame graphs have been adopted by many languages, products, and companies, including Netflix, and have become a standard tool for performance analysis. They were published in "The Flame Graph" article in the June 2016 issue of Communications of the ACM, by their creator, Brendan Gregg.
This talk describes the background for this work, and the challenges encountered when profiling stack traces and resolving symbols for different languages, including for just-in-time compiler runtimes. Instructions will be included generating mixed-mode flame graphs on Linux, and examples from our use at Netflix with Java. Advanced flame graph types will be described, including differential, off-CPU, chain graphs, memory, and TCP events. Finally, future work and unsolved problems in this area will be discussed."
Talk by Brendan Gregg for USENIX LISA 2019: Linux Systems Performance. Abstract: "
Systems performance is an effective discipline for performance analysis and tuning, and can help you find performance wins for your applications and the kernel. However, most of us are not performance or kernel engineers, and have limited time to study this topic. This talk summarizes the topic for everyone, touring six important areas of Linux systems performance: observability tools, methodologies, benchmarking, profiling, tracing, and tuning. Included are recipes for Linux performance analysis and tuning (using vmstat, mpstat, iostat, etc), overviews of complex areas including profiling (perf_events) and tracing (Ftrace, bcc/BPF, and bpftrace/BPF), and much advice about what is and isn't important to learn. This talk is aimed at everyone: developers, operations, sysadmins, etc, and in any environment running Linux, bare metal or the cloud."
Kernel Recipes 2017: Using Linux perf at NetflixBrendan Gregg
Talk for Kernel Recipes 2017 by Brendan Gregg. "Linux perf is a crucial performance analysis tool at Netflix, and is used by a self-service GUI for generating CPU flame graphs and other reports. This sounds like an easy task, however, getting perf to work properly in VM guests running Java, Node.js, containers, and other software, has been at times a challenge. This talk summarizes Linux perf, how we use it at Netflix, the various gotchas we have encountered, and a summary of advanced features."
Linux Performance Analysis: New Tools and Old SecretsBrendan Gregg
Talk for USENIX/LISA2014 by Brendan Gregg, Netflix. At Netflix performance is crucial, and we use many high to low level tools to analyze our stack in different ways. In this talk, I will introduce new system observability tools we are using at Netflix, which I've ported from my DTraceToolkit, and are intended for our Linux 3.2 cloud instances. These show that Linux can do more than you may think, by using creative hacks and workarounds with existing kernel features (ftrace, perf_events). While these are solving issues on current versions of Linux, I'll also briefly summarize the future in this space: eBPF, ktap, SystemTap, sysdig, etc.
Video: https://www.facebook.com/atscaleevents/videos/1693888610884236/ . Talk by Brendan Gregg from Facebook's Performance @Scale: "Linux performance analysis has been the domain of ancient tools and metrics, but that's now changing in the Linux 4.x series. A new tracer is available in the mainline kernel, built from dynamic tracing (kprobes, uprobes) and enhanced BPF (Berkeley Packet Filter), aka, eBPF. It allows us to measure latency distributions for file system I/O and run queue latency, print details of storage device I/O and TCP retransmits, investigate blocked stack traces and memory leaks, and a whole lot more. These lead to performance wins large and small, especially when instrumenting areas that previously had zero visibility. This talk will summarize this new technology and some long-standing issues that it can solve, and how we intend to use it at Netflix."
Velocity 2017 Performance analysis superpowers with Linux eBPFBrendan Gregg
Talk by for Velocity 2017 by Brendan Gregg: Performance analysis superpowers with Linux eBPF.
"Advanced performance observability and debugging have arrived built into the Linux 4.x series, thanks to enhancements to Berkeley Packet Filter (BPF, or eBPF) and the repurposing of its sandboxed virtual machine to provide programmatic capabilities to system tracing. Netflix has been investigating its use for new observability tools, monitoring, security uses, and more. This talk will investigate this new technology, which sooner or later will be available to everyone who uses Linux. The talk will dive deep on these new tracing, observability, and debugging capabilities. Whether you’re doing analysis over an ssh session, or via a monitoring GUI, BPF can be used to provide an efficient, custom, and deep level of detail into system and application performance.
This talk will also demonstrate the new open source tools that have been developed, which make use of kernel- and user-level dynamic tracing (kprobes and uprobes), and kernel- and user-level static tracing (tracepoints). These tools provide new insights for file system and storage performance, CPU scheduler performance, TCP performance, and a whole lot more. This is a major turning point for Linux systems engineering, as custom advanced performance instrumentation can be used safely in production environments, powering a new generation of tools and visualizations."
Video: https://www.youtube.com/watch?v=JRFNIKUROPE . Talk for linux.conf.au 2017 (LCA2017) by Brendan Gregg, about Linux enhanced BPF (eBPF). Abstract:
A world of new capabilities is emerging for the Linux 4.x series, thanks to enhancements that have been included in Linux for to Berkeley Packet Filter (BPF): an in-kernel virtual machine that can execute user space-defined programs. It is finding uses for security auditing and enforcement, enhancing networking (including eXpress Data Path), and performance observability and troubleshooting. Many new open source tools that have been written in the past 12 months for performance analysis that use BPF. Tracing superpowers have finally arrived for Linux!
For its use with tracing, BPF provides the programmable capabilities to the existing tracing frameworks: kprobes, uprobes, and tracepoints. In particular, BPF allows timestamps to be recorded and compared from custom events, allowing latency to be studied in many new places: kernel and application internals. It also allows data to be efficiently summarized in-kernel, including as histograms. This has allowed dozens of new observability tools to be developed so far, including measuring latency distributions for file system I/O and run queue latency, printing details of storage device I/O and TCP retransmits, investigating blocked stack traces and memory leaks, and a whole lot more.
This talk will summarize BPF capabilities and use cases so far, and then focus on its use to enhance Linux tracing, especially with the open source bcc collection. bcc includes BPF versions of old classics, and many new tools, including execsnoop, opensnoop, funcccount, ext4slower, and more (many of which I developed). Perhaps you'd like to develop new tools, or use the existing tools to find performance wins large and small, especially when instrumenting areas that previously had zero visibility. I'll also summarize how we intend to use these new capabilities to enhance systems analysis at Netflix.
Note: When you view the the slide deck via web browser, the screenshots may be blurred. You can download and view them offline (Screenshots are clear).
Video: https://www.youtube.com/watch?v=FJW8nGV4jxY and https://www.youtube.com/watch?v=zrr2nUln9Kk . Tutorial slides for O'Reilly Velocity SC 2015, by Brendan Gregg.
There are many performance tools nowadays for Linux, but how do they all fit together, and when do we use them? This tutorial explains methodologies for using these tools, and provides a tour of four tool types: observability, benchmarking, tuning, and static tuning. Many tools will be discussed, including top, iostat, tcpdump, sar, perf_events, ftrace, SystemTap, sysdig, and others, as well observability frameworks in the Linux kernel: PMCs, tracepoints, kprobes, and uprobes.
This tutorial is updated and extended on an earlier talk that summarizes the Linux performance tool landscape. The value of this tutorial is not just learning that these tools exist and what they do, but hearing when and how they are used by a performance engineer to solve real world problems — important context that is typically not included in the standard documentation.
Talk for PerconaLive 2016 by Brendan Gregg. Video: https://www.youtube.com/watch?v=CbmEDXq7es0 . "Systems performance provides a different perspective for analysis and tuning, and can help you find performance wins for your databases, applications, and the kernel. However, most of us are not performance or kernel engineers, and have limited time to study this topic. This talk summarizes six important areas of Linux systems performance in 50 minutes: observability tools, methodologies, benchmarking, profiling, tracing, and tuning. Included are recipes for Linux performance analysis and tuning (using vmstat, mpstat, iostat, etc), overviews of complex areas including profiling (perf_events), static tracing (tracepoints), and dynamic tracing (kprobes, uprobes), and much advice about what is and isn't important to learn. This talk is aimed at everyone: DBAs, developers, operations, etc, and in any environment running Linux, bare-metal or the cloud."
Linux 4.x Tracing Tools: Using BPF SuperpowersBrendan Gregg
Talk for USENIX LISA 2016 by Brendan Gregg.
"Linux 4.x Tracing Tools: Using BPF Superpowers
The Linux 4.x series heralds a new era of Linux performance analysis, with the long-awaited integration of a programmable tracer: Enhanced BPF (eBPF). Formally the Berkeley Packet Filter, BPF has been enhanced in Linux to provide system tracing capabilities, and integrates with dynamic tracing (kprobes and uprobes) and static tracing (tracepoints and USDT). This has allowed dozens of new observability tools to be developed so far: for example, measuring latency distributions for file system I/O and run queue latency, printing details of storage device I/O and TCP retransmits, investigating blocked stack traces and memory leaks, and a whole lot more. These lead to performance wins large and small, especially when instrumenting areas that previously had zero visibility. Tracing superpowers have finally arrived.
In this talk I'll show you how to use BPF in the Linux 4.x series, and I'll summarize the different tools and front ends available, with a focus on iovisor bcc. bcc is an open source project to provide a Python front end for BPF, and comes with dozens of new observability tools (many of which I developed). These tools include new BPF versions of old classics, and many new tools, including: execsnoop, opensnoop, funccount, trace, biosnoop, bitesize, ext4slower, ext4dist, tcpconnect, tcpretrans, runqlat, offcputime, offwaketime, and many more. I'll also summarize use cases and some long-standing issues that can now be solved, and how we are using these capabilities at Netflix."
USENIX LISA2021 talk by Brendan Gregg (https://www.youtube.com/watch?v=_5Z2AU7QTH4). This talk is a deep dive that describes how BPF (eBPF) works internally on Linux, and dissects some modern performance observability tools. Details covered include the kernel BPF implementation: the verifier, JIT compilation, and the BPF execution environment; the BPF instruction set; different event sources; and how BPF is used by user space, using bpftrace programs as an example. This includes showing how bpftrace is compiled to LLVM IR and then BPF bytecode, and how per-event data and aggregated map data are fetched from the kernel.
Video: http://joyent.com/blog/linux-performance-analysis-and-tools-brendan-gregg-s-talk-at-scale-11x ; This talk for SCaLE11x covers system performance analysis methodologies and the Linux tools to support them, so that you can get the most out of your systems and solve performance issues quickly. This includes a wide variety of tools, including basics like top(1), advanced tools like perf, and new tools like the DTrace for Linux prototypes.
Performance Wins with BPF: Getting StartedBrendan Gregg
Keynote by Brendan Gregg for the eBPF summit, 2020. How to get started finding performance wins using the BPF (eBPF) technology. This short talk covers the quickest and easiest way to find performance wins using BPF observability tools on Linux.
LinuxCon 2015 Linux Kernel Networking WalkthroughThomas Graf
This presentation features a walk through the Linux kernel networking stack for users and developers. It will cover insights into both, existing essential networking features and recent developments and will show how to use them properly. Our starting point is the network card driver as it feeds a packet into the stack. We will follow the packet as it traverses through various subsystems such as packet filtering, routing, protocol stacks, and the socket layer. We will pause here and there to look into concepts such as networking namespaces, segmentation offloading, TCP small queues, and low latency polling and will discuss how to configure them.
BPF of Berkeley Packet Filter mechanism was first introduced in linux in 1997 in version 2.1.75. It has seen a number of extensions of the years. Recently in versions 3.15 - 3.19 it received a major overhaul which drastically expanded it's applicability. This talk will cover how the instruction set looks today and why. It's architecture, capabilities, interface, just-in-time compilers. We will also talk about how it's being used in different areas of the kernel like tracing and networking and future plans.
Talk for AWS re:Invent 2014. Video: https://www.youtube.com/watch?v=7Cyd22kOqWc . Netflix tunes Amazon EC2 instances for maximum performance. In this session, you learn how Netflix configures the fastest possible EC2 instances, while reducing latency outliers. This session explores the various Xen modes (e.g., HVM, PV, etc.) and how they are optimized for different workloads. Hear how Netflix chooses Linux kernel versions based on desired performance characteristics and receive a firsthand look at how they set kernel tunables, including hugepages. You also hear about Netflix’s use of SR-IOV to enable enhanced networking and their approach to observability, which can exonerate EC2 issues and direct attention back to application performance.
SREcon 2016 Performance Checklists for SREsBrendan Gregg
Talk from SREcon2016 by Brendan Gregg. Video: https://www.usenix.org/conference/srecon16/program/presentation/gregg . "There's limited time for performance analysis in the emergency room. When there is a performance-related site outage, the SRE team must analyze and solve complex performance issues as quickly as possible, and under pressure. Many performance tools and techniques are designed for a different environment: an engineer analyzing their system over the course of hours or days, and given time to try dozens of tools: profilers, tracers, monitoring tools, benchmarks, as well as different tunings and configurations. But when Netflix is down, minutes matter, and there's little time for such traditional systems analysis. As with aviation emergencies, short checklists and quick procedures can be applied by the on-call SRE staff to help solve performance issues as quickly as possible.
In this talk, I'll cover a checklist for Linux performance analysis in 60 seconds, as well as other methodology-derived checklists and procedures for cloud computing, with examples of performance issues for context. Whether you are solving crises in the SRE war room, or just have limited time for performance engineering, these checklists and approaches should help you find some quick performance wins. Safe flying."
Video: https://www.facebook.com/atscaleevents/videos/1693888610884236/ . Talk by Brendan Gregg from Facebook's Performance @Scale: "Linux performance analysis has been the domain of ancient tools and metrics, but that's now changing in the Linux 4.x series. A new tracer is available in the mainline kernel, built from dynamic tracing (kprobes, uprobes) and enhanced BPF (Berkeley Packet Filter), aka, eBPF. It allows us to measure latency distributions for file system I/O and run queue latency, print details of storage device I/O and TCP retransmits, investigate blocked stack traces and memory leaks, and a whole lot more. These lead to performance wins large and small, especially when instrumenting areas that previously had zero visibility. This talk will summarize this new technology and some long-standing issues that it can solve, and how we intend to use it at Netflix."
Velocity 2017 Performance analysis superpowers with Linux eBPFBrendan Gregg
Talk by for Velocity 2017 by Brendan Gregg: Performance analysis superpowers with Linux eBPF.
"Advanced performance observability and debugging have arrived built into the Linux 4.x series, thanks to enhancements to Berkeley Packet Filter (BPF, or eBPF) and the repurposing of its sandboxed virtual machine to provide programmatic capabilities to system tracing. Netflix has been investigating its use for new observability tools, monitoring, security uses, and more. This talk will investigate this new technology, which sooner or later will be available to everyone who uses Linux. The talk will dive deep on these new tracing, observability, and debugging capabilities. Whether you’re doing analysis over an ssh session, or via a monitoring GUI, BPF can be used to provide an efficient, custom, and deep level of detail into system and application performance.
This talk will also demonstrate the new open source tools that have been developed, which make use of kernel- and user-level dynamic tracing (kprobes and uprobes), and kernel- and user-level static tracing (tracepoints). These tools provide new insights for file system and storage performance, CPU scheduler performance, TCP performance, and a whole lot more. This is a major turning point for Linux systems engineering, as custom advanced performance instrumentation can be used safely in production environments, powering a new generation of tools and visualizations."
Video: https://www.youtube.com/watch?v=JRFNIKUROPE . Talk for linux.conf.au 2017 (LCA2017) by Brendan Gregg, about Linux enhanced BPF (eBPF). Abstract:
A world of new capabilities is emerging for the Linux 4.x series, thanks to enhancements that have been included in Linux for to Berkeley Packet Filter (BPF): an in-kernel virtual machine that can execute user space-defined programs. It is finding uses for security auditing and enforcement, enhancing networking (including eXpress Data Path), and performance observability and troubleshooting. Many new open source tools that have been written in the past 12 months for performance analysis that use BPF. Tracing superpowers have finally arrived for Linux!
For its use with tracing, BPF provides the programmable capabilities to the existing tracing frameworks: kprobes, uprobes, and tracepoints. In particular, BPF allows timestamps to be recorded and compared from custom events, allowing latency to be studied in many new places: kernel and application internals. It also allows data to be efficiently summarized in-kernel, including as histograms. This has allowed dozens of new observability tools to be developed so far, including measuring latency distributions for file system I/O and run queue latency, printing details of storage device I/O and TCP retransmits, investigating blocked stack traces and memory leaks, and a whole lot more.
This talk will summarize BPF capabilities and use cases so far, and then focus on its use to enhance Linux tracing, especially with the open source bcc collection. bcc includes BPF versions of old classics, and many new tools, including execsnoop, opensnoop, funcccount, ext4slower, and more (many of which I developed). Perhaps you'd like to develop new tools, or use the existing tools to find performance wins large and small, especially when instrumenting areas that previously had zero visibility. I'll also summarize how we intend to use these new capabilities to enhance systems analysis at Netflix.
Note: When you view the the slide deck via web browser, the screenshots may be blurred. You can download and view them offline (Screenshots are clear).
Video: https://www.youtube.com/watch?v=FJW8nGV4jxY and https://www.youtube.com/watch?v=zrr2nUln9Kk . Tutorial slides for O'Reilly Velocity SC 2015, by Brendan Gregg.
There are many performance tools nowadays for Linux, but how do they all fit together, and when do we use them? This tutorial explains methodologies for using these tools, and provides a tour of four tool types: observability, benchmarking, tuning, and static tuning. Many tools will be discussed, including top, iostat, tcpdump, sar, perf_events, ftrace, SystemTap, sysdig, and others, as well observability frameworks in the Linux kernel: PMCs, tracepoints, kprobes, and uprobes.
This tutorial is updated and extended on an earlier talk that summarizes the Linux performance tool landscape. The value of this tutorial is not just learning that these tools exist and what they do, but hearing when and how they are used by a performance engineer to solve real world problems — important context that is typically not included in the standard documentation.
Talk for PerconaLive 2016 by Brendan Gregg. Video: https://www.youtube.com/watch?v=CbmEDXq7es0 . "Systems performance provides a different perspective for analysis and tuning, and can help you find performance wins for your databases, applications, and the kernel. However, most of us are not performance or kernel engineers, and have limited time to study this topic. This talk summarizes six important areas of Linux systems performance in 50 minutes: observability tools, methodologies, benchmarking, profiling, tracing, and tuning. Included are recipes for Linux performance analysis and tuning (using vmstat, mpstat, iostat, etc), overviews of complex areas including profiling (perf_events), static tracing (tracepoints), and dynamic tracing (kprobes, uprobes), and much advice about what is and isn't important to learn. This talk is aimed at everyone: DBAs, developers, operations, etc, and in any environment running Linux, bare-metal or the cloud."
Linux 4.x Tracing Tools: Using BPF SuperpowersBrendan Gregg
Talk for USENIX LISA 2016 by Brendan Gregg.
"Linux 4.x Tracing Tools: Using BPF Superpowers
The Linux 4.x series heralds a new era of Linux performance analysis, with the long-awaited integration of a programmable tracer: Enhanced BPF (eBPF). Formally the Berkeley Packet Filter, BPF has been enhanced in Linux to provide system tracing capabilities, and integrates with dynamic tracing (kprobes and uprobes) and static tracing (tracepoints and USDT). This has allowed dozens of new observability tools to be developed so far: for example, measuring latency distributions for file system I/O and run queue latency, printing details of storage device I/O and TCP retransmits, investigating blocked stack traces and memory leaks, and a whole lot more. These lead to performance wins large and small, especially when instrumenting areas that previously had zero visibility. Tracing superpowers have finally arrived.
In this talk I'll show you how to use BPF in the Linux 4.x series, and I'll summarize the different tools and front ends available, with a focus on iovisor bcc. bcc is an open source project to provide a Python front end for BPF, and comes with dozens of new observability tools (many of which I developed). These tools include new BPF versions of old classics, and many new tools, including: execsnoop, opensnoop, funccount, trace, biosnoop, bitesize, ext4slower, ext4dist, tcpconnect, tcpretrans, runqlat, offcputime, offwaketime, and many more. I'll also summarize use cases and some long-standing issues that can now be solved, and how we are using these capabilities at Netflix."
USENIX LISA2021 talk by Brendan Gregg (https://www.youtube.com/watch?v=_5Z2AU7QTH4). This talk is a deep dive that describes how BPF (eBPF) works internally on Linux, and dissects some modern performance observability tools. Details covered include the kernel BPF implementation: the verifier, JIT compilation, and the BPF execution environment; the BPF instruction set; different event sources; and how BPF is used by user space, using bpftrace programs as an example. This includes showing how bpftrace is compiled to LLVM IR and then BPF bytecode, and how per-event data and aggregated map data are fetched from the kernel.
Video: http://joyent.com/blog/linux-performance-analysis-and-tools-brendan-gregg-s-talk-at-scale-11x ; This talk for SCaLE11x covers system performance analysis methodologies and the Linux tools to support them, so that you can get the most out of your systems and solve performance issues quickly. This includes a wide variety of tools, including basics like top(1), advanced tools like perf, and new tools like the DTrace for Linux prototypes.
Performance Wins with BPF: Getting StartedBrendan Gregg
Keynote by Brendan Gregg for the eBPF summit, 2020. How to get started finding performance wins using the BPF (eBPF) technology. This short talk covers the quickest and easiest way to find performance wins using BPF observability tools on Linux.
LinuxCon 2015 Linux Kernel Networking WalkthroughThomas Graf
This presentation features a walk through the Linux kernel networking stack for users and developers. It will cover insights into both, existing essential networking features and recent developments and will show how to use them properly. Our starting point is the network card driver as it feeds a packet into the stack. We will follow the packet as it traverses through various subsystems such as packet filtering, routing, protocol stacks, and the socket layer. We will pause here and there to look into concepts such as networking namespaces, segmentation offloading, TCP small queues, and low latency polling and will discuss how to configure them.
BPF of Berkeley Packet Filter mechanism was first introduced in linux in 1997 in version 2.1.75. It has seen a number of extensions of the years. Recently in versions 3.15 - 3.19 it received a major overhaul which drastically expanded it's applicability. This talk will cover how the instruction set looks today and why. It's architecture, capabilities, interface, just-in-time compilers. We will also talk about how it's being used in different areas of the kernel like tracing and networking and future plans.
Talk for AWS re:Invent 2014. Video: https://www.youtube.com/watch?v=7Cyd22kOqWc . Netflix tunes Amazon EC2 instances for maximum performance. In this session, you learn how Netflix configures the fastest possible EC2 instances, while reducing latency outliers. This session explores the various Xen modes (e.g., HVM, PV, etc.) and how they are optimized for different workloads. Hear how Netflix chooses Linux kernel versions based on desired performance characteristics and receive a firsthand look at how they set kernel tunables, including hugepages. You also hear about Netflix’s use of SR-IOV to enable enhanced networking and their approach to observability, which can exonerate EC2 issues and direct attention back to application performance.
SREcon 2016 Performance Checklists for SREsBrendan Gregg
Talk from SREcon2016 by Brendan Gregg. Video: https://www.usenix.org/conference/srecon16/program/presentation/gregg . "There's limited time for performance analysis in the emergency room. When there is a performance-related site outage, the SRE team must analyze and solve complex performance issues as quickly as possible, and under pressure. Many performance tools and techniques are designed for a different environment: an engineer analyzing their system over the course of hours or days, and given time to try dozens of tools: profilers, tracers, monitoring tools, benchmarks, as well as different tunings and configurations. But when Netflix is down, minutes matter, and there's little time for such traditional systems analysis. As with aviation emergencies, short checklists and quick procedures can be applied by the on-call SRE staff to help solve performance issues as quickly as possible.
In this talk, I'll cover a checklist for Linux performance analysis in 60 seconds, as well as other methodology-derived checklists and procedures for cloud computing, with examples of performance issues for context. Whether you are solving crises in the SRE war room, or just have limited time for performance engineering, these checklists and approaches should help you find some quick performance wins. Safe flying."
ACM Applicative System Methodology 2016Brendan Gregg
Video: https://youtu.be/eO94l0aGLCA?t=3m37s . Talk by Brendan Gregg for ACM Applicative 2016
"System Methodology - Holistic Performance Analysis on Modern Systems
Traditional systems performance engineering makes do with vendor-supplied metrics, often involving interpretation and inference, and with numerous blind spots. Much in the field of systems performance is still living in the past: documentation, procedures, and analysis GUIs built upon the same old metrics. For modern systems, we can choose the metrics, and can choose ones we need to support new holistic performance analysis methodologies. These methodologies provide faster, more accurate, and more complete analysis, and can provide a starting point for unfamiliar systems.
Methodologies are especially helpful for modern applications and their workloads, which can pose extremely complex problems with no obvious starting point. There are also continuous deployment environments such as the Netflix cloud, where these problems must be solved in shorter time frames. Fortunately, with advances in system observability and tracers, we have virtually endless custom metrics to aid performance analysis. The problem becomes which metrics to use, and how to navigate them quickly to locate the root cause of problems.
System methodologies provide a starting point for analysis, as well as guidance for quickly moving through the metrics to root cause. They also pose questions that the existing metrics may not yet answer, which may be critical in solving the toughest problems. System methodologies include the USE method, workload characterization, drill-down analysis, off-CPU analysis, and more.
This talk will discuss various system performance issues, and the methodologies, tools, and processes used to solve them. The focus is on single systems (any operating system), including single cloud instances, and quickly locating performance issues or exonerating the system. Many methodologies will be discussed, along with recommendations for their implementation, which may be as documented checklists of tools, or custom dashboards of supporting metrics. In general, you will learn to think differently about your systems, and how to ask better questions."
Stop the Guessing: Performance Methodologies for Production SystemsBrendan Gregg
Talk presented at Velocity 2013. Description: When faced with performance issues on complex production systems and distributed cloud environments, it can be difficult to know where to begin your analysis, or to spend much time on it when it isn’t your day job. This talk covers various methodologies, and anti-methodologies, for systems analysis, which serve as guidance for finding fruitful metrics from your current performance monitoring products. Such methodologies can help check all areas in an efficient manner, and find issues that can be easily overlooked, especially for virtualized environments which impose resource controls. Some of the tools and methodologies covered, including the USE Method, were developed by the speaker and have been used successfully in enterprise and cloud environments.
Surge 2014: From Clouds to Roots: root cause performance analysis at Netflix. Brendan Gregg.
At Netflix, high scale and fast deployment rule. The possibilities for failure are endless, and the environment excels at handling this, regularly tested and exercised by the simian army. But, when this environment automatically works around systemic issues that aren’t root-caused, they can grow over time. This talk describes the challenge of not just handling failures of scale on the Netflix cloud, but also new approaches and tools for quickly diagnosing their root cause in an ever changing environment.
Broken benchmarks, misleading metrics, and terrible tools. This talk will help you navigate the treacherous waters of Linux performance tools, touring common problems with system tools, metrics, statistics, visualizations, measurement overhead, and benchmarks. You might discover that tools you have been using for years, are in fact, misleading, dangerous, or broken.
The speaker, Brendan Gregg, has given many talks on tools that work, including giving the Linux PerformanceTools talk originally at SCALE. This is an anti-version of that talk, to focus on broken tools and metrics instead of the working ones. Metrics can be misleading, and counters can be counter-intuitive! This talk will include advice for verifying new performance tools, understanding how they work, and using them successfully.
What is geo-fencing...it can be a predefined set of boundaries. When the location-aware device of a location-based service user enters or exits a geo-fence, the device receives a generated notification.
In the slides you will find more information about geo-location, key to geo-location and how businesses are getting involved.
The templates shown in this presentation are all in the Squeezemobillionaire platform. The mobile app and site builder in this platform has the feature to add news feeds to the sites/apps. Get to know more about what is included in the platform here: https://www.squeezemobillionaire.com/sign-up/pricing/
Linux 4.x Tracing: Performance Analysis with bcc/BPFBrendan Gregg
Talk about bcc/eBPF for SCALE15x (2017) by Brendan Gregg. "BPF (Berkeley Packet Filter) has been enhanced in the Linux 4.x series and now powers a large collection of performance analysis and observability tools ready for you to use, included in the bcc (BPF Complier Collection) open source project. BPF nowadays can do system tracing, software defined networks, and kernel fast path: much more than just filtering packets! This talk will focus on the bcc/BPF tools for performance analysis, which make use of other built in Linux capabilities: dynamic tracing (kprobes and uprobes) and static tracing (tracepoints and USDT). There are now bcc tools for measuring latency distributions for file system I/O and run queue latency, printing details of storage device I/O and TCP retransmits, investigating blocked stack traces and memory leaks, and a whole lot more. These lead to performance wins large and small, especially when instrumenting areas that previously had zero visibility. Tracing superpowers have finally arrived, built in to Linux."
G1 Garbage Collector: Details and TuningSimone Bordet
The G1 Garbage Collector is the low-pause replacement for CMS, available since JDK 7u4. This session will explore in details how the G1 garbage collector works (from region layout, to remembered sets, to refinement threads, etc.), what are its current weaknesses, what command line options you should enable when you use it, and advanced tuning examples extracted from real world applications.
A 2015 performance study by Brendan Gregg, Nitesh Kant, and Ben Christensen. Original is in https://github.com/Netflix-Skunkworks/WSPerfLab/tree/master/test-results
ISO SQL:2016 introduced Row Pattern Matching: a feature to apply (limited) regular expressions on table rows and perform analysis on each match. As of writing, this feature is only supported by the Oracle Database 12c.
Video: https://www.youtube.com/watch?v=uibLwoVKjec . Talk by Brendan Gregg for Sysdig CCWFS 2016. Abstract:
"You have a system with an advanced programmatic tracer: do you know what to do with it? Brendan has used numerous tracers in production environments, and has published hundreds of tracing-based tools. In this talk he will share tips and know-how for creating CLI tracing tools and GUI visualizations, to solve real problems effectively. Programmatic tracing is an amazing superpower, and this talk will show you how to wield it!"
CONFidence 2015: DTrace + OSX = Fun - Andrzej Dyjak PROIDEA
Speaker: Andrzej Dyjak
Language: English
In recent years security industry started to grow fond of Apple’s iOS and OS X platforms. This talk will cover one of XNU's flagship debugging utilities: DTrace, a dynamic tracing framework for troubleshooting kernel and application problems on production systems in real time. It will be shown how it can be used in order to ease various tasks within the realm of dynamic binary analysis and beyond.
CONFidence: http://confidence.org.pl/
Simple, fast, and scalable torch7 tutorialJin-Hwa Kim
A tutorial based on basic information of Torch7. It covers installation, simple runable codes, tensor manipulations, sweep out key-packages and post-hoc audience q&a.
TensorFrames: Google Tensorflow on Apache SparkDatabricks
Presentation at Bay Area Spark Meetup by Databricks Software Engineer and Spark committer Tim Hunter.
This presentation covers how you can use TensorFrames with Tensorflow to distributed computing on GPU.
Performance Analysis: new tools and concepts from the cloudBrendan Gregg
Talk delivered at SCaLE10x, Los Angeles 2012.
Cloud Computing introduces new challenges for performance
analysis, for both customers and operators of the cloud. Apart from
monitoring a scaling environment, issues within a system can be
complicated when tenants are competing for the same resources, and are
invisible to each other. Other factors include rapidly changing
production code and wildly unpredictable traffic surges. For
performance analysis in the Joyent public cloud, we use a variety of
tools including Dynamic Tracing, which allows us to create custom
tools and metrics and to explore new concepts. In this presentation
I'll discuss a collection of these tools and the metrics that they
measure. While these are DTrace-based, the focus of the talk is on
which metrics are proving useful for analyzing real cloud issues.
Introduction to DTrace (Dynamic Tracing), written by Brendan Gregg and delivered in 2007. While aimed at a Solaris-based audience, this introduction is still largely relevant today (2012). Since then, DTrace has appeared in other operating systems (Mac OS X, FreeBSD, and is being ported to Linux), and, many user-level providers have been developed to aid tracing of other languages.
This is the speech Shen Li gave at GopherChina 2017.
TiDB is an open source distributed database. Inspired by the design of Google F1/Spanner, TiDB features in infinite horizontal scalability, strong consistency, and high availability. The goal of TiDB is to serve as a one-stop solution for data storage and analysis.
In this talk, we will mainly cover the following topics:
- What is TiDB
- TiDB Architecture
- SQL Layer Internal
- Golang in TiDB
- Next Step of TiDB
Java/Scala Lab: Анатолий Кметюк - Scala SubScript: Алгебра для реактивного пр...GeeksLab Odessa
SubScript - это расширение языка Scala, добавляющее поддержку конструкций и синтаксиса аглебры общающихся процессов (Algebra of Communicating Processes, ACP). SubScript является перспективным расширением, применимым как для разработки высоконагруженных параллельных систем, так и для простых персональных приложений.
Similar to Blazing Performance with Flame Graphs (20)
Talk by Brendan Gregg for YOW! 2021. "The pursuit of faster performance in computing is the driving reason for many new technologies and updates. This talk discusses performance improvements now underway that you will likely be adopting soon, for processors (including 3D stacking and cloud vendor CPUs), memory (including DDR5 and high-bandwidth memory [HBM]), disks (including 3D Xpoint as a 3D NAND accelerator), networking (including QUIC and eXpress Data Path [XDP]), runtimes, hypervisors, and more. The future of performance is increasingly cloud-based, with hardware hypervisors and custom processors, meaningful observability of everything down to cycle stalls (even as cloud guests), and high-speed syscall-avoiding applications that use eBPF, FPGAs, and io_uring. The talk also discusses where future performance improvements might be expected, with predictions for new technologies."
Talk for Facebook Systems@Scale 2021 by Brendan Gregg: "BPF (eBPF) tracing is the superpower that can analyze everything, helping you find performance wins, troubleshoot software, and more. But with many different front-ends and languages, and years of evolution, finding the right starting point can be hard. This talk will make it easy, showing how to install and run selected BPF tools in the bcc and bpftrace open source projects for some quick wins. Think like a sysadmin, not like a programmer."
Computing Performance: On the Horizon (2021)Brendan Gregg
Talk by Brendan Gregg for USENIX LISA 2021. https://www.youtube.com/watch?v=5nN1wjA_S30 . "The future of computer performance involves clouds with hardware hypervisors and custom processors, servers running a new type of BPF software to allow high-speed applications and kernel customizations, observability of everything in production, new Linux kernel technologies, and more. This talk covers interesting developments in systems and computing performance, their challenges, and where things are headed."
Talk for YOW! by Brendan Gregg. "Systems performance studies the performance of computing systems, including all physical components and the full software stack to help you find performance wins for your application and kernel. However, most of us are not performance or kernel engineers, and have limited time to study this topic. This talk summarizes the topic for everyone, touring six important areas: observability tools, methodologies, benchmarking, profiling, tracing, and tuning. Included are recipes for Linux performance analysis and tuning (using vmstat, mpstat, iostat, etc), overviews of complex areas including profiling (perf_events) and tracing (ftrace, bcc/BPF, and bpftrace/BPF), advice about what is and isn't important to learn, and case studies to see how it is applied. This talk is aimed at everyone: developers, operations, sysadmins, etc, and in any environment running Linux, bare metal or the cloud.
"
re:Invent 2019 BPF Performance Analysis at NetflixBrendan Gregg
Talk by Brendan Gregg at AWS re:Invent 2019. Abstract: "Extended BPF (eBPF) is an open source Linux technology that powers a whole new class of software: mini programs that run on events. Among its many uses, BPF can be used to create powerful performance analysis tools capable of analyzing everything: CPUs, memory, disks, file systems, networking, languages, applications, and more. In this session, Netflix's Brendan Gregg tours BPF tracing capabilities, including many new open source performance analysis tools he developed for his new book "BPF Performance Tools: Linux System and Application Observability." The talk includes examples of using these tools in the Amazon EC2 cloud."
UM2019 Extended BPF: A New Type of SoftwareBrendan Gregg
Keynote for Ubuntu Masters 2019 by Brendan Gregg, Netflix. Video https://www.youtube.com/watch?v=7pmXdG8-7WU&feature=youtu.be . "Extended BPF is a new type of software, and the first fundamental change to how kernels are used in 50 years. This new type of software is already in use by major companies: Netflix has 14 BPF programs running by default on all of its cloud servers, which run Ubuntu Linux. Facebook has 40 BPF programs running by default. Extended BPF is composed of an in-kernel runtime for executing a virtual BPF instruction set through a safety verifier and with JIT compilation. So far it has been used for software defined networking, performance tools, security policies, and device drivers, with more uses planned and more we have yet to think of. It is changing how we use and think about systems. This talk explores the past, present, and future of BPF, with BPF performance tools as a use case."
YOW2018 Cloud Performance Root Cause Analysis at NetflixBrendan Gregg
Keynote by Brendan Gregg for YOW! 2018. Video: https://www.youtube.com/watch?v=03EC8uA30Pw . Description: "At Netflix, improving the performance of our cloud means happier customers and lower costs, and involves root cause
analysis of applications, runtimes, operating systems, and hypervisors, in an environment of 150k cloud instances
that undergo numerous production changes each week. Apart from the developers who regularly optimize their own code
, we also have a dedicated performance team to help with any issue across the cloud, and to build tooling to aid in
this analysis. In this session we will summarize the Netflix environment, procedures, and tools we use and build t
o do root cause analysis on cloud performance issues. The analysis performed may be cloud-wide, using self-service
GUIs such as our open source Atlas tool, or focused on individual instances, and use our open source Vector tool, f
lame graphs, Java debuggers, and tooling that uses Linux perf, ftrace, and bcc/eBPF. You can use these open source
tools in the same way to find performance wins in your own environment."
Talk by Brendan Gregg and Martin Spier for the Linkedin Performance Engineering meetup on Nov 8, 2018. FlameScope is a visualization for performance profiles that helps you study periodic activity, variance, and perturbations, with a heat map for navigation and flame graphs for code analysis.
Talk by Brendan Gregg for All Things Open 2018. "At over one thousand code commits per week, it's hard to keep up with Linux developments. This keynote will summarize recent Linux performance features,
for a wide audience: the KPTI patches for Meltdown, eBPF for performance observability and the new open source tools that use it, Kyber for disk I/O sc
heduling, BBR for TCP congestion control, and more. This is about exposure: knowing what exists, so you can learn and use it later when needed. Get the
most out of your systems with the latest Linux kernels and exciting features."
Linux Performance 2018 (PerconaLive keynote)Brendan Gregg
Keynote for PerconaLive 2018 by Brendan Gregg. Video: https://youtu.be/sV3XfrfjrPo?t=30m51s . "At over one thousand code commits per week, it's hard to keep up with Linux developments. This keynote will summarize recent Linux performance features, for a wide audience: the KPTI patches for Meltdown, eBPF for performance observability, Kyber for disk I/O scheduling, BBR for TCP congestion control, and more. This is about exposure: knowing what exists, so you can learn and use it later when needed. Get the most out of your systems, whether they are databases or application servers, with the latest Linux kernels and exciting features."
How Netflix Tunes EC2 Instances for PerformanceBrendan Gregg
CMP325 talk for AWS re:Invent 2017, by Brendan Gregg. "
At Netflix we make the best use of AWS EC2 instance types and features to create a high performance cloud, achieving near bare metal speed for our workloads. This session will summarize the configuration, tuning, and activities for delivering the fastest possible EC2 instances, and will help other EC2 users improve performance, reduce latency outliers, and make better use of EC2 features. We'll show how we choose EC2 instance types, how we choose between EC2 Xen modes: HVM, PV, and PVHVM, and the importance of EC2 features such SR-IOV for bare-metal performance. SR-IOV is used by EC2 enhanced networking, and recently for the new i3 instance type for enhanced disk performance as well. We'll also cover kernel tuning and observability tools, from basic to advanced. Advanced performance analysis includes the use of Java and Node.js flame graphs, and the new EC2 Performance Monitoring Counter (PMC) feature released this year."
Talk for USENIX LISA17: "Containers pose interesting challenges for performance monitoring and analysis, requiring new analysis methodologies and tooling. Resource-oriented analysis, as is common with systems performance tools and GUIs, must now account for both hardware limits and soft limits, as implemented using cgroups. A reverse diagnosis methodology can be applied to identify whether a container is resource constrained, and by which hard or soft resource. The interaction between the host and containers can also be examined, and noisy neighbors identified or exonerated. Performance tooling can need special usage or workarounds to function properly from within a container or on the host, to deal with different privilege levels and name spaces. At Netflix, we're using containers for some microservices, and care very much about analyzing and tuning our containers to be as fast and efficient as possible. This talk will show you how to identify bottlenecks in the host or container configuration, in the applications by profiling in a container environment, and how to dig deeper into kernel and container internals."
Kernel Recipes 2017: Performance Analysis with BPFBrendan Gregg
Talk by Brendan Gregg at Kernel Recipes 2017 (Paris): "The in-kernel Berkeley Packet Filter (BPF) has been enhanced in recent kernels to do much more than just filtering packets. It can now run user-defined programs on events, such as on tracepoints, kprobes, uprobes, and perf_events, allowing advanced performance analysis tools to be created. These can be used in production as the BPF virtual machine is sandboxed and will reject unsafe code, and are already in use at Netflix.
Beginning with the bpf() syscall in 3.18, enhancements have been added in many kernel versions since, with major features for BPF analysis landing in Linux 4.1, 4.4, 4.7, and 4.9. Specific capabilities these provide include custom in-kernel summaries of metrics, custom latency measurements, and frequency counting kernel and user stack traces on events. One interesting case involves saving stack traces on wake up events, and associating them with the blocked stack trace: so that we can see the blocking stack trace and the waker together, merged in kernel by a BPF program (that particular example is in the kernel as samples/bpf/offwaketime).
This talk will discuss the new BPF capabilities for performance analysis and debugging, and demonstrate the new open source tools that have been developed to use it, many of which are in the Linux Foundation iovisor bcc (BPF Compiler Collection) project. These include tools to analyze the CPU scheduler, TCP performance, file system performance, block I/O, and more."
EuroBSDcon 2017 System Performance Analysis MethodologiesBrendan Gregg
keynote by Brendan Gregg. "Traditional performance monitoring makes do with vendor-supplied metrics, often involving interpretation and inference, and with numerous blind spots. Much in the field of systems performance is still living in the past: documentation, procedures, and analysis GUIs built upon the same old metrics. Modern BSD has advanced tracers and PMC tools, providing virtually endless metrics to aid performance analysis. It's time we really used them, but the problem becomes which metrics to use, and how to navigate them quickly to locate the root cause of problems.
There's a new way to approach performance analysis that can guide you through the metrics. Instead of starting with traditional metrics and figuring out their use, you start with the questions you want answered then look for metrics to answer them. Methodologies can provide these questions, as well as a starting point for analysis and guidance for locating the root cause. They also pose questions that the existing metrics may not yet answer, which may be critical in solving the toughest problems. System methodologies include the USE method, workload characterization, drill-down analysis, off-CPU analysis, chain graphs, and more.
This talk will discuss various system performance issues, and the methodologies, tools, and processes used to solve them. Many methodologies will be discussed, from the production proven to the cutting edge, along with recommendations for their implementation on BSD systems. In general, you will learn to think differently about analyzing your systems, and make better use of the modern tools that BSD provides."
Key Trends Shaping the Future of Infrastructure.pdfCheryl Hung
Keynote at DIGIT West Expo, Glasgow on 29 May 2024.
Cheryl Hung, ochery.com
Sr Director, Infrastructure Ecosystem, Arm.
The key trends across hardware, cloud and open-source; exploring how these areas are likely to mature and develop over the short and long-term, and then considering how organisations can position themselves to adapt and thrive.
UiPath Test Automation using UiPath Test Suite series, part 3DianaGray10
Welcome to UiPath Test Automation using UiPath Test Suite series part 3. In this session, we will cover desktop automation along with UI automation.
Topics covered:
UI automation Introduction,
UI automation Sample
Desktop automation flow
Pradeep Chinnala, Senior Consultant Automation Developer @WonderBotz and UiPath MVP
Deepak Rai, Automation Practice Lead, Boundaryless Group and UiPath MVP
State of ICS and IoT Cyber Threat Landscape Report 2024 previewPrayukth K V
The IoT and OT threat landscape report has been prepared by the Threat Research Team at Sectrio using data from Sectrio, cyber threat intelligence farming facilities spread across over 85 cities around the world. In addition, Sectrio also runs AI-based advanced threat and payload engagement facilities that serve as sinks to attract and engage sophisticated threat actors, and newer malware including new variants and latent threats that are at an earlier stage of development.
The latest edition of the OT/ICS and IoT security Threat Landscape Report 2024 also covers:
State of global ICS asset and network exposure
Sectoral targets and attacks as well as the cost of ransom
Global APT activity, AI usage, actor and tactic profiles, and implications
Rise in volumes of AI-powered cyberattacks
Major cyber events in 2024
Malware and malicious payload trends
Cyberattack types and targets
Vulnerability exploit attempts on CVEs
Attacks on counties – USA
Expansion of bot farms – how, where, and why
In-depth analysis of the cyber threat landscape across North America, South America, Europe, APAC, and the Middle East
Why are attacks on smart factories rising?
Cyber risk predictions
Axis of attacks – Europe
Systemic attacks in the Middle East
Download the full report from here:
https://sectrio.com/resources/ot-threat-landscape-reports/sectrio-releases-ot-ics-and-iot-security-threat-landscape-report-2024/
Elevating Tactical DDD Patterns Through Object CalisthenicsDorra BARTAGUIZ
After immersing yourself in the blue book and its red counterpart, attending DDD-focused conferences, and applying tactical patterns, you're left with a crucial question: How do I ensure my design is effective? Tactical patterns within Domain-Driven Design (DDD) serve as guiding principles for creating clear and manageable domain models. However, achieving success with these patterns requires additional guidance. Interestingly, we've observed that a set of constraints initially designed for training purposes remarkably aligns with effective pattern implementation, offering a more ‘mechanical’ approach. Let's explore together how Object Calisthenics can elevate the design of your tactical DDD patterns, offering concrete help for those venturing into DDD for the first time!
GraphRAG is All You need? LLM & Knowledge GraphGuy Korland
Guy Korland, CEO and Co-founder of FalkorDB, will review two articles on the integration of language models with knowledge graphs.
1. Unifying Large Language Models and Knowledge Graphs: A Roadmap.
https://arxiv.org/abs/2306.08302
2. Microsoft Research's GraphRAG paper and a review paper on various uses of knowledge graphs:
https://www.microsoft.com/en-us/research/blog/graphrag-unlocking-llm-discovery-on-narrative-private-data/
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf91mobiles
91mobiles recently conducted a Smart TV Buyer Insights Survey in which we asked over 3,000 respondents about the TV they own, aspects they look at on a new TV, and their TV buying preferences.
UiPath Test Automation using UiPath Test Suite series, part 4DianaGray10
Welcome to UiPath Test Automation using UiPath Test Suite series part 4. In this session, we will cover Test Manager overview along with SAP heatmap.
The UiPath Test Manager overview with SAP heatmap webinar offers a concise yet comprehensive exploration of the role of a Test Manager within SAP environments, coupled with the utilization of heatmaps for effective testing strategies.
Participants will gain insights into the responsibilities, challenges, and best practices associated with test management in SAP projects. Additionally, the webinar delves into the significance of heatmaps as a visual aid for identifying testing priorities, areas of risk, and resource allocation within SAP landscapes. Through this session, attendees can expect to enhance their understanding of test management principles while learning practical approaches to optimize testing processes in SAP environments using heatmap visualization techniques
What will you get from this session?
1. Insights into SAP testing best practices
2. Heatmap utilization for testing
3. Optimization of testing processes
4. Demo
Topics covered:
Execution from the test manager
Orchestrator execution result
Defect reporting
SAP heatmap example with demo
Speaker:
Deepak Rai, Automation Practice Lead, Boundaryless Group and UiPath MVP
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered QualityInflectra
In this insightful webinar, Inflectra explores how artificial intelligence (AI) is transforming software development and testing. Discover how AI-powered tools are revolutionizing every stage of the software development lifecycle (SDLC), from design and prototyping to testing, deployment, and monitoring.
Learn about:
• The Future of Testing: How AI is shifting testing towards verification, analysis, and higher-level skills, while reducing repetitive tasks.
• Test Automation: How AI-powered test case generation, optimization, and self-healing tests are making testing more efficient and effective.
• Visual Testing: Explore the emerging capabilities of AI in visual testing and how it's set to revolutionize UI verification.
• Inflectra's AI Solutions: See demonstrations of Inflectra's cutting-edge AI tools like the ChatGPT plugin and Azure Open AI platform, designed to streamline your testing process.
Whether you're a developer, tester, or QA professional, this webinar will give you valuable insights into how AI is shaping the future of software delivery.
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...UiPathCommunity
💥 Speed, accuracy, and scaling – discover the superpowers of GenAI in action with UiPath Document Understanding and Communications Mining™:
See how to accelerate model training and optimize model performance with active learning
Learn about the latest enhancements to out-of-the-box document processing – with little to no training required
Get an exclusive demo of the new family of UiPath LLMs – GenAI models specialized for processing different types of documents and messages
This is a hands-on session specifically designed for automation developers and AI enthusiasts seeking to enhance their knowledge in leveraging the latest intelligent document processing capabilities offered by UiPath.
Speakers:
👨🏫 Andras Palfi, Senior Product Manager, UiPath
👩🏫 Lenka Dulovicova, Product Program Manager, UiPath
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...DanBrown980551
Do you want to learn how to model and simulate an electrical network from scratch in under an hour?
Then welcome to this PowSyBl workshop, hosted by Rte, the French Transmission System Operator (TSO)!
During the webinar, you will discover the PowSyBl ecosystem as well as handle and study an electrical network through an interactive Python notebook.
PowSyBl is an open source project hosted by LF Energy, which offers a comprehensive set of features for electrical grid modelling and simulation. Among other advanced features, PowSyBl provides:
- A fully editable and extendable library for grid component modelling;
- Visualization tools to display your network;
- Grid simulation tools, such as power flows, security analyses (with or without remedial actions) and sensitivity analyses;
The framework is mostly written in Java, with a Python binding so that Python developers can access PowSyBl functionalities as well.
What you will learn during the webinar:
- For beginners: discover PowSyBl's functionalities through a quick general presentation and the notebook, without needing any expert coding skills;
- For advanced developers: master the skills to efficiently apply PowSyBl functionalities to your real-world scenarios.
3. My Previous Visualizations Include
• Latency Heat Maps (and other heat map types), including:
• Quotes from LISA'13 yesterday:
•
"Heat maps are a wonderful thing, use them" – Caskey Dickson
•
"If you do distributed systems, you need this" – Theo Schlossnagle
• I did heat maps and visualizations in my LISA'10 talk
4. Audience
• This is for developers, sysadmins, support staff, and
performance engineers
• This is a skill-up for everyone: beginners to experts
• This helps analyze all software: kernels and applications
5. whoami
• G’Day, I’m Brendan
• Recipient of the LISA 2013 Award
for Outstanding Achievement
in System Administration!
(Thank you!)
• Work/Research: tools,
methodologies, visualizations
• Author of Systems Performance,
primary author of DTrace
(Prentice Hall, 2011)
• Lead Performance Engineer
@joyent; also teach classes:
Cloud Perf coming up: http://www.joyent.com/developers/training-services
6. Joyent
• High-Performance Cloud Infrastructure
• Public/private cloud provider
• OS-Virtualization for bare metal performance
• KVM for Linux guests
• Core developers of
SmartOS and node.js
• Office walls decorated
with Flame Graphs:
7. Agenda: Two Talks in One
• 1. CPU Flame Graphs
• Example
• Background
• Flame Graphs
• Generation
• Types: CPU
• 2. Advanced Flame Graphs
• Types: Memory, I/O, Off-CPU, Hot/Cold, Wakeup
• Developments
• SVG demos: https://github.com/brendangregg/FlameGraph/demos
10. Example
• As a short example, I’ll describe the real world performance
issue that led me to create flame graphs
• Then I’ll explain them in detail
11. Example: The Problem
• A production MySQL database had poor performance
• It was a heavy CPU consumer, so I used a CPU profiler to see
why. It sampled stack traces at timed intervals
• The profiler condensed its output by only printing unique
stacks along with their occurrence counts, sorted by count
• The following shows the profiler command and the two most
frequently sampled stacks...
17. Example: Profile Data
• The most frequent stack, printed last, shows CPU usage in
add_to_status(), which is from the “show status” command.
Is that to blame?
• Hard to tell – it only accounts for < 2% of the samples
• I wanted a way to quickly understand stack trace profile data,
without browsing 500,000+ lines of output
18. Example:Visualizations
• To understand this profile data quickly, I created visualization
that worked very well, named “Flame Graph” for its
resemblance to fire (also as it was showing a “hot” CPU issue)
Profile Data.txt
Flame Graph.svg
some
Perl
20. Example: Flame Graph
Same profile data
Where CPU is
really consumed
The
"show
status"
Stack
All Stack Samples
One Stack
Sample
21. Example: Flame Graph
• All data in one picture
• Interactive using JavaScript and a browser: mouse overs
• Stack elements that are frequent can be seen, read, and
compared visually. Frame width is relative to sample count
• CPU usage was now understood properly and quickly,
leading to a 40% performance win
23. Background: Stack Frame
• A stack frame shows a location in code
• Profilers usually show them on a single line. Eg:
libc.so.1`mutex_trylock_adaptive+0x112
module
function
instruction
offset
24. Background: Stack Trace
• A stack trace is a list of frames. Their index is the stack depth:
current
libc.so.1`mutex_trylock_adaptive+0x112 24
parent
parent
libc.so.1`mutex_lock_impl+0x165
23
parent
grand
parent
libc.so.1`mutex_lock+0xc
22
Stack
Depth
[...]
libc.so.1`_lwp_start
0
27. Background: Stack Modes
• Two types of stacks can be profiled:
• user-level for applications (user mode)
• kernel-level for the kernel (kernel mode)
• During a system call, an application may have both
28. Background: Software Internals
• You don’t need to be a programmer to understand stacks.
• Some function names are self explanatory, others require
source code browsing (if available). Not as bad as it sounds:
• MySQL has ~15,000 functions in > 0.5 million lines of code
• The earlier stack has 20 MySQL functions. To understand
them, you may need to browse only 0.13%
(20 / 15000) of the code. Might take hours, but it is doable.
• If you have C++ signatures, you can use a demangler first:
mysqld`_ZN4JOIN4execEv+0x482
gc++filt, demangler.com
mysqld`JOIN::exec()+0x482
29. Background: Stack Visualization
• Stack frames can be visualized as rectangles (boxes)
• Function names can be truncated to fit
• In this case, color is chosen randomly (from a warm palette)
to differentiate adjacent frames
libc.so.1`mutex_trylock_adaptive+0x112
libc.so.1`mutex_trylock_...
libc.so.1`mutex_lock_impl+0x165
libc.so.1`mutex_lock_imp...
libc.so.1`mutex_lock+0xc
libc.so.1`mutex_lock+0xc
mysqld`key_cache_read+0x741
mysqld`key_cache_read+0x741
• A stack trace becomes a column of colored rectangles
30. Background: Time Series Stacks
• Time series ordering allows time-based pattern identification
• However, stacks can change thousands of times per second
Stack
Depth
Time (seconds)
31. Background: Time Series Stacks
• Time series ordering allows time-based pattern identification
• However, stacks can change thousands of times per second
One Stack
Sample
Stack
Depth
Time (seconds)
32. Background: Frame Merging
• When zoomed out, stacks appear as narrow stripes
• Adjacent identical functions can be merged to improve
readability, eg:
mu...
mu...
ge...
muex_tryl...
ge...
mu...
mu...
mu...
mutex_lock_impl()
mu...
mu...
mu...
mutex_lock()
ke...
ke...
ke...
key_cache_read()
• This sometimes works: eg, a repetitive single threaded app
• Often does not (previous slide already did this), due to code
execution between samples or parallel thread execution
33. Background: Frame Merging
• Time-series ordering isn’t necessary for the primary use case:
identify the most common (“hottest”) code path or paths
• By using a different x-axis sort order, frame merging can be
greatly improved...
35. Flame Graphs
• Flame Graphs sort stacks alphabetically. This sort is applied
from the bottom frame upwards. This increases merging and
visualizes code paths.
Stack
Depth
Alphabet
36. Flame Graphs: Definition
• Each box represents a function (a merged stack frame)
• y-axis shows stack depth
• top function led directly to the profiling event
• everything beneath it is ancestry (explains why)
• x-axis spans the sample population, sorted alphabetically
• Box width is proportional to the total time a function was
profiled directly or its children were profiled
• All threads can be shown in the same Flame Graph (the
default), or as separate per-thread Flame Graphs
• Flame Graphs can be interactive: mouse over for details
37. Flame Graphs:Variations
• Profile data can be anything: CPU, I/O, memory, ...
• Naming suggestion: [event] [units] Flame Graph
• Eg: "FS Latency Flame Graph"
• By default, Flame Graphs == CPU Sample Flame Graphs
• Colors can be used for another dimension
• by default, random colors are used to differentiate boxes
• --hash for hash-based on function name
• Distribution applications can be shown in the same Flame
Graph (merge samples from multiple systems)
38. Flame Graphs: A Simple Example
• A CPU Sample Flame Graph:
f()
d()
e()
c()
h()
b()
g()
a()
• I’ll illustrate how these are read by posing various questions
39. Flame Graphs: How to Read
• A CPU Sample Flame Graph:
f()
d()
e()
c()
h()
b()
g()
a()
• Q: which function is on-CPU the most?
40. Flame Graphs: How to Read
• A CPU Sample Flame Graph:
f()
d()
top edge shows
who is on-CPU
directly
e()
c()
h()
b()
g()
a()
• Q: which function is on-CPU the most?
• A: f()
e() is on-CPU a
little, but its runtime
is mostly spent in f(),
which is on-CPU directly
41. Flame Graphs: How to Read
• A CPU Sample Flame Graph:
f()
d()
e()
c()
h()
b()
g()
a()
• Q: why is f() on-CPU?
42. Flame Graphs: How to Read
• A CPU Sample Flame Graph:
f() was called by e()
e() was called by c()
...
f()
d()
e()
ancestry
c()
h()
b()
g()
a()
• Q: why is f() on-CPU?
• A: a() → b() → c() → e() → f()
43. Flame Graphs: How to Read
• A CPU Sample Flame Graph:
f()
d()
e()
c()
h()
b()
g()
a()
• Q: how does b() compare to g()?
44. Flame Graphs: How to Read
• A CPU Sample Flame Graph:
visually compare
lengths
f()
d()
e()
c()
h()
b()
g()
a()
• Q: how does b() compare to g()?
• A: b() looks like it is running (present) about 10 times more
often than g()
45. Flame Graphs: How to Read
• A CPU Sample Flame Graph:
... or mouse over
f()
d()
e()
c()
h()
b()
g()
a()
status line
or tool tip:
b() is 90%
• Q: how does b() compare to g()?
• A: for interactive Flame Graphs, mouse over shows b() is
90%, g() is 10%
46. Flame Graphs: How to Read
• A CPU Sample Flame Graph:
... or mouse over
f()
d()
e()
c()
h()
b()
g()
a()
status line
or tool tip:
g() is 10%
• Q: how does b() compare to g()?
• A: for interactive Flame Graphs, mouse over shows b() is
90%, g() is 10%
47. Flame Graphs: How to Read
• A CPU Sample Flame Graph:
f()
d()
e()
c()
h()
b()
g()
a()
• Q: why are we running f()?
48. Flame Graphs: How to Read
• A CPU Sample Flame Graph:
look for
branches
f()
d()
e()
c()
h()
b()
g()
a()
• Q: why are we running f()?
• A: code path branches can reveal key functions:
• a() choose the b() path
• c() choose the e() path
49. Flame Graphs: Example 1
• Customer alerting software periodically checks a log, however,
it is taking too long (minutes).
• It includes grep(1) of an ~18 Mbyte log file, which takes
around 10 minutes!
• grep(1) appears to be on-CPU for this time. Why?
51. Flame Graphs: Example 1
• CPU Sample Flame Graph for grep(1) user-level stacks:
UTF8?
• 82% of samples are in check_multibyte_string() or its children.
This seems odd as the log file is plain ASCII.
• And why is UTF8 on the scene? ... Oh, LANG=en_US.UTF-8
52. Flame Graphs: Example 1
• CPU Sample Flame Graph for grep(1) user-level stacks:
UTF8?
• Switching to LANG=C improved performance by 2000x
• A simple example, but I did spot this from the raw profiler text
before the Flame Graph. You really need Flame Graphs when
the text gets too long and unwieldy.
53. Flame Graphs: Example 2
• A potential customer benchmarks disk I/O on a cloud instance.
The performance is not as fast as hoped.
• The host has new hardware and software. Issues with the new
type of disks is suspected.
54. Flame Graphs: Example 2
• A potential customer benchmarks disk I/O on a cloud instance.
The performance is not as fast as hoped.
• The host has new hardware and software. Issues with the new
type of disks is suspected.
• I take a look, and notice CPU time in the kernel is modest.
• I’d normally assume this was I/O overheads and not profile it
yet, instead beginning with I/O latency analysis.
• But Flame Graphs make it easy, and it may be useful to see
what code paths (illumos kernel) are on the table.
56. Flame Graphs: Example 2
• 24% in tsc_read()? Time Stamp Counter? Checking ancestry...
57. Flame Graphs: Example 2
• 62% in zfs_zone_io_throttle? Oh, we had forgotten that this
new platform had ZFS I/O throttles turned on by default!
58. Flame Graphs: Example 3
• Application performance is about half that of a competitor
• Everything is believed identical (H/W, application, config,
workload) except for the OS and kernel
• Application is CPU busy, nearly 100% in user-mode. How can
the kernel cause a 2x delta when the app isn't in kernel-mode?
• Flame graphs on both platforms for user-mode were created:
• Linux, using perf
• SmartOS, using DTrace
• Added flamegraph.pl --hash option for consistent function
colors (not random), aiding comparisons
59. Flame Graphs: Example 3
Extra Function:
UnzipDocid()
Linux
SmartOS
• Function label formats are different, but that's just due to
different profilers/stackcollapse.pl's (should fix this)
• Widths slighly different, but we already know perf differs
• Extra function? This is executing different application software!
SphDocID_t
UnzipDocid ()
{ return UnzipOffset(); }
• Actually, a different compiler option was eliding this function
60. Flame Graphs: More Examples
• Flame Graphs are typically
more detailed, like the earlier
MySQL example
• Next, how to generate them,
then more examples
62. Generation
• I’ll describe the original Perl version I wrote and shared on
github:
•
https://github.com/brendangregg/FlameGraph
• There are other great Flame Graph implementations with
different features and usage, which I’ll cover in the last section
64. Generation: Overview
• Full command line example. This uses DTrace for CPU
profiling of the kernel:
# dtrace -x stackframes=100 -n 'profile-997 /arg0/ {
@[stack()] = count(); } tick-60s { exit(0); }' -o out.stacks
# stackcollapse.pl < out.stacks > out.folded
# flamegraph.pl < out.folded > out.svg
• Then, open out.svg in a browser
• Intermediate files could be avoided (piping), but they can be
handy for some manual processing if needed (eg, using vi)
65. Generation: Profiling Data
• The profile data, at a minimum, is a series of stack traces
• These can also include stack trace counts. Eg:
mysqld`_Z13add_to_statusP17system_status_varS0_+0x47
mysqld`_Z22calc_sum_of_all_statusP17system_status_var+0x67
mysqld`_Z16dispatch_command19enum_server_commandP3THDPcj+0x1222
mysqld`_Z10do_commandP3THD+0x198
mysqld`handle_one_connection+0x1a6
libc.so.1`_thrp_setup+0x8d
libc.so.1`_lwp_start
5530
# of occurrences for this stack
• This example is from DTrace, which prints a series of these.
The format of each group is: stack, count, newline
• Your profiler needs to print full (not truncated) stacks, with
symbols. This may be step 0: get the profiler to work!
67. Generation: Profiling Examples: DTrace
• CPU profile kernel stacks at 997 Hertz, for 60 secs:
# dtrace -x stackframes=100 -n 'profile-997 /arg0/ {
@[stack()] = count(); } tick-60s { exit(0); }' -o out.kern_stacks
• CPU profile user-level stacks for PID 12345 at 99 Hertz, 60s:
# dtrace -x ustackframes=100 -n 'profile-97 /PID == 12345 && arg1/ {
@[ustack()] = count(); } tick-60s { exit(0); }' -o out.user_stacks
• Should also work on Mac OS X, but is pending some fixes
preventing stack walking (use Instruments instead)
• Should work for Linux one day with the DTrace ports
68. Generation: Profiling Examples: perf
• CPU profile full stacks at 97 Hertz, for 60 secs:
# perf record -a -g -F 97 sleep 60
# perf script > out.stacks
• Need debug symbol packages installed (*dbgsym), otherwise
stack frames may show as hexidecimal
• May need compilers to cooperate (-fno-omit-frame-pointer)
• Has both user and kernel stacks, and the kernel idle thread.
Can filter the idle thread after stackcollapse-perf.pl using:
# stackcollapse-perf.pl < out.stacks | grep -v cpu_idle | ...
69. Generation: Profiling Examples: SystemTap
• CPU profile kernel stacks at 100 Hertz, for 60 secs:
# stap -s 32 -D MAXTRACE=100 -D MAXSTRINGLEN=4096 -D MAXMAPENTRIES=10240
-D MAXACTION=10000 -D STP_OVERLOAD_THRESHOLD=5000000000 --all-modules
-ve 'global s; probe timer.profile { s[backtrace()] <<< 1; }
probe end { foreach (i in s+) { print_stack(i);
printf("t%dn", @count(s[i])); } } probe timer.s(60) { exit(); }'
> out.kern_stacks
• Need debug symbol packages installed (*dbgsym), otherwise
stack frames may show as hexidecimal
• May need compilers to cooperate (-fno-omit-frame-pointer)
70. Generation: Dynamic Languages
• C or C++ are usually easy to profile, runtime environments
(JVM, node.js, ...) are usually not, typically a way to show
program stacks and not just runtime internals.
• Eg, DTrace’s ustack helper for node.js:
0xfc618bc0
0xfc61bd62
0xfe870841
0xfc61c1f3
0xfc617685
0xfe870841
0xfc6154d7
0xfe870e1a
[...]
libc.so.1`gettimeofday+0x7
Date at position
<< adaptor >>
<< constructor >>
(anon) as exports.active at timers.js position 7590
(anon) as Socket._write at net.js position 21336
(anon) as Socket.write at net.js position 19714
<< adaptor >>
(anon) as OutgoingMessage._writeRaw at http.js p...
(anon) as OutgoingMessage._send at http.js posit...
<< adaptor >>
(anon) as OutgoingMessage.end at http.js pos...
[...]
http://dtrace.org/blogs/dap/2012/01/05/where-does-your-node-program-spend-its-time/
71. Generation: stackcollapse.pl
• Converts profile data into a single line records
• Variants exist for DTrace, perf, SystemTap, Instruments, Xperf
• Eg, DTrace:
unix`i86_mwait+0xd
unix`cpu_idle_mwait+0xf1
unix`idle+0x114
unix`thread_start+0x8
19486
# stackcollapse.pl < out.stacks > out.folded
unix`thread_start;unix`idle;unix`cpu_idle_mwait;unix`i86_mwait 19486
72. Generation: stackcollapse.pl
• Converts profile data into a single line records
• Variants exist for DTrace, perf, SystemTap, Instruments, Xperf
• Eg, DTrace:
unix`i86_mwait+0xd
unix`cpu_idle_mwait+0xf1
unix`idle+0x114
unix`thread_start+0x8
19486
# stackcollapse.pl < out.stacks > out.folded
unix`thread_start;unix`idle;unix`cpu_idle_mwait;unix`i86_mwait 19486
stack trace, frames are ‘;’ delimited
count
73. Generation: stackcollapse.pl
• Full output is many lines, one line per stack
• Bonus: can be grepped
# ./stackcollapse-stap.pl out.stacks | grep ext4fs_dirhash
system_call_fastpath;sys_getdents;vfs_readdir;ext4_readdir;ext4_htree_fill_
tree;htree_dirblock_to_tree;ext4fs_dirhash 100
system_call_fastpath;sys_getdents;vfs_readdir;ext4_readdir;ext4_htree_fill_
tree;htree_dirblock_to_tree;ext4fs_dirhash;half_md4_transform 505
system_call_fastpath;sys_getdents;vfs_readdir;ext4_readdir;ext4_htree_fill_
tree;htree_dirblock_to_tree;ext4fs_dirhash;str2hashbuf_signed 353
[...]
• That shows all stacks containing ext4fs_dirhash(); useful
debug aid by itself
• grep can also be used to filter stacks before Flame Graphs
• eg: grep -v cpu_idle
74. Generation: Final Output
• Desires:
• Full control of output
• High density detail
• Portable: easily viewable
• Interactive
75. Generation: Final Output
• Desires:
• Full control of output
• High density detail
• Portable: easily viewable
PNG
SVG+JS
• Interactive
• SVG+JS: Scalable Vector Graphics with embedded JavaScript
• Common standards, and supported by web browsers
• Can print poster size (scalable); but loses interactivity!
• Can be emitted by a simple Perl program...
76. Generation: flamegraph.pl
• Converts folded stacks into an interactive SVG. Eg:
# flamegraph.pl --titletext="Flame Graph: MySQL" out.folded > graph.svg
• Options:
--titletext
change the title text (default is “Flame Graph”)
--width
width of image (default is 1200)
--height
height of each frame (default is 16)
--minwidth
omit functions smaller than this width (default is 0.1 pixels)
--fonttype
font type (default “Verdana”)
--fontsize
font size (default 12)
--countname
count type label (default “samples”)
--nametype
name type label (default “Function:”)
--colors
color palette: "hot", "mem", "io"
--hash
colors are keyed by function name hash
80. CPU
• Measure code paths that consume CPU
• Helps us understand and optimize CPU usage, improving
performance and scalability
• Commonly performed by sampling CPU stack traces at a
timed interval (eg, 100 Hertz for every 10 ms), on all CPUs
• DTrace/perf/SystemTap examples shown earlier
• Can also be performed by tracing function execution
81. CPU: Sampling
CPU stack sampling:
A
A
A
B
A
-
-
-
-
B
A
A
A
A
A(
)
B(
)
syscall
On-CPU
X
Off-CPU
block . . . . . . . . . interrupt
user-level
kernel
82. CPU: Tracing
CPU function tracing:
A(
B(
B)
A)
A(
)
B(
)
syscall
On-CPU
X
Off-CPU
block . . . . . . . . . interrupt
user-level
kernel
83. CPU: Profiling
• Sampling:
• Coarse but usually effective
• Can also be low overhead, depending on the stack type
and sample rate, which is fixed (eg, 100 Hz x CPU count)
• Tracing:
• Overheads can be too high, distorting results and hurting
the target (eg, millions of trace events per second)
• Most Flame Graphs are generated using stack sampling
84. CPU: Profiling Results
• Example results. Could you do this?
As an experiment to investigate the performance of the
resulting TCP/IP implementation ... the 11/750 is CPU
saturated, but the 11/780 has about 30% idle time. The time
spent in the system processing the data is spread out among
handling for the Ethernet (20%), IP packet processing (10%),
TCP processing (30%), checksumming (25%), and user
system call handling (15%), with no single part of the handling
dominating the time in the system.
85. CPU: Profiling Results
• Example results. Could you do this?
As an experiment to investigate the performance of the
resulting TCP/IP implementation ... the 11/750 is CPU
saturated, but the 11/780 has about 30% idle time. The time
spent in the system processing the data is spread out among
handling for the Ethernet (20%), IP packet processing (10%),
TCP processing (30%), checksumming (25%), and user
system call handling (15%), with no single part of the handling
dominating the time in the system.
– Bill Joy, 1981, TCP-IP Digest, Vol 1 #6
• An impressive report, that even today would be difficult to do
• Flame Graphs make this a lot easier
86. CPU: Another Example
• A file system is archived using tar(1).
• The files and directories are cached, and the run time is
mostly on-CPU in the kernel (Linux). Where exactly?
91. CPU: Recognition
• Once you start profiling a target, you begin to recognize the
common stacks and patterns
• Linux getdents() ext4 path:
• The next slides show similar
example kernel-mode CPU
Sample Flame Graphs
97. CPU: Both Stacks
• Apart from showing either user- or kernel-level stacks, both
can be included by stacking kernel on top of user
• Linux perf does this by default
• DTrace can by aggregating @[stack(), ustack()]
• The different stacks can be highlighted in different ways:
• different colors or hues
• separator: flamegraph.pl will color gray any functions
called "-", which can be inserted as stack separators
• Kernel stacks are only present during syscalls or interrupts
98. CPU: Both Stacks Example: KVM/qemu
user
only
kernel
stack
user
stack
100. Other Targets
• Apart from CPU samples, stack traces can be collected for
any event; eg:
• disk, network, or FS I/O
• CPU events, including cache misses
• lock contention and holds
• memory allocation
• Other values, instead of sample counts, can also be used:
• latency
• bytes
• The next sections demonstrate memory allocation, I/O tracing,
and then all blocking types via off-CPU tracing
102. Memory
• Analyze memory growth or leaks by tracing one of the
following memory events:
• 1. Allocator functions: malloc(), free()
• 2. brk() syscall
• 3. mmap() syscall
• 4. Page faults
• Instead of stacks and
sample counts,
measure stacks
with byte counts
• Merging shows show total bytes by code path
104. Memory: Allocator
• Trace malloc(), free(), realloc(), calloc(), ...
• These operate on virtual memory
• *alloc() stacks show why memory was first allocated (as
opposed to populated): Memory Allocation Flame Graphs
• With free()/realloc()/..., suspected memory leaks during tracing
can be identified: Memory Leak Flame Graphs!
• Down side: allocator functions are frequent, so tracing can
slow the target somewhat (eg, 25%)
• For comparison: Valgrind memcheck is more thorough, but its
CPU simulation can slow the target 20 - 30x
105. Memory: Allocator: malloc()
• As a simple example, just tracing malloc() calls with user-level
stacks and bytes requested, using DTrace:
# dtrace -x ustackframes=100 -n 'pid$target::malloc:entry {
@[ustack()] = sum(arg0); } tick-60s { exit(0); }' -p 529 -o out.malloc
• malloc() Bytes Flame Graph:
# stackcollapse.pl out.malloc | flamegraph.pl --title="malloc() bytes"
--countname="bytes" --colors=mem > out.malloc.svg
• The options customize the title, countname, and color palette
107. Memory: Allocator: Leaks
• Yichun Zhang developed Memory Leak Flame Graphs using
SystemTap to trace allocator functions, and applied them to
leaks in Nginx (web server):
108. Memory: brk()
• Many apps grow their virtual memory size using brk(), which
sets the heap pointer
• A stack trace on brk() shows what triggered growth
• Eg, this script (brkbytes.d) traces brk() growth for “mysqld”:
#!/usr/sbin/dtrace -s
inline string target = "mysqld";
uint brk[int];
syscall::brk:entry /execname == target/ { self->p = arg0; }
syscall::brk:return /arg0 == 0 && self->p && brk[pid]/ {
@[ustack()] = sum(self->p - brk[pid]);
}
syscall::brk:return /arg0 == 0 && self->p/ { brk[pid] = self->p; }
syscall::brk:return /self->p/ { self->p = 0; }
110. Memory: brk()
• brk() tracing has low overhead: these calls are typically
infrequent
• Reasons for brk():
• A memory growth code path
• A memory leak code path
• An innocent application code path, that happened to spillover the current heap size
• Asynchronous allocator code path, that grew the
application in response to diminishing free space
111. Memory: mmap()
• mmap() may be used by the application or it’s user-level
allocator to map in large regions of virtual memory
• It may be followed by munmap() to free the area, which can
also be traced
• Eg, mmap() tracing, similar to brk tracing, to show bytes and
the stacks responsible:
# dtrace -n 'syscall::mmap:entry /execname == "mysqld"/ {
@[ustack()] = sum(arg1); }' -o out.mmap
# stackcollapse.pl out.mmap | flamegraph.pl --countname="bytes"
--title="mmap() bytes Flame Graph" --colors=mem > out.mmap.svg
• This should be low overhead – depends on the frequency
112. Memory: Page Faults
• brk() and mmap() expand virtual memory
• Page faults expand physical memory (RSS). This is demandbased allocation, deferring mapping to the actual write
• Tracing page faults show the stack responsible for consuming
(writing to) memory:
# dtrace -x ustackframes=100 -n 'vminfo:::as_fault /execname == "mysqld"/ {
@[ustack()] = count(); } tick-60s { exit(0); }' > out.fault
# stackcollapse.pl out.mysqld_fault01 | flamegraph.pl --countname=pages
--title="Page Fault Flame Graph" --colors=mem > mysqld_fault.svg
115. I/O
• Show time spent in I/O, eg, storage I/O
• Measure I/O completion events with stacks and their latency;
merging to show total time waiting by code path
Application
system calls
VFS
Logical I/O:
Measure here for user stacks,
and real application latency
FS
Block Device Interface
Disks
Physical I/O:
Measure here for kernel stacks,
and disk I/O latency
116. I/O: Logical I/O Laency
• For example, ZFS call latency using DTrace (zfsustack.d):
#!/usr/sbin/dtrace -s
#pragma D option quiet
#pragma D option ustackframes=100
fbt::zfs_read:entry, fbt::zfs_write:entry,
fbt::zfs_readdir:entry, fbt::zfs_getattr:entry,
fbt::zfs_setattr:entry
{
self->start = timestamp;
}
fbt::zfs_read:return, fbt::zfs_write:return,
fbt::zfs_readdir:return, fbt::zfs_getattr:return,
fbt::zfs_setattr:return
/self->start/
{
this->time = timestamp - self->start;
@[ustack(), execname] = sum(this->time);
self->start = 0;
}
dtrace:::END
{
printa("%k%sn%@dn", @);
}
Timestamp from
function start (entry)
... to function end (return)
117. I/O: Logical I/O Laency
• Making an I/O Time Flame Graph:
# ./zfsustacks.d -n 'tick-10s { exit(0); }' -o out.iostacks
# stackcollapse.pl out.iostacks | awk '{ print $1, $2 / 1000000 }' |
flamegraph.pl --title="FS I/O Time Flame Graph" --color=io
--countname=ms --width=500 > out.iostacks.svg
• DTrace script measures all processes, for 10 seconds
• awk to covert ns to ms
118. I/O: Time Flame Graph: gzip
• gzip(1) waits more time in write()s than read()s
120. I/O: Flame Graphs
• I/O latency tracing: hugely useful
• But once you pick an I/O type, there usually isn't that many
different code paths calling it
• Flame Graphs are nice, but often not necessary
123. Off-CPU: Performance Analysis
• Generic approach for all blocking events, including I/O
• An advanced performance analysis methodology:
•
http://dtrace.org/blogs/brendan/2011/07/08/off-cpu-performance-analysis/
• Counterpart to (on-)CPU profiling
• Measure time a thread spent off-CPU, along with stacks
• Off-CPU reasons:
• Waiting (sleeping) on I/O, locks, timers
• Runnable waiting for CPU
• Runnable waiting for page/swap-ins
• The stack trace will explain which
124. Off-CPU: Time Flame Graphs
• Off-CPU profiling data (durations and stacks) can be rendered
as Off-CPU Time Flame Graphs
• As this involves many more code paths, Flame Graphs are
usually really useful
• Yichun Zhang created these, and has been using them on
Linux with SystemTap to collect the profile data. See:
•
http://agentzh.org/misc/slides/off-cpu-flame-graphs.pdf
• Which describes their uses for Nginx performance analysis
125. Off-CPU: Profiling
• Example of off-CPU profiling for the bash shell:
# dtrace -x ustackframes=100 -n '
sched:::off-cpu /execname == "bash"/ { self->ts = timestamp; }
sched:::on-cpu /self->ts/ {
@[ustack()] = sum(timestamp - self->ts); self->ts = 0; }
tick-30s { exit(0); }' -o out.offcpu
• Traces time from when a thread switches off-CPU to when it
returns on-CPU, with user-level stacks. ie, time blocked or
sleeping
• Off-CPU Time Flame Graph:
# stackcollapse.pl < out.offcpu | awk '{ print $1, $2 / 1000000 }' |
flamegraph.pl --title="Off-CPU Time Flame Graph" --color=io
--countname=ms --width=600 > out.offcpu.svg
• This uses awk to convert nanoseconds into milliseconds
128. Off-CPU: Bash Shell
• For that simple example, the trace
data was so short it could have
just been read (54 lines, 4 unique
stacks):
• For multithreaded applications,
idle thread time can dominate
• For example, an idle MySQL
server...
libc.so.1`__forkx+0xb
libc.so.1`fork+0x1d
bash`make_child+0xb5
bash`execute_simple_command+0xb02
bash`execute_command_internal+0xae6
bash`execute_command+0x45
bash`reader_loop+0x240
bash`main+0xaff
bash`_start+0x83
19052
libc.so.1`syscall+0x13
bash`file_status+0x19
bash`find_in_path_element+0x3e
bash`find_user_command_in_path+0x114
bash`find_user_command_internal+0x6f
bash`search_for_command+0x109
bash`execute_simple_command+0xa97
bash`execute_command_internal+0xae6
bash`execute_command+0x45
bash`reader_loop+0x240
bash`main+0xaff
bash`_start+0x83
7557782
libc.so.1`__waitid+0x15
libc.so.1`waitpid+0x65
bash`waitchld+0x87
bash`wait_for+0x2ce
bash`execute_command_internal+0x1758
bash`execute_command+0x45
bash`reader_loop+0x240
bash`main+0xaff
bash`_start+0x83
1193160644
libc.so.1`__read+0x15
bash`rl_getc+0x2b
bash`rl_read_key+0x22d
bash`readline_internal_char+0x113
bash`readline+0x49
bash`yy_readline_get+0x52
bash`shell_getc+0xe1
bash`read_token+0x6f
bash`yyparse+0x4b9
bash`parse_command+0x67
bash`read_command+0x52
bash`reader_loop+0xa5
bash`main+0xaff
bash`_start+0x83
12588900307
130. Off-CPU: MySQL Idle
Columns from
_thrp_setup
are threads or
thread groups
MySQL gives thread routines
descriptive names (thanks!)
Mouse over each to identify
(profiling time was 30s)
132. Off-CPU: MySQL Idle
• Some thread columns are wider than the measurement time:
evidence of multiple threads
• This can be shown a number of ways. Eg, adding process
name, PID, and TID to the top of each user stack:
#!/usr/sbin/dtrace -s
#pragma D option ustackframes=100
sched:::off-cpu /execname == "mysqld"/ { self->ts = timestamp; }
sched:::on-cpu
/self->ts/
{
@[execname, pid, curlwpsinfo->pr_lwpid, ustack()] =
sum(timestamp - self->ts);
self->ts = 0;
}
dtrace:::END { printa("n%s-%d/%d%k%@dn", @); }
133. Off-CPU: MySQL Idle
1 thread
many threads
2 threads
4 threads doing work
(less idle)
thread ID (TID)
134. Off-CPU: Challenges
• Including multiple threads in one Flame Graph might still be
confusing. Separate Flame Graphs for each can be created
• Off-CPU stacks often don't explain themselves:
• This is blocked on a conditional variable. The real reason it is
blocked and taking time isn't visible here
• Now lets look at a busy MySQL server, which presents
another challenge...
137. Off-CPU: MySQL Busy
• Those were user-level stacks only. The kernel-level stack,
which can be included, will usually explain what happened
• eg, involuntary context switch due to time slice expired
• Those paths are likely hot in the CPU Sample Flame Graph
140. Hot/Cold: Profiling
• Profiling both on-CPU and off-CPU stacks shows everything
• In my LISA'12 talk I called this the Stack Profile Method:
profile all stacks
• Both on-CPU ("hot") and off-CPU ("cold") stacks can be
included in the same Flame Graph, colored differently:
Hot Cold Flame Graphs!
• Merging multiple threads gets even weirder. Creating a
separate graph per-thread makes much more sense, as
comparisons to see how a thread's time is divided between
on- and off-CPU activity
• For example, a single web server thread with kernel stacks...
143. Hot/Cold: Challenges
• Sadly, this often doesn't work well for two reasons:
• 1. On-CPU time columns get compressed by off-CPU time
• Previous example dominated by the idle path – waiting for
a new connection – which is not very interesting!
• Works better with zoomable Flame Graphs, but then we've
lost the ability to see key details on first glance
• Pairs of on-CPU and off-CPU Flame Graphs may be the
best approach, giving both the full width
• 2. Has the same challenge from off-CPU Flame Graphs:
real reason for blocking may not be visible
144. State of the Art
• That was the end of Flame Graphs, but I can't stop here –
we're so close
• On + Off-CPU Flame Graphs can attack any issue
• 1. The compressed problem is solvable via one or more of:
• zoomable Flame Graphs
• separate on- and off-CPU Flame Graphs
• per-thread Flame Graphs
• 2. How do we show the real reason for blocking?
146. Tracing Wakeups
• The systems knows who woke up who
• Tracing who performed the wakeup – and their stack – can
show the real reason for waiting
• Wakeup Latency Flame Graph
• Advanced activity
• Consider overheads – might trace too much
• Eg, consider ssh, starting with the Off CPU Time Flame Graph
147. Off-CPU Time Flame Graph: ssh
Waiting on a conditional variable
But why did we wait this long?
Object sleeping on
150. Tracing Wakeup, Example (DTrace)
#!/usr/sbin/dtrace -s
#pragma D option quiet
#pragma D option ustackframes=100
#pragma D option stackframes=100
int related[uint64_t];
This example targets sshd
(previous example also matched
vmstat, after discovering that
sshd was blocked on vmstat,
which it was: "vmstat 1")
sched:::sleep
/execname == "sshd"/
{
ts[curlwpsinfo->pr_addr] = timestamp;
}
Time from sleep to wakeup
sched:::wakeup
/ts[args[0]->pr_addr]/
{
this->d = timestamp - ts[args[0]->pr_addr];
@[args[1]->pr_fname, args[1]->pr_pid, args[0]->pr_lwpid, args[0]->pr_wchan,
stack(), ustack(), execname, pid, curlwpsinfo->pr_lwpid] = sum(this->d);
ts[args[0]->pr_addr] = 0;
}
Stack traces of who is doing the waking
dtrace:::END
{
printa("n%s-%d/%d-%x%k-%k%s-%d/%dn%@dn", @);
}
Aggregate if possible instead of dumping, to minimize overheads
151. Following Stack Chains
• 1st level of wakeups often not enough
• Would like to programmatically follow multiple chains of
wakeup stacks, and visualize them
• I've discussed this with others before – it's a hard problem
• The following is in development!: Chain Graph
153. Chain Graph
...
Wakeup Thread 2
I wokeup
Wakeup Thread 1
I wokeup
Wakeup Stacks
why I waited
Off CPU Stacks:
why I blocked
154. Chain Graph Visualization
• New, experimental; check for later improvements
• Stacks associated based on sleeping object address
• Retains the value of relative widths equals latency
• Wakeup stacks frames can be listed in reverse (may be less
confusing when following towers bottom-up)
• Towers can get very tall, tracing wakeups through different
software threads, back to metal
155. Following Wakeup Chains, Example (DTrace)
#!/usr/sbin/dtrace -s
#pragma D option quiet
#pragma D option ustackframes=100
#pragma D option stackframes=100
int related[uint64_t];
sched:::sleep
/execname == "sshd" || related[curlwpsinfo->pr_addr]/
{
ts[curlwpsinfo->pr_addr] = timestamp;
}
sched:::wakeup
/ts[args[0]->pr_addr]/
{
this->d = timestamp - ts[args[0]->pr_addr];
@[args[1]->pr_fname, args[1]->pr_pid, args[0]->pr_lwpid, args[0]->pr_wchan,
stack(), ustack(), execname, pid, curlwpsinfo->pr_lwpid] = sum(this->d);
ts[args[0]->pr_addr] = 0;
related[curlwpsinfo->pr_addr] = 1;
}
dtrace:::END
{
printa("n%s-%d/%d-%x%k-%k%s-%d/%dn%@dn", @);
}
Also follow who
wakes up the waker
157. Developments
• There have been many other great developments in the world
of Flame Graphs. The following is a short tour.
158. node.js Flame Graphs
• Dave Pacheco developed the DTrace ustack helper for v8,
and created Flame Graphs with node.js functions
http://dtrace.org/blogs/dap/2012/01/05/where-does-your-node-program-spend-its-time/
159. OS X Instruments Flame Graphs
• Mark Probst developed a
way to produce Flame
Graphs from Instruments
1. Use the Time Profile instrument
2. Instrument -> Export Track
3. stackcollapse-instruments.pl
4. flamegraphs.pl
http://schani.wordpress.com/2012/11/16/flame-graphs-for-instruments/
160. Ruby Flame Graphs
• Sam Saffron developed Flame Graphs with the Ruby
MiniProfiler
• These stacks are very
deep (many frames),
so the function names
have been dropped
and only the rectangles
are drawn
• This preserves the
value of seeing the
big picture at first
glance!
http://samsaffron.com/archive/2013/03/19/flame-graphs-in-ruby-miniprofiler
161. Windows Xperf Flame Graphs
• Bruce Dawson developed Flame Graphs from Xperf data, and
an xperf_to_collapsedstacks.py script
Visual Studio CPU Usage
http://randomascii.wordpress.com/2013/03/26/summarizing-xperf-cpu-usage-with-flame-graphs/
162. WebKit Web Inspector Flame Charts
• Available in Google Chrome developer tools, these show
JavaScript CPU stacks as colored rectangles
• Inspired by Flame Graphs but
not the same: they show the
passage of time on the x-axis!
• This generally works here as:
• the target is single threaded
apps often with repetitive
code paths
• ability to zoom
• Can a "Flame Graph" mode be
provided for the same data?
https://bugs.webkit.org/show_bug.cgi?id=111162
163. Perl Devel::NYTProf Flame Graphs
• Tim Bunce has been adding Flame Graph features, and
included them in the Perl profiler: Devel::NYTProf
http://blog.timbunce.org/2013/04/08/nytprof-v5-flaming-precision/
164. Leak and Off-CPU Time Flame Graphs
• Yichun Zhang (agentzh) has created Memory Leak and OffCPU Time Flame Graphs, and has given good talks to explain
how Flame Graphs work
http://agentzh.org/#Presentations
http://agentzh.org/misc/slides/yapc-na-2013-flame-graphs.pdf
http://www.youtube.com/watch?v=rxn7HoNrv9A
http://agentzh.org/misc/slides/off-cpu-flame-graphs.pdf
http://agentzh.org/misc/flamegraph/nginx-leaks-2013-10-08.svg
https://github.com/agentzh/nginx-systemtap-toolkit
... these
also provide
examples of using
SystemTap on Linux
165. Color Schemes
• Colors can be used to convey data, instead of the default
random color scheme. This example from Dave Pacheco
colors each function by its degree of direct on-CPU execution
• A Flame Graph
tool could let you
select different
color schemes
• Another can be:
color by a hash on
the function name,
so colors are
consistent
https://npmjs.org/package/stackvis
166. Zoomable Flame Graphs
• Dave Pacheco has also used d3 to provide click to zoom!
Zoom
https://npmjs.org/package/stackvis
167. Flame Graph Differentials
• Robert Mustacchi has been experimenting with showing the
difference between two Flame Graphs, as a Flame Graph.
Great potential for non-regression testing, and comparisons!
168. Flame Graphs as a Service
• Pedro Teixeira has a project for node.js Flame Graphs as a
service: automatically generated for each github push!
http://www.youtube.com/watch?v=sMohaWP5YqA
169. References & Acknowledgements
• Neelakanth Nadgir (realneel): developed SVGs using Ruby
and JavaScript of time-series function trace data with stack
levels, inspired by Roch's work
• Roch Bourbonnais: developed Call Stack Analyzer, which
produced similar time-series visualizations
• Edward Tufte: inspired
me to explore
visualizations that show
all the data at once, as
Flame Graphs do
• Thanks to all who have
developed Flame
Graphs further!
realneel's function_call_graph.rb visualization
170. Thank you!
• Questions?
• Homepage: http://www.brendangregg.com (links to everything)
• Resources and further reading:
•
http://dtrace.org/blogs/brendan/2011/12/16/flame-graphs/: see "Updates"
•
http://dtrace.org/blogs/brendan/2012/03/17/linux-kernel-performance-flamegraphs/
•
http://dtrace.org/blogs/brendan/2013/08/16/memory-leak-growth-flame-graphs/
•
http://dtrace.org/blogs/brendan/2011/07/08/off-cpu-performance-analysis/
•
http://dtrace.org/blogs/dap/2012/01/05/where-does-your-node-program-spendits-time/