Bpftrace is a relatively new eBPF-based open source tracer for modern Linux versions (kernels 5.x.y) that is useful for analyzing production performance problems and troubleshooting software. Basic usage of the tool, as well as bpftrace one liners and advanced scripts useful for MariaDB DBAs are presented. Problems of MariaDB Server dynamic tracing with bpftrace and some possible solutions and alternative tracing tools are discussed.
USENIX LISA2021 talk by Brendan Gregg (https://www.youtube.com/watch?v=_5Z2AU7QTH4). This talk is a deep dive that describes how BPF (eBPF) works internally on Linux, and dissects some modern performance observability tools. Details covered include the kernel BPF implementation: the verifier, JIT compilation, and the BPF execution environment; the BPF instruction set; different event sources; and how BPF is used by user space, using bpftrace programs as an example. This includes showing how bpftrace is compiled to LLVM IR and then BPF bytecode, and how per-event data and aggregated map data are fetched from the kernel.
Talk by Brendan Gregg for USENIX LISA 2019: Linux Systems Performance. Abstract: "
Systems performance is an effective discipline for performance analysis and tuning, and can help you find performance wins for your applications and the kernel. However, most of us are not performance or kernel engineers, and have limited time to study this topic. This talk summarizes the topic for everyone, touring six important areas of Linux systems performance: observability tools, methodologies, benchmarking, profiling, tracing, and tuning. Included are recipes for Linux performance analysis and tuning (using vmstat, mpstat, iostat, etc), overviews of complex areas including profiling (perf_events) and tracing (Ftrace, bcc/BPF, and bpftrace/BPF), and much advice about what is and isn't important to learn. This talk is aimed at everyone: developers, operations, sysadmins, etc, and in any environment running Linux, bare metal or the cloud."
BPF of Berkeley Packet Filter mechanism was first introduced in linux in 1997 in version 2.1.75. It has seen a number of extensions of the years. Recently in versions 3.15 - 3.19 it received a major overhaul which drastically expanded it's applicability. This talk will cover how the instruction set looks today and why. It's architecture, capabilities, interface, just-in-time compilers. We will also talk about how it's being used in different areas of the kernel like tracing and networking and future plans.
USENIX LISA2021 talk by Brendan Gregg (https://www.youtube.com/watch?v=_5Z2AU7QTH4). This talk is a deep dive that describes how BPF (eBPF) works internally on Linux, and dissects some modern performance observability tools. Details covered include the kernel BPF implementation: the verifier, JIT compilation, and the BPF execution environment; the BPF instruction set; different event sources; and how BPF is used by user space, using bpftrace programs as an example. This includes showing how bpftrace is compiled to LLVM IR and then BPF bytecode, and how per-event data and aggregated map data are fetched from the kernel.
Talk by Brendan Gregg for USENIX LISA 2019: Linux Systems Performance. Abstract: "
Systems performance is an effective discipline for performance analysis and tuning, and can help you find performance wins for your applications and the kernel. However, most of us are not performance or kernel engineers, and have limited time to study this topic. This talk summarizes the topic for everyone, touring six important areas of Linux systems performance: observability tools, methodologies, benchmarking, profiling, tracing, and tuning. Included are recipes for Linux performance analysis and tuning (using vmstat, mpstat, iostat, etc), overviews of complex areas including profiling (perf_events) and tracing (Ftrace, bcc/BPF, and bpftrace/BPF), and much advice about what is and isn't important to learn. This talk is aimed at everyone: developers, operations, sysadmins, etc, and in any environment running Linux, bare metal or the cloud."
BPF of Berkeley Packet Filter mechanism was first introduced in linux in 1997 in version 2.1.75. It has seen a number of extensions of the years. Recently in versions 3.15 - 3.19 it received a major overhaul which drastically expanded it's applicability. This talk will cover how the instruction set looks today and why. It's architecture, capabilities, interface, just-in-time compilers. We will also talk about how it's being used in different areas of the kernel like tracing and networking and future plans.
Video: https://www.youtube.com/watch?v=JRFNIKUROPE . Talk for linux.conf.au 2017 (LCA2017) by Brendan Gregg, about Linux enhanced BPF (eBPF). Abstract:
A world of new capabilities is emerging for the Linux 4.x series, thanks to enhancements that have been included in Linux for to Berkeley Packet Filter (BPF): an in-kernel virtual machine that can execute user space-defined programs. It is finding uses for security auditing and enforcement, enhancing networking (including eXpress Data Path), and performance observability and troubleshooting. Many new open source tools that have been written in the past 12 months for performance analysis that use BPF. Tracing superpowers have finally arrived for Linux!
For its use with tracing, BPF provides the programmable capabilities to the existing tracing frameworks: kprobes, uprobes, and tracepoints. In particular, BPF allows timestamps to be recorded and compared from custom events, allowing latency to be studied in many new places: kernel and application internals. It also allows data to be efficiently summarized in-kernel, including as histograms. This has allowed dozens of new observability tools to be developed so far, including measuring latency distributions for file system I/O and run queue latency, printing details of storage device I/O and TCP retransmits, investigating blocked stack traces and memory leaks, and a whole lot more.
This talk will summarize BPF capabilities and use cases so far, and then focus on its use to enhance Linux tracing, especially with the open source bcc collection. bcc includes BPF versions of old classics, and many new tools, including execsnoop, opensnoop, funcccount, ext4slower, and more (many of which I developed). Perhaps you'd like to develop new tools, or use the existing tools to find performance wins large and small, especially when instrumenting areas that previously had zero visibility. I'll also summarize how we intend to use these new capabilities to enhance systems analysis at Netflix.
An Introduction to eBPF (and cBPF). Topics covered include history, implementation, program types & maps. Also gives a brief introduction to XDP and DPDK
Note: When you view the the slide deck via web browser, the screenshots may be blurred. You can download and view them offline (Screenshots are clear).
Dataplane programming with eBPF: architecture and toolsStefano Salsano
eBPF is definitely a complex technology. Developing complex systems based on eBPF is challenging due to the intrinsic limitations of the model and the known shortcomings of the tool chain.
The learning curve of this technology is very steep and needs continuous coaching from experts. This tutorial will investigate:
What is eBPF and why it has gained a prominent position among the solutions to improve the packet processing performance in Linux/x86 nodes. We will shortly present some important use case scenarios for eBPF, like Kubernetes’ Cilium
The architecture of eBPF and its programming toolchain (e.g. bcc
What are the frameworks for eBPF programming, such as Polycube and InKeV.
How to make eBPF programming easier, more flexible and modular with HIKe/eCLAT
How to implement a custom application logic in eBPF with eCLAT using a python-like script
How to extend the framework and develop new modules
Note: When you view the the slide deck via web browser, the screenshots may be blurred. You can download and view them offline (Screenshots are clear).
re:Invent 2019 BPF Performance Analysis at NetflixBrendan Gregg
Talk by Brendan Gregg at AWS re:Invent 2019. Abstract: "Extended BPF (eBPF) is an open source Linux technology that powers a whole new class of software: mini programs that run on events. Among its many uses, BPF can be used to create powerful performance analysis tools capable of analyzing everything: CPUs, memory, disks, file systems, networking, languages, applications, and more. In this session, Netflix's Brendan Gregg tours BPF tracing capabilities, including many new open source performance analysis tools he developed for his new book "BPF Performance Tools: Linux System and Application Observability." The talk includes examples of using these tools in the Amazon EC2 cloud."
Using the new extended Berkley Packet Filter capabilities in Linux to the improve performance of auditing security relevant kernel events around network, file and process actions.
eBPF is an exciting new technology that is poised to transform Linux performance engineering. eBPF enables users to dynamically and programatically trace any kernel or user space code path, safely and efficiently. However, understanding eBPF is not so simple. The goal of this talk is to give audiences a fundamental understanding of eBPF, how it interconnects existing Linux tracing technologies, and provides a powerful aplatform to solve any Linux performance problem.
Talk for SCaLE13x. Video: https://www.youtube.com/watch?v=_Ik8oiQvWgo . Profiling can show what your Linux kernel and appliacations are doing in detail, across all software stack layers. This talk shows how we are using Linux perf_events (aka "perf") and flame graphs at Netflix to understand CPU usage in detail, to optimize our cloud usage, solve performance issues, and identify regressions. This will be more than just an intro: profiling difficult targets, including Java and Node.js, will be covered, which includes ways to resolve JITed symbols and broken stacks. Included are the easy examples, the hard, and the cutting edge.
Delivered as plenary at USENIX LISA 2013. video here: https://www.youtube.com/watch?v=nZfNehCzGdw and https://www.usenix.org/conference/lisa13/technical-sessions/plenary/gregg . "How did we ever analyze performance before Flame Graphs?" This new visualization invented by Brendan can help you quickly understand application and kernel performance, especially CPU usage, where stacks (call graphs) can be sampled and then visualized as an interactive flame graph. Flame Graphs are now used for a growing variety of targets: for applications and kernels on Linux, SmartOS, Mac OS X, and Windows; for languages including C, C++, node.js, ruby, and Lua; and in WebKit Web Inspector. This talk will explain them and provide use cases and new visualizations for other event types, including I/O, memory usage, and latency.
Cilium - Container Networking with BPF & XDPThomas Graf
This talk demonstrates that programmability and performance does not require user space networking, it can be achieved in the kernel by generating BPF programs and leveraging the existing kernel subsystems. We will demo an early prototype which provides fast IPv6 & IPv4 connectivity to containers, container labels based security policy with avg cost O(1), and debugging and monitoring based on the per-cpu perf ring buffer. We encourage a lively discussion on the approach taken and next steps.
Linux 4.x Tracing: Performance Analysis with bcc/BPFBrendan Gregg
Talk about bcc/eBPF for SCALE15x (2017) by Brendan Gregg. "BPF (Berkeley Packet Filter) has been enhanced in the Linux 4.x series and now powers a large collection of performance analysis and observability tools ready for you to use, included in the bcc (BPF Complier Collection) open source project. BPF nowadays can do system tracing, software defined networks, and kernel fast path: much more than just filtering packets! This talk will focus on the bcc/BPF tools for performance analysis, which make use of other built in Linux capabilities: dynamic tracing (kprobes and uprobes) and static tracing (tracepoints and USDT). There are now bcc tools for measuring latency distributions for file system I/O and run queue latency, printing details of storage device I/O and TCP retransmits, investigating blocked stack traces and memory leaks, and a whole lot more. These lead to performance wins large and small, especially when instrumenting areas that previously had zero visibility. Tracing superpowers have finally arrived, built in to Linux."
Using eBPF for High-Performance Networking in CiliumScyllaDB
The Cilium project is a popular networking solution for Kubernetes, based on eBPF. This talk uses eBPF code and demos to explore the basics of how Cilium makes network connections, and manipulates packets so that they can avoid traversing the kernel's built-in networking stack. You'll see how eBPF enables high-performance networking as well as deep network observability and security.
UM2019 Extended BPF: A New Type of SoftwareBrendan Gregg
Keynote for Ubuntu Masters 2019 by Brendan Gregg, Netflix. Video https://www.youtube.com/watch?v=7pmXdG8-7WU&feature=youtu.be . "Extended BPF is a new type of software, and the first fundamental change to how kernels are used in 50 years. This new type of software is already in use by major companies: Netflix has 14 BPF programs running by default on all of its cloud servers, which run Ubuntu Linux. Facebook has 40 BPF programs running by default. Extended BPF is composed of an in-kernel runtime for executing a virtual BPF instruction set through a safety verifier and with JIT compilation. So far it has been used for software defined networking, performance tools, security policies, and device drivers, with more uses planned and more we have yet to think of. It is changing how we use and think about systems. This talk explores the past, present, and future of BPF, with BPF performance tools as a use case."
Dynamic tracing of MariaDB on Linux - problems and solutions (MariaDB Server ...Valeriy Kravchuk
Linux with kernels 2.6+. provides different ways to add user probes to almost every other line of code dynamically, and collect the resulting trace and profiling data in a safe and efficient way. This session discusses basic use of ftrace, perf, bcc tools and bpftrace utility, highlights typical problems MariaDB DBAs and developers may hit while trying to apply them, as well as solutions to some of them.
More on bpftrace for MariaDB DBAs and Developers - FOSDEM 2022 MariaDB DevroomValeriy Kravchuk
bpftrace is a relatively new open source tracer for modern Linux (kernels 5.x.y) that may help to troubleshoot performance issues in production as well as to get insights on how software really works. I use it for a couple of years and would like to present more details on how to do it efficiently, including but not limited to adding user probes to different lines of the code inside functions, checking values of local variables and using bpftrace as a code coverage tool.
Video: https://www.youtube.com/watch?v=JRFNIKUROPE . Talk for linux.conf.au 2017 (LCA2017) by Brendan Gregg, about Linux enhanced BPF (eBPF). Abstract:
A world of new capabilities is emerging for the Linux 4.x series, thanks to enhancements that have been included in Linux for to Berkeley Packet Filter (BPF): an in-kernel virtual machine that can execute user space-defined programs. It is finding uses for security auditing and enforcement, enhancing networking (including eXpress Data Path), and performance observability and troubleshooting. Many new open source tools that have been written in the past 12 months for performance analysis that use BPF. Tracing superpowers have finally arrived for Linux!
For its use with tracing, BPF provides the programmable capabilities to the existing tracing frameworks: kprobes, uprobes, and tracepoints. In particular, BPF allows timestamps to be recorded and compared from custom events, allowing latency to be studied in many new places: kernel and application internals. It also allows data to be efficiently summarized in-kernel, including as histograms. This has allowed dozens of new observability tools to be developed so far, including measuring latency distributions for file system I/O and run queue latency, printing details of storage device I/O and TCP retransmits, investigating blocked stack traces and memory leaks, and a whole lot more.
This talk will summarize BPF capabilities and use cases so far, and then focus on its use to enhance Linux tracing, especially with the open source bcc collection. bcc includes BPF versions of old classics, and many new tools, including execsnoop, opensnoop, funcccount, ext4slower, and more (many of which I developed). Perhaps you'd like to develop new tools, or use the existing tools to find performance wins large and small, especially when instrumenting areas that previously had zero visibility. I'll also summarize how we intend to use these new capabilities to enhance systems analysis at Netflix.
An Introduction to eBPF (and cBPF). Topics covered include history, implementation, program types & maps. Also gives a brief introduction to XDP and DPDK
Note: When you view the the slide deck via web browser, the screenshots may be blurred. You can download and view them offline (Screenshots are clear).
Dataplane programming with eBPF: architecture and toolsStefano Salsano
eBPF is definitely a complex technology. Developing complex systems based on eBPF is challenging due to the intrinsic limitations of the model and the known shortcomings of the tool chain.
The learning curve of this technology is very steep and needs continuous coaching from experts. This tutorial will investigate:
What is eBPF and why it has gained a prominent position among the solutions to improve the packet processing performance in Linux/x86 nodes. We will shortly present some important use case scenarios for eBPF, like Kubernetes’ Cilium
The architecture of eBPF and its programming toolchain (e.g. bcc
What are the frameworks for eBPF programming, such as Polycube and InKeV.
How to make eBPF programming easier, more flexible and modular with HIKe/eCLAT
How to implement a custom application logic in eBPF with eCLAT using a python-like script
How to extend the framework and develop new modules
Note: When you view the the slide deck via web browser, the screenshots may be blurred. You can download and view them offline (Screenshots are clear).
re:Invent 2019 BPF Performance Analysis at NetflixBrendan Gregg
Talk by Brendan Gregg at AWS re:Invent 2019. Abstract: "Extended BPF (eBPF) is an open source Linux technology that powers a whole new class of software: mini programs that run on events. Among its many uses, BPF can be used to create powerful performance analysis tools capable of analyzing everything: CPUs, memory, disks, file systems, networking, languages, applications, and more. In this session, Netflix's Brendan Gregg tours BPF tracing capabilities, including many new open source performance analysis tools he developed for his new book "BPF Performance Tools: Linux System and Application Observability." The talk includes examples of using these tools in the Amazon EC2 cloud."
Using the new extended Berkley Packet Filter capabilities in Linux to the improve performance of auditing security relevant kernel events around network, file and process actions.
eBPF is an exciting new technology that is poised to transform Linux performance engineering. eBPF enables users to dynamically and programatically trace any kernel or user space code path, safely and efficiently. However, understanding eBPF is not so simple. The goal of this talk is to give audiences a fundamental understanding of eBPF, how it interconnects existing Linux tracing technologies, and provides a powerful aplatform to solve any Linux performance problem.
Talk for SCaLE13x. Video: https://www.youtube.com/watch?v=_Ik8oiQvWgo . Profiling can show what your Linux kernel and appliacations are doing in detail, across all software stack layers. This talk shows how we are using Linux perf_events (aka "perf") and flame graphs at Netflix to understand CPU usage in detail, to optimize our cloud usage, solve performance issues, and identify regressions. This will be more than just an intro: profiling difficult targets, including Java and Node.js, will be covered, which includes ways to resolve JITed symbols and broken stacks. Included are the easy examples, the hard, and the cutting edge.
Delivered as plenary at USENIX LISA 2013. video here: https://www.youtube.com/watch?v=nZfNehCzGdw and https://www.usenix.org/conference/lisa13/technical-sessions/plenary/gregg . "How did we ever analyze performance before Flame Graphs?" This new visualization invented by Brendan can help you quickly understand application and kernel performance, especially CPU usage, where stacks (call graphs) can be sampled and then visualized as an interactive flame graph. Flame Graphs are now used for a growing variety of targets: for applications and kernels on Linux, SmartOS, Mac OS X, and Windows; for languages including C, C++, node.js, ruby, and Lua; and in WebKit Web Inspector. This talk will explain them and provide use cases and new visualizations for other event types, including I/O, memory usage, and latency.
Cilium - Container Networking with BPF & XDPThomas Graf
This talk demonstrates that programmability and performance does not require user space networking, it can be achieved in the kernel by generating BPF programs and leveraging the existing kernel subsystems. We will demo an early prototype which provides fast IPv6 & IPv4 connectivity to containers, container labels based security policy with avg cost O(1), and debugging and monitoring based on the per-cpu perf ring buffer. We encourage a lively discussion on the approach taken and next steps.
Linux 4.x Tracing: Performance Analysis with bcc/BPFBrendan Gregg
Talk about bcc/eBPF for SCALE15x (2017) by Brendan Gregg. "BPF (Berkeley Packet Filter) has been enhanced in the Linux 4.x series and now powers a large collection of performance analysis and observability tools ready for you to use, included in the bcc (BPF Complier Collection) open source project. BPF nowadays can do system tracing, software defined networks, and kernel fast path: much more than just filtering packets! This talk will focus on the bcc/BPF tools for performance analysis, which make use of other built in Linux capabilities: dynamic tracing (kprobes and uprobes) and static tracing (tracepoints and USDT). There are now bcc tools for measuring latency distributions for file system I/O and run queue latency, printing details of storage device I/O and TCP retransmits, investigating blocked stack traces and memory leaks, and a whole lot more. These lead to performance wins large and small, especially when instrumenting areas that previously had zero visibility. Tracing superpowers have finally arrived, built in to Linux."
Using eBPF for High-Performance Networking in CiliumScyllaDB
The Cilium project is a popular networking solution for Kubernetes, based on eBPF. This talk uses eBPF code and demos to explore the basics of how Cilium makes network connections, and manipulates packets so that they can avoid traversing the kernel's built-in networking stack. You'll see how eBPF enables high-performance networking as well as deep network observability and security.
UM2019 Extended BPF: A New Type of SoftwareBrendan Gregg
Keynote for Ubuntu Masters 2019 by Brendan Gregg, Netflix. Video https://www.youtube.com/watch?v=7pmXdG8-7WU&feature=youtu.be . "Extended BPF is a new type of software, and the first fundamental change to how kernels are used in 50 years. This new type of software is already in use by major companies: Netflix has 14 BPF programs running by default on all of its cloud servers, which run Ubuntu Linux. Facebook has 40 BPF programs running by default. Extended BPF is composed of an in-kernel runtime for executing a virtual BPF instruction set through a safety verifier and with JIT compilation. So far it has been used for software defined networking, performance tools, security policies, and device drivers, with more uses planned and more we have yet to think of. It is changing how we use and think about systems. This talk explores the past, present, and future of BPF, with BPF performance tools as a use case."
Dynamic tracing of MariaDB on Linux - problems and solutions (MariaDB Server ...Valeriy Kravchuk
Linux with kernels 2.6+. provides different ways to add user probes to almost every other line of code dynamically, and collect the resulting trace and profiling data in a safe and efficient way. This session discusses basic use of ftrace, perf, bcc tools and bpftrace utility, highlights typical problems MariaDB DBAs and developers may hit while trying to apply them, as well as solutions to some of them.
More on bpftrace for MariaDB DBAs and Developers - FOSDEM 2022 MariaDB DevroomValeriy Kravchuk
bpftrace is a relatively new open source tracer for modern Linux (kernels 5.x.y) that may help to troubleshoot performance issues in production as well as to get insights on how software really works. I use it for a couple of years and would like to present more details on how to do it efficiently, including but not limited to adding user probes to different lines of the code inside functions, checking values of local variables and using bpftrace as a code coverage tool.
From Zero to Hero - All you need to do serious deep learning stuff in R Kai Lichtenberg
Slides from my talk at the useR Group Münster 04/17/18 on how to start with GPU enabled deep learning in R. First I'm showing how to create a NVIDIA docker based image with RStudio, TensorFlow and Keras for R and then comes an introduction to deep learning (classic MNIST classification with MLP and CNN).
Linux /proc filesystem for MySQL DBAs - FOSDEM 2021Valeriy Kravchuk
Tools and approaches based on /proc sampling (like 0x.tools by Tanel Poder or ad hoc scripts) allow to measure individual thread level activity in MySQL server on Linux, like thread sleep states, currently executing system calls and kernel wait locations. If needed you can drill down into CPU usage of any thread or the system as a whole. Historical data can be captured for post factum analysis, without much impact on the system and no need to install or change anything in its configuration. In this presentation I am going to summarize what's possible with /proc and show useful examples for MySQL DBAs.
E bpf and dynamic tracing for mariadb db as (mariadb day during fosdem 2020)Valeriy Kravchuk
eBPF on Linux 4.9+ is probably the best way to study performance problems. Basic usage of ftrace interface, bcc tools and bpftrace, as well as main bpftrace features and commands are presented. Several typical use cases
(including adding dynamic probes to MariaDB servers, working with stack traces and creating Flame Graphs) are discussed.
Linux kernel tracing superpowers in the cloudAndrea Righi
The Linux 4.x series introduced a new powerful engine of programmable tracing (BPF) that allows to actually look inside the kernel at runtime. This talk will show you how to exploit this engine in order to debug problems or identify performance bottlenecks in a complex environment like a cloud. This talk will cover the latest Linux superpowers that allow to see what is happening “under the hood” of the Linux kernel at runtime. I will explain how to exploit these “superpowers” to measure and trace complex events at runtime in a cloud environment. For example, we will see how we can measure latency distribution of filesystem I/O, details of storage device operations, like individual block I/O request timeouts, or TCP buffer allocations, investigating stack traces of certain events, identify memory leaks, performance bottlenecks and a whole lot more.
Linux Server Deep Dives (DrupalCon Amsterdam)Amin Astaneh
Over the past few years the Linux kernel has gained features that allow us to learn more about what's really happening on our servers and the applications that run on them.
This talk will explore how these new features, particularly perf_events and ebpf, enable us to answer questions about what a Drupal site is doing in real time beyond what the standard logs, server performance tools, and even strace will reveal. Attendees will be provided a brief introduction to example uses of these tools to diagnose performance problems.
This talk is intended for attendees that are familiar with Linux, the command line, and have used host observability tools in the past (top, netstat, etc).
Building Network Functions with eBPF & BCCKernel TLV
eBPF (Extended Berkeley Packet Filter) is an in-kernel virtual machine that allows running user-supplied sandboxed programs inside of the kernel. It is especially well-suited to network programs and it's possible to write programs that filter traffic, classify traffic and perform high-performance custom packet processing.
BCC (BPF Compiler Collection) is a toolkit for creating efficient kernel tracing and manipulation programs. It makes use of eBPF.
BCC provides an end-to-end workflow for developing eBPF programs and supplies Python bindings, making eBPF programs much easier to write.
Together, eBPF and BCC allow you to develop and deploy network functions safely and easily, focusing on your application logic (instead of kernel datapath integration).
In this session, we will introduce eBPF and BCC, explain how to implement a network function using BCC, discuss some real-life use-cases and show a live demonstration of the technology.
About the speaker
Shmulik Ladkani, Chief Technology Officer at Meta Networks,
Long time network veteran and kernel geek.
Shmulik started his career at Jungo (acquired by NDS/Cisco) implementing residential gateway software, focusing on embedded Linux, Linux kernel, networking and hardware/software integration.
Some billions of forwarded packets later, Shmulik left his position as Jungo's lead architect and joined Ravello Systems (acquired by Oracle) as tech lead, developing a virtual data center as a cloud-based service, focusing around virtualization systems, network virtualization and SDN.
Recently he co-founded Meta Networks where he's been busy architecting secure, multi-tenant, large-scale network infrastructure as a cloud-based service.
Linux Performance Analysis: New Tools and Old SecretsBrendan Gregg
Talk for USENIX/LISA2014 by Brendan Gregg, Netflix. At Netflix performance is crucial, and we use many high to low level tools to analyze our stack in different ways. In this talk, I will introduce new system observability tools we are using at Netflix, which I've ported from my DTraceToolkit, and are intended for our Linux 3.2 cloud instances. These show that Linux can do more than you may think, by using creative hacks and workarounds with existing kernel features (ftrace, perf_events). While these are solving issues on current versions of Linux, I'll also briefly summarize the future in this space: eBPF, ktap, SystemTap, sysdig, etc.
This presentation is for Go developers and operators of Go applications who are interested in reducing costs and latency, or debugging problems such as memory leaks, infinite loops, performance regressions, etc. of such applications. We'll start with a brief description of the unique aspects of the Go runtime, and then take a look at the builtin profilers as well as Go's execution tracer. Additionally we'll look at the interoperability with popular observability tools such as Linux perf and bpftrace. After this presentation you should have a good idea of the various tools you can use, and which ones might be the most useful to you in a production environment.
Similar to Tracing MariaDB server with bpftrace - MariaDB Server Fest 2021 (20)
Flame Graphs for MySQL DBAs - FOSDEM 2022 MySQL DevroomValeriy Kravchuk
Flame graph is way to visualize profiling data that allows the most frequent code paths to be identified quickly and accurately. They can be generated using Brendan Gregg's open source programs on github.com/brendangregg/FlameGraph, which create interactive SVG files to be checked in browser. The source of profiling data does not really matter - it can be perf profiler, bpftrace, Performance Schema, EXPLAIN output or any other source that allows to convert the data into the expected format of comma-separated "path" plus metric per line.
Different types of Flame Graphs (CPU, Off-CPU, Memory, Differential etc) are presented. Various tools and approaches to collect profile information of different aspects of MySQL server internal working are presented Several real-life use cases where Flame Graphs helped to understand and solve the problem are discussed.
MariaDB Server on macOS - FOSDEM 2022 MariaDB DevroomValeriy Kravchuk
Current MariaDB Server GA versions are formally not supported (and probably not even regularly built or tested) on macOS 10.x and 11.y. But it's relatively easy to set up the environment and build MariaDB Server from current 10.2 - 10.8 GitHub sources, with few minor issues to resolve in the process, depending on macOS and major server version used.
This talk is a summary of my related experience on 10.13 High Sierra that I had a chance to work on recently, with additional quick review of related fixed and open bugs, as well as some unique features like DTrace support that one may benefit from on macOS. Actually, studying DTrace in context of MariaDB Server troubleshooting and performance tuning was one of the goals why I started to use macOS again.
MariaDB 10.5 new features for troubleshooting (mariadb server fest 2020)Valeriy Kravchuk
The recently released MariaDB 10.5 GA includes many new, useful features, but I’d like to concentrate on those helping DBAs and support engineers to find out what’s going on when a problem occurs.
Specifically I present and discuss the Performance Schema updates to match MySQL 5.7 instrumentation, new tables in the INFORMATION_SCHEMA to monitor the internals of a generic thread pool and improvements of ANALYZE for statements.
This session is about using GNU debugger (gdb) as a tool to study MySQL internals (namely, InnoDB locks and metadata locks) and as a last resort in cases when server hangs or has to be restarted for other reason. It never hurts to try a trick or two before giving up and restarting.
Sometimes MySQL DBAs have to work with stalled/hanged/unresponsive MySQL instance, where their usual SQL-based tricks do not work any more. Sometimes they can not even connect to check what's going on inside server.
In other cases they know what to do and everything still works, but they have to implement changes to read-only server variables. Server restart is often not an option in production, as it means some downtime and may cause negative performance impact.
In these cases one could do something given read and write access to server memory/internals. Here comes gdb, that, alone with careful reading of the source code helps to often resolve the problems described above. During this session I'll show what can be done with gdb when server already is in
troubles, and how to use gdb to "see" and understand MySQL internals (like InnoDB locks or metadata locks) better.
We describe the deployment and use of Globus Compute for remote computation. This content is aimed at researchers who wish to compute on remote resources using a unified programming interface, as well as system administrators who will deploy and operate Globus Compute services on their research computing infrastructure.
Field Employee Tracking System| MiTrack App| Best Employee Tracking Solution|...informapgpstrackings
Keep tabs on your field staff effortlessly with Informap Technology Centre LLC. Real-time tracking, task assignment, and smart features for efficient management. Request a live demo today!
For more details, visit us : https://informapuae.com/field-staff-tracking/
OpenFOAM solver for Helmholtz equation, helmholtzFoam / helmholtzBubbleFoamtakuyayamamoto1800
In this slide, we show the simulation example and the way to compile this solver.
In this solver, the Helmholtz equation can be solved by helmholtzFoam. Also, the Helmholtz equation with uniformly dispersed bubbles can be simulated by helmholtzBubbleFoam.
May Marketo Masterclass, London MUG May 22 2024.pdfAdele Miller
Can't make Adobe Summit in Vegas? No sweat because the EMEA Marketo Engage Champions are coming to London to share their Summit sessions, insights and more!
This is a MUG with a twist you don't want to miss.
Enterprise Resource Planning System includes various modules that reduce any business's workload. Additionally, it organizes the workflows, which drives towards enhancing productivity. Here are a detailed explanation of the ERP modules. Going through the points will help you understand how the software is changing the work dynamics.
To know more details here: https://blogs.nyggs.com/nyggs/enterprise-resource-planning-erp-system-modules/
top nidhi software solution freedownloadvrstrong314
This presentation emphasizes the importance of data security and legal compliance for Nidhi companies in India. It highlights how online Nidhi software solutions, like Vector Nidhi Software, offer advanced features tailored to these needs. Key aspects include encryption, access controls, and audit trails to ensure data security. The software complies with regulatory guidelines from the MCA and RBI and adheres to Nidhi Rules, 2014. With customizable, user-friendly interfaces and real-time features, these Nidhi software solutions enhance efficiency, support growth, and provide exceptional member services. The presentation concludes with contact information for further inquiries.
Listen to the keynote address and hear about the latest developments from Rachana Ananthakrishnan and Ian Foster who review the updates to the Globus Platform and Service, and the relevance of Globus to the scientific community as an automation platform to accelerate scientific discovery.
Experience our free, in-depth three-part Tendenci Platform Corporate Membership Management workshop series! In Session 1 on May 14th, 2024, we began with an Introduction and Setup, mastering the configuration of your Corporate Membership Module settings to establish membership types, applications, and more. Then, on May 16th, 2024, in Session 2, we focused on binding individual members to a Corporate Membership and Corporate Reps, teaching you how to add individual members and assign Corporate Representatives to manage dues, renewals, and associated members. Finally, on May 28th, 2024, in Session 3, we covered questions and concerns, addressing any queries or issues you may have.
For more Tendenci AMS events, check out www.tendenci.com/events
How Recreation Management Software Can Streamline Your Operations.pptxwottaspaceseo
Recreation management software streamlines operations by automating key tasks such as scheduling, registration, and payment processing, reducing manual workload and errors. It provides centralized management of facilities, classes, and events, ensuring efficient resource allocation and facility usage. The software offers user-friendly online portals for easy access to bookings and program information, enhancing customer experience. Real-time reporting and data analytics deliver insights into attendance and preferences, aiding in strategic decision-making. Additionally, effective communication tools keep participants and staff informed with timely updates. Overall, recreation management software enhances efficiency, improves service delivery, and boosts customer satisfaction.
Software Engineering, Software Consulting, Tech Lead.
Spring Boot, Spring Cloud, Spring Core, Spring JDBC, Spring Security,
Spring Transaction, Spring MVC,
Log4j, REST/SOAP WEB-SERVICES.
Custom Healthcare Software for Managing Chronic Conditions and Remote Patient...Mind IT Systems
Healthcare providers often struggle with the complexities of chronic conditions and remote patient monitoring, as each patient requires personalized care and ongoing monitoring. Off-the-shelf solutions may not meet these diverse needs, leading to inefficiencies and gaps in care. It’s here, custom healthcare software offers a tailored solution, ensuring improved care and effectiveness.
In 2015, I used to write extensions for Joomla, WordPress, phpBB3, etc and I ...Juraj Vysvader
In 2015, I used to write extensions for Joomla, WordPress, phpBB3, etc and I didn't get rich from it but it did have 63K downloads (powered possible tens of thousands of websites).
Code reviews are vital for ensuring good code quality. They serve as one of our last lines of defense against bugs and subpar code reaching production.
Yet, they often turn into annoying tasks riddled with frustration, hostility, unclear feedback and lack of standards. How can we improve this crucial process?
In this session we will cover:
- The Art of Effective Code Reviews
- Streamlining the Review Process
- Elevating Reviews with Automated Tools
By the end of this presentation, you'll have the knowledge on how to organize and improve your code review proces
In software engineering, the right architecture is essential for robust, scalable platforms. Wix has undergone a pivotal shift from event sourcing to a CRUD-based model for its microservices. This talk will chart the course of this pivotal journey.
Event sourcing, which records state changes as immutable events, provided robust auditing and "time travel" debugging for Wix Stores' microservices. Despite its benefits, the complexity it introduced in state management slowed development. Wix responded by adopting a simpler, unified CRUD model. This talk will explore the challenges of event sourcing and the advantages of Wix's new "CRUD on steroids" approach, which streamlines API integration and domain event management while preserving data integrity and system resilience.
Participants will gain valuable insights into Wix's strategies for ensuring atomicity in database updates and event production, as well as caching, materialization, and performance optimization techniques within a distributed system.
Join us to discover how Wix has mastered the art of balancing simplicity and extensibility, and learn how the re-adoption of the modest CRUD has turbocharged their development velocity, resilience, and scalability in a high-growth environment.
Globus Compute wth IRI Workflows - GlobusWorld 2024Globus
As part of the DOE Integrated Research Infrastructure (IRI) program, NERSC at Lawrence Berkeley National Lab and ALCF at Argonne National Lab are working closely with General Atomics on accelerating the computing requirements of the DIII-D experiment. As part of the work the team is investigating ways to speedup the time to solution for many different parts of the DIII-D workflow including how they run jobs on HPC systems. One of these routes is looking at Globus Compute as a way to replace the current method for managing tasks and we describe a brief proof of concept showing how Globus Compute could help to schedule jobs and be a tool to connect compute at different facilities.
A Sighting of filterA in Typelevel Rite of Passage
Tracing MariaDB server with bpftrace - MariaDB Server Fest 2021
1. Tracing MariaDB Server with
bpftrace
Problems and Solutions
Valerii Kravchuk, Principal Support Engineer, MariaDB
valerii.kravchuk@mariadb.com
1
2. www.percona.com
Who am I and What Do I Do?
Valerii (aka Valeriy) Kravchuk:
● MySQL Support Engineer in MySQL AB, Sun and Oracle, 2005-2012
● Principal Support Engineer in Percona, 2012-2016
● Principal Support Engineer in MariaDB Corporation since March 2016
● http://mysqlentomologist.blogspot.com - my blog about MariaDB and
MySQL (including some HowTos, not just MySQL bugs marketing any more)
● I often write about bpftrace in my blog...
● https://www.facebook.com/valerii.kravchuk - my Facebook page
● @mysqlbugs #bugoftheday
● MySQL Community Contributor of the Year 2019
● I speak about MySQL and MariaDB in public. Some slides from previous talks
are here and there…
● “I solve problems”, “I drink and I know things”
2
3. www.percona.com
Disclaimers
● Since September, 2012 I am an Independent Consultant
providing services to different companies
● All views, ideas, conclusions, statements and approaches
in my presentations and blog posts are mine and may not
be shared by any of my previous, current and future
employees, customers and partners
● All examples are either based on public information or are
truly fictional and has nothing to do with any real persons or
companies. Any similarities are pure coincidence :)
● The information presented is true to the best of my
knowledge
3
4. www.percona.com
Sources of tracing and profiling information for
MariaDB server
● Trace files from -debug binaries
● Extended slow query log
● show [global] status, show engine innodb statusG
● InnoDB-related tables in the INFORMATION_SCHEMA
● userstat - operations per user, client, table or index
● show profiles;
● PERFORMANCE_SCHEMA that was supposed to be an
ultimate solution but is disabled by default in MariaDB
● OS-level tracing and profiling tools:
○ /proc sampling
○ ftrace and perf profiler
○ eBPF, bcc tools and bpftrace
● tcpdump analysis
4
5. www.percona.com
What is this session about?
● It’s about tracing and profiling MariaDB server in
production on recent enough Linux versions like Ubuntu
20.04 (kernel 5.x.y, the newer the better) with eBPF-based
bpftrace tool
● I plan to present and discuss some (mostly resolvable)
dynamic tracing problems one may hit while working with
MariaDB server
● Performance impact of tracing and profiling in production
matters
● Why not about Performance Schema?
● Why not about perf and bcc tools?
5
6. www.percona.com
So, what do I suggest?
● Use modern Linux tracing tools while troubleshooting MariaDB server!
● Yes, all that kernel and user probes and tracepoints, with bpftrace if Linux
kernel version allows to use it
● Brendan D. Gregg explained the role of bpftrace in the global picture:
6
7. www.percona.com
Tracing events sources
● So, tracing is basically doing something whenever specific events occur
● Event data can come from the kernel or from userspace (apps and libraries).
Some of them are automatically available without further upstream
developer effort, others require manual annotations:
● Kprobe - the mechanism that allows tracing any function call inside the
kernel
● Kernel tracepoint - tracing custom events that the kernel developers have
defined (with TRACE_EVENT macros).
● Uprobe - for tracing user space function calls
● USDT (e.g. DTrace probes) stands for Userland Statically Defined Tracing
7
Automatic Manual annotations
Kernel kprobes Kernel tracepoints
Userspace uprobes USDT
8. www.percona.com
eBPF: extended Berkeley Packet Filter
● eBPF is a tiny language for a VM that can be executed inside Linux Kernel. eBPF instructions can
be JIT-compiled into a native code. eBPF was originally conceived to power tools like tcpdump
and implement programmable network packed dispatch and tracing. Since Linux 4.1, eBPF
programs can be attached to kprobes and later - uprobes, enabling efficient programmable tracing
● Brendan Gregg explained it here:
8
9. www.percona.com
More about eBPF
● Julia Evans explained it here:
1. You write an “eBPF program” (often in C, Python or use a tool that generates that program
for you) for LLVM. It’s the “probe”.
2. You ask the kernel to attach that probe to a kprobe/uprobe/tracepoint/dtrace probe
3. Your program writes out data to an eBPF map / ftrace / perf buffer
4. You have your precious preprocessed data exported to userspace!
● eBPF is a part of any modern Linux (kernel 4.9+):
4.1 - kprobes
4.3 - uprobes
4.6 - stack traces, count and hist builtins (use PER CPU maps for accuracy and efficiency)
4.7 - tracepoints
4.9 - timers/profiling
● You don’t have to install any kernel modules
● You can define your own programs to do any fancy aggregation you want, so
it’s really powerful
● DBAs usually use eBPF via some existing bcc frontend. Check some here.
● Recently a very convenient bpftrace frontend was added
9
10. bpftrace as a frontend for eBPF
● bpftrace (frontend with pattern/action based programming language) allows
to define actions for probes presented below in easy and flexible way
● How to start using bpftrace? You need recent enough kernel 5.x.y, install
the package or build it from GitHub source and then...
10
11. Check bpftrace --help output
openxs@ao756:~$ bpftrace --version
bpftrace v0.13.0-120-gc671
openxs@ao756:~$ bpftrace --help
USAGE:
bpftrace [options] filename
bpftrace [options] - <stdin input>
bpftrace [options] -e 'program'
OPTIONS:
-B MODE output buffering mode ('full', 'none')
-f FORMAT output format ('text', 'json')
-o file redirect bpftrace output to file
-d debug info dry run
-dd verbose debug info dry run
-e 'program' execute this program
-h, --help show this help message
11
12. Check bpftrace --help output (more)
-I DIR add the directory to the include search path
--include FILE add an #include file before preprocessing
-l [search] list probes
-p PID enable USDT probes on PID
-c 'CMD' run CMD and enable USDT probes on resulting
process
--usdt-file-activation
activate usdt semaphores based on file path
--unsafe allow unsafe builtin functions (and more)
-q keep messages quiet
-v verbose messages
--info Print information about kernel BPF support
-k emit a warning when a bpf helper returns an
error (except read functions)
-kk check all bpf helper functions
-V, --version bpftrace version
--no-warnings disable all warning messages
12
13. Environment variables for bpftrace (--help output)
ENVIRONMENT:
BPFTRACE_STRLEN [default: 64] bytes on BPF stack per
str()
BPFTRACE_NO_CPP_DEMANGLE [default: 0] disable C++ symbol
demangling
BPFTRACE_MAP_KEYS_MAX [default: 4096] max keys in a map
BPFTRACE_CAT_BYTES_MAX [default: 10k] maximum bytes read by
cat builtin
BPFTRACE_MAX_PROBES [default: 512] max number of probes
BPFTRACE_LOG_SIZE [default: 1000000] log size in bytes
BPFTRACE_PERF_RB_PAGES [default: 64] pages per CPU to allocate
for ring buffer
BPFTRACE_NO_USER_SYMBOLS [default: 0] disable user symbol
resolution
BPFTRACE_CACHE_USER_SYMBOLS [default: auto] enable user symbol
cache
BPFTRACE_VMLINUX [default: none] vmlinux path used for
kernel symbol resolution
BPFTRACE_BTF [default: none] BTF file
13
14. Read the Reference Guide - Probes
● The program consists of one or more of the following sequences:
probe[,probe,...] [/filter/] { action }
● Probes:
○ BEGIN - triggered before all other probes are attached
○ kprobe - kernel function start
○ kretprobe - kernel function return
○ uprobe - user-level function start
○ uretprobe - user-level function return
○ tracepoint - kernel static tracepoints
○ usdt - user-level static tracepoints (not used in MySQL code any more)
○ profile - timed sampling
○ interval - timed output
○ software - kernel software events (see perf, like page-faults or cs)
○ hardware - processor-level events (see perf, like cycles or cache-misses)
○ watchpoint - memory (by address) watchpoints provided by the kernel (r:w:x)
○ END - triggered after all other probes are detached
14
15. Read the Reference Guide - Language
● For me the bpftrace “language” resembles awk (see the Reference Guide):
○ {...}: Action Blocks - you can group statements separated by ;
○ /.../: Filtering - optional, you can specify any condition using arguments, variables...
○ //, /*: Comments - single line and multi-line comments
○ ->: C Struct Navigation - you can refer to struct fields with -> operator
○ struct: Struct Declaration - you can declare structures like in C
○ ? :: ternary operators - like in C
○ if () {...} else {...} - conditional execution
○ unroll () {...}: repeat the body N times. For example, try this:
'BEGIN { $i = 1; unroll(5) { printf("i: %dn", $i); $i = $i +
1; } exit() }'
○ ++ and --: increment operators - increment or decrement counters in maps or vars
○ []: Array access - to access one-dimensional, constant arrays
○ Integer casts of 64 bit signed integers: uint8, int8, uint16, … uint64, int64
○ Looping constructs - C style while loops, with continue and break, kernel 5.3+
○ return: Terminate Early - exist the current probe (vs exit() function)
○ ( , ): Tuples - you can define variables with tuples, access with . index ($t.2)
15
16. Read the Reference Guide - Variables
● See “Variables”:
○ Builtins - pid, tid, uid, gid, cpu, comm, retval, func, probe, rand,
cgroup, arg0, arg1, ..., argN. - arguments to the traced function;
assumed to be 64 bits wide, and more …
○ nsecs, elapsed: - timestamp and elapsed time since bpftrace
initialization, in nanoseconds
○ @, $: Basic Variables - global (@a), per-thread (@a[tid]) and scratch ($a)
○ @[]: Associative Arrays - BPF map, @start[tid] = nsecs;
@memory[tid,ustack] = retval;
○ kstack: Stack Traces, Kernel - alias for kstack() function
○ ustack: Stack Traces, User - alias for ustack() function
○ $1, ..., $N, $#: Positional Parameters - command line arguments, may be
used in probe too
16
17. Read the Reference Guide - Functions
● See “Functions”:
○ printf(char *fmt, ...) - Print formatted
○ print(...) - Print a non-map value with default formatting
○ str(char *s [, int length]) - Returns the string pointed to by s
○ system(char *cmd) - Execute shell command
○ exit() - Quit bpftrace
○ cgroupid(char *path) - Resolve cgroup ID
○ kstack([StackMode mode, ][int level]) - Kernel stack trace
○ ustack([StackMode mode, ][int level]) - User stack trace
○ strncmp(char *s1, char *s2, int n) - Compare first n characters
of two strings
○ sizeof(...) - Return size of a type or expression
○ ...
17
18. Read the Reference Guide - Map Functions
● Maps are special BPF data types that can be used to store
counts, statistics, and histograms.
● When bpftrace exits all maps are printed
● See “Map Functions”:
○ count() - Count the number of times this function is called
○ sum(int n), avg(int n), min(int n), max(int n) - sum, average etc
○ stats(int n) - Return the count, average, and total for this value
○ hist(int n) - Produce a log2 histogram of values of n
○ lhist(int n, int min, int max, int step) - Produce a linear histogram of values of n
○ delete(@x[key]) - Delete the map element passed in as an argument
○ print(@x[, top [, div]]) - Print the map, optionally the top entries only and with a
divisor
○ clear(@x) - Delete all keys from the map
○ zero(@x) - Set all map values to zero
18
19. Study at least one-liner bpftrace examples
● https://github.com/iovisor/bpftrace/blob/master/docs/tutorial_one_liners.
md
● Listing probes that match a template:
bpftrace -l 'tracepoint:syscalls:sys_enter_*'
● Tracing file opens may look as follows:
bpftrace -e 'tracepoint:syscalls:sys_enter_openat
{ printf("%s %sn", comm, str(args->filename)); }'
● Syscall count by program:
bpftrace -e 'tracepoint:raw_syscalls:sys_enter { @[comm]
= count(); }'
● Read size distribution by process:
bpftrace -e 'tracepoint:syscalls:sys_exit_read { @[comm]
= hist(args->ret); }'
● More from Brendan Gregg (as of August 2019) on it is here
19
20. Check and use existing bpftrace programs
● They are in the tools subdirectory:
[root@fc31 tools]# ls
bashreadline.bt loads_example.txt syscount_example.txt
bashreadline_example.txt mdflush.bt tcpaccept.bt
biolatency.bt mdflush_example.txt tcpaccept_example.txt
biolatency_example.txt naptime.bt tcpconnect.bt
biosnoop.bt naptime_example.txt tcpconnect_example.txt
biosnoop_example.txt oomkill.bt tcpdrop.bt
biostacks.bt oomkill_example.txt tcpdrop_example.txt
biostacks_example.txt opensnoop.bt tcplife.bt
bitesize.bt opensnoop_example.txt tcplife_example.txt
bitesize_example.txt pidpersec.bt tcpretrans.bt
capable.bt pidpersec_example.txt tcpretrans_example.txt
capable_example.txt runqlat.bt tcpsynbl.bt
…
● Ready to use for ad hoc OS level tracing and monitoring
● Good examples on how to use kprobes and tracepoints, clean up everything,
use hist() and other built in functions
● See my blog post for a lot more details
20
21. Code review of biosnoop.bt
● File starts with shebang line and #include directive:
#!/usr/bin/env bpftrace
#include <linux/blkdev.h>
● Comments:
/*
* biosnoop.bt Block I/O tracing tool, showing per I/O latency.
* For Linux, uses bpftrace, eBPF.
*
* TODO: switch to block tracepoints. Add offset and size columns.*/
● BEGIN probe to print the header line:
BEGIN
{
printf("%-12s %-7s %-16s %-6s %7sn", "TIME(ms)", "DISK", "COMM",
"PID", "LAT(ms)");
}
21
23. Code review of biosnoop.bt (END probe etc)
● We do not want to print maps content at the end, as this is a monitoring tool
for interactive use
● So, we clear all the maps in the END probe:
END
{
clear(@start);
clear(@iopid);
clear(@iocomm);
clear(@disk);
}
● This approach is typical for interactive monitoring tools
● Other option (syscount.bt) is to collect the data and print at the end:
END {
printf("nTop 10 syscalls IDs:n");
print(@syscall, 10);
clear(@syscall); … }
23
24. www.percona.com
Adding a uprobe to MariaDB with bpftrace
● The idea is to add dynamic probes to capture SQL queries (and their
execution times)
● This was done first on Fedora 31, see my blog post for the details
● First I had to find out with gdb or code where is the query stored/passed
● I already know that it is in the third argument in this call:
dispatch_command(enum_server_command, THD*, char*, ...)
● Then it’s just as easy as follows (note the mangled function name):
[openxs@fc31 ~]$ sudo bpftrace -e '
uprobe:/home/openxs/dbs/maria10.5/bin/mariadbd:_Z16dispatch_com
mand19enum_server_commandP3THDPcjbb { @sql[tid] = str(arg2);
@start[tid] = nsecs; }
uretprobe:/home/openxs/dbs/maria10.5/bin/mariadbd:_Z16dispatch_
command19enum_server_commandP3THDPcjbb /@start[tid] != 0/ {
printf("%s : %u %u msn", @sql[tid], tid, (nsecs -
@start[tid])/1000000); } '
24
25. www.percona.com
Adding a uprobe to MariaDB with bpftrace
● We have queries captured with probe added on previous slide:
Attaching 2 probes...
select sleep(1) : 4029 1000 ms
: 4029 0 ms
select sleep(2) : 4281 2000 ms
: 4281 0 ms
select sleep(3) : 4283 3000 ms
: 4283 0 ms
select sleep(4) : 4282 4000 ms
: 4282 0 ms
^C
...
● We do not need to find addresses, understand the way parameters are
passed via CPU registers, and usually can access structure fields etc, but
studying the source code of the specific version is still essential
● Versions 0.11+ understands non-mangled C++ function signatures… (demo)
25
26. bpftrace as a code coverage testing tool (lame…)
● What if we try to add uprobe for every function in MariaDB server?
openxs@ao756:~$ sudo bpftrace -e
'uprobe:/home/openxs/dbs/maria10.6/bin/mariadbd:* { printf("%sn", func);
}'
ERROR: Can't attach to 34765 probes because it exceeds the current limit
of 512 probes.
You can increase the limit through the BPFTRACE_MAX_PROBES environment
variable, but BE CAREFUL since a high number of probes attached can cause
your system to crash.
openxs@ao756:~$ sudo BPFTRACE_MAX_PROBES=40000 bpftrace -e
'uprobe:/home/openxs/dbs/maria10.6/bin/mariadbd:* { printf("%sn", func);
}'
Attaching 34765 probes...
bpf: Failed to load program: Too many open files
processed 17 insns (limit 1000000) max_states_per_insn 0 total_states 1
peak_states 1 mark_read 0
...
ERROR: Error loading program:
uprobe:/home/openxs/dbs/maria10.6/bin/mariadbd:thd_client_ip (try -v)
Segmentation fault
... 26
27. bpftrace as a code coverage testing tool (PoC)
● What if we try try to trace less functions in MariaDB server? It works!
openxs@ao756:~$ sudo BPFTRACE_MAX_PROBES=40000 bpftrace -e
'uprobe:/home/openxs/dbs/maria10.6/bin/mariadbd:*do* { printf("%sn", func);
}'
Attaching 1076 probes...
ERROR: Offset outside the function bounds ('__do_global_dtors_aux' size is
0)
openxs@ao756:~$ sudo bpftrace -e
'uprobe:/home/openxs/dbs/maria10.6/bin/mariadbd:*command* { @cnt[func] += 1;
}'
Attaching 70 probes...
^C
@cnt[is_log_table_write_query(enum_sql_command)]: 2
@cnt[mysql_execute_command(THD*, bool)]: 4
...
@cnt[general_log_print(THD*, enum_server_command, char const*, ...)]: 6
@cnt[Opt_trace_start::Opt_trace_start(THD*, TABLE_LIST*, enum_sql_command,
List<set_var_base>*, char const*, unsigned long, charset_info_st const*)]: 8
@cnt[dispatch_command(enum_server_command, THD*, char*, unsigned int,
bool)]: 9
27
28. Getting stack traces with bpftrace
● See ustack() etc in the Reference Guide
● This is how we can use bpftrace as a poor man’s profiler:
sudo bpftrace -e 'profile:hz:99 /comm == "mariadbd"/
{printf("# %sn", ustack(perf));}' > /tmp/ustack.txt
● We get output like this by default (perf argument adds address etc):
...
mysqld_stmt_execute(THD*, char*, unsigned int)+37
dispatch_command(enum_server_command, THD*, char*,
unsigned int, bool, bool)+5123
do_command(THD*)+368
tp_callback(TP_connection*)+314
worker_main(void*)+160
start_thread+234
● See my recent blog post for more details on what you may want to do next :)
28
29. Tracing pthread_mutex_lock with bpftrace
● You can find more details in my recent blog post
● But basically we need to trace pthread_mutex_lock calls in the
libpthread.so.* and count different strack traces that led to them, then output
the summary:
[openxs@fc31 ~]$ ldd /home/openxs/dbs/maria10.5/bin/mariadbd | grep thread
libpthread.so.0 => /lib64/libpthread.so.0 (0x00007f3d957bf000)
[openxs@fc31 ~]$ sudo bpftrace -e
'uprobe:/lib64/libpthread.so.0:pthread_mutex_lock /comm == "mariadbd"/ {
@[ustack] = count(); }' > /tmp/bpfmutex.txt
^C
● Take care about the performance impact for tracing frequent events!
...
[ 10s ] thds: 32 tps: 658.05 qps: 13199.78 (r/w/o:
9246.09/2634.40/1319.30) lat (ms,95%): 227.40 err/s: 0.00 reconn/s: 0.00
[ 20s ] thds: 32 tps: 737.82 qps: 14752.19 (r/w/o:
10325.44/2951.30/1475.45) lat (ms,95%): 193.38 err/s: 0.00 reconn/s: 0.00
[ 30s ] thds: 32 tps: 451.18 qps: 9023.16 (r/w/o: 6316.56/1804.03/902.57)
lat (ms,95%): 320.17 err/s: 0.00 reconn/s: 0.00
[ 40s ] thds: 32 tps: 379.09 qps: 7585.24 (r/w/o: 5310.19/1516.87/758.18)
lat (ms,95%): 390.30 err/s: 0.00 reconn/s: 0.00
...
29
30. Tracing time spent in __lll_lock_wait with bpftrace
● It turned out that tracing uncontended pthread_mutex_lock calls may give
false alarms. So we changed the approach to measuring time spent waiting...
● Main parts of the new lll_lock_wait2.bt tool:
...
interval:s:$1 { exit(); }
uprobe:/lib64/libpthread.so.0:__lll_lock_wait
/comm == "mariadbd"/ {
@start[tid] = nsecs;
@tidstack[tid] = ustack(perf); }
uretprobe:/lib64/libpthread.so.0:__lll_lock_wait
/comm == "mariadbd" && @start[tid] != 0/ {
$now = nsecs;
$time = $now - @start[tid];
@futexstack[@tidstack[tid]] += $time;
print(@futexstack);
delete(@futexstack[@tidstack[tid]]);
delete(@start[tid]);
delete(@tidstack[tid]); }
● Call stacks and times spent are printed, they are summarized externally!
30
31. Tracing memory allocations with bpftrace
● You can trace and monitor almost anything wth bpftrace. For example, you
can easily figure out when (and where) and how large memory areas are
really allocated:
openxs@ao756:~$ sudo bpftrace -e
'uprobe:/lib/x86_64-linux-gnu/libc.so.6:malloc / comm == "mysqld" / {
printf("Allocating %d bytes in thread %u...n", arg0, tid); }'
Attaching 1 probe...
Allocating 32 bytes in thread 105713...
Allocating 32 bytes in thread 105713...
Allocating 40 bytes in thread 107448...
● Now when we run this:
select repeat('a',3000000);
● We see:
Allocating 3000040 bytes in thread 107448...
Allocating 3000048 bytes in thread 107448...
Allocating 40 bytes in thread 107448...
Allocating 16416 bytes in thread 107448...
Allocating 32 bytes in thread 105713...
^C
31
32. Tracing memory allocations with bpftrace (lame...)
● You can trace and monitor almost anything with bpftrace, but beware of lame
approaches that work, but with a cost that makes them impractical:
uprobe:/lib64/libc.so.6:malloc /comm == "mariadbd"/ {
@size[tid] += arg0;
/* printf("Allocating %d bytes in thread %u...n", arg0, tid); */
}
uretprobe:/lib64/libc.so.6:malloc /comm == "mariadbd" && @size[tid] > 0/
{
@memory[tid,retval] = @size[tid];
@stack[ustack(perf)] += @size[tid];
print(@stack);
clear(@stack);
delete(@size[tid]); }
uprobe:/lib64/libc.so.6:free / comm == "mariadbd" / {
delete(@memory[tid, arg0]);
/* printf("Freeing %p...n", arg0); */
}
...
32
33. Performance impact of pt-pmp vs perf vs bpftrace
● Consider sysbench (I/O bound) test on Q8300 @ 2.50GHz Fedora box:
sysbench /usr/local/share/sysbench/ oltp_point_select.lua
--mysql-host=127.0.0.1 --mysql-user=root --mysql-port=3306 --threads=12
--tables=4 --table-size=1000000 --time=60 --report-interval=5 run
● I’ve executed it without tracing and with the following (compatible?) data
collections working for same 60 seconds:
1. sudo pt-pmp --interval=1 --iterations=60 --pid=`pidof mysqld`
2. sudo perf record -F 99 -a -g -- sleep 60
[ perf record: Woken up 17 times to write data ]
[ perf record: Captured and wrote 5.464 MB perf.data (23260 samples) ]
3. sudo bpftrace -e 'profile:hz:99 { @[ustack] = count(); }' >
/tmp/bpftrace-stack.txt
[openxs@fc29 tmp]$ ls -l /tmp/bpftrace-stack.txt
-rw-rw-r--. 1 openxs openxs 2980460 Jan 29 12:24 /tmp/bpftrace-stack.txt
● Average QPS: 27272 | 15279 (56%) | 26780 (98.2%) | 27237 (99.87%)
33
34. www.percona.com
Problems of dynamic tracing with bpftrace
● root/sudo access is required
● Limit memory and CPU usage while in kernel context
● Do as much aggregations as possible in the probes (performance impact?)
● How to add dynamic probe to some line inside the function? (possible, probe
on offset within function, but had not tried myself yet)
● C++ (mangled names, class members, virtual member functions) and
access to complex structures (bpftrace needs headers), no stable “API”
● eBPF tools rely on recent Linux kernels (4.9+). Use perf for older versions!
● -fno-omit-frame-pointer must be used everywhere to see reasonable stack
traces
● -debuginfo packages, symbolic information for binaries?
● More tools to install (and maybe build from source, had to do this even on
Ubuntu 20.04)
● I had not (yet) used bpftrace for real life Support issues at customer side
(gdb and perf are standard tools for many customers already).
34
35. www.percona.com
References to bpftrace in MySQL, MariaDB and
Percona public bugs databases...
● MySQL (site://bugs.mysql.com bpftrace): zero hits! For perf - 288 hits
● MariaDB (site://jira.mariadb.org bpftrace):
○ MDEV-24316 - on migration pre-check tool, “bpftrace script that hooks mysql and
counts when incompatible functions are used”, Daniel Black
○ MDEV-20931 - on the lack of information about shutdown progress, “...no way to
find this out without OS level tools like gdb, perf or bpftrace”, yours truly…
○ MDEV-19552 - on DROP TABLE locking SHOW etc, incomplete. “Could you try
perf or BPF trace + https://github.com/brendangregg/FlameGraph?”, Eugene
Kosov
○ MDEV-23326 - Aria TRANSACTIONAL=1 significantly slow on timezone
initialisation, in review already. Daniel Black
bpftrace -e 'tracepoint:syscalls:sys_enter_fdatasync {
@start[args->fd] = nsecs; @fd = args->fd}
tracepoint:syscalls:sys_exit_fdatasync { @us[ustack, @fd] =
hist((nsecs - @start[@fd]) / 1000); delete(@start[@fd]) } ' -p
331087
● Percona (site://jira.percona.com bpftrace): zero hits! For perf - 147 hits
35
36. www.percona.com
Am I crazy trying these and suggesting to DBAs?
● Quite possible, maybe I just have too much free time :)
● Or maybe I do not know how to use Performance Schema properly :)
● But I am not alone… Markos Albe also speaks about bpftrace-based tools,
MariaDB engineers starts to “think” in terms of bpftrace...
● Dynamic tracers are proven tools for instrumenting OS calls (probes for
measuring I/O latency at microsecond precision, for example)
● Dynamic tracing of RDBMS userspace is a topic of growing interest, with a
lot of RAM and workloads that are often CPU-bound these days.
● For open source RDBMS like MariaDB there is no good reason NOT to try to
use dynamic probes (at least while Performance Schema instrumentations
are not on every other line of the code :)
● eBPF with bpftrace makes it easier (to some extent) and safer to do this in
production
36