The document provides an overview of how to read and understand garbage collection (GC) log lines from different Java vendors and JVM versions. It begins by explaining the parts of a basic GC log line for the OpenJDK GC log format. It then discusses GC log lines for G1 GC and CMS GC in more detail. Finally, it shares examples of GC log formats from IBM JVMs and different levels of information provided. The document aims to help readers learn to correctly interpret GC logs and analyze GC behavior.
Shenandoah GC: Java Without The Garbage Collection Hiccups (Christine Flood)Red Hat Developers
Just like a spoon full of sugar will cure your hiccups, running your JVM with -XX:+UseShenandoahGC will cure your Java garbage collection hiccups. Shenandoah GC is a new garbage collector algorithm developed for OpenJDK at Red Hat, which will produce much better pause times than the currently-available algorithms without a significant decrease in throughput. In this session, we'll explain how Shenandoah works and compare it to the currently-available OpenJDK garbage collectors.
Talk for PerconaLive 2016 by Brendan Gregg. Video: https://www.youtube.com/watch?v=CbmEDXq7es0 . "Systems performance provides a different perspective for analysis and tuning, and can help you find performance wins for your databases, applications, and the kernel. However, most of us are not performance or kernel engineers, and have limited time to study this topic. This talk summarizes six important areas of Linux systems performance in 50 minutes: observability tools, methodologies, benchmarking, profiling, tracing, and tuning. Included are recipes for Linux performance analysis and tuning (using vmstat, mpstat, iostat, etc), overviews of complex areas including profiling (perf_events), static tracing (tracepoints), and dynamic tracing (kprobes, uprobes), and much advice about what is and isn't important to learn. This talk is aimed at everyone: DBAs, developers, operations, etc, and in any environment running Linux, bare-metal or the cloud."
Video: https://www.facebook.com/atscaleevents/videos/1693888610884236/ . Talk by Brendan Gregg from Facebook's Performance @Scale: "Linux performance analysis has been the domain of ancient tools and metrics, but that's now changing in the Linux 4.x series. A new tracer is available in the mainline kernel, built from dynamic tracing (kprobes, uprobes) and enhanced BPF (Berkeley Packet Filter), aka, eBPF. It allows us to measure latency distributions for file system I/O and run queue latency, print details of storage device I/O and TCP retransmits, investigate blocked stack traces and memory leaks, and a whole lot more. These lead to performance wins large and small, especially when instrumenting areas that previously had zero visibility. This talk will summarize this new technology and some long-standing issues that it can solve, and how we intend to use it at Netflix."
Shared Memory Performance: Beyond TCP/IP with Ben Cotton, JPMorganHazelcast
In this webinar you’ll learn the importance of Advanced Data Locality and Data IPC Transports with respect to Java distributed cache data grids. This is information that is super-crucial to the HPC Linux supercomputing community. The presenter will show how by using native /dev/shm as an IPC transport we can achieve latencies 1,000x faster than TCP/IP.
We’ll cover the following topics:
-Why going off-heap is fundamental to meeting real-time SLAs
-Why traditional grid transports (TCP/IP) are not good enough for the HPC Linux supercomputing community
-Why we need something more than TCP/IP
-Live Q&A Session
Presenter:
Ben Cotton, Consultant at JPMorgan
Ben is has been an IT Consultant to Financial Services Industry for nearly 20 years. His specializations lie in open source, transactions, caching, datagrids, fixed income & derivatives trading systems.
Ben is active in the following communities:
Java Community Process Member
JSR-156 expert group: Java XML Transactions API
JSR-107 expert group: Java Caching API
JSR-347 expert group: Java Data Grids API
RedHat Community Member (Fedora, Infinispan, JBoss XTS) Code Contributor
Performance Analysis: new tools and concepts from the cloudBrendan Gregg
Talk delivered at SCaLE10x, Los Angeles 2012.
Cloud Computing introduces new challenges for performance
analysis, for both customers and operators of the cloud. Apart from
monitoring a scaling environment, issues within a system can be
complicated when tenants are competing for the same resources, and are
invisible to each other. Other factors include rapidly changing
production code and wildly unpredictable traffic surges. For
performance analysis in the Joyent public cloud, we use a variety of
tools including Dynamic Tracing, which allows us to create custom
tools and metrics and to explore new concepts. In this presentation
I'll discuss a collection of these tools and the metrics that they
measure. While these are DTrace-based, the focus of the talk is on
which metrics are proving useful for analyzing real cloud issues.
Shenandoah GC: Java Without The Garbage Collection Hiccups (Christine Flood)Red Hat Developers
Just like a spoon full of sugar will cure your hiccups, running your JVM with -XX:+UseShenandoahGC will cure your Java garbage collection hiccups. Shenandoah GC is a new garbage collector algorithm developed for OpenJDK at Red Hat, which will produce much better pause times than the currently-available algorithms without a significant decrease in throughput. In this session, we'll explain how Shenandoah works and compare it to the currently-available OpenJDK garbage collectors.
Talk for PerconaLive 2016 by Brendan Gregg. Video: https://www.youtube.com/watch?v=CbmEDXq7es0 . "Systems performance provides a different perspective for analysis and tuning, and can help you find performance wins for your databases, applications, and the kernel. However, most of us are not performance or kernel engineers, and have limited time to study this topic. This talk summarizes six important areas of Linux systems performance in 50 minutes: observability tools, methodologies, benchmarking, profiling, tracing, and tuning. Included are recipes for Linux performance analysis and tuning (using vmstat, mpstat, iostat, etc), overviews of complex areas including profiling (perf_events), static tracing (tracepoints), and dynamic tracing (kprobes, uprobes), and much advice about what is and isn't important to learn. This talk is aimed at everyone: DBAs, developers, operations, etc, and in any environment running Linux, bare-metal or the cloud."
Video: https://www.facebook.com/atscaleevents/videos/1693888610884236/ . Talk by Brendan Gregg from Facebook's Performance @Scale: "Linux performance analysis has been the domain of ancient tools and metrics, but that's now changing in the Linux 4.x series. A new tracer is available in the mainline kernel, built from dynamic tracing (kprobes, uprobes) and enhanced BPF (Berkeley Packet Filter), aka, eBPF. It allows us to measure latency distributions for file system I/O and run queue latency, print details of storage device I/O and TCP retransmits, investigate blocked stack traces and memory leaks, and a whole lot more. These lead to performance wins large and small, especially when instrumenting areas that previously had zero visibility. This talk will summarize this new technology and some long-standing issues that it can solve, and how we intend to use it at Netflix."
Shared Memory Performance: Beyond TCP/IP with Ben Cotton, JPMorganHazelcast
In this webinar you’ll learn the importance of Advanced Data Locality and Data IPC Transports with respect to Java distributed cache data grids. This is information that is super-crucial to the HPC Linux supercomputing community. The presenter will show how by using native /dev/shm as an IPC transport we can achieve latencies 1,000x faster than TCP/IP.
We’ll cover the following topics:
-Why going off-heap is fundamental to meeting real-time SLAs
-Why traditional grid transports (TCP/IP) are not good enough for the HPC Linux supercomputing community
-Why we need something more than TCP/IP
-Live Q&A Session
Presenter:
Ben Cotton, Consultant at JPMorgan
Ben is has been an IT Consultant to Financial Services Industry for nearly 20 years. His specializations lie in open source, transactions, caching, datagrids, fixed income & derivatives trading systems.
Ben is active in the following communities:
Java Community Process Member
JSR-156 expert group: Java XML Transactions API
JSR-107 expert group: Java Caching API
JSR-347 expert group: Java Data Grids API
RedHat Community Member (Fedora, Infinispan, JBoss XTS) Code Contributor
Performance Analysis: new tools and concepts from the cloudBrendan Gregg
Talk delivered at SCaLE10x, Los Angeles 2012.
Cloud Computing introduces new challenges for performance
analysis, for both customers and operators of the cloud. Apart from
monitoring a scaling environment, issues within a system can be
complicated when tenants are competing for the same resources, and are
invisible to each other. Other factors include rapidly changing
production code and wildly unpredictable traffic surges. For
performance analysis in the Joyent public cloud, we use a variety of
tools including Dynamic Tracing, which allows us to create custom
tools and metrics and to explore new concepts. In this presentation
I'll discuss a collection of these tools and the metrics that they
measure. While these are DTrace-based, the focus of the talk is on
which metrics are proving useful for analyzing real cloud issues.
This talk is about a new interface to get information about processes, called task_diag, which we developed.
Currently /proc file system is used to get information about the processes running on the system. All information are presented as text files, which is convenient for humans, but not for programs such as ps and top. This incurs significant delays, especially on a systems with lots of containers running, which is frequently the case nowdays.
Ideally, tools such top and ps would get information in binary format, and use flexible means to specify which kinds of information and for which tasks is required. Presented is a new interface with all these features, called task_diag.
task_diag is based on netlink sockets and looks like socket-diag, which is used to get information about sockets. It uses the request-response model. An request specifies a set of processes and required properties for them. A response contains requested information and can be divided into a few netlink packets if it's too long.
The task diag is much faster than the /proc file system. For example, when reading from /proc, ps opens, reads, and closes many files -- and iterates this for every single processes. With task_diag, it's just sending a request and getting a response.
Except for ps and top, the proposed interface is to be used by CRIU, a containers checkpoint/restore and live migration mechanism. Also, developers of perf tool found that it can be useful to them and implemented a prototype which show a big performance improvements in case of using task_diag instead of procfs.
Our performance measurements show that the ps tool works at least four times faster if task_diag is used instead of procfs.
Video: https://www.youtube.com/watch?v=uibLwoVKjec . Talk by Brendan Gregg for Sysdig CCWFS 2016. Abstract:
"You have a system with an advanced programmatic tracer: do you know what to do with it? Brendan has used numerous tracers in production environments, and has published hundreds of tracing-based tools. In this talk he will share tips and know-how for creating CLI tracing tools and GUI visualizations, to solve real problems effectively. Programmatic tracing is an amazing superpower, and this talk will show you how to wield it!"
Linux 4.x Tracing Tools: Using BPF SuperpowersBrendan Gregg
Talk for USENIX LISA 2016 by Brendan Gregg.
"Linux 4.x Tracing Tools: Using BPF Superpowers
The Linux 4.x series heralds a new era of Linux performance analysis, with the long-awaited integration of a programmable tracer: Enhanced BPF (eBPF). Formally the Berkeley Packet Filter, BPF has been enhanced in Linux to provide system tracing capabilities, and integrates with dynamic tracing (kprobes and uprobes) and static tracing (tracepoints and USDT). This has allowed dozens of new observability tools to be developed so far: for example, measuring latency distributions for file system I/O and run queue latency, printing details of storage device I/O and TCP retransmits, investigating blocked stack traces and memory leaks, and a whole lot more. These lead to performance wins large and small, especially when instrumenting areas that previously had zero visibility. Tracing superpowers have finally arrived.
In this talk I'll show you how to use BPF in the Linux 4.x series, and I'll summarize the different tools and front ends available, with a focus on iovisor bcc. bcc is an open source project to provide a Python front end for BPF, and comes with dozens of new observability tools (many of which I developed). These tools include new BPF versions of old classics, and many new tools, including: execsnoop, opensnoop, funccount, trace, biosnoop, bitesize, ext4slower, ext4dist, tcpconnect, tcpretrans, runqlat, offcputime, offwaketime, and many more. I'll also summarize use cases and some long-standing issues that can now be solved, and how we are using these capabilities at Netflix."
Video: https://www.youtube.com/watch?v=JRFNIKUROPE . Talk for linux.conf.au 2017 (LCA2017) by Brendan Gregg, about Linux enhanced BPF (eBPF). Abstract:
A world of new capabilities is emerging for the Linux 4.x series, thanks to enhancements that have been included in Linux for to Berkeley Packet Filter (BPF): an in-kernel virtual machine that can execute user space-defined programs. It is finding uses for security auditing and enforcement, enhancing networking (including eXpress Data Path), and performance observability and troubleshooting. Many new open source tools that have been written in the past 12 months for performance analysis that use BPF. Tracing superpowers have finally arrived for Linux!
For its use with tracing, BPF provides the programmable capabilities to the existing tracing frameworks: kprobes, uprobes, and tracepoints. In particular, BPF allows timestamps to be recorded and compared from custom events, allowing latency to be studied in many new places: kernel and application internals. It also allows data to be efficiently summarized in-kernel, including as histograms. This has allowed dozens of new observability tools to be developed so far, including measuring latency distributions for file system I/O and run queue latency, printing details of storage device I/O and TCP retransmits, investigating blocked stack traces and memory leaks, and a whole lot more.
This talk will summarize BPF capabilities and use cases so far, and then focus on its use to enhance Linux tracing, especially with the open source bcc collection. bcc includes BPF versions of old classics, and many new tools, including execsnoop, opensnoop, funcccount, ext4slower, and more (many of which I developed). Perhaps you'd like to develop new tools, or use the existing tools to find performance wins large and small, especially when instrumenting areas that previously had zero visibility. I'll also summarize how we intend to use these new capabilities to enhance systems analysis at Netflix.
Velocity 2017 Performance analysis superpowers with Linux eBPFBrendan Gregg
Talk by for Velocity 2017 by Brendan Gregg: Performance analysis superpowers with Linux eBPF.
"Advanced performance observability and debugging have arrived built into the Linux 4.x series, thanks to enhancements to Berkeley Packet Filter (BPF, or eBPF) and the repurposing of its sandboxed virtual machine to provide programmatic capabilities to system tracing. Netflix has been investigating its use for new observability tools, monitoring, security uses, and more. This talk will investigate this new technology, which sooner or later will be available to everyone who uses Linux. The talk will dive deep on these new tracing, observability, and debugging capabilities. Whether you’re doing analysis over an ssh session, or via a monitoring GUI, BPF can be used to provide an efficient, custom, and deep level of detail into system and application performance.
This talk will also demonstrate the new open source tools that have been developed, which make use of kernel- and user-level dynamic tracing (kprobes and uprobes), and kernel- and user-level static tracing (tracepoints). These tools provide new insights for file system and storage performance, CPU scheduler performance, TCP performance, and a whole lot more. This is a major turning point for Linux systems engineering, as custom advanced performance instrumentation can be used safely in production environments, powering a new generation of tools and visualizations."
Мастер-класс "Логическая репликация и Avito" / Константин Евтеев, Михаил Тюр...Ontico
HighLoad++ 2017
Зал «Кейптаун», 7 ноября, 12:00
Тезисы:
http://www.highload.ru/2017/abstracts/2868.html
В Avito объявления хранятся в базах данных Postgres. При этом уже на протяжении многих лет активно применяется логическая репликация. С помощью неё успешно решаются вопросы роста объема данных и количества запросов к ним, масштабирования и распределения нагрузки, доставки данных в DWH и поисковые подсистемы, меж-базные и меж-сервисные синхронизации данных и пр.
...
USENIX ATC 2017: Visualizing Performance with Flame GraphsBrendan Gregg
Talk by Brendan Gregg for USENIX ATC 2017.
"Flame graphs are a simple stack trace visualization that helps answer an everyday problem: how is software consuming resources, especially CPUs, and how did this change since the last software version? Flame graphs have been adopted by many languages, products, and companies, including Netflix, and have become a standard tool for performance analysis. They were published in "The Flame Graph" article in the June 2016 issue of Communications of the ACM, by their creator, Brendan Gregg.
This talk describes the background for this work, and the challenges encountered when profiling stack traces and resolving symbols for different languages, including for just-in-time compiler runtimes. Instructions will be included generating mixed-mode flame graphs on Linux, and examples from our use at Netflix with Java. Advanced flame graph types will be described, including differential, off-CPU, chain graphs, memory, and TCP events. Finally, future work and unsolved problems in this area will be discussed."
Talk for SCaLE13x. Video: https://www.youtube.com/watch?v=_Ik8oiQvWgo . Profiling can show what your Linux kernel and appliacations are doing in detail, across all software stack layers. This talk shows how we are using Linux perf_events (aka "perf") and flame graphs at Netflix to understand CPU usage in detail, to optimize our cloud usage, solve performance issues, and identify regressions. This will be more than just an intro: profiling difficult targets, including Java and Node.js, will be covered, which includes ways to resolve JITed symbols and broken stacks. Included are the easy examples, the hard, and the cutting edge.
Debugging Distributed Systems - Velocity Santa Clara 2016Donny Nadolny
Despite our best efforts, our systems fail. Sometimes it’s our fault—code that we wrote, bugs that we caused. But sometimes the fault is with systems that we have no direct control over. Distributed systems are hard. They are complicated, hard to understand, and very challenging to manage. But they are critical to modern software, and when they have problems, we need to fix them.
ZooKeeper is a very useful distributed system that is often used as a building block for other distributed systems like Kafka and Spark. It is used by PagerDuty for many critical systems, and for five months it failed a lot. Donny Nadolny looks at what it takes to debug a problem in a distributed system like ZooKeeper, walking attendees through the process of finding and fixing one cause of many of these failures. Donny explains how to use various tools to stress test the network, some intricate details of how ZooKeeper works, and possibly more than you will want to know about TCP, including an example of machines having a different view of the state of a TCP stream.
http://conferences.oreilly.com/velocity/devops-web-performance-ca/public/schedule/detail/50058
Video: https://www.youtube.com/watch?v=FJW8nGV4jxY and https://www.youtube.com/watch?v=zrr2nUln9Kk . Tutorial slides for O'Reilly Velocity SC 2015, by Brendan Gregg.
There are many performance tools nowadays for Linux, but how do they all fit together, and when do we use them? This tutorial explains methodologies for using these tools, and provides a tour of four tool types: observability, benchmarking, tuning, and static tuning. Many tools will be discussed, including top, iostat, tcpdump, sar, perf_events, ftrace, SystemTap, sysdig, and others, as well observability frameworks in the Linux kernel: PMCs, tracepoints, kprobes, and uprobes.
This tutorial is updated and extended on an earlier talk that summarizes the Linux performance tool landscape. The value of this tutorial is not just learning that these tools exist and what they do, but hearing when and how they are used by a performance engineer to solve real world problems — important context that is typically not included in the standard documentation.
G1 Garbage Collector: Details and TuningSimone Bordet
The G1 Garbage Collector is the low-pause replacement for CMS, available since JDK 7u4. This session will explore in details how the G1 garbage collector works (from region layout, to remembered sets, to refinement threads, etc.), what are its current weaknesses, what command line options you should enable when you use it, and advanced tuning examples extracted from real world applications.
This talk is about a new interface to get information about processes, called task_diag, which we developed.
Currently /proc file system is used to get information about the processes running on the system. All information are presented as text files, which is convenient for humans, but not for programs such as ps and top. This incurs significant delays, especially on a systems with lots of containers running, which is frequently the case nowdays.
Ideally, tools such top and ps would get information in binary format, and use flexible means to specify which kinds of information and for which tasks is required. Presented is a new interface with all these features, called task_diag.
task_diag is based on netlink sockets and looks like socket-diag, which is used to get information about sockets. It uses the request-response model. An request specifies a set of processes and required properties for them. A response contains requested information and can be divided into a few netlink packets if it's too long.
The task diag is much faster than the /proc file system. For example, when reading from /proc, ps opens, reads, and closes many files -- and iterates this for every single processes. With task_diag, it's just sending a request and getting a response.
Except for ps and top, the proposed interface is to be used by CRIU, a containers checkpoint/restore and live migration mechanism. Also, developers of perf tool found that it can be useful to them and implemented a prototype which show a big performance improvements in case of using task_diag instead of procfs.
Our performance measurements show that the ps tool works at least four times faster if task_diag is used instead of procfs.
Video: https://www.youtube.com/watch?v=uibLwoVKjec . Talk by Brendan Gregg for Sysdig CCWFS 2016. Abstract:
"You have a system with an advanced programmatic tracer: do you know what to do with it? Brendan has used numerous tracers in production environments, and has published hundreds of tracing-based tools. In this talk he will share tips and know-how for creating CLI tracing tools and GUI visualizations, to solve real problems effectively. Programmatic tracing is an amazing superpower, and this talk will show you how to wield it!"
Linux 4.x Tracing Tools: Using BPF SuperpowersBrendan Gregg
Talk for USENIX LISA 2016 by Brendan Gregg.
"Linux 4.x Tracing Tools: Using BPF Superpowers
The Linux 4.x series heralds a new era of Linux performance analysis, with the long-awaited integration of a programmable tracer: Enhanced BPF (eBPF). Formally the Berkeley Packet Filter, BPF has been enhanced in Linux to provide system tracing capabilities, and integrates with dynamic tracing (kprobes and uprobes) and static tracing (tracepoints and USDT). This has allowed dozens of new observability tools to be developed so far: for example, measuring latency distributions for file system I/O and run queue latency, printing details of storage device I/O and TCP retransmits, investigating blocked stack traces and memory leaks, and a whole lot more. These lead to performance wins large and small, especially when instrumenting areas that previously had zero visibility. Tracing superpowers have finally arrived.
In this talk I'll show you how to use BPF in the Linux 4.x series, and I'll summarize the different tools and front ends available, with a focus on iovisor bcc. bcc is an open source project to provide a Python front end for BPF, and comes with dozens of new observability tools (many of which I developed). These tools include new BPF versions of old classics, and many new tools, including: execsnoop, opensnoop, funccount, trace, biosnoop, bitesize, ext4slower, ext4dist, tcpconnect, tcpretrans, runqlat, offcputime, offwaketime, and many more. I'll also summarize use cases and some long-standing issues that can now be solved, and how we are using these capabilities at Netflix."
Video: https://www.youtube.com/watch?v=JRFNIKUROPE . Talk for linux.conf.au 2017 (LCA2017) by Brendan Gregg, about Linux enhanced BPF (eBPF). Abstract:
A world of new capabilities is emerging for the Linux 4.x series, thanks to enhancements that have been included in Linux for to Berkeley Packet Filter (BPF): an in-kernel virtual machine that can execute user space-defined programs. It is finding uses for security auditing and enforcement, enhancing networking (including eXpress Data Path), and performance observability and troubleshooting. Many new open source tools that have been written in the past 12 months for performance analysis that use BPF. Tracing superpowers have finally arrived for Linux!
For its use with tracing, BPF provides the programmable capabilities to the existing tracing frameworks: kprobes, uprobes, and tracepoints. In particular, BPF allows timestamps to be recorded and compared from custom events, allowing latency to be studied in many new places: kernel and application internals. It also allows data to be efficiently summarized in-kernel, including as histograms. This has allowed dozens of new observability tools to be developed so far, including measuring latency distributions for file system I/O and run queue latency, printing details of storage device I/O and TCP retransmits, investigating blocked stack traces and memory leaks, and a whole lot more.
This talk will summarize BPF capabilities and use cases so far, and then focus on its use to enhance Linux tracing, especially with the open source bcc collection. bcc includes BPF versions of old classics, and many new tools, including execsnoop, opensnoop, funcccount, ext4slower, and more (many of which I developed). Perhaps you'd like to develop new tools, or use the existing tools to find performance wins large and small, especially when instrumenting areas that previously had zero visibility. I'll also summarize how we intend to use these new capabilities to enhance systems analysis at Netflix.
Velocity 2017 Performance analysis superpowers with Linux eBPFBrendan Gregg
Talk by for Velocity 2017 by Brendan Gregg: Performance analysis superpowers with Linux eBPF.
"Advanced performance observability and debugging have arrived built into the Linux 4.x series, thanks to enhancements to Berkeley Packet Filter (BPF, or eBPF) and the repurposing of its sandboxed virtual machine to provide programmatic capabilities to system tracing. Netflix has been investigating its use for new observability tools, monitoring, security uses, and more. This talk will investigate this new technology, which sooner or later will be available to everyone who uses Linux. The talk will dive deep on these new tracing, observability, and debugging capabilities. Whether you’re doing analysis over an ssh session, or via a monitoring GUI, BPF can be used to provide an efficient, custom, and deep level of detail into system and application performance.
This talk will also demonstrate the new open source tools that have been developed, which make use of kernel- and user-level dynamic tracing (kprobes and uprobes), and kernel- and user-level static tracing (tracepoints). These tools provide new insights for file system and storage performance, CPU scheduler performance, TCP performance, and a whole lot more. This is a major turning point for Linux systems engineering, as custom advanced performance instrumentation can be used safely in production environments, powering a new generation of tools and visualizations."
Мастер-класс "Логическая репликация и Avito" / Константин Евтеев, Михаил Тюр...Ontico
HighLoad++ 2017
Зал «Кейптаун», 7 ноября, 12:00
Тезисы:
http://www.highload.ru/2017/abstracts/2868.html
В Avito объявления хранятся в базах данных Postgres. При этом уже на протяжении многих лет активно применяется логическая репликация. С помощью неё успешно решаются вопросы роста объема данных и количества запросов к ним, масштабирования и распределения нагрузки, доставки данных в DWH и поисковые подсистемы, меж-базные и меж-сервисные синхронизации данных и пр.
...
USENIX ATC 2017: Visualizing Performance with Flame GraphsBrendan Gregg
Talk by Brendan Gregg for USENIX ATC 2017.
"Flame graphs are a simple stack trace visualization that helps answer an everyday problem: how is software consuming resources, especially CPUs, and how did this change since the last software version? Flame graphs have been adopted by many languages, products, and companies, including Netflix, and have become a standard tool for performance analysis. They were published in "The Flame Graph" article in the June 2016 issue of Communications of the ACM, by their creator, Brendan Gregg.
This talk describes the background for this work, and the challenges encountered when profiling stack traces and resolving symbols for different languages, including for just-in-time compiler runtimes. Instructions will be included generating mixed-mode flame graphs on Linux, and examples from our use at Netflix with Java. Advanced flame graph types will be described, including differential, off-CPU, chain graphs, memory, and TCP events. Finally, future work and unsolved problems in this area will be discussed."
Talk for SCaLE13x. Video: https://www.youtube.com/watch?v=_Ik8oiQvWgo . Profiling can show what your Linux kernel and appliacations are doing in detail, across all software stack layers. This talk shows how we are using Linux perf_events (aka "perf") and flame graphs at Netflix to understand CPU usage in detail, to optimize our cloud usage, solve performance issues, and identify regressions. This will be more than just an intro: profiling difficult targets, including Java and Node.js, will be covered, which includes ways to resolve JITed symbols and broken stacks. Included are the easy examples, the hard, and the cutting edge.
Debugging Distributed Systems - Velocity Santa Clara 2016Donny Nadolny
Despite our best efforts, our systems fail. Sometimes it’s our fault—code that we wrote, bugs that we caused. But sometimes the fault is with systems that we have no direct control over. Distributed systems are hard. They are complicated, hard to understand, and very challenging to manage. But they are critical to modern software, and when they have problems, we need to fix them.
ZooKeeper is a very useful distributed system that is often used as a building block for other distributed systems like Kafka and Spark. It is used by PagerDuty for many critical systems, and for five months it failed a lot. Donny Nadolny looks at what it takes to debug a problem in a distributed system like ZooKeeper, walking attendees through the process of finding and fixing one cause of many of these failures. Donny explains how to use various tools to stress test the network, some intricate details of how ZooKeeper works, and possibly more than you will want to know about TCP, including an example of machines having a different view of the state of a TCP stream.
http://conferences.oreilly.com/velocity/devops-web-performance-ca/public/schedule/detail/50058
Video: https://www.youtube.com/watch?v=FJW8nGV4jxY and https://www.youtube.com/watch?v=zrr2nUln9Kk . Tutorial slides for O'Reilly Velocity SC 2015, by Brendan Gregg.
There are many performance tools nowadays for Linux, but how do they all fit together, and when do we use them? This tutorial explains methodologies for using these tools, and provides a tour of four tool types: observability, benchmarking, tuning, and static tuning. Many tools will be discussed, including top, iostat, tcpdump, sar, perf_events, ftrace, SystemTap, sysdig, and others, as well observability frameworks in the Linux kernel: PMCs, tracepoints, kprobes, and uprobes.
This tutorial is updated and extended on an earlier talk that summarizes the Linux performance tool landscape. The value of this tutorial is not just learning that these tools exist and what they do, but hearing when and how they are used by a performance engineer to solve real world problems — important context that is typically not included in the standard documentation.
G1 Garbage Collector: Details and TuningSimone Bordet
The G1 Garbage Collector is the low-pause replacement for CMS, available since JDK 7u4. This session will explore in details how the G1 garbage collector works (from region layout, to remembered sets, to refinement threads, etc.), what are its current weaknesses, what command line options you should enable when you use it, and advanced tuning examples extracted from real world applications.
A 2015 performance study by Brendan Gregg, Nitesh Kant, and Ben Christensen. Original is in https://github.com/Netflix-Skunkworks/WSPerfLab/tree/master/test-results
ISO SQL:2016 introduced Row Pattern Matching: a feature to apply (limited) regular expressions on table rows and perform analysis on each match. As of writing, this feature is only supported by the Oracle Database 12c.
SREcon 2016 Performance Checklists for SREsBrendan Gregg
Talk from SREcon2016 by Brendan Gregg. Video: https://www.usenix.org/conference/srecon16/program/presentation/gregg . "There's limited time for performance analysis in the emergency room. When there is a performance-related site outage, the SRE team must analyze and solve complex performance issues as quickly as possible, and under pressure. Many performance tools and techniques are designed for a different environment: an engineer analyzing their system over the course of hours or days, and given time to try dozens of tools: profilers, tracers, monitoring tools, benchmarks, as well as different tunings and configurations. But when Netflix is down, minutes matter, and there's little time for such traditional systems analysis. As with aviation emergencies, short checklists and quick procedures can be applied by the on-call SRE staff to help solve performance issues as quickly as possible.
In this talk, I'll cover a checklist for Linux performance analysis in 60 seconds, as well as other methodology-derived checklists and procedures for cloud computing, with examples of performance issues for context. Whether you are solving crises in the SRE war room, or just have limited time for performance engineering, these checklists and approaches should help you find some quick performance wins. Safe flying."
Talk for AWS re:Invent 2014. Video: https://www.youtube.com/watch?v=7Cyd22kOqWc . Netflix tunes Amazon EC2 instances for maximum performance. In this session, you learn how Netflix configures the fastest possible EC2 instances, while reducing latency outliers. This session explores the various Xen modes (e.g., HVM, PV, etc.) and how they are optimized for different workloads. Hear how Netflix chooses Linux kernel versions based on desired performance characteristics and receive a firsthand look at how they set kernel tunables, including hugepages. You also hear about Netflix’s use of SR-IOV to enable enhanced networking and their approach to observability, which can exonerate EC2 issues and direct attention back to application performance.
Delivered as plenary at USENIX LISA 2013. video here: https://www.youtube.com/watch?v=nZfNehCzGdw and https://www.usenix.org/conference/lisa13/technical-sessions/plenary/gregg . "How did we ever analyze performance before Flame Graphs?" This new visualization invented by Brendan can help you quickly understand application and kernel performance, especially CPU usage, where stacks (call graphs) can be sampled and then visualized as an interactive flame graph. Flame Graphs are now used for a growing variety of targets: for applications and kernels on Linux, SmartOS, Mac OS X, and Windows; for languages including C, C++, node.js, ruby, and Lua; and in WebKit Web Inspector. This talk will explain them and provide use cases and new visualizations for other event types, including I/O, memory usage, and latency.
Are you building high throughput, low latency application? Are you trying to figure out perfect JVM heap size? Are you struggling to choose right garbage collection algorithm and settings? Are you striving to achieve pause less GC? Do you know the right tools & best practices to tame the GC? Do you know to troubleshoot memory problems using GC logs? You will get complete answers to several such questions in this presentation.
There are at least 40 to 50 different formats of GC logs. Here, we explained the commonly used GC log formats, tricks, patterns and tools to analyze them effectively.
Tier1app CEO & Founder, Ram Lakshmanan, spoke at All Day Devops 2017 about Java GC Logs. In this presentation, you can learn how to enable Java GC logs, commonly used GC log formats, tricks, patterns and tools to analyze them effectively.
Are you building high throughput, low latency application? Are you trying to figure out perfect JVM heap size? Are you struggling to choose right garbage collection algorithm and settings? Are you striving to achieve pause less GC? Do you know the right tools & best practices to tame the GC? Do you know to troubleshoot memory problems using GC logs? You will get complete answers to several such questions in this PPT.
Java Garbage Collectors – Moving to Java7 Garbage First (G1) CollectorGurpreet Sachdeva
One of the key strengths of JVM is automatic memory management (Garbage Collection). Its understanding can help in writing better applications. This becomes all the more important as enterprise server applications have large amount of live heap data and significant parallel threads. Until recently, main collectors were parallel collector and concurrent-mark-sweep (CMS) collector. This presentation introduces the various Garbage Collectors and compares the CMS collector against its replacement, a new implementation in Java7 i.e. G1. It is characterized by a single contiguous heap which is split into same-sized regions. In fact if your application is still running on the 1.5 or 1.6 JVM, a compelling argument to upgrade to Java 7 is to leverage G1.
G1 Garbage Collector - Big Heaps and Low Pauses?C2B2 Consulting
Devoxx 2012 talk by Jaromir Hamala, C2B2 Senior Consultant.
The Garbage-First (G1) is the latest garbage collector in the JVM, aiming to be the long-term replacement for CMS. Targeted for machines with large memories and multiple processors. Promising low and more predictable pause times while achieving high throughput.
The session will introduce the architecture and design of G1. Then the main focus of the talk will be the performance characteristics observed under different loads; tuning capabilities and common pitfalls. With the aim of answering the question can you run big heaps and achieve low pauses.
Hadoop Meetup Jan 2019 - Dynamometer and a Case Study in NameNode GCErik Krogen
Erik Krogen of LinkedIn presents regarding Dynamometer, a system open sourced by LinkedIn for scale- and performance-testing HDFS. He discusses one major use case for Dynamometer, tuning NameNode GC, and discusses characteristics of NameNode GC such as why it is important, and how it interacts with various current and future GC algorithms.
This is taken from the Apache Hadoop Contributors Meetup on January 30, hosted by LinkedIn in Mountain View.
In Java 9, Garbage First Garbage Collector (G1 GC) will be the default GC. This presentation makes an effort to help Hotspot VM users to understand the concept of G1 GC as well as provides some tuning advice.
This presentation was given to the system adminstration team to give them an idea of how GC works and what to look for when there is abottleneck and troubles.
Garbage First Garbage Collector (G1 GC) - Migration to, Expectations and Adva...Monica Beckwith
Learn what you need to know to experience nirvana in the evaluation of G1 GC even if your are migrating from Parallel GC to G1, or CMS GC to G1 GC
You also get a walk through of some case study data
G1 GC
Uncover the hidden challenges that plague production environments in this eye-opening session. Join us as we explore the five most common performance problems that emerge in live systems. Gain invaluable insights into detecting these issues early on, before they wreak havoc on your operations. Discover practical solutions that empower you to address these challenges head-on, ensuring optimal performance and seamless user experiences.
The French Revolution, which began in 1789, was a period of radical social and political upheaval in France. It marked the decline of absolute monarchies, the rise of secular and democratic republics, and the eventual rise of Napoleon Bonaparte. This revolutionary period is crucial in understanding the transition from feudalism to modernity in Europe.
For more information, visit-www.vavaclasses.com
Model Attribute Check Company Auto PropertyCeline George
In Odoo, the multi-company feature allows you to manage multiple companies within a single Odoo database instance. Each company can have its own configurations while still sharing common resources such as products, customers, and suppliers.
How to Make a Field invisible in Odoo 17Celine George
It is possible to hide or invisible some fields in odoo. Commonly using “invisible” attribute in the field definition to invisible the fields. This slide will show how to make a field invisible in odoo 17.
Synthetic Fiber Construction in lab .pptxPavel ( NSTU)
Synthetic fiber production is a fascinating and complex field that blends chemistry, engineering, and environmental science. By understanding these aspects, students can gain a comprehensive view of synthetic fiber production, its impact on society and the environment, and the potential for future innovations. Synthetic fibers play a crucial role in modern society, impacting various aspects of daily life, industry, and the environment. ynthetic fibers are integral to modern life, offering a range of benefits from cost-effectiveness and versatility to innovative applications and performance characteristics. While they pose environmental challenges, ongoing research and development aim to create more sustainable and eco-friendly alternatives. Understanding the importance of synthetic fibers helps in appreciating their role in the economy, industry, and daily life, while also emphasizing the need for sustainable practices and innovation.
Operation “Blue Star” is the only event in the history of Independent India where the state went into war with its own people. Even after about 40 years it is not clear if it was culmination of states anger over people of the region, a political game of power or start of dictatorial chapter in the democratic setup.
The people of Punjab felt alienated from main stream due to denial of their just demands during a long democratic struggle since independence. As it happen all over the word, it led to militant struggle with great loss of lives of military, police and civilian personnel. Killing of Indira Gandhi and massacre of innocent Sikhs in Delhi and other India cities was also associated with this movement.
The Art Pastor's Guide to Sabbath | Steve ThomasonSteve Thomason
What is the purpose of the Sabbath Law in the Torah. It is interesting to compare how the context of the law shifts from Exodus to Deuteronomy. Who gets to rest, and why?
Read| The latest issue of The Challenger is here! We are thrilled to announce that our school paper has qualified for the NATIONAL SCHOOLS PRESS CONFERENCE (NSPC) 2024. Thank you for your unwavering support and trust. Dive into the stories that made us stand out!
Unit 8 - Information and Communication Technology (Paper I).pdfThiyagu K
This slides describes the basic concepts of ICT, basics of Email, Emerging Technology and Digital Initiatives in Education. This presentations aligns with the UGC Paper I syllabus.
Students, digital devices and success - Andreas Schleicher - 27 May 2024..pptxEduSkills OECD
Andreas Schleicher presents at the OECD webinar ‘Digital devices in schools: detrimental distraction or secret to success?’ on 27 May 2024. The presentation was based on findings from PISA 2022 results and the webinar helped launch the PISA in Focus ‘Managing screen time: How to protect and equip students against distraction’ https://www.oecd-ilibrary.org/education/managing-screen-time_7c225af4-en and the OECD Education Policy Perspective ‘Students, digital devices and success’ can be found here - https://oe.cd/il/5yV
How to Split Bills in the Odoo 17 POS ModuleCeline George
Bills have a main role in point of sale procedure. It will help to track sales, handling payments and giving receipts to customers. Bill splitting also has an important role in POS. For example, If some friends come together for dinner and if they want to divide the bill then it is possible by POS bill splitting. This slide will show how to split bills in odoo 17 POS.
This is a presentation by Dada Robert in a Your Skill Boost masterclass organised by the Excellence Foundation for South Sudan (EFSS) on Saturday, the 25th and Sunday, the 26th of May 2024.
He discussed the concept of quality improvement, emphasizing its applicability to various aspects of life, including personal, project, and program improvements. He defined quality as doing the right thing at the right time in the right way to achieve the best possible results and discussed the concept of the "gap" between what we know and what we do, and how this gap represents the areas we need to improve. He explained the scientific approach to quality improvement, which involves systematic performance analysis, testing and learning, and implementing change ideas. He also highlighted the importance of client focus and a team approach to quality improvement.
Welcome to TechSoup New Member Orientation and Q&A (May 2024).pdfTechSoup
In this webinar you will learn how your organization can access TechSoup's wide variety of product discount and donation programs. From hardware to software, we'll give you a tour of the tools available to help your nonprofit with productivity, collaboration, financial management, donor tracking, security, and more.
Welcome to TechSoup New Member Orientation and Q&A (May 2024).pdf
Am I reading GC logs Correctly?
1. Am I Reading GC Logs Correctly?
Ram Lakshmanan
Founder – GCEasy.io & FastThread.io
2. Agenda
• Let’s learn to read 1 GC Log line
• Let’s learn to read 1 more GC Log line
SUSPENSE
3. How to enable GC Logging?
-XX:+PrintGCDetails -XX:+PrintGCDateStamps -Xloggc:<file-path>
Rotate GC Log file
from Java 6 Update 34
-XX:+UseGCLogFileRotation
must be used with -Xloggc:<filename>;
-XX:NumberOfGCLogFiles=<number of files>
must be >=1, default is one;
-XX:GCLogFileSize=<number>M (or K)
default will be set to 512K.
Best Practice: Enable ON all JVMs always
4. GC Log
2016-03-21T13:46:58.371+0000: 500.203: [Full GC [PSYoungGen: 1047584K-
>0K(1048064K)] [ParOldGen: 2097151K->865378K(2097152K)] 3144735K-
>865378K(3145216K) [PSPermGen: 29674K->29634K(262144K)], 1.8485590 secs]
[Times: user=3.17 sys=0.21, real=1.84 secs]
2 3 4
5
7
8
1
2
3
4
5
6
7
6
8
1
2016-03-21T13:46:58.371 – Timestamp at which GC event ran
500.203 – Number of seconds since application started
Full GC – Type of GC
PSYoungGen: 1047584K->0K(1048064K) – Young Gen size dropped from 1047584
(i.e.1gb) to 0k. Total allocated Young Gen size is 1048064K
ParOldGen: 2097151K->865378K(2097152K) – Old Gen size dropped from 2097151
(i.e.1.99gb) to 865378K(i.e.845mb). Total allocated Old Gen size is 2097152k (i.e.2gb)
3144735K->865378K(3145216K) – overall heap size dropped from 3144735K
(i.e.2.99gb) to 865378K (i.e.845mb)
PSPermGen: 29674K->29634K(262144K) – Perm Gen Size dropped from 29674K to
29634K. Overall Perm Gen Size is 262144K (i.e.256mb)
[Times: user=3.17 sys=0.21, real=1.84 secs]
6. GC Time
• Real is wall clock time – time from start to finish of the call. This is all
elapsed time including time slices used by other processes and time the
process spends blocked (for example if it is waiting for I/O to complete).
• Sys is the amount of CPU time spent in the kernel within the process. This
means executing CPU time spent in system calls within the kernel, as
opposed to library code, which is still running in user-space. Like ‘user’,
this is only CPU time used by the process.
• User is the amount of CPU time spent in user-mode code (outside the
kernel) within the process. This is only actual CPU time used in executing
the process. Other processes and time the process spends blocked do not
count towards this figure.
• User+Sys will tell you how much actual CPU time your process used. Note
that this is across all CPUs, so if the process has multiple threads it could
potentially exceed the wall clock time reported by Real.
[Times: user=3.09 sys=0.00, real=3.10 secs]
[Times: user=11.53 sys=1.38, real=1.03 secs]
Typical for Serial GC
7. Agenda
• Let’s learn to read 1 GC Log line
• Let’s learn to read 1 more GC Log line
SUSPENSE
8. 2015-09-14T12:32:24.398-0700: 0.356: [GC pause (G1 Evacuation Pause) (young), 0.0215287
secs]
[Parallel Time: 20.0 ms, GC Workers: 8]
[GC Worker Start (ms): Min: 355.9, Avg: 356.3, Max: 358.4, Diff: 2.4]
[Processed Buffers: Min: 0, Avg: 1.1, Max: 5, Diff: 5, Sum: 9]
:
:
[Free CSet: 0.0 ms]
[Eden: 12.0M(12.0M)->0.0B(14.0M) Survivors: 0.0B->2048.0K Heap: 12.6M(252.0M)-
>7848.3K(252.0M)]
[Times: user=0.08 sys=0.00, real=0.02 secs]
1 2 3
5
6
4
G1 GC Log Format
1
2
3
4
5
6
2015-09-14T12:32:24.398-0700: 0.356 – indicates the time at which this GC event fired. Here 0.356 indicates that 356 milliseconds after the
Java process was started this GC event was fired.
GC pause (G1 Evacuation Pause) — Evacuation Pause is a phase where live objects are copied from one region (young or young + old) to
another region.
(young) – indicates that this is a Young GC event.
GC Workers: 8 – indicates the number of GC worker threads.
[Eden: 12.0M(12.0M)->0.0B(14.0M) Survivors: 0.0B->2048.0K Heap: 12.6M(252.0M)->7848.3K(252.0M)] – This line indicates the heap size
changes:
Eden: 12.0M(12.0M)->0.0B(14.0M) - indicates that Eden generation’s capacity was 12mb and all of the 12mb was occupied. After this
GC event, young generation occupied size came down to 0. Target Capacity of Eden generation has been increased to 14mb, but not
yet committed. Additional regions are added to Eden generation, as demands are made.
Survivors: 0.0B->2048.0K - indicates that Survivor space was 0 bytes before this GC event. But after the event Survivor size increased
to 2048kb. It indicates that objects are promoted from Young Generation to Survivor space.
Heap: 12.6M(252.0M)->7848.3K(252.0M) – indicates that capacity of heap size was 252mb, in that 12.6mb was utilized. After this GC
event, heap utilization dropped to 7848.3kb (i.e. 5mb (i.e. 12.6mb – 7848.3kb) of objects has been garbage collected in this event).
And heap capacity remained at 252mb.
Times: user=0.08, sys=0.00, real=0.02 secs
9. Agenda
• Let’s learn to read 1 GC Log line
• Let’s learn to read 1 more GC Log line
• Let’s learn to read 1 more GC Log line
• Let’s learn to read 1 more GC Log line
• Let’s learn to read 1 more GC Log line
• Let’s learn to read 1 more GC Log line
• Let’s learn to read 1 more GC Log line
SUSPENSE
10. Agenda
• Let’s learn to read 1 GC Log line
• Let’s learn to read 1 more GC Log line
• Let’s learn to read 1 more GC Log line
• Let’s learn to read 1 more GC Log line
• Let’s learn to read 1 more GC Log line
• Let’s learn to read 1 more GC Log line
• Let’s learn to read 1 more GC Log line
11. Agenda
• Let’s learn to read 1 GC Log line
• Let’s learn to read 1 more GC Log line
• How many more GC Logs lines should I learn?
• Troubleshoot real world GC Logs
13. G1 GC Log - another format
53957.956: [GC pause (G1 Evacuation Pause) (young) 104M-
>49M(118M), 0.0020080 secs]
14. CMS Log format
Before GC:
Statistics for BinaryTreeDictionary:
------------------------------------
Total Free Space: 2524251
Max Chunk Size: 2519552
Number of Blocks: 13
Av. Block Size: 194173
Tree Height: 8
2016-05-03T04:27:37.503+0000: 30282.678: [ParNew
Desired survivor size 214728704 bytes, new threshold 1 (max 1)
- age 1: 85782640 bytes, 85782640 total
: 3510063K->100856K(3774912K), 0.0516290 secs] 9371816K->6022161K(14260672K)After GC:
Statistics for BinaryTreeDictionary:
------------------------------------
Total Free Space: 530579346
Max Chunk Size: 332576815
Number of Blocks: 7178
Av. Block Size: 73917
Tree Height: 44
After GC:
Statistics for BinaryTreeDictionary:
------------------------------------
Total Free Space: 2524251
Max Chunk Size: 2519552
Number of Blocks: 13
Av. Block Size: 194173
Tree Height: 8
, 0.0552970 secs] [Times: user=0.67 sys=0.00, real=0.06 secs]
Heap after GC invocations=5898 (full 77):
par new generation total 3774912K, used 100856K [0x000000047ae00000, 0x000000057ae00000, 0x000000057ae00000)
eden space 3355520K, 0% used [0x000000047ae00000, 0x000000047ae00000, 0x0000000547ae0000)
from space 419392K, 24% used [0x0000000547ae0000, 0x000000054dd5e328, 0x0000000561470000)
to space 419392K, 0% used [0x0000000561470000, 0x0000000561470000, 0x000000057ae00000)
concurrent mark-sweep generation total 10485760K, used 5921305K [0x000000057ae00000, 0x00000007fae00000, 0x00000007fae00000)
concurrent-mark-sweep perm gen total 49380K, used 29537K [0x00000007fae00000, 0x00000007fde39000, 0x0000000800000000)
}
20. Agenda
• Let’s learn to read 1 GC Log line
• Let’s learn to read 1 more GC Log line
• How many more GC Logs lines should I learn?
• Troubleshoot real world GC Logs