The document outlines a seminar on performance analysis in a multitenant cloud environment using Hadoop and Oracle Solaris 11. It discusses analyzing performance across abstraction layers, introducing Hadoop and its architecture, and demonstrating tools like zonestat, mpstat, fsstat, and prstat to monitor CPU, memory, disk I/O, and other metrics at both the system and per-zone level.
OSSNA 2017 Performance Analysis Superpowers with Linux BPFBrendan Gregg
Talk by Brendan Gregg for OSSNA 2017. "Advanced performance observability and debugging have arrived built into the Linux 4.x series, thanks to enhancements to Berkeley Packet Filter (BPF, or eBPF) and the repurposing of its sandboxed virtual machine to provide programmatic capabilities to system tracing. Netflix has been investigating its use for new observability tools, monitoring, security uses, and more. This talk will be a dive deep on these new tracing, observability, and debugging capabilities, which sooner or later will be available to everyone who uses Linux. Whether you’re doing analysis over an ssh session, or via a monitoring GUI, BPF can be used to provide an efficient, custom, and deep level of detail into system and application performance.
This talk will also demonstrate the new open source tools that have been developed, which make use of kernel- and user-level dynamic tracing (kprobes and uprobes), and kernel- and user-level static tracing (tracepoints). These tools provide new insights for file system and storage performance, CPU scheduler performance, TCP performance, and a whole lot more. This is a major turning point for Linux systems engineering, as custom advanced performance instrumentation can be used safely in production environments, powering a new generation of tools and visualizations."
Delivered as plenary at USENIX LISA 2013. video here: https://www.youtube.com/watch?v=nZfNehCzGdw and https://www.usenix.org/conference/lisa13/technical-sessions/plenary/gregg . "How did we ever analyze performance before Flame Graphs?" This new visualization invented by Brendan can help you quickly understand application and kernel performance, especially CPU usage, where stacks (call graphs) can be sampled and then visualized as an interactive flame graph. Flame Graphs are now used for a growing variety of targets: for applications and kernels on Linux, SmartOS, Mac OS X, and Windows; for languages including C, C++, node.js, ruby, and Lua; and in WebKit Web Inspector. This talk will explain them and provide use cases and new visualizations for other event types, including I/O, memory usage, and latency.
Talk for QConSF 2015: "Broken benchmarks, misleading metrics, and terrible tools. This talk will help you navigate the treacherous waters of system performance tools, touring common problems with system metrics, monitoring, statistics, visualizations, measurement overhead, and benchmarks. This will likely involve some unlearning, as you discover tools you have been using for years, are in fact, misleading, dangerous, or broken.
The speaker, Brendan Gregg, has given many popular talks on operating system performance tools. This is an anti-version of these talks, to focus on broken tools and metrics instead of the working ones. Metrics can be misleading, and counters can be counter-intuitive! This talk will include advice and methodologies for verifying new performance tools, understanding how they work, and using them successfully."
Linux Tracing Superpowers by Eugene PirogovPivorak MeetUp
For a long time Linux was far behind operating systems of Unix family from the perspective of debuggability, specifically in a live production systems.
However, over the course of 2016 Linux saw a series of patches that brought it on par with Unix world: an old Linux tool called BPF has risen and extended into powerful new one – eBPF. Some say that eBPF marks the begining of true DTrace for Linux.
During the presentation I'm going to talk about tracing basics, cover a series of events that led to the development of eBPF and will compare eBPF with DTrace from Unix world. Current state of affairs of Linux tracing tools will be explored. Finally, together we'll look at some of the exciting examples of eBPF application.
***
Eugene is well known in our Ruby (and Elixir) communities. Last time when he was at #pivorak he made a very light and interesting intro to the Elixir. You can check his speech out here - http://bit.ly/2evCd9R
Talk for YOW! by Brendan Gregg. "Systems performance studies the performance of computing systems, including all physical components and the full software stack to help you find performance wins for your application and kernel. However, most of us are not performance or kernel engineers, and have limited time to study this topic. This talk summarizes the topic for everyone, touring six important areas: observability tools, methodologies, benchmarking, profiling, tracing, and tuning. Included are recipes for Linux performance analysis and tuning (using vmstat, mpstat, iostat, etc), overviews of complex areas including profiling (perf_events) and tracing (ftrace, bcc/BPF, and bpftrace/BPF), advice about what is and isn't important to learn, and case studies to see how it is applied. This talk is aimed at everyone: developers, operations, sysadmins, etc, and in any environment running Linux, bare metal or the cloud.
"
Analyzing OS X Systems Performance with the USE MethodBrendan Gregg
Talk for MacIT 2014. This talk is about systems performance on OS X, and introduces the USE Method to check for common performance bottlenecks and errors. This methodology can be used by beginners and experts alike, and begins by constructing a checklist of the questions we’d like to ask of the system, before reaching for tools to answer them. The focus is resources: CPUs, GPUs, memory capacity, network interfaces, storage devices, controllers, interconnects, as well as some software resources such as mutex locks. These areas are investigated by a wide variety of tools, including vm_stat, iostat, netstat, top, latency, the DTrace scripts in /usr/bin (which were written by Brendan), custom DTrace scripts, Instruments, and more. This is a tour of the tools needed to solve our performance needs, rather than understanding tools just because they exist. This talk will make you aware of many areas of OS X that you can investigate, which will be especially useful for the time when you need to get to the bottom of a performance issue.
OSSNA 2017 Performance Analysis Superpowers with Linux BPFBrendan Gregg
Talk by Brendan Gregg for OSSNA 2017. "Advanced performance observability and debugging have arrived built into the Linux 4.x series, thanks to enhancements to Berkeley Packet Filter (BPF, or eBPF) and the repurposing of its sandboxed virtual machine to provide programmatic capabilities to system tracing. Netflix has been investigating its use for new observability tools, monitoring, security uses, and more. This talk will be a dive deep on these new tracing, observability, and debugging capabilities, which sooner or later will be available to everyone who uses Linux. Whether you’re doing analysis over an ssh session, or via a monitoring GUI, BPF can be used to provide an efficient, custom, and deep level of detail into system and application performance.
This talk will also demonstrate the new open source tools that have been developed, which make use of kernel- and user-level dynamic tracing (kprobes and uprobes), and kernel- and user-level static tracing (tracepoints). These tools provide new insights for file system and storage performance, CPU scheduler performance, TCP performance, and a whole lot more. This is a major turning point for Linux systems engineering, as custom advanced performance instrumentation can be used safely in production environments, powering a new generation of tools and visualizations."
Delivered as plenary at USENIX LISA 2013. video here: https://www.youtube.com/watch?v=nZfNehCzGdw and https://www.usenix.org/conference/lisa13/technical-sessions/plenary/gregg . "How did we ever analyze performance before Flame Graphs?" This new visualization invented by Brendan can help you quickly understand application and kernel performance, especially CPU usage, where stacks (call graphs) can be sampled and then visualized as an interactive flame graph. Flame Graphs are now used for a growing variety of targets: for applications and kernels on Linux, SmartOS, Mac OS X, and Windows; for languages including C, C++, node.js, ruby, and Lua; and in WebKit Web Inspector. This talk will explain them and provide use cases and new visualizations for other event types, including I/O, memory usage, and latency.
Talk for QConSF 2015: "Broken benchmarks, misleading metrics, and terrible tools. This talk will help you navigate the treacherous waters of system performance tools, touring common problems with system metrics, monitoring, statistics, visualizations, measurement overhead, and benchmarks. This will likely involve some unlearning, as you discover tools you have been using for years, are in fact, misleading, dangerous, or broken.
The speaker, Brendan Gregg, has given many popular talks on operating system performance tools. This is an anti-version of these talks, to focus on broken tools and metrics instead of the working ones. Metrics can be misleading, and counters can be counter-intuitive! This talk will include advice and methodologies for verifying new performance tools, understanding how they work, and using them successfully."
Linux Tracing Superpowers by Eugene PirogovPivorak MeetUp
For a long time Linux was far behind operating systems of Unix family from the perspective of debuggability, specifically in a live production systems.
However, over the course of 2016 Linux saw a series of patches that brought it on par with Unix world: an old Linux tool called BPF has risen and extended into powerful new one – eBPF. Some say that eBPF marks the begining of true DTrace for Linux.
During the presentation I'm going to talk about tracing basics, cover a series of events that led to the development of eBPF and will compare eBPF with DTrace from Unix world. Current state of affairs of Linux tracing tools will be explored. Finally, together we'll look at some of the exciting examples of eBPF application.
***
Eugene is well known in our Ruby (and Elixir) communities. Last time when he was at #pivorak he made a very light and interesting intro to the Elixir. You can check his speech out here - http://bit.ly/2evCd9R
Talk for YOW! by Brendan Gregg. "Systems performance studies the performance of computing systems, including all physical components and the full software stack to help you find performance wins for your application and kernel. However, most of us are not performance or kernel engineers, and have limited time to study this topic. This talk summarizes the topic for everyone, touring six important areas: observability tools, methodologies, benchmarking, profiling, tracing, and tuning. Included are recipes for Linux performance analysis and tuning (using vmstat, mpstat, iostat, etc), overviews of complex areas including profiling (perf_events) and tracing (ftrace, bcc/BPF, and bpftrace/BPF), advice about what is and isn't important to learn, and case studies to see how it is applied. This talk is aimed at everyone: developers, operations, sysadmins, etc, and in any environment running Linux, bare metal or the cloud.
"
Analyzing OS X Systems Performance with the USE MethodBrendan Gregg
Talk for MacIT 2014. This talk is about systems performance on OS X, and introduces the USE Method to check for common performance bottlenecks and errors. This methodology can be used by beginners and experts alike, and begins by constructing a checklist of the questions we’d like to ask of the system, before reaching for tools to answer them. The focus is resources: CPUs, GPUs, memory capacity, network interfaces, storage devices, controllers, interconnects, as well as some software resources such as mutex locks. These areas are investigated by a wide variety of tools, including vm_stat, iostat, netstat, top, latency, the DTrace scripts in /usr/bin (which were written by Brendan), custom DTrace scripts, Instruments, and more. This is a tour of the tools needed to solve our performance needs, rather than understanding tools just because they exist. This talk will make you aware of many areas of OS X that you can investigate, which will be especially useful for the time when you need to get to the bottom of a performance issue.
Kernel Recipes 2017 - Understanding the Linux kernel via ftrace - Steven RostedtAnne Nicolas
Ftrace is the official tracer of the Linux kernel. It has been apart of Linux since 2.6.31, and has grown tremendously ever since. Ftrace’s name comes from its most powerful feature: function tracing. But the ftrace infrastructure is much more than that. It also encompasses the trace events that are used by perf, as well as kprobes that can dynamically add trace events that the user defines.
This talk will focus on learning how the kernel works by using the ftrace infrastructure. It will show how to see what happens within the kernel during a system call; learn how interrupts work; see how ones processes are being scheduled, and more. A quick introduction to some tools like trace-cmd and KernelShark will also be demonstrated.
Steven Rostedt, VMware
Linux 4.x Tracing: Performance Analysis with bcc/BPFBrendan Gregg
Talk about bcc/eBPF for SCALE15x (2017) by Brendan Gregg. "BPF (Berkeley Packet Filter) has been enhanced in the Linux 4.x series and now powers a large collection of performance analysis and observability tools ready for you to use, included in the bcc (BPF Complier Collection) open source project. BPF nowadays can do system tracing, software defined networks, and kernel fast path: much more than just filtering packets! This talk will focus on the bcc/BPF tools for performance analysis, which make use of other built in Linux capabilities: dynamic tracing (kprobes and uprobes) and static tracing (tracepoints and USDT). There are now bcc tools for measuring latency distributions for file system I/O and run queue latency, printing details of storage device I/O and TCP retransmits, investigating blocked stack traces and memory leaks, and a whole lot more. These lead to performance wins large and small, especially when instrumenting areas that previously had zero visibility. Tracing superpowers have finally arrived, built in to Linux."
zfsday talk (a video is on the last slide). The performance of the file system, or disks, is often the target of blame, especially in multi-tenant cloud environments. At Joyent we deploy a public cloud on ZFS-based systems, and frequently investigate performance with a wide variety of applications in growing environments. This talk is about ZFS performance observability, showing the tools and approaches we use to quickly show what ZFS is doing. This includes observing ZFS I/O throttling, an enhancement added to illumos-ZFS to isolate performance between neighbouring tenants, and the use of DTrace and heat maps to examine latency distributions and locate outliers.
Performance Wins with BPF: Getting StartedBrendan Gregg
Keynote by Brendan Gregg for the eBPF summit, 2020. How to get started finding performance wins using the BPF (eBPF) technology. This short talk covers the quickest and easiest way to find performance wins using BPF observability tools on Linux.
OSDC 2017 - Werner Fischer - Linux performance profiling and monitoringNETWAYS
Nowadays system administrators have great choices when it comes down to Linux performance profiling and monitoring. The challenge is to pick the appropriate tools and interpret their results correctly.
This talk is a chance to take a tour through various performance profiling and benchmarking tools, focusing on their benefit for every sysadmin.
More than 25 different tools are presented. Ranging from well known tools like strace, iostat, tcpdump or vmstat to new features like Linux tracepoints or perf_events. You will also learn which tools can be monitored by Icinga and which monitoring plugins are already available for that.
At the end the goal is to gather reference points to look at, whenever you are faced with performance problems.
Take the chance to close your knowledge gaps and learn how to get the most out of your system.
re:Invent 2019 BPF Performance Analysis at NetflixBrendan Gregg
Talk by Brendan Gregg at AWS re:Invent 2019. Abstract: "Extended BPF (eBPF) is an open source Linux technology that powers a whole new class of software: mini programs that run on events. Among its many uses, BPF can be used to create powerful performance analysis tools capable of analyzing everything: CPUs, memory, disks, file systems, networking, languages, applications, and more. In this session, Netflix's Brendan Gregg tours BPF tracing capabilities, including many new open source performance analysis tools he developed for his new book "BPF Performance Tools: Linux System and Application Observability." The talk includes examples of using these tools in the Amazon EC2 cloud."
UKOUG version of a presentation trying to establish the sensible limits of parallelism on a couple of hardware configurations. Detailed white paper is at http://oracledoug.com/px_slaves.pdf
SREcon 2016 Performance Checklists for SREsBrendan Gregg
Talk from SREcon2016 by Brendan Gregg. Video: https://www.usenix.org/conference/srecon16/program/presentation/gregg . "There's limited time for performance analysis in the emergency room. When there is a performance-related site outage, the SRE team must analyze and solve complex performance issues as quickly as possible, and under pressure. Many performance tools and techniques are designed for a different environment: an engineer analyzing their system over the course of hours or days, and given time to try dozens of tools: profilers, tracers, monitoring tools, benchmarks, as well as different tunings and configurations. But when Netflix is down, minutes matter, and there's little time for such traditional systems analysis. As with aviation emergencies, short checklists and quick procedures can be applied by the on-call SRE staff to help solve performance issues as quickly as possible.
In this talk, I'll cover a checklist for Linux performance analysis in 60 seconds, as well as other methodology-derived checklists and procedures for cloud computing, with examples of performance issues for context. Whether you are solving crises in the SRE war room, or just have limited time for performance engineering, these checklists and approaches should help you find some quick performance wins. Safe flying."
When your whole system is unresponsive, how to investigate on this failure ?
We'll see how to get a memory dump for offline analysis with kdump system.
Then how to analyze it with crash utility.
And finally, how to use crash on a running system to modify the kernel memory (at your own risks !)
Kernel Recipes 2017: Performance Analysis with BPFBrendan Gregg
Talk by Brendan Gregg at Kernel Recipes 2017 (Paris): "The in-kernel Berkeley Packet Filter (BPF) has been enhanced in recent kernels to do much more than just filtering packets. It can now run user-defined programs on events, such as on tracepoints, kprobes, uprobes, and perf_events, allowing advanced performance analysis tools to be created. These can be used in production as the BPF virtual machine is sandboxed and will reject unsafe code, and are already in use at Netflix.
Beginning with the bpf() syscall in 3.18, enhancements have been added in many kernel versions since, with major features for BPF analysis landing in Linux 4.1, 4.4, 4.7, and 4.9. Specific capabilities these provide include custom in-kernel summaries of metrics, custom latency measurements, and frequency counting kernel and user stack traces on events. One interesting case involves saving stack traces on wake up events, and associating them with the blocked stack trace: so that we can see the blocking stack trace and the waker together, merged in kernel by a BPF program (that particular example is in the kernel as samples/bpf/offwaketime).
This talk will discuss the new BPF capabilities for performance analysis and debugging, and demonstrate the new open source tools that have been developed to use it, many of which are in the Linux Foundation iovisor bcc (BPF Compiler Collection) project. These include tools to analyze the CPU scheduler, TCP performance, file system performance, block I/O, and more."
The objective of this article is to describe what to monitor in and around Alfresco in order to have a good understanding of how the applications are performing and to be aware of potential issues.
A 2015 presentation to introduce users to Java profiling. The Yourkit Profiler is used for concrete examples. The following topics are covered:
1) When to profile
2) Profiler sampling
3) Profiler instrumentation
4) Where to Start
5) Macro vs micro benchmarking
Kernel Recipes 2017 - Understanding the Linux kernel via ftrace - Steven RostedtAnne Nicolas
Ftrace is the official tracer of the Linux kernel. It has been apart of Linux since 2.6.31, and has grown tremendously ever since. Ftrace’s name comes from its most powerful feature: function tracing. But the ftrace infrastructure is much more than that. It also encompasses the trace events that are used by perf, as well as kprobes that can dynamically add trace events that the user defines.
This talk will focus on learning how the kernel works by using the ftrace infrastructure. It will show how to see what happens within the kernel during a system call; learn how interrupts work; see how ones processes are being scheduled, and more. A quick introduction to some tools like trace-cmd and KernelShark will also be demonstrated.
Steven Rostedt, VMware
Linux 4.x Tracing: Performance Analysis with bcc/BPFBrendan Gregg
Talk about bcc/eBPF for SCALE15x (2017) by Brendan Gregg. "BPF (Berkeley Packet Filter) has been enhanced in the Linux 4.x series and now powers a large collection of performance analysis and observability tools ready for you to use, included in the bcc (BPF Complier Collection) open source project. BPF nowadays can do system tracing, software defined networks, and kernel fast path: much more than just filtering packets! This talk will focus on the bcc/BPF tools for performance analysis, which make use of other built in Linux capabilities: dynamic tracing (kprobes and uprobes) and static tracing (tracepoints and USDT). There are now bcc tools for measuring latency distributions for file system I/O and run queue latency, printing details of storage device I/O and TCP retransmits, investigating blocked stack traces and memory leaks, and a whole lot more. These lead to performance wins large and small, especially when instrumenting areas that previously had zero visibility. Tracing superpowers have finally arrived, built in to Linux."
zfsday talk (a video is on the last slide). The performance of the file system, or disks, is often the target of blame, especially in multi-tenant cloud environments. At Joyent we deploy a public cloud on ZFS-based systems, and frequently investigate performance with a wide variety of applications in growing environments. This talk is about ZFS performance observability, showing the tools and approaches we use to quickly show what ZFS is doing. This includes observing ZFS I/O throttling, an enhancement added to illumos-ZFS to isolate performance between neighbouring tenants, and the use of DTrace and heat maps to examine latency distributions and locate outliers.
Performance Wins with BPF: Getting StartedBrendan Gregg
Keynote by Brendan Gregg for the eBPF summit, 2020. How to get started finding performance wins using the BPF (eBPF) technology. This short talk covers the quickest and easiest way to find performance wins using BPF observability tools on Linux.
OSDC 2017 - Werner Fischer - Linux performance profiling and monitoringNETWAYS
Nowadays system administrators have great choices when it comes down to Linux performance profiling and monitoring. The challenge is to pick the appropriate tools and interpret their results correctly.
This talk is a chance to take a tour through various performance profiling and benchmarking tools, focusing on their benefit for every sysadmin.
More than 25 different tools are presented. Ranging from well known tools like strace, iostat, tcpdump or vmstat to new features like Linux tracepoints or perf_events. You will also learn which tools can be monitored by Icinga and which monitoring plugins are already available for that.
At the end the goal is to gather reference points to look at, whenever you are faced with performance problems.
Take the chance to close your knowledge gaps and learn how to get the most out of your system.
re:Invent 2019 BPF Performance Analysis at NetflixBrendan Gregg
Talk by Brendan Gregg at AWS re:Invent 2019. Abstract: "Extended BPF (eBPF) is an open source Linux technology that powers a whole new class of software: mini programs that run on events. Among its many uses, BPF can be used to create powerful performance analysis tools capable of analyzing everything: CPUs, memory, disks, file systems, networking, languages, applications, and more. In this session, Netflix's Brendan Gregg tours BPF tracing capabilities, including many new open source performance analysis tools he developed for his new book "BPF Performance Tools: Linux System and Application Observability." The talk includes examples of using these tools in the Amazon EC2 cloud."
UKOUG version of a presentation trying to establish the sensible limits of parallelism on a couple of hardware configurations. Detailed white paper is at http://oracledoug.com/px_slaves.pdf
SREcon 2016 Performance Checklists for SREsBrendan Gregg
Talk from SREcon2016 by Brendan Gregg. Video: https://www.usenix.org/conference/srecon16/program/presentation/gregg . "There's limited time for performance analysis in the emergency room. When there is a performance-related site outage, the SRE team must analyze and solve complex performance issues as quickly as possible, and under pressure. Many performance tools and techniques are designed for a different environment: an engineer analyzing their system over the course of hours or days, and given time to try dozens of tools: profilers, tracers, monitoring tools, benchmarks, as well as different tunings and configurations. But when Netflix is down, minutes matter, and there's little time for such traditional systems analysis. As with aviation emergencies, short checklists and quick procedures can be applied by the on-call SRE staff to help solve performance issues as quickly as possible.
In this talk, I'll cover a checklist for Linux performance analysis in 60 seconds, as well as other methodology-derived checklists and procedures for cloud computing, with examples of performance issues for context. Whether you are solving crises in the SRE war room, or just have limited time for performance engineering, these checklists and approaches should help you find some quick performance wins. Safe flying."
When your whole system is unresponsive, how to investigate on this failure ?
We'll see how to get a memory dump for offline analysis with kdump system.
Then how to analyze it with crash utility.
And finally, how to use crash on a running system to modify the kernel memory (at your own risks !)
Kernel Recipes 2017: Performance Analysis with BPFBrendan Gregg
Talk by Brendan Gregg at Kernel Recipes 2017 (Paris): "The in-kernel Berkeley Packet Filter (BPF) has been enhanced in recent kernels to do much more than just filtering packets. It can now run user-defined programs on events, such as on tracepoints, kprobes, uprobes, and perf_events, allowing advanced performance analysis tools to be created. These can be used in production as the BPF virtual machine is sandboxed and will reject unsafe code, and are already in use at Netflix.
Beginning with the bpf() syscall in 3.18, enhancements have been added in many kernel versions since, with major features for BPF analysis landing in Linux 4.1, 4.4, 4.7, and 4.9. Specific capabilities these provide include custom in-kernel summaries of metrics, custom latency measurements, and frequency counting kernel and user stack traces on events. One interesting case involves saving stack traces on wake up events, and associating them with the blocked stack trace: so that we can see the blocking stack trace and the waker together, merged in kernel by a BPF program (that particular example is in the kernel as samples/bpf/offwaketime).
This talk will discuss the new BPF capabilities for performance analysis and debugging, and demonstrate the new open source tools that have been developed to use it, many of which are in the Linux Foundation iovisor bcc (BPF Compiler Collection) project. These include tools to analyze the CPU scheduler, TCP performance, file system performance, block I/O, and more."
The objective of this article is to describe what to monitor in and around Alfresco in order to have a good understanding of how the applications are performing and to be aware of potential issues.
A 2015 presentation to introduce users to Java profiling. The Yourkit Profiler is used for concrete examples. The following topics are covered:
1) When to profile
2) Profiler sampling
3) Profiler instrumentation
4) Where to Start
5) Macro vs micro benchmarking
Jörg Schad - NO ONE PUTS Java IN THE CONTAINER - Codemotion Milan 2017Codemotion
The current craze of Docker has everyone sticking their processes inside a container… but do you really understand cgroups and how they work? Do you understand the difference between CPU Sets and CPU Shares? Spark is a Scala application that lives inside a Java Runtime, do you understand the consequence of what impact the cgroup constraints have on the JRE? This talk starts with a deep understand of Java’s memory management and GC characteristics and how JRE characteristics change based on core count. We will continue the talk looking at containers and how resource isolation works.
EuroBSDcon 2017 System Performance Analysis MethodologiesBrendan Gregg
keynote by Brendan Gregg. "Traditional performance monitoring makes do with vendor-supplied metrics, often involving interpretation and inference, and with numerous blind spots. Much in the field of systems performance is still living in the past: documentation, procedures, and analysis GUIs built upon the same old metrics. Modern BSD has advanced tracers and PMC tools, providing virtually endless metrics to aid performance analysis. It's time we really used them, but the problem becomes which metrics to use, and how to navigate them quickly to locate the root cause of problems.
There's a new way to approach performance analysis that can guide you through the metrics. Instead of starting with traditional metrics and figuring out their use, you start with the questions you want answered then look for metrics to answer them. Methodologies can provide these questions, as well as a starting point for analysis and guidance for locating the root cause. They also pose questions that the existing metrics may not yet answer, which may be critical in solving the toughest problems. System methodologies include the USE method, workload characterization, drill-down analysis, off-CPU analysis, chain graphs, and more.
This talk will discuss various system performance issues, and the methodologies, tools, and processes used to solve them. Many methodologies will be discussed, from the production proven to the cutting edge, along with recommendations for their implementation on BSD systems. In general, you will learn to think differently about analyzing your systems, and make better use of the modern tools that BSD provides."
This document tells about some brief idea about Supercomputer.
The list of Supercomuters in the world.
Little bit idea about Clustering Of Computer (HPC Cluster) and about the model of it.
The biggest headine at the 2009 Oracle OpenWorld was when Larry Ellison announced that Oracle was entering the hardware business with a pre-built database machine, engineered by Oracle. Since then businesses around the world have started to use these engineered systems. This beginner/intermediate-level session will take you through my first 100 days of starting to administer an Exadata machine and all the roadblocks and all the success I had along this new path.
Talk for PerconaLive 2016 by Brendan Gregg. Video: https://www.youtube.com/watch?v=CbmEDXq7es0 . "Systems performance provides a different perspective for analysis and tuning, and can help you find performance wins for your databases, applications, and the kernel. However, most of us are not performance or kernel engineers, and have limited time to study this topic. This talk summarizes six important areas of Linux systems performance in 50 minutes: observability tools, methodologies, benchmarking, profiling, tracing, and tuning. Included are recipes for Linux performance analysis and tuning (using vmstat, mpstat, iostat, etc), overviews of complex areas including profiling (perf_events), static tracing (tracepoints), and dynamic tracing (kprobes, uprobes), and much advice about what is and isn't important to learn. This talk is aimed at everyone: DBAs, developers, operations, etc, and in any environment running Linux, bare-metal or the cloud."
Similar to Performance analysis in a multitenant cloud environment Using Hadoop Cluster and Oracle Solaris 11 (20)
Red Hat Enteprise Linux Open Stack Platfrom DirectorOrgad Kimchi
Red Hat Enterprise Linux OpenStack Platform director is a toolset for installing and managing a complete OpenStack environment. It is based primarily on the OpenStack project TripleO, which is an abbreviation for "OpenStack-On-OpenStack". This project takes advantage of OpenStack components to install a fully operational OpenStack environment. This includes new OpenStack components that provision and control bare metal systems to use as OpenStack nodes. This provides a simple method for installing a complete Red Hat Enterprise Linux OpenStack Platform environment that is both lean and robust.
Oracle Solaris 11.2 - Engineered for Cloud
Oracle Solaris provides an efficient, secure and compliant, simple, open, and affordable solution for
deploying your enterprise-grade clouds. More than just an operating system, Oracle Solaris 11.2 includes
features and enhancements that deliver no-compromise virtualization, application-driven software-defined
networking, and a complete OpenStack distribution for creating and managing an enterprise cloud, enabling
you to meet IT demands and redefine your business.
For more information: http://www.oracle.com/technetwork/server-storage/solaris11/overview/beta-2182985.html
Oracle Solaris 11 as a BIG Data Platform Apache Hadoop Use CaseOrgad Kimchi
The following are benefits of using Oracle Solaris Zones for a Hadoop cluster:
Fast provision of new cluster members using the zone cloning feature
Very high network throughput between the zones for data node replication
Optimized disk I/O utilization for better I/O performance with ZFS built-in compression
Secure data at rest using ZFS encryption
For more information see: http://www.oracle.com/technetwork/articles/servers-storage-admin/howto-setup-hadoop-zones-1899993.html
Oracle Solaris 11 is the first operating system engineered with cloud computing in mind. So what's new in Oracle Solaris 11, and how does that connect to the cloud? If you`re involved in Application Life-cycle Management, Configuration Management,
Cloud Deployment, Big Data Design and Application or Infrastructure Scaling - You will learn how to leverage the Solaris 11 technologies in order to build your Cloud infrastructure.
For more information see: http://www.oracle.com/technetwork/systems/hands-on-labs/hol-oracle-solaris-remote-lab-1894053.html
The Art of the Pitch: WordPress Relationships and SalesLaura Byrne
Clients don’t know what they don’t know. What web solutions are right for them? How does WordPress come into the picture? How do you make sure you understand scope and timeline? What do you do if sometime changes?
All these questions and more will be explored as we talk about matching clients’ needs with what your agency offers without pulling teeth or pulling your hair out. Practical tips, and strategies for successful relationship building that leads to closing the deal.
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...DanBrown980551
Do you want to learn how to model and simulate an electrical network from scratch in under an hour?
Then welcome to this PowSyBl workshop, hosted by Rte, the French Transmission System Operator (TSO)!
During the webinar, you will discover the PowSyBl ecosystem as well as handle and study an electrical network through an interactive Python notebook.
PowSyBl is an open source project hosted by LF Energy, which offers a comprehensive set of features for electrical grid modelling and simulation. Among other advanced features, PowSyBl provides:
- A fully editable and extendable library for grid component modelling;
- Visualization tools to display your network;
- Grid simulation tools, such as power flows, security analyses (with or without remedial actions) and sensitivity analyses;
The framework is mostly written in Java, with a Python binding so that Python developers can access PowSyBl functionalities as well.
What you will learn during the webinar:
- For beginners: discover PowSyBl's functionalities through a quick general presentation and the notebook, without needing any expert coding skills;
- For advanced developers: master the skills to efficiently apply PowSyBl functionalities to your real-world scenarios.
Epistemic Interaction - tuning interfaces to provide information for AI supportAlan Dix
Paper presented at SYNERGY workshop at AVI 2024, Genoa, Italy. 3rd June 2024
https://alandix.com/academic/papers/synergy2024-epistemic/
As machine learning integrates deeper into human-computer interactions, the concept of epistemic interaction emerges, aiming to refine these interactions to enhance system adaptability. This approach encourages minor, intentional adjustments in user behaviour to enrich the data available for system learning. This paper introduces epistemic interaction within the context of human-system communication, illustrating how deliberate interaction design can improve system understanding and adaptation. Through concrete examples, we demonstrate the potential of epistemic interaction to significantly advance human-computer interaction by leveraging intuitive human communication strategies to inform system design and functionality, offering a novel pathway for enriching user-system engagements.
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...Ramesh Iyer
In today's fast-changing business world, Companies that adapt and embrace new ideas often need help to keep up with the competition. However, fostering a culture of innovation takes much work. It takes vision, leadership and willingness to take risks in the right proportion. Sachin Dev Duggal, co-founder of Builder.ai, has perfected the art of this balance, creating a company culture where creativity and growth are nurtured at each stage.
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...UiPathCommunity
💥 Speed, accuracy, and scaling – discover the superpowers of GenAI in action with UiPath Document Understanding and Communications Mining™:
See how to accelerate model training and optimize model performance with active learning
Learn about the latest enhancements to out-of-the-box document processing – with little to no training required
Get an exclusive demo of the new family of UiPath LLMs – GenAI models specialized for processing different types of documents and messages
This is a hands-on session specifically designed for automation developers and AI enthusiasts seeking to enhance their knowledge in leveraging the latest intelligent document processing capabilities offered by UiPath.
Speakers:
👨🏫 Andras Palfi, Senior Product Manager, UiPath
👩🏫 Lenka Dulovicova, Product Program Manager, UiPath
"Impact of front-end architecture on development cost", Viktor TurskyiFwdays
I have heard many times that architecture is not important for the front-end. Also, many times I have seen how developers implement features on the front-end just following the standard rules for a framework and think that this is enough to successfully launch the project, and then the project fails. How to prevent this and what approach to choose? I have launched dozens of complex projects and during the talk we will analyze which approaches have worked for me and which have not.
Neuro-symbolic is not enough, we need neuro-*semantic*Frank van Harmelen
Neuro-symbolic (NeSy) AI is on the rise. However, simply machine learning on just any symbolic structure is not sufficient to really harvest the gains of NeSy. These will only be gained when the symbolic structures have an actual semantics. I give an operational definition of semantics as “predictable inference”.
All of this illustrated with link prediction over knowledge graphs, but the argument is general.
Essentials of Automations: Optimizing FME Workflows with ParametersSafe Software
Are you looking to streamline your workflows and boost your projects’ efficiency? Do you find yourself searching for ways to add flexibility and control over your FME workflows? If so, you’re in the right place.
Join us for an insightful dive into the world of FME parameters, a critical element in optimizing workflow efficiency. This webinar marks the beginning of our three-part “Essentials of Automation” series. This first webinar is designed to equip you with the knowledge and skills to utilize parameters effectively: enhancing the flexibility, maintainability, and user control of your FME projects.
Here’s what you’ll gain:
- Essentials of FME Parameters: Understand the pivotal role of parameters, including Reader/Writer, Transformer, User, and FME Flow categories. Discover how they are the key to unlocking automation and optimization within your workflows.
- Practical Applications in FME Form: Delve into key user parameter types including choice, connections, and file URLs. Allow users to control how a workflow runs, making your workflows more reusable. Learn to import values and deliver the best user experience for your workflows while enhancing accuracy.
- Optimization Strategies in FME Flow: Explore the creation and strategic deployment of parameters in FME Flow, including the use of deployment and geometry parameters, to maximize workflow efficiency.
- Pro Tips for Success: Gain insights on parameterizing connections and leveraging new features like Conditional Visibility for clarity and simplicity.
We’ll wrap up with a glimpse into future webinars, followed by a Q&A session to address your specific questions surrounding this topic.
Don’t miss this opportunity to elevate your FME expertise and drive your projects to new heights of efficiency.
Transcript: Selling digital books in 2024: Insights from industry leaders - T...BookNet Canada
The publishing industry has been selling digital audiobooks and ebooks for over a decade and has found its groove. What’s changed? What has stayed the same? Where do we go from here? Join a group of leading sales peers from across the industry for a conversation about the lessons learned since the popularization of digital books, best practices, digital book supply chain management, and more.
Link to video recording: https://bnctechforum.ca/sessions/selling-digital-books-in-2024-insights-from-industry-leaders/
Presented by BookNet Canada on May 28, 2024, with support from the Department of Canadian Heritage.
DevOps and Testing slides at DASA ConnectKari Kakkonen
My and Rik Marselis slides at 30.5.2024 DASA Connect conference. We discuss about what is testing, then what is agile testing and finally what is Testing in DevOps. Finally we had lovely workshop with the participants trying to find out different ways to think about quality and testing in different parts of the DevOps infinity loop.
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf91mobiles
91mobiles recently conducted a Smart TV Buyer Insights Survey in which we asked over 3,000 respondents about the TV they own, aspects they look at on a new TV, and their TV buying preferences.
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024Tobias Schneck
As AI technology is pushing into IT I was wondering myself, as an “infrastructure container kubernetes guy”, how get this fancy AI technology get managed from an infrastructure operational view? Is it possible to apply our lovely cloud native principals as well? What benefit’s both technologies could bring to each other?
Let me take this questions and provide you a short journey through existing deployment models and use cases for AI software. On practical examples, we discuss what cloud/on-premise strategy we may need for applying it to our own infrastructure to get it to work from an enterprise perspective. I want to give an overview about infrastructure requirements and technologies, what could be beneficial or limiting your AI use cases in an enterprise environment. An interactive Demo will give you some insides, what approaches I got already working for real.
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Performance analysis in a multitenant cloud environment Using Hadoop Cluster and Oracle Solaris 11
1. Seminar:
Performance Analysis in a Multitenant
Cloud Environment
Using Hadoop Cluster and Oracle
Solaris 11
Presenter:
Orgad Kimchi
Principal Software Engineer
Oracle
2. The following is intended to outline our general product
direction. It is intended for information purposes only, and
may not be incorporated into any contract. It is not a
commitment to deliver any material, code, or functionality,
and should not be relied upon in making purchasing
decisions. The development, release, and timing of any
features or functionality described for Oracle’s products
remains at the sole discretion of Oracle.
3. Overview
Analyzing the performance of a virtualized multi-tenant cloud
environment can be challenging because of the layers of
abstraction.
• Each type of virtualization software adds an abstraction layer to enable
better manageability.
• Each Oracle Solaris Zone can have different workload; it can be disk I/O,
network I/O, CPU, memory, or combination of these. In addition, a single
Oracle Solaris Zone can overload the entire system resources.
• It is very difficult to observe the environment; you need to be able to
monitor the environment from the top level to see all the virtual instances
(non-global zones) in real time with the ability to drill down to specific
resources.
4. Introduction to Hadoop
The Apache Hadoop software is a framework that allows for the
distributed processing of large data sets across clusters of computers
using simple programming models.
To store data, Hadoop uses the Hadoop Distributed File System
(HDFS), which provides high-throughput access to application data and
is suitable for applications that have large data sets.
The Hadoop cluster building blocks are as follows:
NameNode: The centerpiece of HDFS, which stores file system
metadata, directs the slave DataNode daemons to perform the low-level
I/O tasks, and also runs the JobTracker process.
Secondary NameNode: Performs internal checks of the NameNode
transaction log.
DataNodes: Nodes that store the data in HDFS, which are also known
as slaves and run the TaskTracker process.
7. Overview
Best practice for any performance analysis is to
get a bird's eye view of the running environment in
order to see which resource is the busiest, and
then drill down to each resource.
We will the Solaris 11 zonestat and the fsstat
commands in order to answer this question.
8. Benchmark Description
The first Hadoop benchmark that we are going to run to load our
environment is Pi Estimator. Pi Estimator is a MapReduce
program that employs a Monte Carlo method to estimate the
value of pi.
In this example, we're going to use 128 maps and each of the
maps will compute one billion samples (for a total of 128 billion
samples).
root@global_zone:~# zlogin -l hadoop name-node hadoop jar
/usr/local/hadoop/hadoop-examples-1.2.0.jar pi 128 1000000000
Where:
“zlogin -l hadoop name-node” running the command as user hadoop on the name-node
zone
“hadoop jar /usr/local/hadoop/hadoop-examples-1.2.0.jar pi” the hadoop jar file
“128” the number of maps
“1000000000” the number of samples
9. zonestat
The zonestat allows us to monitor all the Oracle Solaris Zones running in our
environment and provides real-time statistics for the CPU utilization, memory
utilization, and network utilization.
Run the zonestat command at 10-second intervals
root@global_zone:~# zonestat 10 10
Interval: 1, Duration: 0:00:10
SUMMARY
Cpus/Online: 128/12
PhysMem: 256G
VirtMem: 259G
---CPU---- --PhysMem-- --VirtMem-- --PhysNet-ZONE USED %PART USED %USED USED %USED PBYTE %PUSE
[total] 118.10 92.2% 24.6G 9.62% 60.0G 23.0% 18.4E 100%
[system] 0.00 0.00% 9684M 3.69% 40.5G 15.5%
data-node3 42.13 32.9% 4897M 1.86% 6146M 2.30% 18.4E 100%
data-node1 41.49 32.4% 4891M 1.86% 6173M 2.31% 18.4E 100%
data-node2 33.97 26.5% 4851M 1.85% 6145M 2.30% 18.4E 100%
global 0.34 0.27% 283M 0.10% 420M 0.15% 2192 0.00%
name-node 0.15 0.11% 419M 0.15% 718M 0.26%
126 0.00%
sec-name-node 0.00 0.00% 205M 0.07% 363M 0.13%
0 0.00%
As we can see from the zonestat output the Pi program is CPU bound application
CPU (%PART 92.2.0%).
10. mpstat
Another useful command to show if CPU utilization balanced evenly across the available
CPUs is the mpstat command.
The following is the output of the Oracle Solaris mpstat(1M) command. Each line
represents one virtual CPU.
root@global_zone:~# mpstat 1
CPU minf mjf xcal intr ithr csw icsw migr smtx srw
0
85
0 10183
683
59 931
40 269 464
1
80
0 34872
484
9 1096
39 317 498
2
72
0 15632
325
4 669
30 166 334
3
42
0 13422
253
3 553
32 144 277
syscl usr sys wt
2 1315
30 14
2 1437
34 14
1 1321
37
9
2
818
31
7
idl
0 56
0 51
0 54
0 62
on system with many CPUs the mpstat output can be very long; we can monitor
CPU utilization per core.
root@global_zone:~# mpstat -A core 10
COR minf mjf xcal intr ithr csw icsw migr smtx srw syscl usr sys wt
3074 103
0 23654 1680 697 1264 644 277 502
10 11268 748 52
3078
95
0 32090
893 137 1228 635 281 439
8 10929 759 41
3082
94
0 31574
889 129 1245 629 308 560
9 12792 753 47
3086 111
0 20262
829 121 1200 615 277 512
7 12657 753 47
idl
0
0
0
0
sze
0
0
0
0
In addition mpstat can print performance statistics pet socket or processor-set, for
more examples see mpstat(1M).
8
8
8
8
12. fsstat
The fsstat command allows us to monitor disk I/O activity per disk or per Oracle
Solaris Zone
For example, we can monitor the writes to all ZFS file systems at 10-second intervals
root@global_zone:~# fsstat -Z zfs 10 10
new name
name attr attr lookup rddir read read write write
file remov chng
get
set
ops
ops
ops bytes
ops bytes
0
0
0
0
0
0
0
0
0
0
0
0
0
744
0
0
151
0
359
0
413
0
14
0
14
11.4K
0 6.01K 5.87M
0
0 3.27K
0 1.41K 1.94M
0 8.72K
0 2.75K 3.95M
0 9.03K
0 2.98K 4.22M
0
51
0
0
0
0
51
0
0
0
0 zfs:global
7 1.42K zfs:data-node1
22 4.06K zfs:data-node2
21 4.34K zfs:data-node3
0
0 zfs:name-node
0
0 zfs:sec-name-node
The default report shows general file system activity. This display combines similar
operations into general categories as follows:
13. vmstat
Based on the zonestat, mpstat, and fsstat output, the conclusion is that the Pi Estimator program
is a CPU-bound application
So let's continue our CPU performance analysis. The next question that we are going to ask is
whether there idle CPU time.
root@global_zone:~# vmstat 1
kthr
memory
page
disk
r b w
swap free re mf pi po fr de sr s3 s4 s5 s6
faults
in
sy
cpu
cs us sy id
kthr
memory
page
disk
faults
cpu
r b w
swap free re mf pi po fr de sr s3 s4 s5 s6
in
sy
cs us sy id
8 0 0 213772168 245340872 770 5954 0 0 0 0 0 0 0 0 0 17732 161637 39181 93 7 0
12 0 0 213346168 244887200 134 2237 0 0 0 0 0 0 0 0 0 13689 140604 19640 96 4 0
17 0 0 212974464 244353760 124 1939 0 0 0 0 0 0 0 0 0 12079 130895 17225 96 4 0
A value of 0 in the id column means that the system's CPU is 100 percent busy!
14. prstat
You can also track run queue latency by using the prstat -Lm command and noting
the value in the LAT column.
we can use the prstat command to see whether the CPU cycles are being consumed
in user mode or in system (kernel) mode:
root@global_zone:~# prstat –ZmL
Total: 310 processes, 8269 lwps, load averages: 47.63, 48.79, 36.98
PID USERNAME USR SYS TRP TFL DFL LCK SLP LAT VCX ICX SCL SIG
PROCESS/LWPID
19338 hadoop
100 0.0 0.0 0.0 0.0 0.0 0.0 0.3
0 73
0
0 java/2
19329 hadoop
100 0.0 0.0 0.0 0.0 0.0 0.0 0.4
0 86
0
0 java/2
19519 hadoop
84 15 0.1 0.0 0.2 0.0 0.0 0.8 56 153 29K
0 java/2
19503 hadoop
88 11 0.1 0.0 0.3 0.1 0.0 1.0 52 168 23K
3 java/2
The prstat output shows that the system CPU cycles are being consumed in user
mode (USR)
15. Virtualizasion-aware
When you use the -Z option, it prints under an additional ZONE column header the name
of the zone with which the process is associated.
Note: The command is aware that it's running within a non-global zone; thus, it can't see
other user processes when running from the non-global zone.
For example, to print all the Hadoop processes that are running now,
root@global_zone:~# ps -efZ | grep hadoop
ZONE
UID
PID PPID
C
STIME TTY
TIME CMD
data-nod
hadoop 14024 11795
0 07:38:19 ?
0:20
/usr/jdk/instances/jdk1.6.0/jre/bin/java -Djava.library.path=/usr/local/hadoopdata-nod
hadoop 14026 11798
0 07:38:19 ?
0:19
/usr/jdk/instances/jdk1.6.0/jre/bin/java -Djava.library.path=/usr/local/hadoopname-nod
hadoop 11621
1
0 07:20:12 ?
0:59 /usr/java/bin/java
-Dproc_jobtracker -Xmx1000m -Dcom.sun.management.jmxremote -
16. Virtualizasion Aware
We want to observe the application that is responsible for the load.
For example, what code paths are making the CPUs busy?
And which process in each zone is responsible for the system load?
In the next example, we are going to drill down into one of the Oracle Solaris Zones to
understand which application or process is responsible to the load.
Let's login into the data-node1 zone.
root@global_zone:~# zlogin data-node1
We can use the prstat command inside the zone to see which process is responsible
for the system load.
root@data-node1:~# prstat –mLc
PID USERNAME USR
22866 root
22715 hadoop
22704 hadoop
SYS TRP TFL DFL LCK SLP LAT VCX ICX SCL SIG PROCESS/LWPID
24 74 1.6 0.0 0.0 0.0 0.0 0.0 122 122 85K
0 prstat/1
80 3.3 0.1 0.0 0.0 4.0 0.1 12 45 201 4K
4 java/2
80 3.3 0.2 0.0 0.0 6.2 0.4 10 61 277 4K 10 java/2
18. DISK I/O -Cont'd
The first command that we are going to use in the disk I/O performance observation
in the fsstat command, which allows us to analyze disk I/O workload per Oracle
Solaris Zone and see file system statistics for each file system.
The following example shows per-zone statistics for zones data-node1, data-node2,
and data-node3 as well as a system-wide aggregate for the tmpfs and zfs file
systems.
root@global_zone:~# fsstat -A -Z tmpfs zfs 10 10
new name
file remov
126
0
0
0
20
0
52
0
0
0
0
0
54
0
156
0
0
0
52
0
52
0
0
0
0
0
52
0
name attr
chng
get
128 1.57K
0
0
20
260
52
612
0
40
0
40
56
656
162 1.78K
0
0
54
511
54
512
0
140
0
140
54
518
attr lookup rddir
set
ops
ops
512 15.9K
0
0
0
0
80 2.55K
0
208 6.36K
0
0
70
0
0
70
0
224 6.83K
0
0 22.9K
0
0
3
0
0 4.52K
0
0 8.46K
0
0
514
0
0
510
0
0 8.95K
0
read
ops
0
0
0
0
0
0
0
28
2
0
12
1
0
13
read
bytes
0
0
0
0
0
0
0
3.16K
599
0
1.28K
4
0
1.29K
write
ops
127
0
20
52
0
0
55
175K
0
58.3K
58.3K
106
0
58.3K
write
bytes
15.9K
0
2.50K
6.50K
0
0
6.88K
5.45G
0
1.82G
1.82G
19.2K
0
1.81G
tmpfs
tmpfs:global
tmpfs:data-node2
tmpfs:data-node3
tmpfs:name-node
tmpfs:sec-name-node
tmpfs:data-node1
zfs
zfs:global
zfs:data-node2
zfs:data-node3
zfs:name-node
zfs:sec-name-node
zfs:data-node1
19. DISK I/O -Cont'd
Next, we want to pinpoint our measurements for a specific Oracle Solaris Zone.
The following example shows per-zone statistics for zones data-node1 data-node2 and
data-node3,
As well as a system-wide aggregate, for the tmpfs and zfs file systems.
root@global_zone:~# fsstat -A -Z -z data-node1 -z data-node2 -z data-node3
tmpfs zfs 10 10
new name
name
file remov chng
140
13
116
20
0
20
node2
57
5
46
node3
63
8
50
node1
154
0
94
52
0
32
52
0
32
50
0
30
attr attr lookup rddir read read write write
get
set
ops
ops
ops bytes
ops bytes
3.16K
512 42.7K
16
242 926K
250 342K tmpfs
266
80 2.56K
0
0
0
20 2.50K tmpfs:data1.35K
204
19.2K
8
115
436K
113
170K tmpfs:data-
1.47K
228
20.8K
8
127
491K
117
170K tmpfs:data-
7.74K
445
2.98K
3.04K
0
0
0
0
85.6K
4.25K
31.0K
32.9K
40 20.9K 29.8M 127K
0
0
0 43.0K
20 6.63K 10.9M 43.1K
20 7.21K 11.9M 41.0K
3.96G
1.34G
1.34G
1.28G
zfs
zfs:data-node2
zfs:data-node3
zfs:data-node1
20. DISK I/O -Cont'd
Next, we are going to drill down to watch individual disk read and write operations.
First let’s get the ZFS pool names.
root@global_zone:~# zpool list
NAME
data-node1-pool
data-node2-pool
data-node3-pool
rpool
SIZE
556G
556G
556G
278G
ALLOC
56.7G
56.3G
56.4G
21.7G
FREE
499G
500G
500G
256G
CAP
10%
10%
10%
7%
We can see that we have four ZFS zpools
DEDUP
1.00x
1.00x
1.00x
1.00x
HEALTH
ONLINE
ONLINE
ONLINE
ONLINE
ALTROOT
-
21. DISK I/O -Cont'd
We can monitor all the ZFS zpools at the same time using the following command:
root@global_zone:~# zpool iostat -v 10
pool
------------------------data-node1-pool
c0t5000CCA0160D3264d0
------------------------data-node2-pool
c0t5000CCA01612A4F0d0
------------------------data-node3-pool
c0t5000CCA016295ABCd0
------------------------rpool
c0t5001517803D013B3d0s0
-------------------------
capacity
alloc
free
----- ----31.1G
525G
31.1G
525G
----- ----31.0G
525G
31.0G
525G
----- ----31.0G
525G
31.0G
525G
----- ----22.0G
256G
22.0G
256G
----- -----
operations
read write
----- ----2
9
2
9
----- ----2
10
2
10
----- ----1
9
1
9
----- ----10
7
10
7
----- -----
bandwidth
read write
----- ----124K 6.49M
124K 6.49M
----- ----91.0K 6.50M
91.0K 6.50M
----- ----103K 6.49M
103K 6.49M
----- ----95.0K 64.1K
95.0K 64.1K
----- -----
22. DISK I/O -Cont'd
We can also use the iostat command to see how fast the disk I/O operations are
being processed on a per-device basis.
root@global_zone:~# iostat -xnz 5 10
extended device statistics
r/s
w/s
kr/s
kw/s wait actv wsvc_t asvc_t %w %b device
1.6
10.8
47.9 3765.1 0.0 0.2
0.1
16.4
0
2
c0t5001517803D013B3d0
1.2
7.1 365.9 2238.4 0.0 0.2
0.1
19.6
0
2
c0t5000CCA0160D3264d0
0.9
8.5 279.4 2237.7 0.0 0.2
0.1
16.7
0
2
c0t5000CCA01612A4F0d0
1.1
8.8 335.9 2237.2 0.0 0.2
0.1
16.3
0
2
c0t5000CCA016295ABCd0
extended device statistics
r/s
w/s
kr/s
kw/s wait actv wsvc_t asvc_t %w %b device
0.0
16.6
0.0
50.1 0.0 0.0
0.0
0.3
0
0
c0t5001517803D013B3d0
31.0
15.6 13346.7
44.4 0.0 0.8
0.0
17.1
0 12
c0t5000CCA0160D3264d0
0.0
15.0
0.0
47.0 0.0 0.0
0.0
1.8
0
1
c0t5000CCA016295ABCd0
extended device statistics
23. DISK I/O -Cont'd
Another useful tool is the iotop DTrace script, which displays top disk I/O events by
process per Oracle Solaris Zone
root@global_zone:~# /usr/dtrace/DTT/iotop -Z 10 10
Tracing... Please wait.
2013 Oct
ZONE
0
0
0
7 08:40:19,
PID
717
5
896
PPID
0
0
0
load: 24.38,
CMD
zpool-data-node3
zpool-rpool
zpool-data-node1
disk_r:
DEVICE
sd6
sd3
sd4
0 KB,
MAJ MIN D
73 48 W
73 24 W
73 32 W
disk_w:
1886 KB
BYTES
347648
417280
1195520
You can see the zone ID (ZONE), process ID (PID), type of operation (read or write, D),
and total size of the operation (BYTES).
25. Monitoring Memory Utilization
First let’s print how much physical memory the system has.
root@global_zone:~# prtconf -v | grep Mem
Memory size: 262144 Megabytes
We can see that we have 256 GB of memory in the system.
Second, let's get more information about how the system memory is being allocated
root@global_zone:~# echo ::memstat | mdb -k
Page Summary
Pages
--------------------------Kernel
1473974
ZFS File Data
4990336
Anon
2223697
Exec and libs
3342
Page cache
5244141
Free (cachelist)
27122
Free (freelist)
19591820
Total
33554432
MB
---------------11515
38987
17372
26
40969
211
153061
262144
%Tot
---4%
15%
7%
0%
16%
0%
58%
26. Monitoring Memory -Cont'd
To find how much free memory is currently available in the system, we can use the
vmstat command, as shown in Listing 23, and look at the value in the free column (the
unit is KB) for any line other than the first line.
root@global_zone:~# vmstat 10
kthr
memory
page
r b w
swap free re mf pi po fr de
1 0 0 202844144 233325872 315 1311 0 0
4 0 0 110774160 142093304 347 3681 0 0
5 0 0 110862440 142055728 347 3671 0 0
3 0 0 111113056 142043608 331 3525 0 0
disk
faults
cpu
sr s3 s4 s5 s6
in
sy
cs us sy id
0 0 1 15 19 19 18 23352 32919 46222 3 4 93
0 0 0 0 27 15 18 72275 48754 148884 1 11 88
0 0 0 19 15 22 16 72286 48292 148838 1 11 88
0 0 0 0 20 29 20 70099 49362 143970 1 11 88
The command output shows that the system has about 138 GB of free memory.
This is the memory that has no association with any file or process.
27. Monitoring Memory -Cont'd
We can use is the prstat command in order to see process statistics for the system
and virtual machines (non-global zones) is the prstat command
root@global_zone:~# prstat -ZmLc
PID USERNAME SIZE
RSS STATE
20025 hadoop
293M 253M cpu60
20739 hadoop
285M 241M sleep
17206 hadoop
285M 237M sleep
17782 hadoop
281M 229M sleep
17356 hadoop
289M 241M sleep
11621 hadoop
166M 126M sleep
ZONEID
NPROC SWAP
4
74 7246M
3
53 7442M
5
52 7108M
2
32 675M
0
82 870M
Total: 322 processes,
PRI NICE
59
0
59
0
59
0
59
0
59
0
59
0
TIME CPU PROCESS/NLWP
0:00:49 12% java/68
0:00:49 10% java/68
0:01:07 10% java/68
0:00:57 7.4% java/67
0:01:04 7.0% java/68
0:02:32 5.9% java/90
RSS MEMORY
TIME CPU ZONE
6133M
2.3%
2:31:34 43% data-node2
6248M
2.4%
2:23:01 30% data-node1
6001M
2.3%
2:27:40 22% data-node3
468M
0.1%
0:04:36 4.0% name-node
414M
0.1%
1:19:20 1.0% global
8024 lwps, load averages: 15.54, 18.25, 20.09
From the prstat output, we can see the following information for each Oracle Solaris Zone:
The SWAP column shows the total virtual memory size for each zone.
The RSS column shows the total zone-resident set size (main memory usage).
The MEMORY column shows the main memory consumed, as a percentage of system-wide resources.
The CPU column shows the CPU consumed, as a percentage of system-wide resources.
The ZONE column shows each zone's name.
28. Monitoring Memory -Cont'd
We can use to monitor memory utilization in a virtualized environment is the zvmstat
command, which prints vmstat output for each zone
root@global_zone:~# /usr/dtrace/DTT/Bin/zvmstat 10
ZONE
re
global 273
sec-name-node
name-nodenode
data-node1ode
data-node2ode
data-node3ode
mf
fr
218
0
0
0
0
0
0
0
0
0
0
0
sr
0
0
0
0
0
0
epi
0
0
0
0
0
0
epo
0
0
0
0
0
0
epf
0
0
0
0
0
0
api
0
0
0
0
0
0
apo
0
0
0
0
0
0
apf
0
0
0
0
0
0
fpi
0
0
0
0
0
0
fpo
0
0
0
0
0
0
fpf
0
0
0
0
0
0
0
0
0
0
0
29. Monitoring Network
•In the Hadoop cluster, most of the network traffic is for HDFS
data replication between the DataNodes.
•The questions that we will be answering are as follows:
•Which zones are seeing the highest and lowest network traffic?
•Which is the busiest zone in terms of the number of network
connections that it handles currently?
•How can we monitor specific network resources, for example,
Oracle Solaris Zones, physical network cards, or virtual network
interface cards (VNICs).
31. Monitoring Network -Cont'd
First let's view our network setup by using the dladm command to
show how many physical network cards we have:
root@global_zone:~# dladm show-phys
LINK
net0
net2
net1
net3
net4
MEDIA
Ethernet
Ethernet
Ethernet
Ethernet
Ethernet
Let's prints the VNIC information
root@global_zone:~# dladm show-vnic
STATE
up
unknown
unknown
unknown
up
SPEED
1000
0
0
0
10
DUPLEX
full
unknown
unknown
unknown
full
DEVICE
ixgbe0
ixgbe2
ixgbe1
ixgbe3
usbecm0
32. Monitoring Network -Cont'd
We can use the zonestat command with the -r and -x options for extended networking
information to pinpoint our measurements to specific Oracle Solaris Zones, for example,
monitoring the network traffic on three DataNode zones (data-node1, data-node2, and
data-node3)
root@global_zone:~# zonestat -z data-node1 -z data-node2 -z data-node3 -r network -x 10
Collecting data for first interval...
Interval: 1, Duration: 0:00:10
NETWORK-DEVICE
SPEED
STATE
TYPE
net0
1000mbps
up
phys
ZONE
LINK TOBYTE MAXBW %MAXBW PRBYTE %PRBYTE POBYTE %POBYTE
[total]
net0
269M
198
0.00% 18.4E
100%
global
net0
2642 13474770085G
198
0.00%
284
0.00%
data-node1 data-node1/net0 93.6M
0
0.00% 18.4E
100%
data-node3 data-node3/net0 91.3M
0
0.00% 18.4E
100%
data-node2 data-node2/net0 84.4M
0
0.00% 18.4E
100%
name-node name-node/net0
304K
0
0.00% 18.4E
100%
data-node3
data_node3
2340
0
0.00%
0
0.00%
sec-name-node sec-name-node/net0
2340
0
0.00%
0
0.00%
data-node2
data_node2
2280
0
0.00%
0
0.00%
name-node
name_node1
2280
0
0.00%
0
0.00%
data-node1
data_node1
2220
0
0.00%
0
0.00%
sec-name-node secondary_name1
2220
0
0.00%
0
0.00%
33. Monitoring Network -Cont'd
We can drill down to a specific network resource for example:
We can monitor physical network interface (net0).
root@global_zone:~# dlstat net0 -i 10
LINK
^C
IPKTS
RBYTES
net0
39.41K
net0
45
net0
43
net0
41
OPKTS
OBYTES
2.63M
8.16K
2.74K
1
2.61K
1
2.47K
1
1.44M
198
150
150
Note: To stop the dlstat command, press Ctrl-c.
We can monitor only the VNIC which is associated with the data-node1
zone.
root@global_zone:~# dlstat name_node1 -i 10
LINK
data_node1
data_node1
data_node1
data_node1
IPKTS
26.30K
42
43
31
RBYTES
1.59M
2.70K
2.58K
1.86K
OPKTS
0
0
0
0
OBYTES
0
0
0
0
34. Monitoring Network -Cont'd
We can also monitor our network traffic on a specific TCP or UDP port.
This is useful if we want to monitor how the data replication between two Hadoop
clusters is progressing, for example, data being replicated from Hadoop cluster A to
Hadoop cluster B, which is located in a different data center
35. Monitoring Network -Cont'd
Flow is a sophisticated quality of service (QoS) mechanism built into the new Oracle
Solaris 11 network virtualization architecture, and it allows us to measure or limit the
network bandwidth for a specific network port on a specific network interface
In the following example, we will set up a flow that is associated with the TCP 8020
network port on the name_node1 network interface.
Create the flow
root@name-node:~# flowadm add-flow -l name_node1 -a transport=TCP,local_port=8020 distcp-flow
Note: You don't need to reboot the zone in order to enable or disable the flow. This is
very useful when you need to debug network performance issues on a production
system!
Verify the flow creation:
root@name_node:~# flowadm show-flow
FLOW
LINK
distcp-flow name_node1
IPADDR
--
PROTO LPORT RPORT DSFLD
tcp 8020 --
36. Monitoring Network -Cont'd
To report the bandwidth on the distcp-flow flow, which monitors TCP port
8020, use the command shown
root@name_node:~# flowstat -i 1
FLOW IPKTS RBYTES IDROPS OPKTS OBYTES ODROPS
distcp-flow 24.72M 37.17G 0 3.09M 204.08M 0
distcp-flow 749.28K 1.13G 0 93.73K 6.19M 0
distcp-flow 783.68K 1.18G 0 98.03K 6.47M 0
distcp-flow 668.83K 1.01G 0 83.66K 5.52M 0
distcp-flow 783.87K 1.18G 0 98.07K 6.47M 0
distcp-flow 775.34K 1.17G 0 96.98K 6.40M 0
distcp-flow 777.15K 1.17G 0 97.21K 6.42M 0
^C
Note: To stop the flowstat command, press Ctrl-c.
37. Conclusion
In this slide deckwe saw how we can leverage the new Oracle Solaris 11 performance
analysis tools to observe and monitor a virtualized environment that hosts a Hadoop
cluster.
For more information:
MY Blog https://blogs.oracle.com/vreality
Hands on lab http://www.oracle.com/technetwork/systems/hands-on-labs/hol-setuphadoop-solaris-2041770.html
How to Set Up a Hadoop Cluster Using Oracle Solaris Zones
http://www.oracle.com/technetwork/articles/servers-storage-admin/howto-setuphadoop-zones-1899993.html
Performance Analysis in a Multitenant Cloud Environment
http://www.oracle.com/technetwork/articles/servers-storage-admin/perf-analysismultitenant-cloud-2082193.html