A lot of effort has gone into cloud storage peformance benchmarking, both of swift and other cloud stacks and part of the result is a lot of confusion in the numbers, in large part because there is no standard. This is further complicated because some implementations are written in java, some in python and some in raw curl. Furthermore, the underlying libraries themselves can cause variances as they do not all use the same buffer sizes, enable/disable ssl-compression and probably other parameters as well.
I would like to talk about our benchmarking methodologies at HP as well as describe a tool suite I've developed that implements them and share some results of benchmarking our own OpenStack implementation. One thing I've discovered over previous months of testing is that both latency and cpu overhead can have a major impact on performance and those are captured as well, something most tools typically don't report.
The tools are written in python and use the OpenStack python-swiftclient library.
Speakers
Mark Seger
Video and slides synchronized, mp3 and slide download available at URL http://bit.ly/1N4GN6z.
Brendan Gregg focuses on broken tools and metrics instead of the working ones. Metrics can be misleading, and counters can be counter-intuitive. Gregg includes advice and methodologies for verifying new performance tools, understanding how they work, and using them successfully. Filmed at qconsf.com.
Brendan Gregg is a senior performance architect at Netflix, where he does large scale computer performance design, analysis, and tuning. He is the author of multiple technical books including Systems Performance published by Prentice Hall, and received the USENIX LISA Award for Outstanding Achievement in System Administration.
Accumulo includes a remarkable breadth of testing frameworks, which helps to ensure its correctness, performance, robustness, and protection of your vital data. This presentation takes you on a tour from Accumulo's basic unit testing up through performance and scalability testing exercised on running clusters. Learn the extent to which Accumulo is put through its paces before it is released, and get ideas for how you can similarly enhance testing of your own code.
Find this talk and others at http://www.slideshare.net/AccumuloSummit.
The document summarizes a study measuring water pressure and temperature at Gooseberry Island Causeway over multiple days. Sensors were placed on both sides of the causeway and recorded a differential of up to 12 inches in water height, indicating the causeway is restricting tidal flow. This supports the conclusion that the 1997 study underestimated the causeway's environmental impact by changing water circulation in the harbor. Additional observations of tide-temperature patterns and anomalies in temperature readings over time suggest more information could be gleaned with further analysis.
Real-time in the real world: DIRT in productionbcantrill
This document discusses the challenges of building and debugging DIRT (data-intensive real-time) applications in production. It provides examples from the mobile push-to-talk app Voxer, which is described as a canonical DIRT app. Specific issues covered include application restarts inducing latency bubbles, dropped TCP connections causing latency outliers, and identifying sources of slow disk I/O. Tools like DTrace are highlighted as being essential for instrumentation and problem diagnosis in DIRT apps.
LinuxCon Europe, 2014. Video: https://www.youtube.com/watch?v=SN7Z0eCn0VY . There are many performance tools nowadays for Linux, but how do they all fit together, and when do we use them? This talk summarizes the three types of performance tools: observability, benchmarking, and tuning, providing a tour of what exists and why they exist. Advanced tools including those based on tracepoints, kprobes, and uprobes are also included: perf_events, ktap, SystemTap, LTTng, and sysdig. You'll gain a good understanding of the performance tools landscape, knowing what to reach for to get the most out of your systems.
DRC-HUBO is Rainbow Robotics' humanoid robot that competed in the DRC Finals. It uses a modular, lightweight exoskeletal design with effective cooling and power systems. PODO-RT is the real-time framework that controls DRC-HUBO. It uses a distributed architecture with independent processes communicating over shared memory for high-speed control. DRC-HUBO demonstrated a variety of autonomous tasks at the DRC Finals, including driving, opening doors, using tools, and traversing rough terrain.
The document provides an overview of common Linux performance analysis tools including top, mpstat, ps, sar, and others. It discusses how these tools can be used to monitor CPU, memory, disk, network, and process-level performance metrics. Examples are given showing output from top displaying real-time process activity, mpstat showing CPU utilization breakdown, ps displaying currently running processes, and sar historical CPU utilization. The tools can help identify potential performance issues like high CPU usage, memory pressures, and disk or network bottlenecks.
This document summarizes Marc Merlin's talk at Linuxcon 2013 about how Google migrated thousands of servers from an old Red Hat distribution to a newer Debian-based one over 10 years. It describes how they incrementally upgraded packages and libraries, converted RPMs to DEBs, and pushed the new image in a way that avoided disruption of services. Through stripping packages, rebuilding from source, and slowly merging the distributions, they were able to successfully complete the large-scale live upgrade.
Video and slides synchronized, mp3 and slide download available at URL http://bit.ly/1N4GN6z.
Brendan Gregg focuses on broken tools and metrics instead of the working ones. Metrics can be misleading, and counters can be counter-intuitive. Gregg includes advice and methodologies for verifying new performance tools, understanding how they work, and using them successfully. Filmed at qconsf.com.
Brendan Gregg is a senior performance architect at Netflix, where he does large scale computer performance design, analysis, and tuning. He is the author of multiple technical books including Systems Performance published by Prentice Hall, and received the USENIX LISA Award for Outstanding Achievement in System Administration.
Accumulo includes a remarkable breadth of testing frameworks, which helps to ensure its correctness, performance, robustness, and protection of your vital data. This presentation takes you on a tour from Accumulo's basic unit testing up through performance and scalability testing exercised on running clusters. Learn the extent to which Accumulo is put through its paces before it is released, and get ideas for how you can similarly enhance testing of your own code.
Find this talk and others at http://www.slideshare.net/AccumuloSummit.
The document summarizes a study measuring water pressure and temperature at Gooseberry Island Causeway over multiple days. Sensors were placed on both sides of the causeway and recorded a differential of up to 12 inches in water height, indicating the causeway is restricting tidal flow. This supports the conclusion that the 1997 study underestimated the causeway's environmental impact by changing water circulation in the harbor. Additional observations of tide-temperature patterns and anomalies in temperature readings over time suggest more information could be gleaned with further analysis.
Real-time in the real world: DIRT in productionbcantrill
This document discusses the challenges of building and debugging DIRT (data-intensive real-time) applications in production. It provides examples from the mobile push-to-talk app Voxer, which is described as a canonical DIRT app. Specific issues covered include application restarts inducing latency bubbles, dropped TCP connections causing latency outliers, and identifying sources of slow disk I/O. Tools like DTrace are highlighted as being essential for instrumentation and problem diagnosis in DIRT apps.
LinuxCon Europe, 2014. Video: https://www.youtube.com/watch?v=SN7Z0eCn0VY . There are many performance tools nowadays for Linux, but how do they all fit together, and when do we use them? This talk summarizes the three types of performance tools: observability, benchmarking, and tuning, providing a tour of what exists and why they exist. Advanced tools including those based on tracepoints, kprobes, and uprobes are also included: perf_events, ktap, SystemTap, LTTng, and sysdig. You'll gain a good understanding of the performance tools landscape, knowing what to reach for to get the most out of your systems.
DRC-HUBO is Rainbow Robotics' humanoid robot that competed in the DRC Finals. It uses a modular, lightweight exoskeletal design with effective cooling and power systems. PODO-RT is the real-time framework that controls DRC-HUBO. It uses a distributed architecture with independent processes communicating over shared memory for high-speed control. DRC-HUBO demonstrated a variety of autonomous tasks at the DRC Finals, including driving, opening doors, using tools, and traversing rough terrain.
The document provides an overview of common Linux performance analysis tools including top, mpstat, ps, sar, and others. It discusses how these tools can be used to monitor CPU, memory, disk, network, and process-level performance metrics. Examples are given showing output from top displaying real-time process activity, mpstat showing CPU utilization breakdown, ps displaying currently running processes, and sar historical CPU utilization. The tools can help identify potential performance issues like high CPU usage, memory pressures, and disk or network bottlenecks.
This document summarizes Marc Merlin's talk at Linuxcon 2013 about how Google migrated thousands of servers from an old Red Hat distribution to a newer Debian-based one over 10 years. It describes how they incrementally upgraded packages and libraries, converted RPMs to DEBs, and pushed the new image in a way that avoided disruption of services. Through stripping packages, rebuilding from source, and slowly merging the distributions, they were able to successfully complete the large-scale live upgrade.
The document discusses performance analysis methodologies, beginning with some anti-methodologies like blaming others or only using familiar tools. It then covers common methodologies like using ad hoc checklists of steps, characterizing the workload, and performing drill-down analysis using tools like the USE method and latency analysis to diagnose a database slowdown issue caused by memory pressure.
(PFC302) Performance Benchmarking on AWS | AWS re:Invent 2014Amazon Web Services
In this session, we explain how to measure the key performance-impacting metrics in a cloud-based application and best practices for a reliable benchmarking process. Measuring the performance of applications correctly can be challenging and there are many tools available to measure and track performance. This session will provide you with specific examples of good and bad tests. We make it clear how to get reliable measurements of and how to map benchmark results to your application. We also cover the importance of selecting tests wisely, repeating tests, and measuring variability. In addition a customer will provide real-life examples of how they developed their application testing stack, utilize it for repeatable testing and identify bottlenecks.
This document provides information on monitoring Linux system resources and performance. It discusses tools like vmstat, sar, iostat for monitoring CPU usage, memory usage, I/O usage, and other metrics. It also covers Linux processes, memory management, and block device monitoring.
Delivered at the FISL13 conference in Brazil: http://www.youtube.com/watch?v=K9w2cipqfvc
This talk introduces the USE Method: a simple strategy for performing a complete check of system performance health, identifying common bottlenecks and errors. This methodology can be used early in a performance investigation to quickly identify the most severe system performance issues, and is a methodology the speaker has used successfully for years in both enterprise and cloud computing environments. Checklists have been developed to show how the USE Method can be applied to Solaris/illumos-based and Linux-based systems.
Many hardware and software resource types have been commonly overlooked, including memory and I/O busses, CPU interconnects, and kernel locks. Any of these can become a system bottleneck. The USE Method provides a way to find and identify these.
This approach focuses on the questions to ask of the system, before reaching for the tools. Tools that are ultimately used include all the standard performance tools (vmstat, iostat, top), and more advanced tools, including dynamic tracing (DTrace), and hardware performance counters.
Other performance methodologies are included for comparison: the Problem Statement Method, Workload Characterization Method, and Drill-Down Analysis Method.
Training Slides: Intermediate 204: Identifying and Resolving Issues with Tung...Continuent
In this intermediate training session we look at resolving issues that may occur with replication (Tungsten Replicator) and clustering (Tungsten Clustering). This training is for all engineers using Continuent Tungsten solutions, as the tools we use will apply to any of our products and topologies. Some MySQL replication knowledge is assumed.
AGENDA
- How to identify an issue when scanning logs
- Tools to help identify issues
- Identifying replication lag / diagnosing replication latency
- Resolving common replication issues
- Resolving common clustering issues
What every Java developer should know about network?aragozin
This document summarizes key aspects of TCP networking that every Java developer should know. It covers TCP fundamentals like reliable in-order data transmission, connection establishment through binding and accepting sockets, graceful connection closure, and error handling. It also discusses tuning options like buffer sizes, Nagle's algorithm, and keepalive timeouts. The document explains TCP congestion control mechanisms like slow start and uses diagrams to illustrate concepts like the sliding window and congestion avoidance phases. It also cautions about network hazards that can be simulated for testing.
MeetBSDCA 2014 Performance Analysis for BSD, by Brendan Gregg. A tour of five relevant topics: observability tools, methodologies, benchmarking, profiling, and tracing. Tools summarized include pmcstat and DTrace.
This document summarizes tools and techniques for Java profiling and diagnostics. It discusses using JMX, JVMTI, and the Attach API to gather information on threading, memory usage, garbage collection, and perform actions like heap dumps. It also introduces the SJK toolkit which provides commands for profiling tasks and the Sigar and BTrace tools. Real-world uses of profiling techniques are presented, like benchmarking and diagnosing production systems. Future ideas proposed include a visual thread analyzer and scripting-based heap dump exploration.
This document discusses various tools for Java profiling and diagnostics including SJK, BTrace, JVM attach API, and perf counters. SJK is a command line tool that exploits JMX, attach API, and perf counters to provide commands for thread profiling, garbage collection analysis, heap dumps, and other diagnostic information. BTrace allows injecting code snippets to perform instrumentation profiling. The JVM attach API can be used to attach to running JVMs and perform operations like heap dumps and stack traces. Perf counters provide low-overhead access to JVM counters via shared memory. The document provides examples and links to documentation for these various Java profiling and diagnostic tools.
SREcon 2016 Performance Checklists for SREsBrendan Gregg
Talk from SREcon2016 by Brendan Gregg. Video: https://www.usenix.org/conference/srecon16/program/presentation/gregg . "There's limited time for performance analysis in the emergency room. When there is a performance-related site outage, the SRE team must analyze and solve complex performance issues as quickly as possible, and under pressure. Many performance tools and techniques are designed for a different environment: an engineer analyzing their system over the course of hours or days, and given time to try dozens of tools: profilers, tracers, monitoring tools, benchmarks, as well as different tunings and configurations. But when Netflix is down, minutes matter, and there's little time for such traditional systems analysis. As with aviation emergencies, short checklists and quick procedures can be applied by the on-call SRE staff to help solve performance issues as quickly as possible.
In this talk, I'll cover a checklist for Linux performance analysis in 60 seconds, as well as other methodology-derived checklists and procedures for cloud computing, with examples of performance issues for context. Whether you are solving crises in the SRE war room, or just have limited time for performance engineering, these checklists and approaches should help you find some quick performance wins. Safe flying."
Talk for USENIX LISA17: "Containers pose interesting challenges for performance monitoring and analysis, requiring new analysis methodologies and tooling. Resource-oriented analysis, as is common with systems performance tools and GUIs, must now account for both hardware limits and soft limits, as implemented using cgroups. A reverse diagnosis methodology can be applied to identify whether a container is resource constrained, and by which hard or soft resource. The interaction between the host and containers can also be examined, and noisy neighbors identified or exonerated. Performance tooling can need special usage or workarounds to function properly from within a container or on the host, to deal with different privilege levels and name spaces. At Netflix, we're using containers for some microservices, and care very much about analyzing and tuning our containers to be as fast and efficient as possible. This talk will show you how to identify bottlenecks in the host or container configuration, in the applications by profiling in a container environment, and how to dig deeper into kernel and container internals."
Stop the Guessing: Performance Methodologies for Production SystemsBrendan Gregg
Talk presented at Velocity 2013. Description: When faced with performance issues on complex production systems and distributed cloud environments, it can be difficult to know where to begin your analysis, or to spend much time on it when it isn’t your day job. This talk covers various methodologies, and anti-methodologies, for systems analysis, which serve as guidance for finding fruitful metrics from your current performance monitoring products. Such methodologies can help check all areas in an efficient manner, and find issues that can be easily overlooked, especially for virtualized environments which impose resource controls. Some of the tools and methodologies covered, including the USE Method, were developed by the speaker and have been used successfully in enterprise and cloud environments.
This document summarizes a presentation on flame graphs for profiling CPU and memory performance on FreeBSD. It introduces flame graphs as a way to visualize stack profiles to easily compare performance across systems. Examples are given profiling MySQL workload CPU usage on two hosts to identify a 30% performance difference. Commands are provided to generate flame graphs from DTrace profiles of CPU stack sampling and page faults.
This document discusses Java memory usage on Linux systems and how to monitor and troubleshoot Java applications running on Linux. It covers Java memory structures like heap, non-heap memory and thread stacks. It also discusses Linux memory management and key metrics like resident size. The document provides tips on setting up the JVM, tuning network and OS settings. It recommends tools like jstack, jstat and jcmd for diagnosing issues like high CPU usage, leaks or out of memory errors.
FPGA based 10G Performance Tester for HW OpenFlow SwitchYutaka Yasuda
SDN operators need to measure the performance of OF HW switch on their site. Cause there is 1000 times differences in latency, depends on the specified flow entry. ASIC can forward in several μsecs but the software (CPU) may take msec.
To protect yourself from unexpected performance plunge, monitor your switches healthiness on your site.
Oracle Latch and Mutex Contention TroubleshootingTanel Poder
This is an intro to latch & mutex contention troubleshooting which I've delivered at Hotsos Symposium, UKOUG Conference etc... It's also the starting point of my Latch & Mutex contention sections in my Advanced Oracle Troubleshooting online seminar - but we go much deeper there :-)
This document provides an overview of performance analysis tools for Linux systems. It describes Brendan Gregg's background and work analyzing performance at Netflix. It then discusses different types of tools, including observability tools to monitor systems, benchmarking tools to test performance, and tuning tools to optimize systems. A number of command line monitoring tools are outlined, such as vmstat, iostat, mpstat, and netstat, as well as more advanced tools like strace and tcpdump.
This document discusses the evolution of systems performance analysis tools from closed source to open source environments.
In the early 2000s with Solaris 9, performance analysis was limited due to closed source tools that provided only high-level metrics. Opening the Solaris kernel code with OpenSolaris in 2005 allowed deeper insight through understanding undocumented metrics and dynamic tracing tools like DTrace. This filled observability gaps across the entire software stack.
Modern performance analysis leverages both traditional Unix tools and new dynamic tracing tools. With many high-resolution metrics available, the focus is on visualization and collecting metrics across cloud environments. Overall open source improved systems analysis by providing full source code access.
Video: https://www.youtube.com/watch?v=uibLwoVKjec . Talk by Brendan Gregg for Sysdig CCWFS 2016. Abstract:
"You have a system with an advanced programmatic tracer: do you know what to do with it? Brendan has used numerous tracers in production environments, and has published hundreds of tracing-based tools. In this talk he will share tips and know-how for creating CLI tracing tools and GUI visualizations, to solve real problems effectively. Programmatic tracing is an amazing superpower, and this talk will show you how to wield it!"
Analyzing OS X Systems Performance with the USE MethodBrendan Gregg
Talk for MacIT 2014. This talk is about systems performance on OS X, and introduces the USE Method to check for common performance bottlenecks and errors. This methodology can be used by beginners and experts alike, and begins by constructing a checklist of the questions we’d like to ask of the system, before reaching for tools to answer them. The focus is resources: CPUs, GPUs, memory capacity, network interfaces, storage devices, controllers, interconnects, as well as some software resources such as mutex locks. These areas are investigated by a wide variety of tools, including vm_stat, iostat, netstat, top, latency, the DTrace scripts in /usr/bin (which were written by Brendan), custom DTrace scripts, Instruments, and more. This is a tour of the tools needed to solve our performance needs, rather than understanding tools just because they exist. This talk will make you aware of many areas of OS X that you can investigate, which will be especially useful for the time when you need to get to the bottom of a performance issue.
Talk for PerconaLive 2016 by Brendan Gregg. Video: https://www.youtube.com/watch?v=CbmEDXq7es0 . "Systems performance provides a different perspective for analysis and tuning, and can help you find performance wins for your databases, applications, and the kernel. However, most of us are not performance or kernel engineers, and have limited time to study this topic. This talk summarizes six important areas of Linux systems performance in 50 minutes: observability tools, methodologies, benchmarking, profiling, tracing, and tuning. Included are recipes for Linux performance analysis and tuning (using vmstat, mpstat, iostat, etc), overviews of complex areas including profiling (perf_events), static tracing (tracepoints), and dynamic tracing (kprobes, uprobes), and much advice about what is and isn't important to learn. This talk is aimed at everyone: DBAs, developers, operations, etc, and in any environment running Linux, bare-metal or the cloud."
The document discusses performance analysis methodologies, beginning with some anti-methodologies like blaming others or only using familiar tools. It then covers common methodologies like using ad hoc checklists of steps, characterizing the workload, and performing drill-down analysis using tools like the USE method and latency analysis to diagnose a database slowdown issue caused by memory pressure.
(PFC302) Performance Benchmarking on AWS | AWS re:Invent 2014Amazon Web Services
In this session, we explain how to measure the key performance-impacting metrics in a cloud-based application and best practices for a reliable benchmarking process. Measuring the performance of applications correctly can be challenging and there are many tools available to measure and track performance. This session will provide you with specific examples of good and bad tests. We make it clear how to get reliable measurements of and how to map benchmark results to your application. We also cover the importance of selecting tests wisely, repeating tests, and measuring variability. In addition a customer will provide real-life examples of how they developed their application testing stack, utilize it for repeatable testing and identify bottlenecks.
This document provides information on monitoring Linux system resources and performance. It discusses tools like vmstat, sar, iostat for monitoring CPU usage, memory usage, I/O usage, and other metrics. It also covers Linux processes, memory management, and block device monitoring.
Delivered at the FISL13 conference in Brazil: http://www.youtube.com/watch?v=K9w2cipqfvc
This talk introduces the USE Method: a simple strategy for performing a complete check of system performance health, identifying common bottlenecks and errors. This methodology can be used early in a performance investigation to quickly identify the most severe system performance issues, and is a methodology the speaker has used successfully for years in both enterprise and cloud computing environments. Checklists have been developed to show how the USE Method can be applied to Solaris/illumos-based and Linux-based systems.
Many hardware and software resource types have been commonly overlooked, including memory and I/O busses, CPU interconnects, and kernel locks. Any of these can become a system bottleneck. The USE Method provides a way to find and identify these.
This approach focuses on the questions to ask of the system, before reaching for the tools. Tools that are ultimately used include all the standard performance tools (vmstat, iostat, top), and more advanced tools, including dynamic tracing (DTrace), and hardware performance counters.
Other performance methodologies are included for comparison: the Problem Statement Method, Workload Characterization Method, and Drill-Down Analysis Method.
Training Slides: Intermediate 204: Identifying and Resolving Issues with Tung...Continuent
In this intermediate training session we look at resolving issues that may occur with replication (Tungsten Replicator) and clustering (Tungsten Clustering). This training is for all engineers using Continuent Tungsten solutions, as the tools we use will apply to any of our products and topologies. Some MySQL replication knowledge is assumed.
AGENDA
- How to identify an issue when scanning logs
- Tools to help identify issues
- Identifying replication lag / diagnosing replication latency
- Resolving common replication issues
- Resolving common clustering issues
What every Java developer should know about network?aragozin
This document summarizes key aspects of TCP networking that every Java developer should know. It covers TCP fundamentals like reliable in-order data transmission, connection establishment through binding and accepting sockets, graceful connection closure, and error handling. It also discusses tuning options like buffer sizes, Nagle's algorithm, and keepalive timeouts. The document explains TCP congestion control mechanisms like slow start and uses diagrams to illustrate concepts like the sliding window and congestion avoidance phases. It also cautions about network hazards that can be simulated for testing.
MeetBSDCA 2014 Performance Analysis for BSD, by Brendan Gregg. A tour of five relevant topics: observability tools, methodologies, benchmarking, profiling, and tracing. Tools summarized include pmcstat and DTrace.
This document summarizes tools and techniques for Java profiling and diagnostics. It discusses using JMX, JVMTI, and the Attach API to gather information on threading, memory usage, garbage collection, and perform actions like heap dumps. It also introduces the SJK toolkit which provides commands for profiling tasks and the Sigar and BTrace tools. Real-world uses of profiling techniques are presented, like benchmarking and diagnosing production systems. Future ideas proposed include a visual thread analyzer and scripting-based heap dump exploration.
This document discusses various tools for Java profiling and diagnostics including SJK, BTrace, JVM attach API, and perf counters. SJK is a command line tool that exploits JMX, attach API, and perf counters to provide commands for thread profiling, garbage collection analysis, heap dumps, and other diagnostic information. BTrace allows injecting code snippets to perform instrumentation profiling. The JVM attach API can be used to attach to running JVMs and perform operations like heap dumps and stack traces. Perf counters provide low-overhead access to JVM counters via shared memory. The document provides examples and links to documentation for these various Java profiling and diagnostic tools.
SREcon 2016 Performance Checklists for SREsBrendan Gregg
Talk from SREcon2016 by Brendan Gregg. Video: https://www.usenix.org/conference/srecon16/program/presentation/gregg . "There's limited time for performance analysis in the emergency room. When there is a performance-related site outage, the SRE team must analyze and solve complex performance issues as quickly as possible, and under pressure. Many performance tools and techniques are designed for a different environment: an engineer analyzing their system over the course of hours or days, and given time to try dozens of tools: profilers, tracers, monitoring tools, benchmarks, as well as different tunings and configurations. But when Netflix is down, minutes matter, and there's little time for such traditional systems analysis. As with aviation emergencies, short checklists and quick procedures can be applied by the on-call SRE staff to help solve performance issues as quickly as possible.
In this talk, I'll cover a checklist for Linux performance analysis in 60 seconds, as well as other methodology-derived checklists and procedures for cloud computing, with examples of performance issues for context. Whether you are solving crises in the SRE war room, or just have limited time for performance engineering, these checklists and approaches should help you find some quick performance wins. Safe flying."
Talk for USENIX LISA17: "Containers pose interesting challenges for performance monitoring and analysis, requiring new analysis methodologies and tooling. Resource-oriented analysis, as is common with systems performance tools and GUIs, must now account for both hardware limits and soft limits, as implemented using cgroups. A reverse diagnosis methodology can be applied to identify whether a container is resource constrained, and by which hard or soft resource. The interaction between the host and containers can also be examined, and noisy neighbors identified or exonerated. Performance tooling can need special usage or workarounds to function properly from within a container or on the host, to deal with different privilege levels and name spaces. At Netflix, we're using containers for some microservices, and care very much about analyzing and tuning our containers to be as fast and efficient as possible. This talk will show you how to identify bottlenecks in the host or container configuration, in the applications by profiling in a container environment, and how to dig deeper into kernel and container internals."
Stop the Guessing: Performance Methodologies for Production SystemsBrendan Gregg
Talk presented at Velocity 2013. Description: When faced with performance issues on complex production systems and distributed cloud environments, it can be difficult to know where to begin your analysis, or to spend much time on it when it isn’t your day job. This talk covers various methodologies, and anti-methodologies, for systems analysis, which serve as guidance for finding fruitful metrics from your current performance monitoring products. Such methodologies can help check all areas in an efficient manner, and find issues that can be easily overlooked, especially for virtualized environments which impose resource controls. Some of the tools and methodologies covered, including the USE Method, were developed by the speaker and have been used successfully in enterprise and cloud environments.
This document summarizes a presentation on flame graphs for profiling CPU and memory performance on FreeBSD. It introduces flame graphs as a way to visualize stack profiles to easily compare performance across systems. Examples are given profiling MySQL workload CPU usage on two hosts to identify a 30% performance difference. Commands are provided to generate flame graphs from DTrace profiles of CPU stack sampling and page faults.
This document discusses Java memory usage on Linux systems and how to monitor and troubleshoot Java applications running on Linux. It covers Java memory structures like heap, non-heap memory and thread stacks. It also discusses Linux memory management and key metrics like resident size. The document provides tips on setting up the JVM, tuning network and OS settings. It recommends tools like jstack, jstat and jcmd for diagnosing issues like high CPU usage, leaks or out of memory errors.
FPGA based 10G Performance Tester for HW OpenFlow SwitchYutaka Yasuda
SDN operators need to measure the performance of OF HW switch on their site. Cause there is 1000 times differences in latency, depends on the specified flow entry. ASIC can forward in several μsecs but the software (CPU) may take msec.
To protect yourself from unexpected performance plunge, monitor your switches healthiness on your site.
Oracle Latch and Mutex Contention TroubleshootingTanel Poder
This is an intro to latch & mutex contention troubleshooting which I've delivered at Hotsos Symposium, UKOUG Conference etc... It's also the starting point of my Latch & Mutex contention sections in my Advanced Oracle Troubleshooting online seminar - but we go much deeper there :-)
This document provides an overview of performance analysis tools for Linux systems. It describes Brendan Gregg's background and work analyzing performance at Netflix. It then discusses different types of tools, including observability tools to monitor systems, benchmarking tools to test performance, and tuning tools to optimize systems. A number of command line monitoring tools are outlined, such as vmstat, iostat, mpstat, and netstat, as well as more advanced tools like strace and tcpdump.
This document discusses the evolution of systems performance analysis tools from closed source to open source environments.
In the early 2000s with Solaris 9, performance analysis was limited due to closed source tools that provided only high-level metrics. Opening the Solaris kernel code with OpenSolaris in 2005 allowed deeper insight through understanding undocumented metrics and dynamic tracing tools like DTrace. This filled observability gaps across the entire software stack.
Modern performance analysis leverages both traditional Unix tools and new dynamic tracing tools. With many high-resolution metrics available, the focus is on visualization and collecting metrics across cloud environments. Overall open source improved systems analysis by providing full source code access.
Video: https://www.youtube.com/watch?v=uibLwoVKjec . Talk by Brendan Gregg for Sysdig CCWFS 2016. Abstract:
"You have a system with an advanced programmatic tracer: do you know what to do with it? Brendan has used numerous tracers in production environments, and has published hundreds of tracing-based tools. In this talk he will share tips and know-how for creating CLI tracing tools and GUI visualizations, to solve real problems effectively. Programmatic tracing is an amazing superpower, and this talk will show you how to wield it!"
Analyzing OS X Systems Performance with the USE MethodBrendan Gregg
Talk for MacIT 2014. This talk is about systems performance on OS X, and introduces the USE Method to check for common performance bottlenecks and errors. This methodology can be used by beginners and experts alike, and begins by constructing a checklist of the questions we’d like to ask of the system, before reaching for tools to answer them. The focus is resources: CPUs, GPUs, memory capacity, network interfaces, storage devices, controllers, interconnects, as well as some software resources such as mutex locks. These areas are investigated by a wide variety of tools, including vm_stat, iostat, netstat, top, latency, the DTrace scripts in /usr/bin (which were written by Brendan), custom DTrace scripts, Instruments, and more. This is a tour of the tools needed to solve our performance needs, rather than understanding tools just because they exist. This talk will make you aware of many areas of OS X that you can investigate, which will be especially useful for the time when you need to get to the bottom of a performance issue.
Talk for PerconaLive 2016 by Brendan Gregg. Video: https://www.youtube.com/watch?v=CbmEDXq7es0 . "Systems performance provides a different perspective for analysis and tuning, and can help you find performance wins for your databases, applications, and the kernel. However, most of us are not performance or kernel engineers, and have limited time to study this topic. This talk summarizes six important areas of Linux systems performance in 50 minutes: observability tools, methodologies, benchmarking, profiling, tracing, and tuning. Included are recipes for Linux performance analysis and tuning (using vmstat, mpstat, iostat, etc), overviews of complex areas including profiling (perf_events), static tracing (tracepoints), and dynamic tracing (kprobes, uprobes), and much advice about what is and isn't important to learn. This talk is aimed at everyone: DBAs, developers, operations, etc, and in any environment running Linux, bare-metal or the cloud."
Puppet Availability and Performance at 100K Nodes - PuppetConf 2014Puppet
This document discusses challenges encountered when scaling Puppet to manage over 100,000 nodes, and proposes an alternative approach called PVC (Puppet Variable Control) to help address those challenges. Some key issues identified include uneven and unpredictable Puppet run times that cause system thrashing. PVC aims to run Puppet reactively based on available capacity and local file changes, using frequent pings and fact collection to determine run timing in a way that achieves a flat and consistent service curve. It also aims to improve security by running Puppet immediately when monitored files change outside of Puppet runs.
Writing Serverless Application in Java with comparison of 3 approaches: AWS S...Andrew Zakordonets
This presentation is from my Meetup talks about how we were building Serverless Application using Java and Step Functions. It contains some comparison of 3 different approaches on how you can build Java with Serverless : AWS SDK, Micronaut and Spring.
Talk for YOW! by Brendan Gregg. "Systems performance studies the performance of computing systems, including all physical components and the full software stack to help you find performance wins for your application and kernel. However, most of us are not performance or kernel engineers, and have limited time to study this topic. This talk summarizes the topic for everyone, touring six important areas: observability tools, methodologies, benchmarking, profiling, tracing, and tuning. Included are recipes for Linux performance analysis and tuning (using vmstat, mpstat, iostat, etc), overviews of complex areas including profiling (perf_events) and tracing (ftrace, bcc/BPF, and bpftrace/BPF), advice about what is and isn't important to learn, and case studies to see how it is applied. This talk is aimed at everyone: developers, operations, sysadmins, etc, and in any environment running Linux, bare metal or the cloud.
"
The objective of this article is to describe what to monitor in and around Alfresco in order to have a good understanding of how the applications are performing and to be aware of potential issues.
One of the great challenges of of monitoring any large cluster is how much data to collect and how often to collect it. Those responsible for managing the cloud infrastructure want to see everything collected centrally which places limits on how much and how often. Developers on the other hand want to see as much detail as they can at as high a frequency as reasonable without impacting the overall cloud performance.
To address what seems to be conflicting requirements, we've chosen a hybrid model at HP. Like many others, we have a centralized monitoring system that records a set of key system metrics for all servers at the granularity of 1 minute, but at the same time we do fine-grained local monitoring on each server of hundreds of metrics every second so when there are problems that need more details than are available centrally, one can go to the servers in question to see exactly what was going on at any specific time.
The tool of choice for this fine-grained monitoring is the open source tool collectl, which additionally has an extensible api. It is through this api that we've developed a swift monitoring capability to not only capture the number of gets, put, etc every second, but using collectl's colmux utility, we can also display these in a top-like formact to see exactly what all the object and/or proxy servers are doing in real-time.
We've also developer a second cability that allows one to see what the Virtual Machines are doing on each compute node in terms of CPU, disk and network traffic. This data can also be displayed in real-time with colmux.
This talk will briefly introduce the audience to collectl's capabilities but more importantly show how it's used to augment any existing centralized monitoring infrastructure.
Speakers
Mark Seger
The document provides agendas and information for OSSEC Con workshops on days 3 and 4, including:
- Day 3 agenda with workshops and lab time from 09:30 to 16:00
- Day 4 agenda with similar schedule and an exam from 16:00 to 16:45
It also includes links for downloading examples and workshop topics covering OSSEC installation, configuration, troubleshooting, file integrity monitoring, and more.
The document discusses benchmarking the performance of compute and database services on the cloud. It outlines procedures to test IOPS, network performance, and CPU usage of a compute instance and database throughput, connections, and queries per second of a DB service. Tests were run varying threads, block sizes, and parallel connections. The compute instance showed optimal IOPS at specific block sizes and higher write than read performance. Database performance increased with more connections but was limited by the server. Issues noted were inability to directly measure IOPS and lack of short-term monitoring data. Finally, a WordPress application was deployed on a compute instance connected to a database service to test performance.
This document provides an overview of monitoring Linux system performance. It discusses how to interpret output from tools like vmstat and top to analyze CPU utilization, context switching, and run queue length. These metrics can help identify performance bottlenecks by indicating whether a system is CPU-bound, I/O-bound, or experiencing high load. The document also defines important CPU terminology like user time, system time, interrupts, and priorities to understand how the Linux scheduler manages threads.
Extreme Linux Performance Monitoring and TuningMilind Koyande
This document provides an introduction to monitoring Linux system performance. It discusses determining the type of application running and establishing a baseline of typical system usage. Key CPU concepts are then outlined such as hardware interrupts, soft interrupts, real-time threads and kernel/user threads. Context switches between threads and the thread scheduling queue are also introduced. The goal is to understand typical system behavior and identify any bottlenecks.
Kernel Recipes 2015: Linux Kernel IO subsystem - How it works and how can I s...Anne Nicolas
Understanding how Linux kernel IO subsystem works is a key to analysis of a wide variety of issues occurring when running a Linux system. This talk is aimed at helping Linux users understand what is going on and how to get more insight into what is happening.
First we present an overview of Linux kernel block layer including different IO schedulers. We also talk about a new block multiqueue implementation that gets used for more and more devices.
After surveying the basic architecture we will be prepared to talk about tools to peek into it. We start with lightweight monitoring like iostat and continue with more heavy blktrace and variety of tools that are based on it. We demonstrate use of the tools on analysis of real world issues.
Jan Kara, SUSE
Java Day 2021, WeAreDevelopers, 2021-09-01, online: Moritz Kammerer (@Moritz Kammerer, Expert Software Engineer at QAware).
== Please download slides in case they are blurred! ===
In this talk, we took a look at how Microservices can be developed with Micronaut. Have a look if it has kept its promises.
How deep is your buffer – Demystifying buffers and application performanceCumulus Networks
Packet buffer memory is among the oldest topics in networking, and yet it never seems to fade in popularity. Starting from the days of buffers sized by the bandwidth delay product to what is now called "buffer bloat", from the days of 10Mbps to 100Gbps, the discussion around how deep should the buffers be never ceases to evoke opinionated responses.
In this webinar we will be joined by JR Rivers, co-founder and CTO of Cumulus Networks, a man who has designed many ultra-successful switching chips, switch products, and compute platforms, to discuss the innards of buffering. This webinar will cover data path theory, tools to evaluate network data path behavior, and the configuration variations that affect application visible outcomes.
This document provides an overview of LX branded zones in SmartOS. It begins with a brief description of container mechanisms and introduces SmartOS zones and branded zones. It then describes LX branded zones, which allow running Linux binaries directly on the SmartOS kernel using ZFS datasets for filesystem isolation and zones for process protection. The document demonstrates how to import Linux images, create LX zones, and use SmartOS tools like DTrace from within the LX zone. It provides references for further information on LX branded zones and Docker setup on SmartOS.
Application Performance Troubleshooting 1x1 - Part 2 - Noch mehr Schweine und...rschuppe
Application Performance doesn't come easy. How to find the root cause of performance issues in modern and complex applications? All you have is a complaining user to start with?
In this presentation (mainly in German, but understandable for english speakers) I'd reprised the fundamentals of trouble shooting and have some new examples on how to tackle issues.
Follow up presentation to "Performance Trouble Shooting 101 - Schweine, Schlangen und Papierschnitte"
The document discusses load testing and why it often fails. It recommends using a tool with the lowest barrier to entry, such as Blitz, which allows load testing on AWS with a web form or API. Blitz produces results but no reports, so the document shows how to modify Blitz's code to output JSON results for reporting purposes. It encourages integrating load testing into existing development workflows rather than treating it separately after deployment.
Trying and evaluating the new features of GlusterFS 3.5Keisuke Takahashi
My presentation in LinuxCon/CloudOpen Japan 2014.
It has passed few days since GlusterFS 3.5 released so feel free to correct me if you find my mistakes or misunderstandings. Thanks.
Load testing is a continuous process that involves designing tests with a specific purpose in mind, running the tests to saturation using realistic user models, and analyzing the results across different load levels while linking them to production metrics. The goal is to understand how an application performs under various loads and identify any bottlenecks before they impact real users.
Ipv6 test plan for opnfv poc v2.2 spirent-vctlabIben Rodriguez
This document outlines test plans and requirements for testing IPv6 in an OPNFV PoC v2.0 environment using OpenStack Liberty and ODL Lithium SR2. It details:
(1) Setting up an IPv6 service VM in OpenStack with ODL controller capability for IPv6 routing and address advertisement.
(2) A test design and steps for setting up infrastructure, ODL and OpenStack controllers, and compute nodes.
(3) Positive test cases to validate IPv6 and IPv4 connectivity between VMs, routers and external DNS via ping, traceroute from the VM and service VM.
(4) References for IPv6 configuration and testing in Linux.
This document provides an agenda for the 2017 CENIC Annual Conference held from March 19-22, 2017. The agenda includes sessions on international research and education networking, distributed caching, innovative partnerships between NOAA and CENIC, and enabling new services for California libraries through connectivity to the CalREN network. A panel discussion focuses on diversity and inclusion in technology fields. Breakout sessions cover expanding the High-Performance Wireless Research and Education Network into Orange County and congratulating newly connected public libraries on their CalREN connectivity. The document provides location, speaker, and brief description information for each session in the conference agenda.
This document discusses incident handling at the Naval Postgraduate School (NPS) in a bring your own device (BYOD) environment. It provides details on NPS's network upgrades including a new wireless network and cloud initiatives. It outlines NPS's cybersecurity organization and technologies used like a security information and event management system. The document describes the incident handling process based on the NIST framework and tools used like a JIRA incident tracking template. It emphasizes documenting incidents, leveraging automation, and collaborating across the incident response team.
New Threats, New Approaches in Modern Data CentersIben Rodriguez
New Threats, New Approaches in Modern Data Centers - A Presentation by NPS at CENIC conference 11:00 am - 12:00 pm, Wednesday, March 22, 2017 – in San Diego, California
The standard approach to securing data centers has historically emphasized strong perimeter protection to keep threats on the outside of the network. However, this model is ineffective for handling new types of threats—including advanced persistent threats, insider threats, and coordinated attacks. A better model for data center security is needed: one that assumes threats can be anywhere and probably are everywhere and then, through automation, acts accordingly. Using micro-segmentation, fine-grained network controls enable unit-level trust, and flexible security policies can be applied all the way down to a network interface. In this joint presentation between customer, partner, and VMware, the fundamental tenants of micro-segmentation will be discussed. Presenters will describe how the Naval Postgraduate School has incorporated these principles into the architecture and design of a multi-tenant Cybersecurity Lab environment to deliver security training to national and international government personnel.
Edgar Mendoza, IT Specialist, Information Technology and Communications Services (ITACS) Naval Postgraduate School
Eldor Magat, Computer Specialist, ITACS, Naval Postgraduate School
Mike Monahan, Network Engineer, ITACS, Naval Postgraduate School
Iben Rodriguez, Brocade Resident SDN Delivery Consultant, ITACS, Naval Postgraduate School
Brian Recore, NSX Systems Engineer, VMware, Inc.
https://youtu.be/mYBbIbfKkGU?t=1h7m16s
Copied from the program with corrections - https://adobeindd.com/view/publications/b9fbbdf0-60f1-41dc-8654-3d2141b0bf54/nh4h/publication-web-resources/pdf/Conference_Agenda_2017_v1.pdf
VERIGRAPH: A pre-deployment verification service for Virtualized Network Functions (aka NFV virtual appliances) running on OpenStack with OPNFV.
Formal Methods: rigorous mathematical methods based on mathematical models for analysing (computer-based) systems
Formal Verification: applied to SDN/NFV-based networks
Verify a formal network model satisfies some invariants or network policies (e.g., absence of loops and black holes, reachability, security policies, etc.)
Iben from Spirent talks at the SDN World Congress about the importance of and...Iben Rodriguez
@Iben Rodriguez from @Spirent talks at the SDN World Congress about the importance of and issues with NFV VNF and SDN Testing in the cloud.
#Layer123 Dusseldorf Germany 20141016
SDN/NFV is following the same path Linux and the Internet did...
Mentioned during the Open Networking Summit 2014
Santa Clara March 4th
Re-engineering Engineering
Vinod Khosla
Kleiner Perkins Caufield & Byers
vkhosla@kpcb.com
Sept 2000
Virtualization can help streamline regulatory compliance efforts by reducing resource and cost requirements, providing unified IT controls, and enabling efficient audit trails that reduce administrative effort. Virtualization allows for isolation of virtual machines, centralized logging of events and changes, quick recovery of virtual machines, and separation of duties through role-based access controls. Some examples given include using virtualization to isolate development and production instances, centrally capturing configuration changes and backups, quickly provisioning new virtual machines from templates, and minimizing disruptions from hardware maintenance. Overall, virtualization can simplify many compliance-related IT tasks compared to traditional physical infrastructure management.
Generative AI Deep Dive: Advancing from Proof of Concept to ProductionAggregage
Join Maher Hanafi, VP of Engineering at Betterworks, in this new session where he'll share a practical framework to transform Gen AI prototypes into impactful products! He'll delve into the complexities of data collection and management, model selection and optimization, and ensuring security, scalability, and responsible use.
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...Neo4j
Leonard Jayamohan, Partner & Generative AI Lead, Deloitte
This keynote will reveal how Deloitte leverages Neo4j’s graph power for groundbreaking digital twin solutions, achieving a staggering 100x performance boost. Discover the essential role knowledge graphs play in successful generative AI implementations. Plus, get an exclusive look at an innovative Neo4j + Generative AI solution Deloitte is developing in-house.
How to Get CNIC Information System with Paksim Ga.pptxdanishmna97
Pakdata Cf is a groundbreaking system designed to streamline and facilitate access to CNIC information. This innovative platform leverages advanced technology to provide users with efficient and secure access to their CNIC details.
Enchancing adoption of Open Source Libraries. A case study on Albumentations.AIVladimir Iglovikov, Ph.D.
Presented by Vladimir Iglovikov:
- https://www.linkedin.com/in/iglovikov/
- https://x.com/viglovikov
- https://www.instagram.com/ternaus/
This presentation delves into the journey of Albumentations.ai, a highly successful open-source library for data augmentation.
Created out of a necessity for superior performance in Kaggle competitions, Albumentations has grown to become a widely used tool among data scientists and machine learning practitioners.
This case study covers various aspects, including:
People: The contributors and community that have supported Albumentations.
Metrics: The success indicators such as downloads, daily active users, GitHub stars, and financial contributions.
Challenges: The hurdles in monetizing open-source projects and measuring user engagement.
Development Practices: Best practices for creating, maintaining, and scaling open-source libraries, including code hygiene, CI/CD, and fast iteration.
Community Building: Strategies for making adoption easy, iterating quickly, and fostering a vibrant, engaged community.
Marketing: Both online and offline marketing tactics, focusing on real, impactful interactions and collaborations.
Mental Health: Maintaining balance and not feeling pressured by user demands.
Key insights include the importance of automation, making the adoption process seamless, and leveraging offline interactions for marketing. The presentation also emphasizes the need for continuous small improvements and building a friendly, inclusive community that contributes to the project's growth.
Vladimir Iglovikov brings his extensive experience as a Kaggle Grandmaster, ex-Staff ML Engineer at Lyft, sharing valuable lessons and practical advice for anyone looking to enhance the adoption of their open-source projects.
Explore more about Albumentations and join the community at:
GitHub: https://github.com/albumentations-team/albumentations
Website: https://albumentations.ai/
LinkedIn: https://www.linkedin.com/company/100504475
Twitter: https://x.com/albumentations
Epistemic Interaction - tuning interfaces to provide information for AI supportAlan Dix
Paper presented at SYNERGY workshop at AVI 2024, Genoa, Italy. 3rd June 2024
https://alandix.com/academic/papers/synergy2024-epistemic/
As machine learning integrates deeper into human-computer interactions, the concept of epistemic interaction emerges, aiming to refine these interactions to enhance system adaptability. This approach encourages minor, intentional adjustments in user behaviour to enrich the data available for system learning. This paper introduces epistemic interaction within the context of human-system communication, illustrating how deliberate interaction design can improve system understanding and adaptation. Through concrete examples, we demonstrate the potential of epistemic interaction to significantly advance human-computer interaction by leveraging intuitive human communication strategies to inform system design and functionality, offering a novel pathway for enriching user-system engagements.
Dr. Sean Tan, Head of Data Science, Changi Airport Group
Discover how Changi Airport Group (CAG) leverages graph technologies and generative AI to revolutionize their search capabilities. This session delves into the unique search needs of CAG’s diverse passengers and customers, showcasing how graph data structures enhance the accuracy and relevance of AI-generated search results, mitigating the risk of “hallucinations” and improving the overall customer journey.
UiPath Test Automation using UiPath Test Suite series, part 6DianaGray10
Welcome to UiPath Test Automation using UiPath Test Suite series part 6. In this session, we will cover Test Automation with generative AI and Open AI.
UiPath Test Automation with generative AI and Open AI webinar offers an in-depth exploration of leveraging cutting-edge technologies for test automation within the UiPath platform. Attendees will delve into the integration of generative AI, a test automation solution, with Open AI advanced natural language processing capabilities.
Throughout the session, participants will discover how this synergy empowers testers to automate repetitive tasks, enhance testing accuracy, and expedite the software testing life cycle. Topics covered include the seamless integration process, practical use cases, and the benefits of harnessing AI-driven automation for UiPath testing initiatives. By attending this webinar, testers, and automation professionals can gain valuable insights into harnessing the power of AI to optimize their test automation workflows within the UiPath ecosystem, ultimately driving efficiency and quality in software development processes.
What will you get from this session?
1. Insights into integrating generative AI.
2. Understanding how this integration enhances test automation within the UiPath platform
3. Practical demonstrations
4. Exploration of real-world use cases illustrating the benefits of AI-driven test automation for UiPath
Topics covered:
What is generative AI
Test Automation with generative AI and Open AI.
UiPath integration with generative AI
Speaker:
Deepak Rai, Automation Practice Lead, Boundaryless Group and UiPath MVP
Climate Impact of Software Testing at Nordic Testing DaysKari Kakkonen
My slides at Nordic Testing Days 6.6.2024
Climate impact / sustainability of software testing discussed on the talk. ICT and testing must carry their part of global responsibility to help with the climat warming. We can minimize the carbon footprint but we can also have a carbon handprint, a positive impact on the climate. Quality characteristics can be added with sustainability, and then measured continuously. Test environments can be used less, and in smaller scale and on demand. Test techniques can be used in optimizing or minimizing number of tests. Test automation can be used to speed up testing.
A tale of scale & speed: How the US Navy is enabling software delivery from l...sonjaschweigert1
Rapid and secure feature delivery is a goal across every application team and every branch of the DoD. The Navy’s DevSecOps platform, Party Barge, has achieved:
- Reduction in onboarding time from 5 weeks to 1 day
- Improved developer experience and productivity through actionable findings and reduction of false positives
- Maintenance of superior security standards and inherent policy enforcement with Authorization to Operate (ATO)
Development teams can ship efficiently and ensure applications are cyber ready for Navy Authorizing Officials (AOs). In this webinar, Sigma Defense and Anchore will give attendees a look behind the scenes and demo secure pipeline automation and security artifacts that speed up application ATO and time to production.
We will cover:
- How to remove silos in DevSecOps
- How to build efficient development pipeline roles and component templates
- How to deliver security artifacts that matter for ATO’s (SBOMs, vulnerability reports, and policy evidence)
- How to streamline operations with automated policy checks on container images
UiPath Test Automation using UiPath Test Suite series, part 5DianaGray10
Welcome to UiPath Test Automation using UiPath Test Suite series part 5. In this session, we will cover CI/CD with devops.
Topics covered:
CI/CD with in UiPath
End-to-end overview of CI/CD pipeline with Azure devops
Speaker:
Lyndsey Byblow, Test Suite Sales Engineer @ UiPath, Inc.
zkStudyClub - Reef: Fast Succinct Non-Interactive Zero-Knowledge Regex ProofsAlex Pruden
This paper presents Reef, a system for generating publicly verifiable succinct non-interactive zero-knowledge proofs that a committed document matches or does not match a regular expression. We describe applications such as proving the strength of passwords, the provenance of email despite redactions, the validity of oblivious DNS queries, and the existence of mutations in DNA. Reef supports the Perl Compatible Regular Expression syntax, including wildcards, alternation, ranges, capture groups, Kleene star, negations, and lookarounds. Reef introduces a new type of automata, Skipping Alternating Finite Automata (SAFA), that skips irrelevant parts of a document when producing proofs without undermining soundness, and instantiates SAFA with a lookup argument. Our experimental evaluation confirms that Reef can generate proofs for documents with 32M characters; the proofs are small and cheap to verify (under a second).
Paper: https://eprint.iacr.org/2023/1886
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...SOFTTECHHUB
The choice of an operating system plays a pivotal role in shaping our computing experience. For decades, Microsoft's Windows has dominated the market, offering a familiar and widely adopted platform for personal and professional use. However, as technological advancements continue to push the boundaries of innovation, alternative operating systems have emerged, challenging the status quo and offering users a fresh perspective on computing.
One such alternative that has garnered significant attention and acclaim is Nitrux Linux 3.5.0, a sleek, powerful, and user-friendly Linux distribution that promises to redefine the way we interact with our devices. With its focus on performance, security, and customization, Nitrux Linux presents a compelling case for those seeking to break free from the constraints of proprietary software and embrace the freedom and flexibility of open-source computing.
Sudheer Mechineni, Head of Application Frameworks, Standard Chartered Bank
Discover how Standard Chartered Bank harnessed the power of Neo4j to transform complex data access challenges into a dynamic, scalable graph database solution. This keynote will cover their journey from initial adoption to deploying a fully automated, enterprise-grade causal cluster, highlighting key strategies for modelling organisational changes and ensuring robust disaster recovery. Learn how these innovations have not only enhanced Standard Chartered Bank’s data infrastructure but also positioned them as pioneers in the banking sector’s adoption of graph technology.
1. A Swift Benchmarking Tool
Mark Seger
Hewlett Packard
Cloud Services
4/19/2013 Getput Swift Performance Tools
1
2. Problem Statement
• Performance Measurements
– Consistent/standard mechanisms for controlled experiments
– Ability to easily modify test parameters
– Minimal installation, configuration and use
– Easy to compare results of multiple runs
– Easy to clean up when done
• Benchmarking – run performance tests at scale
– Repeat tests while increasing demand for resources
– Parallel tests must be coordinated: start/finish together
4/19/2013 Getput Swift Performance Tools 2
3. Getput Suite
• Multiple tools organized in a hierarchy
– getput: actual workhorse, runs tests on single client
– gpmaster: coordinates running getput on multiple clients
– gpsuite: defines suites of tests to minimize switches usage
– yourscript: can call gpsuite multiple times when desired
4/19/2013 Getput Swift Performance Tools 3
4. getput.py
• Uses swiftclient library
• Lots of switches, lots of different behaviors
– Standalone
• Basic: creds, cname, oname, size, num/runtime, tests, rep count
• More: processes, container type: shared/byproc/bynode, latency
details, operation logging, and still more
– Multi-node (controlled by gpmaster)
• start time, rank
4/19/2013 Getput Swift Performance Tools 4
5. gpmaster.py
• Coordinates running of getput on multiple clients
– Assures all start together and finish approx together
– Summarizes results as a single line
– Unlike getput only runs 1 test at a time, job for gpsuite
• More required switches than getput
– Credentials file
– Rank
– Start time
– Hosts file or single client name, may need ssh key too
– And a few more…
• But rarely run by itself!
4/19/2013 Getput Swift Performance Tools 5
6. gpsuite.py
• Removes complexity of running gpmaster
• Think of macros: gpsuite –suite full
– Sets of object sizes, eg: 1k, 10k, 100k, etc
– Numbers of threads, eg: 1, 2, 4, 8, etc
• Distributes threads across multiple clients
• Some runs can take hours with a single command
• Cleans up after each run
4/19/2013 Getput Swift Performance Tools 6
7. Getput Output
Earliest versions
Inst Start End Seconds Tests Num MB/S IOPS Errs
0 13:59:20 13:59:29 8.57 put 100 0.11 11.67 0
0 13:59:29 13:59:33 4.03 get 100 0.24 24.83 0
0 13:59:33 13:59:34 1.80 del 100 0.54 55.68 0
4/19/2013 Getput Swift Performance Tools 7
8. Getput Output
Earliest versions
Inst Start End Seconds Tests Num MB/S IOPS Errs
0 13:59:20 13:59:29 8.57 put 100 0.11 11.67 0
0 13:59:29 13:59:33 4.03 get 100 0.24 24.83 0
0 13:59:33 13:59:34 1.80 del 100 0.54 55.68 0
Added latency range in later versions
Inst Start End Seconds Tests Num MB/S IOPS Latency LatRange Errs
0 13:59:20 13:59:29 8.57 put 100 0.11 11.67 0.085 0.02-00.22 0
0 13:59:29 13:59:33 4.03 get 100 0.24 24.83 0.040 0.04-00.05 0
0 13:59:33 13:59:34 1.80 del 100 0.54 55.68 0.018 0.01-00.05 0
4/19/2013 Getput Swift Performance Tools 8
9. Getput Output
Earliest versions
Inst Start End Seconds Tests Num MB/S IOPS Errs
0 13:59:20 13:59:29 8.57 put 100 0.11 11.67 0
0 13:59:29 13:59:33 4.03 get 100 0.24 24.83 0
0 13:59:33 13:59:34 1.80 del 100 0.54 55.68 0
Added latency range in later versions
Inst Start End Seconds Tests Num MB/S IOPS Latency LatRange Errs
0 13:59:20 13:59:29 8.57 put 100 0.11 11.67 0.085 0.02-00.22 0
0 13:59:29 13:59:33 4.03 get 100 0.24 24.83 0.040 0.04-00.05 0
0 13:59:33 13:59:34 1.80 del 100 0.54 55.68 0.018 0.01-00.05 0
Added CPU and started playing with compression in more recent versions
Inst Start End Seconds Tests Num MB/S IOPS Latency LatRange Errs Procs OSize %CPU Comp
0 13:59:20 13:59:29 8.57 put 100 0.11 11.67 0.085 0.02-00.22 0 1 10k 0.30 no
0 13:59:29 13:59:33 4.03 get 100 0.24 24.83 0.040 0.04-00.05 0 1 10k 0.39 no
0 13:59:33 13:59:34 1.80 del 100 0.54 55.68 0.018 0.01-00.05 0 1 10k 0.58 no
4/19/2013 Getput Swift Performance Tools 9
10. Getput Output
Earliest versions
Inst Start End Seconds Tests Num MB/S IOPS Errs
0 13:59:20 13:59:29 8.57 put 100 0.11 11.67 0
0 13:59:29 13:59:33 4.03 get 100 0.24 24.83 0
0 13:59:33 13:59:34 1.80 del 100 0.54 55.68 0
Added latency range in later versions
Inst Start End Seconds Tests Num MB/S IOPS Latency LatRange Errs
0 13:59:20 13:59:29 8.57 put 100 0.11 11.67 0.085 0.02-00.22 0
0 13:59:29 13:59:33 4.03 get 100 0.24 24.83 0.040 0.04-00.05 0
0 13:59:33 13:59:34 1.80 del 100 0.54 55.68 0.018 0.01-00.05 0
Added CPU and started playing with compression in more recent versions
Inst Start End Seconds Tests Num MB/S IOPS Latency LatRange Errs Procs OSize %CPU Comp
0 13:59:20 13:59:29 8.57 put 100 0.11 11.67 0.085 0.02-00.22 0 1 10k 0.30 no
0 13:59:29 13:59:33 4.03 get 100 0.24 24.83 0.040 0.04-00.05 0 1 10k 0.39 no
0 13:59:33 13:59:34 1.80 del 100 0.54 55.68 0.018 0.01-00.05 0 1 10k 0.58 no
Eventually added latency distribution histogram
Latency LatRange Errs Procs OSize 0.0 0.1 0.2 0.3 0.4 0.5
0.106 0.02-00.36 0 10 10k 527 396 67 10 0 0
0.041 0.01-00.07 0 10 10k 1000 0 0 0 0 0
0.031 0.01-00.16 0 10 10k 964 36 0 0 0 0
4/19/2013 Getput Swift Performance Tools 10
11. Observations
• Swift multi-scaling excellent
– With multiple clients performance grows close to linearly
– With single client and multiple threads
• Smaller objects scale very well with even lots of threads
• Larger objects hit either CPU/Network wall!
• Both compression and encryption cost CPU
– Limits large object bandwidth, less so with smaller ones
– Early testing: !compression up to 2X boost for large objects
• Similar behavior when using http instead of https
– Only just started looking at changing ciphers
Recommendation: make compression, ssl and cipher choice optional in swiftclient
4/19/2013 Getput Swift Performance Tools 11
14. Let’s talk about latency
• Latency metrics originally based on averages
– Like coarse monitoring, great for trends but poor for exceptions
– Soon realized more detail was needed
• Consider the following. What does it really mean?
– Is the only problem that one entry of 0.083?
4/19/2013 Getput Swift Performance Tools 14
15. On closer inspection
• The first 4 entries don’t look too bad
• Even the bottom one isn’t that horrible
4/19/2013 Getput Swift Performance Tools 15
16. Ranges shed more light
• Even though first 4 lines have close latencies, look at
their max values
• Now we know why line 5 so bad
• Even line 6 has very high max
4/19/2013 Getput Swift Performance Tools 16
17. But even that’s not enough
• Min/Max doesn’t tell us how many outliers
• Line 2/4 have almost 50 in the .5 bucket
• Line 5 has 6 PUTs >4 seconds
• Line 6 all over the place
4/19/2013 Getput Swift Performance Tools 17
18. Example 1: Latency of 0.04 too high!
• When looking at 1k, 10k and 100k GETS, noticed IOPS for 10k
were lower!
– Great reason to look at more than MB/sec
• After much digging discovered this only applied to object sizes
7888 -> 22469 bytes
– This could only have been found by running sets of tests and looking
very closely at the numbers
• What’s going on here?
4/19/2013 Getput Swift Performance Tools 18
19. Example 1: Latency of 0.04 too high!
• When looking at 1k, 10k and 100k GETS, noticed IOPS for 10k
were lower!
– Great reason to look at more than MB/sec
• After much digging discovered this only applied to object sizes
7888 -> 22469 bytes
– This could only have been found by running sets of tests and looking
very closely at the numbers
• What’s going on here?
– We run pound on proxies to support multiple connection ports
– Proxy does fast get and passes data to pound over loopback address
– Max segsize for loopback >> network MSS
– Eventlet uses 8192 byte buffers
– Nagle algorithm: bytes > 8192 and ~<8192+MSS have delayed ACK
• Eventlet needs bigger buffers? Turn off nagle?
4/19/2013 Getput Swift Performance Tools 19
20. Example 2: Latency 0.5
• Observed a number of these in small object PUTs
• Caused by a proxy timeout connecting to obj server
• Might be worth looking into ways to reduce and/or
not try to re-contact a non-responsive server
4/19/2013 Getput Swift Performance Tools 20
21. Example 3: Latency 6 Secs
• These occur less frequently, but do happen
• Traced back to disk error on object server
• BUT the other 2 object servers responded in < 1sec
• Think about how many IOPS are being lost!
Might it be worth it to return after 2 successes?
Maybe at least ignore writes to that disk?
4/19/2013 Getput Swift Performance Tools 21
22. So what’s next for latency?
• Investigate why some ops have even longer latencies
• Added another switch to getput! --logops
– Extended put_object() to return transaction ID
– Writes detailed log records for every operation
– Makes it possible for longer latency transactions to be traced
segerm@az1-nv-compute-0000:~$ more /tmp/getput-p-0-1363878303.log
15:05:03.522 1363878303.521659 1363878303.459080 0.062547 eb4194b73e46f52f774a63fa552755d4 o-0-1-1
15:05:03.574 1363878303.574005 1363878303.521702 0.052291 eb4194b73e46f52f774a63fa552755d4 o-0-1-2
15:05:03.627 1363878303.627218 1363878303.574032 0.053174 eb4194b73e46f52f774a63fa552755d4 o-0-1-3
15:05:03.686 1363878303.686175 1363878303.627244 0.058918 eb4194b73e46f52f774a63fa552755d4 o-0-1-4
15:05:03.747 1363878303.746874 1363878303.686201 0.060661 eb4194b73e46f52f774a63fa552755d4 o-0-1-5
15:05:03.804 1363878303.804106 1363878303.746900 0.057194 eb4194b73e46f52f774a63fa552755d4 o-0-1-6
15:05:03.866 1363878303.866148 1363878303.804133 0.061979 eb4194b73e46f52f774a63fa552755d4 o-0-1-7
15:05:03.932 1363878303.931911 1363878303.866175 0.065724 eb4194b73e46f52f774a63fa552755d4 o-0-1-8
Recommendation: GET, PUT and DEL calls should return transaction IDs
4/19/2013 Getput Swift Performance Tools 22
23. swcmd: a nifty helper utility
• One challenge of benchmarking can be LOTs of
containers and objects needing cleanup
– Can have dozens to 100s containers
– Can have Ks to 100Ks of objects
– Swift client too slow for deletes!
• Swift client utility could use some more functionality
– How about displaying numbers of objects in containers?
– Container sizes and even dates?
– When listing containers same things
– What about parallel or even wild card listing/deletes?
• Only parallelizes for >1K objects in a container
• Uses multiprocessing can hit 300-400 deletes/sec
4/19/2013 Getput Swift Performance Tools 23
24. Examples
swcmd ls
63482 61M 2013-03-21 16:19:12 qc-1363882747
49 4G 2013-03-09 13:13:36 vlat-1362834811
0 0 2013-03-20 22:05:06 vlat-1363817101
1 10 2013-03-15 13:58:37 xxx-0-0
1 200M 2013-03-11 12:28:16 xyxxy
2 200M 2013-03-11 12:29:01 xyzzy
2901 702M 2013-02-12 16:34:19 zzz
swcmd –p ls xyz # list containers starting with xyz
swcmd –f rc zzz # force removal of zzz even though not empty
swcmd –p pf x # force removal of ALL containers starting with x
Swcmd rm xyzzy/xyzzy # remove specific object
Recommendation: add these types of features to the swift utility
4/19/2013 Getput Swift Performance Tools 24