http://adl.tw/~jeremy/slides/presentation2.pptx
Attached detailed Analysis of CVE-2013-2094 (&on x86-32).
Exploit the CVE-2013-2094 with animation
There have been more vulnerabilities in the Linux Kernel in 2013 than there had been in the previous decade. In this paper, the research was focused on defending against arbitrary memory overwrites in Privilege Escalation.
To avoid malicious users getting root authority. The easiest way is to set the sensitive data structure to read-only. But we are not sure the sensitive data structure will never be modified by legal behavior from a normal device driver; thus, we posed a compatible solution between read-only solutions and writable solutions to enhance compatibility.
The main idea that we posed not only solves the above problem, but also the general problem which is ensuring that important memory values can only be changed within a safe range.
It is not just set to read-only.
Key Word : Linux Kernel Vulnerabilities、exploit、Privilege Escalation
This talk is all about the Berkeley Packet Filters (BPF) and their uses in Linux.
Agenda:
* What is a BPF and why do we need it?
* Writing custom BPFs
* Notes on BPF implementation in the kernel
* Usage examples: SOCKET_FILTER & seccomp
Speaker:
Kfir Gollan, senior embedded software developer, Linux kernel hacker and software team leader.
ACM Applicative System Methodology 2016Brendan Gregg
Video: https://youtu.be/eO94l0aGLCA?t=3m37s . Talk by Brendan Gregg for ACM Applicative 2016
"System Methodology - Holistic Performance Analysis on Modern Systems
Traditional systems performance engineering makes do with vendor-supplied metrics, often involving interpretation and inference, and with numerous blind spots. Much in the field of systems performance is still living in the past: documentation, procedures, and analysis GUIs built upon the same old metrics. For modern systems, we can choose the metrics, and can choose ones we need to support new holistic performance analysis methodologies. These methodologies provide faster, more accurate, and more complete analysis, and can provide a starting point for unfamiliar systems.
Methodologies are especially helpful for modern applications and their workloads, which can pose extremely complex problems with no obvious starting point. There are also continuous deployment environments such as the Netflix cloud, where these problems must be solved in shorter time frames. Fortunately, with advances in system observability and tracers, we have virtually endless custom metrics to aid performance analysis. The problem becomes which metrics to use, and how to navigate them quickly to locate the root cause of problems.
System methodologies provide a starting point for analysis, as well as guidance for quickly moving through the metrics to root cause. They also pose questions that the existing metrics may not yet answer, which may be critical in solving the toughest problems. System methodologies include the USE method, workload characterization, drill-down analysis, off-CPU analysis, and more.
This talk will discuss various system performance issues, and the methodologies, tools, and processes used to solve them. The focus is on single systems (any operating system), including single cloud instances, and quickly locating performance issues or exonerating the system. Many methodologies will be discussed, along with recommendations for their implementation, which may be as documented checklists of tools, or custom dashboards of supporting metrics. In general, you will learn to think differently about your systems, and how to ask better questions."
Linux 4.x Tracing Tools: Using BPF SuperpowersBrendan Gregg
Talk for USENIX LISA 2016 by Brendan Gregg.
"Linux 4.x Tracing Tools: Using BPF Superpowers
The Linux 4.x series heralds a new era of Linux performance analysis, with the long-awaited integration of a programmable tracer: Enhanced BPF (eBPF). Formally the Berkeley Packet Filter, BPF has been enhanced in Linux to provide system tracing capabilities, and integrates with dynamic tracing (kprobes and uprobes) and static tracing (tracepoints and USDT). This has allowed dozens of new observability tools to be developed so far: for example, measuring latency distributions for file system I/O and run queue latency, printing details of storage device I/O and TCP retransmits, investigating blocked stack traces and memory leaks, and a whole lot more. These lead to performance wins large and small, especially when instrumenting areas that previously had zero visibility. Tracing superpowers have finally arrived.
In this talk I'll show you how to use BPF in the Linux 4.x series, and I'll summarize the different tools and front ends available, with a focus on iovisor bcc. bcc is an open source project to provide a Python front end for BPF, and comes with dozens of new observability tools (many of which I developed). These tools include new BPF versions of old classics, and many new tools, including: execsnoop, opensnoop, funccount, trace, biosnoop, bitesize, ext4slower, ext4dist, tcpconnect, tcpretrans, runqlat, offcputime, offwaketime, and many more. I'll also summarize use cases and some long-standing issues that can now be solved, and how we are using these capabilities at Netflix."
Broken benchmarks, misleading metrics, and terrible tools. This talk will help you navigate the treacherous waters of Linux performance tools, touring common problems with system tools, metrics, statistics, visualizations, measurement overhead, and benchmarks. You might discover that tools you have been using for years, are in fact, misleading, dangerous, or broken.
The speaker, Brendan Gregg, has given many talks on tools that work, including giving the Linux PerformanceTools talk originally at SCALE. This is an anti-version of that talk, to focus on broken tools and metrics instead of the working ones. Metrics can be misleading, and counters can be counter-intuitive! This talk will include advice for verifying new performance tools, understanding how they work, and using them successfully.
Delivered as plenary at USENIX LISA 2013. video here: https://www.youtube.com/watch?v=nZfNehCzGdw and https://www.usenix.org/conference/lisa13/technical-sessions/plenary/gregg . "How did we ever analyze performance before Flame Graphs?" This new visualization invented by Brendan can help you quickly understand application and kernel performance, especially CPU usage, where stacks (call graphs) can be sampled and then visualized as an interactive flame graph. Flame Graphs are now used for a growing variety of targets: for applications and kernels on Linux, SmartOS, Mac OS X, and Windows; for languages including C, C++, node.js, ruby, and Lua; and in WebKit Web Inspector. This talk will explain them and provide use cases and new visualizations for other event types, including I/O, memory usage, and latency.
This talk is all about the Berkeley Packet Filters (BPF) and their uses in Linux.
Agenda:
* What is a BPF and why do we need it?
* Writing custom BPFs
* Notes on BPF implementation in the kernel
* Usage examples: SOCKET_FILTER & seccomp
Speaker:
Kfir Gollan, senior embedded software developer, Linux kernel hacker and software team leader.
ACM Applicative System Methodology 2016Brendan Gregg
Video: https://youtu.be/eO94l0aGLCA?t=3m37s . Talk by Brendan Gregg for ACM Applicative 2016
"System Methodology - Holistic Performance Analysis on Modern Systems
Traditional systems performance engineering makes do with vendor-supplied metrics, often involving interpretation and inference, and with numerous blind spots. Much in the field of systems performance is still living in the past: documentation, procedures, and analysis GUIs built upon the same old metrics. For modern systems, we can choose the metrics, and can choose ones we need to support new holistic performance analysis methodologies. These methodologies provide faster, more accurate, and more complete analysis, and can provide a starting point for unfamiliar systems.
Methodologies are especially helpful for modern applications and their workloads, which can pose extremely complex problems with no obvious starting point. There are also continuous deployment environments such as the Netflix cloud, where these problems must be solved in shorter time frames. Fortunately, with advances in system observability and tracers, we have virtually endless custom metrics to aid performance analysis. The problem becomes which metrics to use, and how to navigate them quickly to locate the root cause of problems.
System methodologies provide a starting point for analysis, as well as guidance for quickly moving through the metrics to root cause. They also pose questions that the existing metrics may not yet answer, which may be critical in solving the toughest problems. System methodologies include the USE method, workload characterization, drill-down analysis, off-CPU analysis, and more.
This talk will discuss various system performance issues, and the methodologies, tools, and processes used to solve them. The focus is on single systems (any operating system), including single cloud instances, and quickly locating performance issues or exonerating the system. Many methodologies will be discussed, along with recommendations for their implementation, which may be as documented checklists of tools, or custom dashboards of supporting metrics. In general, you will learn to think differently about your systems, and how to ask better questions."
Linux 4.x Tracing Tools: Using BPF SuperpowersBrendan Gregg
Talk for USENIX LISA 2016 by Brendan Gregg.
"Linux 4.x Tracing Tools: Using BPF Superpowers
The Linux 4.x series heralds a new era of Linux performance analysis, with the long-awaited integration of a programmable tracer: Enhanced BPF (eBPF). Formally the Berkeley Packet Filter, BPF has been enhanced in Linux to provide system tracing capabilities, and integrates with dynamic tracing (kprobes and uprobes) and static tracing (tracepoints and USDT). This has allowed dozens of new observability tools to be developed so far: for example, measuring latency distributions for file system I/O and run queue latency, printing details of storage device I/O and TCP retransmits, investigating blocked stack traces and memory leaks, and a whole lot more. These lead to performance wins large and small, especially when instrumenting areas that previously had zero visibility. Tracing superpowers have finally arrived.
In this talk I'll show you how to use BPF in the Linux 4.x series, and I'll summarize the different tools and front ends available, with a focus on iovisor bcc. bcc is an open source project to provide a Python front end for BPF, and comes with dozens of new observability tools (many of which I developed). These tools include new BPF versions of old classics, and many new tools, including: execsnoop, opensnoop, funccount, trace, biosnoop, bitesize, ext4slower, ext4dist, tcpconnect, tcpretrans, runqlat, offcputime, offwaketime, and many more. I'll also summarize use cases and some long-standing issues that can now be solved, and how we are using these capabilities at Netflix."
Broken benchmarks, misleading metrics, and terrible tools. This talk will help you navigate the treacherous waters of Linux performance tools, touring common problems with system tools, metrics, statistics, visualizations, measurement overhead, and benchmarks. You might discover that tools you have been using for years, are in fact, misleading, dangerous, or broken.
The speaker, Brendan Gregg, has given many talks on tools that work, including giving the Linux PerformanceTools talk originally at SCALE. This is an anti-version of that talk, to focus on broken tools and metrics instead of the working ones. Metrics can be misleading, and counters can be counter-intuitive! This talk will include advice for verifying new performance tools, understanding how they work, and using them successfully.
Delivered as plenary at USENIX LISA 2013. video here: https://www.youtube.com/watch?v=nZfNehCzGdw and https://www.usenix.org/conference/lisa13/technical-sessions/plenary/gregg . "How did we ever analyze performance before Flame Graphs?" This new visualization invented by Brendan can help you quickly understand application and kernel performance, especially CPU usage, where stacks (call graphs) can be sampled and then visualized as an interactive flame graph. Flame Graphs are now used for a growing variety of targets: for applications and kernels on Linux, SmartOS, Mac OS X, and Windows; for languages including C, C++, node.js, ruby, and Lua; and in WebKit Web Inspector. This talk will explain them and provide use cases and new visualizations for other event types, including I/O, memory usage, and latency.
If you've heard about Burst but aren't sure how to start using it in your games, then this is the talk for you. The first part of this deck is hands-on in Unity and Visual Studio. We'll start with a simple game, then port game code to the C# Job System and speed it up using Burst. Next, the session will explore what you can and can't do with Burst and discuss common pitfalls, such as unsupported C# constructs.
Speaker: Lee Hammerton – Unity
Watch the session on YouTube: https://youtu.be/Tzn-nX9hK1o
Linux Performance Analysis: New Tools and Old SecretsBrendan Gregg
Talk for USENIX/LISA2014 by Brendan Gregg, Netflix. At Netflix performance is crucial, and we use many high to low level tools to analyze our stack in different ways. In this talk, I will introduce new system observability tools we are using at Netflix, which I've ported from my DTraceToolkit, and are intended for our Linux 3.2 cloud instances. These show that Linux can do more than you may think, by using creative hacks and workarounds with existing kernel features (ftrace, perf_events). While these are solving issues on current versions of Linux, I'll also briefly summarize the future in this space: eBPF, ktap, SystemTap, sysdig, etc.
Analyzing OS X Systems Performance with the USE MethodBrendan Gregg
Talk for MacIT 2014. This talk is about systems performance on OS X, and introduces the USE Method to check for common performance bottlenecks and errors. This methodology can be used by beginners and experts alike, and begins by constructing a checklist of the questions we’d like to ask of the system, before reaching for tools to answer them. The focus is resources: CPUs, GPUs, memory capacity, network interfaces, storage devices, controllers, interconnects, as well as some software resources such as mutex locks. These areas are investigated by a wide variety of tools, including vm_stat, iostat, netstat, top, latency, the DTrace scripts in /usr/bin (which were written by Brendan), custom DTrace scripts, Instruments, and more. This is a tour of the tools needed to solve our performance needs, rather than understanding tools just because they exist. This talk will make you aware of many areas of OS X that you can investigate, which will be especially useful for the time when you need to get to the bottom of a performance issue.
We Love Performance! How Tic Toc Games Uses ECS in Mobile Puzzle GamesUnity Technologies
Tic Toc Games recently implemented Unity's Entity Component System (ECS) in their mobile puzzle engine, which brought both great performance improvements and faster iteration time. This intermediate-level session will explain why ECS is able to process data much faster, how they increased iteration speed using ECS, and their experience learning and working with Unity's ECS package.
Garth Smith - Tic Toc Games
USENIX LISA2021 talk by Brendan Gregg (https://www.youtube.com/watch?v=_5Z2AU7QTH4). This talk is a deep dive that describes how BPF (eBPF) works internally on Linux, and dissects some modern performance observability tools. Details covered include the kernel BPF implementation: the verifier, JIT compilation, and the BPF execution environment; the BPF instruction set; different event sources; and how BPF is used by user space, using bpftrace programs as an example. This includes showing how bpftrace is compiled to LLVM IR and then BPF bytecode, and how per-event data and aggregated map data are fetched from the kernel.
Optimize your game with the Profile Analyzer - Unite Copenhagen 2019Unity Technologies
Have you ever needed to compare the difference in performance between two versions of your project? This session will show you how to use Unity's Profile Analyzer to see the impact of an asset or code change, optimization work, settings modification, or Unity version upgrade to verify enhancements.
Speaker: Lyndon Homewood- Unity
Watch the session on YouTube: https://youtu.be/0lzqdDdE9Tc
Video: https://www.youtube.com/watch?v=FJW8nGV4jxY and https://www.youtube.com/watch?v=zrr2nUln9Kk . Tutorial slides for O'Reilly Velocity SC 2015, by Brendan Gregg.
There are many performance tools nowadays for Linux, but how do they all fit together, and when do we use them? This tutorial explains methodologies for using these tools, and provides a tour of four tool types: observability, benchmarking, tuning, and static tuning. Many tools will be discussed, including top, iostat, tcpdump, sar, perf_events, ftrace, SystemTap, sysdig, and others, as well observability frameworks in the Linux kernel: PMCs, tracepoints, kprobes, and uprobes.
This tutorial is updated and extended on an earlier talk that summarizes the Linux performance tool landscape. The value of this tutorial is not just learning that these tools exist and what they do, but hearing when and how they are used by a performance engineer to solve real world problems — important context that is typically not included in the standard documentation.
SREcon 2016 Performance Checklists for SREsBrendan Gregg
Talk from SREcon2016 by Brendan Gregg. Video: https://www.usenix.org/conference/srecon16/program/presentation/gregg . "There's limited time for performance analysis in the emergency room. When there is a performance-related site outage, the SRE team must analyze and solve complex performance issues as quickly as possible, and under pressure. Many performance tools and techniques are designed for a different environment: an engineer analyzing their system over the course of hours or days, and given time to try dozens of tools: profilers, tracers, monitoring tools, benchmarks, as well as different tunings and configurations. But when Netflix is down, minutes matter, and there's little time for such traditional systems analysis. As with aviation emergencies, short checklists and quick procedures can be applied by the on-call SRE staff to help solve performance issues as quickly as possible.
In this talk, I'll cover a checklist for Linux performance analysis in 60 seconds, as well as other methodology-derived checklists and procedures for cloud computing, with examples of performance issues for context. Whether you are solving crises in the SRE war room, or just have limited time for performance engineering, these checklists and approaches should help you find some quick performance wins. Safe flying."
Linux kernel tracing superpowers in the cloudAndrea Righi
The Linux 4.x series introduced a new powerful engine of programmable tracing (BPF) that allows to actually look inside the kernel at runtime. This talk will show you how to exploit this engine in order to debug problems or identify performance bottlenecks in a complex environment like a cloud. This talk will cover the latest Linux superpowers that allow to see what is happening “under the hood” of the Linux kernel at runtime. I will explain how to exploit these “superpowers” to measure and trace complex events at runtime in a cloud environment. For example, we will see how we can measure latency distribution of filesystem I/O, details of storage device operations, like individual block I/O request timeouts, or TCP buffer allocations, investigating stack traces of certain events, identify memory leaks, performance bottlenecks and a whole lot more.
If you've heard about Burst but aren't sure how to start using it in your games, then this is the talk for you. The first part of this deck is hands-on in Unity and Visual Studio. We'll start with a simple game, then port game code to the C# Job System and speed it up using Burst. Next, the session will explore what you can and can't do with Burst and discuss common pitfalls, such as unsupported C# constructs.
Speaker: Lee Hammerton – Unity
Watch the session on YouTube: https://youtu.be/Tzn-nX9hK1o
Linux Performance Analysis: New Tools and Old SecretsBrendan Gregg
Talk for USENIX/LISA2014 by Brendan Gregg, Netflix. At Netflix performance is crucial, and we use many high to low level tools to analyze our stack in different ways. In this talk, I will introduce new system observability tools we are using at Netflix, which I've ported from my DTraceToolkit, and are intended for our Linux 3.2 cloud instances. These show that Linux can do more than you may think, by using creative hacks and workarounds with existing kernel features (ftrace, perf_events). While these are solving issues on current versions of Linux, I'll also briefly summarize the future in this space: eBPF, ktap, SystemTap, sysdig, etc.
Analyzing OS X Systems Performance with the USE MethodBrendan Gregg
Talk for MacIT 2014. This talk is about systems performance on OS X, and introduces the USE Method to check for common performance bottlenecks and errors. This methodology can be used by beginners and experts alike, and begins by constructing a checklist of the questions we’d like to ask of the system, before reaching for tools to answer them. The focus is resources: CPUs, GPUs, memory capacity, network interfaces, storage devices, controllers, interconnects, as well as some software resources such as mutex locks. These areas are investigated by a wide variety of tools, including vm_stat, iostat, netstat, top, latency, the DTrace scripts in /usr/bin (which were written by Brendan), custom DTrace scripts, Instruments, and more. This is a tour of the tools needed to solve our performance needs, rather than understanding tools just because they exist. This talk will make you aware of many areas of OS X that you can investigate, which will be especially useful for the time when you need to get to the bottom of a performance issue.
We Love Performance! How Tic Toc Games Uses ECS in Mobile Puzzle GamesUnity Technologies
Tic Toc Games recently implemented Unity's Entity Component System (ECS) in their mobile puzzle engine, which brought both great performance improvements and faster iteration time. This intermediate-level session will explain why ECS is able to process data much faster, how they increased iteration speed using ECS, and their experience learning and working with Unity's ECS package.
Garth Smith - Tic Toc Games
USENIX LISA2021 talk by Brendan Gregg (https://www.youtube.com/watch?v=_5Z2AU7QTH4). This talk is a deep dive that describes how BPF (eBPF) works internally on Linux, and dissects some modern performance observability tools. Details covered include the kernel BPF implementation: the verifier, JIT compilation, and the BPF execution environment; the BPF instruction set; different event sources; and how BPF is used by user space, using bpftrace programs as an example. This includes showing how bpftrace is compiled to LLVM IR and then BPF bytecode, and how per-event data and aggregated map data are fetched from the kernel.
Optimize your game with the Profile Analyzer - Unite Copenhagen 2019Unity Technologies
Have you ever needed to compare the difference in performance between two versions of your project? This session will show you how to use Unity's Profile Analyzer to see the impact of an asset or code change, optimization work, settings modification, or Unity version upgrade to verify enhancements.
Speaker: Lyndon Homewood- Unity
Watch the session on YouTube: https://youtu.be/0lzqdDdE9Tc
Video: https://www.youtube.com/watch?v=FJW8nGV4jxY and https://www.youtube.com/watch?v=zrr2nUln9Kk . Tutorial slides for O'Reilly Velocity SC 2015, by Brendan Gregg.
There are many performance tools nowadays for Linux, but how do they all fit together, and when do we use them? This tutorial explains methodologies for using these tools, and provides a tour of four tool types: observability, benchmarking, tuning, and static tuning. Many tools will be discussed, including top, iostat, tcpdump, sar, perf_events, ftrace, SystemTap, sysdig, and others, as well observability frameworks in the Linux kernel: PMCs, tracepoints, kprobes, and uprobes.
This tutorial is updated and extended on an earlier talk that summarizes the Linux performance tool landscape. The value of this tutorial is not just learning that these tools exist and what they do, but hearing when and how they are used by a performance engineer to solve real world problems — important context that is typically not included in the standard documentation.
SREcon 2016 Performance Checklists for SREsBrendan Gregg
Talk from SREcon2016 by Brendan Gregg. Video: https://www.usenix.org/conference/srecon16/program/presentation/gregg . "There's limited time for performance analysis in the emergency room. When there is a performance-related site outage, the SRE team must analyze and solve complex performance issues as quickly as possible, and under pressure. Many performance tools and techniques are designed for a different environment: an engineer analyzing their system over the course of hours or days, and given time to try dozens of tools: profilers, tracers, monitoring tools, benchmarks, as well as different tunings and configurations. But when Netflix is down, minutes matter, and there's little time for such traditional systems analysis. As with aviation emergencies, short checklists and quick procedures can be applied by the on-call SRE staff to help solve performance issues as quickly as possible.
In this talk, I'll cover a checklist for Linux performance analysis in 60 seconds, as well as other methodology-derived checklists and procedures for cloud computing, with examples of performance issues for context. Whether you are solving crises in the SRE war room, or just have limited time for performance engineering, these checklists and approaches should help you find some quick performance wins. Safe flying."
Linux kernel tracing superpowers in the cloudAndrea Righi
The Linux 4.x series introduced a new powerful engine of programmable tracing (BPF) that allows to actually look inside the kernel at runtime. This talk will show you how to exploit this engine in order to debug problems or identify performance bottlenecks in a complex environment like a cloud. This talk will cover the latest Linux superpowers that allow to see what is happening “under the hood” of the Linux kernel at runtime. I will explain how to exploit these “superpowers” to measure and trace complex events at runtime in a cloud environment. For example, we will see how we can measure latency distribution of filesystem I/O, details of storage device operations, like individual block I/O request timeouts, or TCP buffer allocations, investigating stack traces of certain events, identify memory leaks, performance bottlenecks and a whole lot more.
O'Reilly Velocity New York 2016 presentation on modern Linux tracing tools and technology. Highlights the available tracing data sources on Linux (ftrace, perf_events, BPF) and demonstrates some tools that can be used to obtain traces, including DebugFS, the perf front-end, and most importantly, the BCC/BPF tool collection.
Agenda:
The Linux kernel has multiple "tracers" built-in, with various degrees of support for aggregation, dynamic probes, parameter processing, filtering, histograms, and other features. Starting from the venerable ftrace, introduced in kernel 2.6, all the way through eBPF, which is still under development, there are many options to choose from when you need to statically instrument your software with probes, or diagnose issues in the field using the system's dynamic probes. Modern tools include SystemTap, Sysdig, ktap, perf, bcc, and others. In this talk, we will begin by reviewing the modern tracing landscape -- ftrace, perf_events, kprobes, uprobes, eBPF -- and what insight into system activity these tools can offer. Then, we will look at specific examples of using tracing tools for diagnostics: tracing a memory leak using low-overhead kmalloc/kfree instrumentation, diagnosing a CPU caching issue using perf stat, probing network and block I/O latency distributions under load, or merely snooping user activities by capturing terminal input and output.
Speaker:
Sasha is the CTO of Sela Group, a training and consulting company based in Israel that employs over 400 developers world-wide. Most of Sasha's work revolves around performance optimization, production debugging, and low-level system diagnostics, but he also dabbles in mobile application development on iOS and Android. Sasha is the author of two books and three Pluralsight courses, and a contributor to multiple open-source projects. He blogs at http://blog.sashag.net.
Exploring Compiler Optimization Opportunities for the OpenMP 4.x Accelerator...Akihiro Hayashi
Third Workshop on Accelerator Programming Using Directives (WACCPD2016, co-located with SC16)
While GPUs are increasingly popular for high-performance
computing, optimizing the performance of GPU programs is a time-consuming and non-trivial process in general. This complexity stems from the low abstraction level of standard
GPU programming models such as CUDA and OpenCL:
programmers are required to orchestrate low-level operations
in order to exploit the full capability of GPUs. In terms of
software productivity and portability, a more attractive approach
would be to facilitate GPU programming by providing high-level
abstractions for expressing parallel algorithms.
OpenMP is a directive-based shared memory parallel programming model and has been widely used for many years.
From OpenMP 4.0 onwards, GPU platforms are supported
by extending OpenMP’s high-level parallel abstractions with
accelerator programming. This extension allows programmers to
write GPU programs in standard C/C++ or Fortran languages,
without exposing too many details of GPU architectures.
However, such high-level parallel programming strategies generally impose additional program optimizations on compilers,
which could result in lower performance than fully hand-tuned
code with low-level programming models.To study potential
performance improvements by compiling and optimizing high-level GPU programs, in this paper, we 1) evaluate a set of
OpenMP 4.x benchmarks on an IBM POWER8 and NVIDIA
Tesla GPU platform and 2) conduct a comparable performance
analysis among hand-written CUDA and automatically-generated
GPU programs by the IBM XL and clang/LLVM compilers.
Combining Phase Identification and Statistic Modeling for Automated Parallel ...Mingliang Liu
Parallel application benchmarks are indispensable for evaluating/optimizing HPC software and hardware. However, it is very challenging and costly to obtain high-fidelity benchmarks reflecting the scale and complexity of state-of-the-art parallel applications. Hand-extracted synthetic benchmarks are time- and labor-intensive to create. Real applications themselves, while offering most accurate performance evaluation, are expensive to compile, port, reconfigure, and often plainly inaccessible due to security or ownership concerns. This work contributes APPrime, a novel tool for trace-based automatic parallel benchmark generation. Taking as input standard communication-I/O traces of an application's execution, it couples accurate automatic phase identification with statistical regeneration of event parameters to create compact, portable, and to some degree reconfigurable parallel application benchmarks. Experiments with four NAS Parallel Benchmarks (NPB) and three real scientific simulation codes confirm the fidelity of APPrime benchmarks. They retain the original applications' performance characteristics, in particular their relative performance across platforms. Also, the result benchmarks, already released online, are much more compact and easy-to-port compared to the original applications.
http://dl.acm.org/citation.cfm?id=2745876
The Java Virtual Machine (JVM) can deliver significantly better performance through the use of Just In Time compilation. However, each time you start an application it needs to repeat the same process of analysis and compilation. This session discusses Java with Co-ordinated Checkpoint at Restore. This is a way to freeze an application and start it again (potentially many times) from the same checkpoint.
Instrumenting application code is like flossing your teeth. Developers know they ought to be doing it more often. Code instrumentation is an important practice for establishing baseline performance metrics and identifying bottlenecks. Getting the right metrics is core to understanding how much concurrency your application can handle, determining what latency is normal for the application, and indicating when performance is deviating from those norms.
While most developers acknowledge the value of instrumentation, few actually implement it. If Bytecode injection sounds as scary as a root canal, take heart, effective instrumentation doesn't have to be complicated. I've written an open-source instrumentation framework to encourage developers to get the metrics they need to pilot their application safely. We'll examine some strategies for code instrumentation, run some load tests, and make sense of the numbers.
Understand and Harness the Capabilities of Intel® Xeon Phi™ ProcessorsIntel® Software
The second-generation Intel® Xeon Phi™ processor offers new and enhanced features that provide significant performance gains in modernized code. For this lab, we pair these features with Intel® Software Development Products and methodologies to enable developers to gain insights on application behavior and to find opportunities to optimize parallelism, memory, and vectorization features.
This is the "Deep Dive" talk given at the first Apache Flink Meetup Stockholm. The talk describes three components of the Apache Flink Internals: (a) job life-cycle, (b) the batch optimizer and (c) native iterations.
Transform Your Communication with Cloud-Based IVR SolutionsTheSMSPoint
Discover the power of Cloud-Based IVR Solutions to streamline communication processes. Embrace scalability and cost-efficiency while enhancing customer experiences with features like automated call routing and voice recognition. Accessible from anywhere, these solutions integrate seamlessly with existing systems, providing real-time analytics for continuous improvement. Revolutionize your communication strategy today with Cloud-Based IVR Solutions. Learn more at: https://thesmspoint.com/channel/cloud-telephony
AI Pilot Review: The World’s First Virtual Assistant Marketing SuiteGoogle
AI Pilot Review: The World’s First Virtual Assistant Marketing Suite
👉👉 Click Here To Get More Info 👇👇
https://sumonreview.com/ai-pilot-review/
AI Pilot Review: Key Features
✅Deploy AI expert bots in Any Niche With Just A Click
✅With one keyword, generate complete funnels, websites, landing pages, and more.
✅More than 85 AI features are included in the AI pilot.
✅No setup or configuration; use your voice (like Siri) to do whatever you want.
✅You Can Use AI Pilot To Create your version of AI Pilot And Charge People For It…
✅ZERO Manual Work With AI Pilot. Never write, Design, Or Code Again.
✅ZERO Limits On Features Or Usages
✅Use Our AI-powered Traffic To Get Hundreds Of Customers
✅No Complicated Setup: Get Up And Running In 2 Minutes
✅99.99% Up-Time Guaranteed
✅30 Days Money-Back Guarantee
✅ZERO Upfront Cost
See My Other Reviews Article:
(1) TubeTrivia AI Review: https://sumonreview.com/tubetrivia-ai-review
(2) SocioWave Review: https://sumonreview.com/sociowave-review
(3) AI Partner & Profit Review: https://sumonreview.com/ai-partner-profit-review
(4) AI Ebook Suite Review: https://sumonreview.com/ai-ebook-suite-review
E-commerce Application Development Company.pdfHornet Dynamics
Your business can reach new heights with our assistance as we design solutions that are specifically appropriate for your goals and vision. Our eCommerce application solutions can digitally coordinate all retail operations processes to meet the demands of the marketplace while maintaining business continuity.
Artificia Intellicence and XPath Extension FunctionsOctavian Nadolu
The purpose of this presentation is to provide an overview of how you can use AI from XSLT, XQuery, Schematron, or XML Refactoring operations, the potential benefits of using AI, and some of the challenges we face.
Quarkus Hidden and Forbidden ExtensionsMax Andersen
Quarkus has a vast extension ecosystem and is known for its subsonic and subatomic feature set. Some of these features are not as well known, and some extensions are less talked about, but that does not make them less interesting - quite the opposite.
Come join this talk to see some tips and tricks for using Quarkus and some of the lesser known features, extensions and development techniques.
Graspan: A Big Data System for Big Code AnalysisAftab Hussain
We built a disk-based parallel graph system, Graspan, that uses a novel edge-pair centric computation model to compute dynamic transitive closures on very large program graphs.
We implement context-sensitive pointer/alias and dataflow analyses on Graspan. An evaluation of these analyses on large codebases such as Linux shows that their Graspan implementations scale to millions of lines of code and are much simpler than their original implementations.
These analyses were used to augment the existing checkers; these augmented checkers found 132 new NULL pointer bugs and 1308 unnecessary NULL tests in Linux 4.4.0-rc5, PostgreSQL 8.3.9, and Apache httpd 2.2.18.
- Accepted in ASPLOS ‘17, Xi’an, China.
- Featured in the tutorial, Systemized Program Analyses: A Big Data Perspective on Static Analysis Scalability, ASPLOS ‘17.
- Invited for presentation at SoCal PLS ‘16.
- Invited for poster presentation at PLDI SRC ‘16.
DDS Security Version 1.2 was adopted in 2024. This revision strengthens support for long runnings systems adding new cryptographic algorithms, certificate revocation, and hardness against DoS attacks.
Utilocate offers a comprehensive solution for locate ticket management by automating and streamlining the entire process. By integrating with Geospatial Information Systems (GIS), it provides accurate mapping and visualization of utility locations, enhancing decision-making and reducing the risk of errors. The system's advanced data analytics tools help identify trends, predict potential issues, and optimize resource allocation, making the locate ticket management process smarter and more efficient. Additionally, automated ticket management ensures consistency and reduces human error, while real-time notifications keep all relevant personnel informed and ready to respond promptly.
The system's ability to streamline workflows and automate ticket routing significantly reduces the time taken to process each ticket, making the process faster and more efficient. Mobile access allows field technicians to update ticket information on the go, ensuring that the latest information is always available and accelerating the locate process. Overall, Utilocate not only enhances the efficiency and accuracy of locate ticket management but also improves safety by minimizing the risk of utility damage through precise and timely locates.
Atelier - Innover avec l’IA Générative et les graphes de connaissancesNeo4j
Atelier - Innover avec l’IA Générative et les graphes de connaissances
Allez au-delà du battage médiatique autour de l’IA et découvrez des techniques pratiques pour utiliser l’IA de manière responsable à travers les données de votre organisation. Explorez comment utiliser les graphes de connaissances pour augmenter la précision, la transparence et la capacité d’explication dans les systèmes d’IA générative. Vous partirez avec une expérience pratique combinant les relations entre les données et les LLM pour apporter du contexte spécifique à votre domaine et améliorer votre raisonnement.
Amenez votre ordinateur portable et nous vous guiderons sur la mise en place de votre propre pile d’IA générative, en vous fournissant des exemples pratiques et codés pour démarrer en quelques minutes.
Preparing Non - Technical Founders for Engaging a Tech AgencyISH Technologies
Preparing non-technical founders before engaging a tech agency is crucial for the success of their projects. It starts with clearly defining their vision and goals, conducting thorough market research, and gaining a basic understanding of relevant technologies. Setting realistic expectations and preparing a detailed project brief are essential steps. Founders should select a tech agency with a proven track record and establish clear communication channels. Additionally, addressing legal and contractual considerations and planning for post-launch support are vital to ensure a smooth and successful collaboration. This preparation empowers non-technical founders to effectively communicate their needs and work seamlessly with their chosen tech agency.Visit our site to get more details about this. Contact us today www.ishtechnologies.com.au
Why Mobile App Regression Testing is Critical for Sustained Success_ A Detail...kalichargn70th171
A dynamic process unfolds in the intricate realm of software development, dedicated to crafting and sustaining products that effortlessly address user needs. Amidst vital stages like market analysis and requirement assessments, the heart of software development lies in the meticulous creation and upkeep of source code. Code alterations are inherent, challenging code quality, particularly under stringent deadlines.
OpenMetadata Community Meeting - 5th June 2024OpenMetadata
The OpenMetadata Community Meeting was held on June 5th, 2024. In this meeting, we discussed about the data quality capabilities that are integrated with the Incident Manager, providing a complete solution to handle your data observability needs. Watch the end-to-end demo of the data quality features.
* How to run your own data quality framework
* What is the performance impact of running data quality frameworks
* How to run the test cases in your own ETL pipelines
* How the Incident Manager is integrated
* Get notified with alerts when test cases fail
Watch the meeting recording here - https://www.youtube.com/watch?v=UbNOje0kf6E
Hand Rolled Applicative User ValidationCode KataPhilip Schwarz
Could you use a simple piece of Scala validation code (granted, a very simplistic one too!) that you can rewrite, now and again, to refresh your basic understanding of Applicative operators <*>, <*, *>?
The goal is not to write perfect code showcasing validation, but rather, to provide a small, rough-and ready exercise to reinforce your muscle-memory.
Despite its grandiose-sounding title, this deck consists of just three slides showing the Scala 3 code to be rewritten whenever the details of the operators begin to fade away.
The code is my rough and ready translation of a Haskell user-validation program found in a book called Finding Success (and Failure) in Haskell - Fall in love with applicative functors.
6. Attack Principle
• A typical exploit for arbitrary memory
overwrite of Privilege Escalation is described
as following:
1. Triggering the vulnerabilities for arbitrary
memory overwrite.
6
Normal function
Function pointer(addr1)
addr1
7. Attack Principle
2. Overwriting any kernel function pointer, to make
the pointer pointing to the payload address in user
space.
7
addr2
Normal function
Function pointer(addr2)
addr1
8. Attack Principle
3. Because the overwritten kernel function can be easily used
when serving user requests, an unauthorized attacker can get
the root access when the modified kernel function was used.
1. A Case Study : CVE-2013-2094
2. CVE-2013-2094 on x86-32
8
commit_creds(prepare_kernel_cred(NULL));
addr2
Normal function
Function pointer(addr2)
addr1
10. Proposed Solution :
Background knowledge
The interrupt handler address fields in an IDT
table entry are composed of low_offset field,
middle_offset field, and high_offset field.
10
11. ptmx_fops ?
pty (pseudo-teletype) is a pair of virtual character devices that
provide a bidirectional communication channel.
Pty consists of ptmx and pts.
For example:
USER TTY FROM LOGIN@ IDLE JCPU PCPU WHAT
xue pts/0 61.66.243.96 Mon16 17:24m 0.21s 0.04s ssh
bbs@ptt.cc
jeremy pts/1 36-231-101-220.d 10:27 0.00s 0.19s 0.01s w
11
13. ptmx_fops ?
• static struct file_operations ptmx_fops;
• file_operations
– defined in <include/linux/fs.h>
– contain a set of function pointers
• int (*fsync) (struct file *,loff_t,loff_t,int datasync);
– used by Kernel to access device driver’s functions
13
14. Ways to Find Addresses of Related
Kernel Data Structures
14
• Use the assembly instruction - sidt
• Search
– system.map
– /proc/kallsyms
15. Example
15
• grep ptmx_fops
/boot/System.map-$(uname -r)
• grep ptmx_fops /proc/kallsyms
• /* 56 is offset of fsync in
struct file_operations on x86-
32 */
int target = pmtx_ops + 56
//target is a kernel pointer
16. Hiding the Kernel Function Addresses
• #chmod o-r /boot/System.map-
3.0.42-0.7-default
• #sysctl -w
kernel.kptr_restrict=1
16
18. Proposed Solution
18
Important value : x Let 1 ≤ 𝐱 ≤ 5
If x=1, it is ok.
If x=6, it will fail.
Read-only
Page fault handler helps us to modify value (x=1)
19. Page fault error code bits
• The CPU pushes an error code on the stack before
firing a page fault exception.
• The error code must be analyzed by the exception
handler to determine how to handle the exception.
http://wiki.osdev.org/Paging
19
20. Meaning of “error code = 3”
– bit 0 == 0: no page found 1: protection fault
– bit 1 == 0: read access 1: write access
– bit 2 == 0: kernel-mode access
1: user-mode access
20
21. 21
Flowchart of Linux Page fault handler
Access to
kernel space
Access in
Kernel Mode
Noncontiguous
memory area
address
Address is a wrong
System call
parameter
Compatible
solution
22. Flowchart of Libra for IDT Table
22
Invalid address= read_cr2
Page fault handler executes no_context()
Does the invalid address fall
into the IDT table address
range?
Is the error_code equal to 3?
Continuing to execute no_context()
23. Flowchart of Libra for IDT Table
23
Is the new value in
Kernel space?
Close the Read-only attribute of IDT table
Modify the value
Open the Read-only attribute of IDT table
Update the program counter
Page fault handler returns to the
program that
needs to change IDT table
24. Flowchart of Libra for ptmx_fops
24
kallsyms_lookup_name("ptmx_fops");
Invalid address = read_cr2();
Is the invalid address
in the ptmx_fops
structure ?
Error code == 3 ? &&
Modified value >
0xffffffff80000000
Continuing to
execute
no_context()
25. 25
Close the read-only attribute of ptmx_fops
Modify the value
Open the read-only attribute of ptmx_fops
Update the program counter
Page fault returns to the
program that needs
change ptmx_fops
26. Characteristics of Libra
– It is a software solution that does not require any
extra hardware cost.
– It is a compromised solution between read-only
solutions and writable solutions to enhance
compatibility.
– It is a response-oriented security solution to avoid
spending CPU resource for monitoring.
26
27. Implementation
• On Ubuntu 13.04 (Kernel version: 3.10.15)
• On x86-64 architecture.
• Intel(R) Pentium(R) D CPU 3.00GHz, 1GB RAM
• [patch]x86: Use a read-only IDT alias on all
CPUs, by Kees Cook.
27
28. Evaluation:
Compatible Comparison
• Adding a system call for compatible comparison
idt_table2 = ((gate_desc *) idtr.address);
idt_table2[i].offset_low = 0xbeef;
idt_table2[i].offset_middle = 0xdead;
idt_table2[i].offset_high = 0x12345678;
• Original zeroth entry of IDT table :
*0xffffffffff57a008 (high_offset)= ffffffff
*0xffffffffff57a000 (low_offset)=4df0
*0xffffffffff57a004 (middle_offset)=816a
28
29. Evaluation:
Compatible Comparison
• Kees cook’s solution led to this result that is as follow:
• [131360.581351]
pte_write(8000000001e22161):0
• [131360.581355]
*0xffffffffff57a008
(offset_high)= ffffffff
• [131360.581358]
*0xffffffffff57a000
(offset_low)=4df0
• [131360.581360]
*0xffffffffff57a004
(offset_middle)=816a
29
30. Evaluation:
Compatible Comparison
• Libra solution led to this result that is as follow:
• [11679.083463] pte_write(8000000001e22161):0
• [11679.083466]
*0xffffffffff57a008
(offset_high)= ffffffff
• [11679.083469]
*0xffffffffff57a000
(offset_low)=beef
• [11679.083472]
*0xffffffffff57a004
(offset_middle)=dead
30
32. Evaluation:
Compatible Comparison
• [ 146.215132] flush: (null)
• [ 146.215132] release value:ffffffff813e5880
• [ 146.215132] fsync value: (null)
• [ 146.215132] aio_fsync value: (null)
• [ 146.215132] fasync value:ffffffff813e4430
• [ 146.215132] Jeremy, cr2 = ffffffffff577060
• [ 146.215132] Jeremy, page fault handler in kernel space is trigger!!!
• [ 146.215132] address 0xffffffffff577060 in CR2
• [ 146.215132] error_code = 3
• [ 146.215132] page fault run to line 1068
• [ 146.215132] page fault run to line 772
• [ 146.215132] Jeremy, *kallsyms_lookup_name(ptmx_fops) : ffffffffff577000 in fault.c
• [ 146.215132] The modified value at ptmx_fops : ffffffff8100beef
• [ 146.215132] address : ffffffffff577060
• [ 146.236822] flush:ffffffff8100beef
32
33. Evaluation:
Performance
• We do performance testing through perf that
is in Linux Kernel source code.
• “perf stat –r 100000 ./test_modify_idt 0”
• The performance of Kees Cook’s solution :
33
36. Evaluation:
Stability testing
• “The Linux™ Test Project (LTP)is a joint project
started by SGI™ and maintained by IBM®, that
has a goal to deliver test suites to the open
source community that validate the reliability,
robustness, and stability of Linux.” -
http://linux-test-project.github.io/
36
37. Evaluation:
Stability testing
• Total Tests: 1424
37
Original
Ubuntu
Libra
Ubuntu
Total Skipped Tests
117 138
Total Failures
59 80
Kernel Version
3.8.0-19-generic 3.10.15
42. Appendix A
• A Case Study : CVE-2013-2094
• A Case Study : CVE-2013-2094 on x86-32
42
43. Integer issues - Sign conversion issues
(CVE-2013-2094)
• int fd = syscall(__NR_perf_event_open,
,…)
• After executing perf_swevent_init, an attacker can
increase the content of any kernel address by 1.
after close(fd)
• After executing sw_perf_event_destroy, an attacker
can decrease the content of any user address by 1.
43
44. A case study : CVE-2013-2094
• /kernel/events/core.c
1.static int perf_swevent_init(
2. )
3.{
4. int event_id = event->attr.config;
5.
6. /* ... */
7.
8. if (event_id >= PERF_COUNT_SW_MAX)
9. return -ENOENT;
10. // PERF_COUNT_SW_MAX == 9
11. /* ... */
12.
13. atomic_inc(&perf_swevent_enabled[event_id]
);
44
47. Linux 64-bit memory layout
0000000000000000 - 00007fffffffffff (=47 bits) user space, different per mm
hole caused by [48:63] sign extension
ffff800000000000 - ffff80ffffffffff (=40 bits) guard hole
ffff880000000000 - ffffc7ffffffffff (=64 TB) direct mapping of all phys. memory
ffffc80000000000 - ffffc8ffffffffff (=40 bits) hole
ffffc90000000000 - ffffe8ffffffffff (=45 bits) vmalloc/ioremap space
ffffe90000000000 - ffffe9ffffffffff (=40 bits) hole
ffffea0000000000 - ffffeaffffffffff (=40 bits) virtual memory map (1TB)
... unused hole ...
ffffffff80000000 - ffffffffa0000000 (=512 MB) kernel text mapping, from phys 0
ffffffffa0000000 - ffffffffff5fffff (=1525 MB) module mapping space
ffffffffff600000 - ffffffffffdfffff (=8 MB) vsyscalls
ffffffffffe00000 - ffffffffffffffff (=2 MB) unused hole
• From (Documentation/x86/x86_64/mm.txt)
47
48. Integer issues - Sign conversion
issues(CVE-2013-2094)
In perf_swevent_init
$ cat /boot/System.map-2.6.32-358.el6.x86_64 |
grep perf_swevent_enabled
ffffffff81f360c0 B perf_swevent_enabled
P.S : signed extension because of int type
int event_id == 0xffffffff == -1 == 0xffffffffffffffff (x86-64)
perf_swevent_enabled[-1] ==
0xffffffffffffffff * 4 + 0xffffffff81f360c0 == 0xFFFFFFFF81F360BC
int event_id == 0xfffffffe == -2 -->
perf_swevent_enabled[-2] ==
0xfffffffffffffffe * 4 + 0xffffffff81f360c0 == 0xFFFFFFFF81F360B8
48
49. Integer issues - Sign conversion issues
In sw_perf_event_destroy()
Let's assume again event->attr.config == 0xffffffff -->
P.S. No signed extension
0xffffffff [32bit] => 0x00000000ffffffff [64bit]
• perf_swevent_enabled[-1] address in sw_perf_event_destroy:
– 0x00000000ffffffff * 4 + 0xffffffff81f360c0 == 0x00000003 81f360bc
• perf_swevent_enabled[-1] address in perf_swevent_init:
– 0xffffffffffffffff * 4 + 0xffffffff81f360c0 == 0xffffffff 81f360bc
Even though the addresses of perf_swevent_enabled[-1]
in sw_perf_event_destroy() and
perf_swevent_init() are different, the 6 MSBs of
their last 8 hex digitals are the same.
49
50. Perf_swevent_enabled[0]
0xffff ffff +1
Perf_swevent_enabled[-2]
Perf_swevent_enabled[-1]
user
Kernel
NOP + Shellcode
<-0x00000000 81000000
<- 0xffffffff 81734000
<-0x00000003 80000000
<- 0xffffffff 81734048 Hijacked
idt.addr
Offset1----------->
idt.addr + 0x48
0000….
-1 by destroy(-2)
<-0x00000003 90000000
-1 by destroy(-1)
1.Allocate red region & green region.
2. Trigger vulnerability
for measuring
perf_event_open(-1)
perf_event_open(-2)
<- 0xffffffff 817e1340
<-0x00000003 817e1338
<-0x00000003 817e133c
4.asm(“int $0x4”); root got!
<- 0xffffffff 80000008
sw_perf_event_destroy()
a.k.a destroy()
50
offset2
offset3(0x48)
offset1
+=1 by init(-2)
+=1 by init(-1)
idt table
perf_swevent_init()
a.k.a init()
Offst1----------->
0x0000 0000
3. perf_event_open(
- i + (((idt.addr&0xffff ffff)-0x80000000)/4)+16)
offset1 offset2÷4 offset3÷4
51. Modify the ptmx_fops
In exploit
// 56 is offset of fsync in struct file_operations
int target = pmtx_ops + 56;
int payload = -((perf_swevent_enabled - target)/4);
perf_swevent_enabled + (payload*4)
Trigger
int ptmx = open("/dev/ptmx", O_RDWR);
fsync(ptmx);
Source code : cve-2013-2094 port to x86-32
51
52. 4. Trigger vulnerability .
fsync(ptmx); root got!
4. Create a lot of child process.
Each process executes
Perf_event_open(Offset)
256 times.
Total 256 processes.( 256*256= 65536)
1. Allocate 0x10000 for payload
52
Payload
Null
3.Offset=
(Perf_swevent_enabled -
fsync)/4;
56
12
2. Find perf_swevent_enabled
& ptmx_fops by system.map
3…65536(0x10000)
而有vulnerability未必有相對應的exploit
這裡統計2009至2013的一些exploit
This statistics of Linux Kernel exploits is collected by exploit-db [21], packetstorm [22] from 2009 to 2013. Therefore, we know the exploits of Linux Kernel are in the majority.
Privilege Escalation 16
DoS 6 Buffer Overflow 1
Memory Disclosure 2
Total(弱點總數) 25
These kernel functions will create a new credit structure with root privileges, and then commit that credit structure to the current process. After returning to user-space, a root shell can be spawned.
4-bit Linux allows up to 128 TB of virtual address space for individual processes, and can address approximately 64 TB of physical memory
Ref. http://en.wikipedia.org/wiki/X86-64
http://code.woboq.org/linux/linux/arch/x86/include/asm/page_64_types.h.html