This document discusses a new tracer for reverse engineering based on record and replay. It aims to make reverse engineering more efficient by overcoming issues with existing instruction tracers like slow speed and large data generation. The proposed tracer is implemented as a virtual machine monitor (VMM) on x64 platforms using binary translation. By classifying elements as deterministic or nondeterministic inputs and interrupts, it can generate small trace logs and have overhead under 100% by leveraging record and replay techniques. It also discusses challenges in modeling x86 elements and implementing lazy evaluation for EFLAGS to further improve efficiency.
Intel Processor Trace (or Intel PT) is an processor extension for IA64 and IA32. The extension captures how a program got executed in machine-instruction level. All dynamic events, such as, branches, calls and interrupts, are recorded. This allows perfect reconstruction of previous execution by a trace analyzer.
This slide summarizes which data is generated out from this extension.
This document outlines requirements and frameworks for hardware/software co-design verification including using a simulator for co-simulation of hardware and software with a unit testing framework in Python, implementing a regression framework in Jenkins for running daily jobs and reporting results, and providing examples for hardware/software co-design verification using tools like MyHDL, as well as references Python and Perl projects.
Memory management in operating system | Paging | Virtual memoryShivam Mitra
This document discusses memory management techniques in operating systems. It begins by covering contiguous memory allocation approaches like fixed and variable partitioning. It then discusses non-contiguous techniques like paging and segmentation. Key concepts covered include logical vs physical addresses, page tables, translation lookaside buffers, demand paging, and virtual memory. The document provides examples and links to detailed video explanations of these important OS memory management topics.
Luca Abeni - Real-Time Virtual Machines with Linux and kvmlinuxlab_conf
This talk describes how to use some available technologies (the SCHED_DEADLINE scheduling policy, the PREEMPT_RT patchset, etc…) to execute real-time applications in kvm-based virtual machines while still providing performance guarantees to the virtualized applications.
In recent years, there has been a growing interest in supporting virtualized services even in embedded and real-time systems. However, executing real-time applications (characterized by temporal constraints) in virtual machines is not straightforward and presents some non-trivial challenges. This talk will describe how to use some technologies already available in the Linux kernel (the SCHED_DEADLINE scheduling policy, the PREEMPT_RT patchset, etc…) to execute real-time applications in kvm-based virtual machines while still providing performance guarantees to the virtualized applications. After presenting the problem (and providing a quick summary about real-time scheduling), it will be shown how to configure the host and guest kernels and the virtual machine, and how to schedule the VM threads in order to achieve predictable response times and to provide real-time guarantees.
Contiki os timer is an essential topic in contiki OS. This presentation describes the different types of timers and their API .
It is following the same explanation as contiki OS wiki.
Dynamic Instrumentation- OpenEBS Golang Meetup July 2017OpenEBS
The slides were presented by Jeffry Molanus who is the CTO of OpenEBS in Golang Meetup. OpenEBS is an open source cloud native storage. OpenEBS delivers storage and storage services to containerized environments. OpenEBS allows stateful workloads to be managed more like stateless containers. OpenEBS storage services include: per container (or pod) QoS SLAs, tiering and replica policies across AZs and environments, and predictable and scalable performance.Our vision is simple: let’s let storage and storage services for persistent workloads be so fully integrated into the environment and hence managed automatically that is almost disappears into the background as just yet another infrastructure service that works.
Dead Lock Analysis of spin_lock() in Linux Kernel (english)Sneeker Yeh
The document discusses spin locks and semaphores in the Linux kernel. It begins with an introduction to the difference between spin locks and semaphores. Spin locks cause threads to continuously loop trying to acquire the lock, while semaphores cause threads to sleep. An example is given of a deadlock scenario that can occur with spin locks. The document then discusses the concept of context in the kernel, including user context, interrupt context, and the control flow during procedure calls and interrupts. Log analysis and examples of double-acquire deadlocks involving spin locks are provided. The document concludes with recommendations for how to prevent deadlocks, such as using spin_lock_irqsave/restore and avoiding semaphores in interrupt context.
Intel Processor Trace (or Intel PT) is an processor extension for IA64 and IA32. The extension captures how a program got executed in machine-instruction level. All dynamic events, such as, branches, calls and interrupts, are recorded. This allows perfect reconstruction of previous execution by a trace analyzer.
This slide summarizes which data is generated out from this extension.
This document outlines requirements and frameworks for hardware/software co-design verification including using a simulator for co-simulation of hardware and software with a unit testing framework in Python, implementing a regression framework in Jenkins for running daily jobs and reporting results, and providing examples for hardware/software co-design verification using tools like MyHDL, as well as references Python and Perl projects.
Memory management in operating system | Paging | Virtual memoryShivam Mitra
This document discusses memory management techniques in operating systems. It begins by covering contiguous memory allocation approaches like fixed and variable partitioning. It then discusses non-contiguous techniques like paging and segmentation. Key concepts covered include logical vs physical addresses, page tables, translation lookaside buffers, demand paging, and virtual memory. The document provides examples and links to detailed video explanations of these important OS memory management topics.
Luca Abeni - Real-Time Virtual Machines with Linux and kvmlinuxlab_conf
This talk describes how to use some available technologies (the SCHED_DEADLINE scheduling policy, the PREEMPT_RT patchset, etc…) to execute real-time applications in kvm-based virtual machines while still providing performance guarantees to the virtualized applications.
In recent years, there has been a growing interest in supporting virtualized services even in embedded and real-time systems. However, executing real-time applications (characterized by temporal constraints) in virtual machines is not straightforward and presents some non-trivial challenges. This talk will describe how to use some technologies already available in the Linux kernel (the SCHED_DEADLINE scheduling policy, the PREEMPT_RT patchset, etc…) to execute real-time applications in kvm-based virtual machines while still providing performance guarantees to the virtualized applications. After presenting the problem (and providing a quick summary about real-time scheduling), it will be shown how to configure the host and guest kernels and the virtual machine, and how to schedule the VM threads in order to achieve predictable response times and to provide real-time guarantees.
Contiki os timer is an essential topic in contiki OS. This presentation describes the different types of timers and their API .
It is following the same explanation as contiki OS wiki.
Dynamic Instrumentation- OpenEBS Golang Meetup July 2017OpenEBS
The slides were presented by Jeffry Molanus who is the CTO of OpenEBS in Golang Meetup. OpenEBS is an open source cloud native storage. OpenEBS delivers storage and storage services to containerized environments. OpenEBS allows stateful workloads to be managed more like stateless containers. OpenEBS storage services include: per container (or pod) QoS SLAs, tiering and replica policies across AZs and environments, and predictable and scalable performance.Our vision is simple: let’s let storage and storage services for persistent workloads be so fully integrated into the environment and hence managed automatically that is almost disappears into the background as just yet another infrastructure service that works.
Dead Lock Analysis of spin_lock() in Linux Kernel (english)Sneeker Yeh
The document discusses spin locks and semaphores in the Linux kernel. It begins with an introduction to the difference between spin locks and semaphores. Spin locks cause threads to continuously loop trying to acquire the lock, while semaphores cause threads to sleep. An example is given of a deadlock scenario that can occur with spin locks. The document then discusses the concept of context in the kernel, including user context, interrupt context, and the control flow during procedure calls and interrupts. Log analysis and examples of double-acquire deadlocks involving spin locks are provided. The document concludes with recommendations for how to prevent deadlocks, such as using spin_lock_irqsave/restore and avoiding semaphores in interrupt context.
This document discusses operating systems and their core abstractions like uninterrupted computation, infinite memory, and simple I/O. It describes how operating systems provide these abstractions using mechanisms like context switching, virtual memory, and system calls. It also covers different types of operating systems and characteristics of embedded operating systems like real-time capabilities.
Week1 Electronic System-level ESL Design and SystemC Begin敬倫 林
This document provides an introduction and overview of electronic system level (ESL) design using SystemC. It begins with background on ESL design basics, system on chip design flows, and SystemC. It then provides 3 examples of SystemC code: a counter, traffic light, and simple bus. The counter example shows a basic module with clocked process. The traffic light demonstrates a finite state machine. The bus example illustrates an interface, master/slave devices, and memory mapped components communicating over a bus. Overall, the document serves as an introductory tutorial for designing and modeling electronic systems using the SystemC language.
The document discusses basic concepts related to exploit development such as vulnerabilities, exploits, fuzzers, memory management, assembly language, and stack-based overflows. It provides definitions and explanations of these key terms, how programs are laid out in memory, basic assembly instructions, register usage, and how to recognize common C language constructs when viewing assembly code.
Contiki introduction II-from what to howDingxin Xu
The document discusses the Contiki operating system framework, including how it uses processes and events for scheduling work, inter-process communication using event posting, and how modules like Rime and the TDMA MAC layer separate protocol logic from header construction and buffer management for flexible networking implementations. Key data structures include a process list and event queue that the kernel uses to schedule work across asynchronous processes.
Slides for a college course at City College San Francisco. Based on "The Shellcoder's Handbook: Discovering and Exploiting Security Holes ", by Chris Anley, John Heasman, Felix Lindner, Gerardo Richarte; ASIN: B004P5O38Q.
Instructor: Sam Bowne
Class website: https://samsclass.info/127/127_F18.shtml
Operating Systems 1 (8/12) - ConcurrencyPeter Tröger
This document provides an introduction to concurrency and parallel programming concepts using POSIX threads. It defines key concurrency terms like race conditions, deadlocks, livelocks and starvation. It discusses how to protect critical sections and shared resources using semaphores and mutexes. Examples like the dining philosophers problem illustrate how deadlocks can occur. The document also outlines POSIX thread functions for thread management, synchronization with mutexes and condition variables, and reading/writing locks and barriers.
CNIT 127 Lecture 7: Intro to 64-Bit Assembler (not in book)Sam Bowne
This document discusses 64-bit assembly programming. It covers 64-bit registers used in 64-bit assembly like RIP and RSP. It also discusses limitations of 64-bit addressing in different versions of Windows and how the operating system separates memory used by the OS and user programs. Common opcodes, syscalls, and using sections like .data and .text are described. Examples shown include simple programs, reading and writing data, and encoding text using Caesar cipher and XOR encryption.
Slides for a college course at City College San Francisco. Based on "The Shellcoder's Handbook: Discovering and Exploiting Security Holes ", by Chris Anley, John Heasman, Felix Lindner, Gerardo Richarte; ASIN: B004P5O38Q.
Instructor: Sam Bowne
Class website: https://samsclass.info/127/127_F18.shtml
Slides for a college course at City College San Francisco. Based on "The Shellcoder's Handbook: Discovering and Exploiting Security Holes ", by Chris Anley, John Heasman, Felix Lindner, Gerardo Richarte; ASIN: B004P5O38Q.
Instructor: Sam Bowne
Class website: https://samsclass.info/127/127_S17.shtml
The document provides a summary of 15 lectures on operating systems topics:
1. The first few lectures introduce concepts like computer organization, boot process, need for an operating system, and basic OS definitions.
2. Later lectures cover additional OS concepts like multiprogramming, multitasking, multiprocessing, memory protection, and interrupts.
3. The document discusses process management topics like process states, context switching, scheduling, and inter-process communication using pipes.
Video and slides synchronized, mp3 and slide download available at URL http://bit.ly/2J5O3XV.
Howard Chu gives tips and techniques for writing highly efficient and scalable software drawn from decades of experience. The guiding principle is a simple one, and can be applied nearly everywhere. The talk is focused on programming in C. Filmed at qconlondon.com.
Howard Chu founded Symas Corp. with 5 other partners and serves as its CTO. His work has spanned a wide range of computing topics, including most of the GNU utilities, networking protocols and tools, kernel and filesystem drivers, and focused on maximizing the useful work from a system. His current focus is database oriented, covering LDAP, LMDB, and other non-relational database technologies.
Slides for a college course at City College San Francisco. Based on "The Shellcoder's Handbook: Discovering and Exploiting Security Holes ", by Chris Anley, John Heasman, Felix Lindner, Gerardo Richarte; ASIN: B004P5O38Q.
Instructor: Sam Bowne
Class website: https://samsclass.info/127/127_S18.shtml
CNIT 127: Ch 4: Introduction to format string bugsSam Bowne
Slides for a college course at City College San Francisco. Based on "The Shellcoder's Handbook: Discovering and Exploiting Security Holes ", by Chris Anley, John Heasman, Felix Lindner, Gerardo Richarte; ASIN: B004P5O38Q.
Instructor: Sam Bowne
Class website: https://samsclass.info/127/127_F19.shtml
CNIT 126: 10: Kernel Debugging with WinDbgSam Bowne
Slides for a college course at City College San Francisco. Based on "Practical Malware Analysis: The Hands-On Guide to Dissecting Malicious Software", by Michael Sikorski and Andrew Honig; ISBN-10: 1593272901.
Instructor: Sam Bowne
Class website: https://samsclass.info/126/126_F18.shtml
The document discusses replication techniques used in the online game Warframe. It covers several key aspects of the replication system including: using prioritized state replication to send updated object properties and events to clients; maintaining high and low frequency object lists to optimize updates; using prediction techniques for actions like ball throwing animations; implementing their own congestion control system tailored for games; and supporting dedicated servers by separating game and server code.
The timing behavior of the OS must be predictable - services of the OS: Upper bound on the execution time!
2. OS must manage the timing and scheduling
OS possibly has to be aware of task deadlines;
(unless scheduling is done off-line).
3. The OS must be fast
Slides for a college course at City College San Francisco. Based on "The Shellcoder's Handbook: Discovering and Exploiting Security Holes ", by Chris Anley, John Heasman, Felix Lindner, Gerardo Richarte; ASIN: B004P5O38Q.
Instructor: Sam Bowne
Class website: https://samsclass.info/127/127_S17.shtml
Steelcon 2014 - Process Injection with Pythoninfodox
This is the slides to accompany the talk given by Darren Martyn at the Steelcon security conference in July 2014 about process injection using python.
Covers using Python to manipulate processes by injecting code on x86, x86_64, and ARMv7l platforms, and writing a stager that automatically detects what platform it is running on and intelligently decides which shellcode to inject, and via which method.
The Proof of Concept code is available at https://github.com/infodox/steelcon-python-injection
CNIT 126 7: Analyzing Malicious Windows ProgramsSam Bowne
Slides for a college course at City College San Francisco. Based on "Practical Malware Analysis: The Hands-On Guide to Dissecting Malicious Software", by Michael Sikorski and Andrew Honig; ISBN-10: 1593272901.
Instructor: Sam Bowne
Class website: https://samsclass.info/126/126_S17.shtml
CNIT 127 Ch 4: Introduction to format string bugs (rev. 2-9-17)Sam Bowne
Slides for a college course at City College San Francisco. Based on "The Shellcoder's Handbook: Discovering and Exploiting Security Holes ", by Chris Anley, John Heasman, Felix Lindner, Gerardo Richarte; ASIN: B004P5O38Q.
Instructor: Sam Bowne
Class website: https://samsclass.info/127/127_S17.shtml
Advanced debugging on ARM Cortex devices such as STM32, Kinetis, LPC, etc.Atollic
Learn more on advanced debugging of ARM Cortex devices, including how to analyse crashed system after a hard fault exception, SWV real-time event and data tracing, analysing execution history using ETM instruction tracing, dual-core debugging, kernel aware RTOS debugging, and more. Also, learn how to introducing bugs in the first place with static source code analysis (such as MISRA-C), code complexity analysis, and source code review meetings (peer review)
Using the big guns: Advanced OS performance tools for troubleshooting databas...Nikolay Savvinov
Using OS performance tools and basic alternatives to troubleshoot production database issues
The document discusses using Linux performance tools like pidstat, ps, and tracing tools like perf, systemtap, and dtrace to troubleshoot complex database problems that may involve issues at the operating system, hardware, or network level. It provides examples of using these tools to diagnose specific issues like memory fragmentation, I/O problems, and network congestion and presents a methodology around reproducing issues, analyzing tool output, identifying root causes, and developing solutions.
This document discusses operating systems and their core abstractions like uninterrupted computation, infinite memory, and simple I/O. It describes how operating systems provide these abstractions using mechanisms like context switching, virtual memory, and system calls. It also covers different types of operating systems and characteristics of embedded operating systems like real-time capabilities.
Week1 Electronic System-level ESL Design and SystemC Begin敬倫 林
This document provides an introduction and overview of electronic system level (ESL) design using SystemC. It begins with background on ESL design basics, system on chip design flows, and SystemC. It then provides 3 examples of SystemC code: a counter, traffic light, and simple bus. The counter example shows a basic module with clocked process. The traffic light demonstrates a finite state machine. The bus example illustrates an interface, master/slave devices, and memory mapped components communicating over a bus. Overall, the document serves as an introductory tutorial for designing and modeling electronic systems using the SystemC language.
The document discusses basic concepts related to exploit development such as vulnerabilities, exploits, fuzzers, memory management, assembly language, and stack-based overflows. It provides definitions and explanations of these key terms, how programs are laid out in memory, basic assembly instructions, register usage, and how to recognize common C language constructs when viewing assembly code.
Contiki introduction II-from what to howDingxin Xu
The document discusses the Contiki operating system framework, including how it uses processes and events for scheduling work, inter-process communication using event posting, and how modules like Rime and the TDMA MAC layer separate protocol logic from header construction and buffer management for flexible networking implementations. Key data structures include a process list and event queue that the kernel uses to schedule work across asynchronous processes.
Slides for a college course at City College San Francisco. Based on "The Shellcoder's Handbook: Discovering and Exploiting Security Holes ", by Chris Anley, John Heasman, Felix Lindner, Gerardo Richarte; ASIN: B004P5O38Q.
Instructor: Sam Bowne
Class website: https://samsclass.info/127/127_F18.shtml
Operating Systems 1 (8/12) - ConcurrencyPeter Tröger
This document provides an introduction to concurrency and parallel programming concepts using POSIX threads. It defines key concurrency terms like race conditions, deadlocks, livelocks and starvation. It discusses how to protect critical sections and shared resources using semaphores and mutexes. Examples like the dining philosophers problem illustrate how deadlocks can occur. The document also outlines POSIX thread functions for thread management, synchronization with mutexes and condition variables, and reading/writing locks and barriers.
CNIT 127 Lecture 7: Intro to 64-Bit Assembler (not in book)Sam Bowne
This document discusses 64-bit assembly programming. It covers 64-bit registers used in 64-bit assembly like RIP and RSP. It also discusses limitations of 64-bit addressing in different versions of Windows and how the operating system separates memory used by the OS and user programs. Common opcodes, syscalls, and using sections like .data and .text are described. Examples shown include simple programs, reading and writing data, and encoding text using Caesar cipher and XOR encryption.
Slides for a college course at City College San Francisco. Based on "The Shellcoder's Handbook: Discovering and Exploiting Security Holes ", by Chris Anley, John Heasman, Felix Lindner, Gerardo Richarte; ASIN: B004P5O38Q.
Instructor: Sam Bowne
Class website: https://samsclass.info/127/127_F18.shtml
Slides for a college course at City College San Francisco. Based on "The Shellcoder's Handbook: Discovering and Exploiting Security Holes ", by Chris Anley, John Heasman, Felix Lindner, Gerardo Richarte; ASIN: B004P5O38Q.
Instructor: Sam Bowne
Class website: https://samsclass.info/127/127_S17.shtml
The document provides a summary of 15 lectures on operating systems topics:
1. The first few lectures introduce concepts like computer organization, boot process, need for an operating system, and basic OS definitions.
2. Later lectures cover additional OS concepts like multiprogramming, multitasking, multiprocessing, memory protection, and interrupts.
3. The document discusses process management topics like process states, context switching, scheduling, and inter-process communication using pipes.
Video and slides synchronized, mp3 and slide download available at URL http://bit.ly/2J5O3XV.
Howard Chu gives tips and techniques for writing highly efficient and scalable software drawn from decades of experience. The guiding principle is a simple one, and can be applied nearly everywhere. The talk is focused on programming in C. Filmed at qconlondon.com.
Howard Chu founded Symas Corp. with 5 other partners and serves as its CTO. His work has spanned a wide range of computing topics, including most of the GNU utilities, networking protocols and tools, kernel and filesystem drivers, and focused on maximizing the useful work from a system. His current focus is database oriented, covering LDAP, LMDB, and other non-relational database technologies.
Slides for a college course at City College San Francisco. Based on "The Shellcoder's Handbook: Discovering and Exploiting Security Holes ", by Chris Anley, John Heasman, Felix Lindner, Gerardo Richarte; ASIN: B004P5O38Q.
Instructor: Sam Bowne
Class website: https://samsclass.info/127/127_S18.shtml
CNIT 127: Ch 4: Introduction to format string bugsSam Bowne
Slides for a college course at City College San Francisco. Based on "The Shellcoder's Handbook: Discovering and Exploiting Security Holes ", by Chris Anley, John Heasman, Felix Lindner, Gerardo Richarte; ASIN: B004P5O38Q.
Instructor: Sam Bowne
Class website: https://samsclass.info/127/127_F19.shtml
CNIT 126: 10: Kernel Debugging with WinDbgSam Bowne
Slides for a college course at City College San Francisco. Based on "Practical Malware Analysis: The Hands-On Guide to Dissecting Malicious Software", by Michael Sikorski and Andrew Honig; ISBN-10: 1593272901.
Instructor: Sam Bowne
Class website: https://samsclass.info/126/126_F18.shtml
The document discusses replication techniques used in the online game Warframe. It covers several key aspects of the replication system including: using prioritized state replication to send updated object properties and events to clients; maintaining high and low frequency object lists to optimize updates; using prediction techniques for actions like ball throwing animations; implementing their own congestion control system tailored for games; and supporting dedicated servers by separating game and server code.
The timing behavior of the OS must be predictable - services of the OS: Upper bound on the execution time!
2. OS must manage the timing and scheduling
OS possibly has to be aware of task deadlines;
(unless scheduling is done off-line).
3. The OS must be fast
Slides for a college course at City College San Francisco. Based on "The Shellcoder's Handbook: Discovering and Exploiting Security Holes ", by Chris Anley, John Heasman, Felix Lindner, Gerardo Richarte; ASIN: B004P5O38Q.
Instructor: Sam Bowne
Class website: https://samsclass.info/127/127_S17.shtml
Steelcon 2014 - Process Injection with Pythoninfodox
This is the slides to accompany the talk given by Darren Martyn at the Steelcon security conference in July 2014 about process injection using python.
Covers using Python to manipulate processes by injecting code on x86, x86_64, and ARMv7l platforms, and writing a stager that automatically detects what platform it is running on and intelligently decides which shellcode to inject, and via which method.
The Proof of Concept code is available at https://github.com/infodox/steelcon-python-injection
CNIT 126 7: Analyzing Malicious Windows ProgramsSam Bowne
Slides for a college course at City College San Francisco. Based on "Practical Malware Analysis: The Hands-On Guide to Dissecting Malicious Software", by Michael Sikorski and Andrew Honig; ISBN-10: 1593272901.
Instructor: Sam Bowne
Class website: https://samsclass.info/126/126_S17.shtml
CNIT 127 Ch 4: Introduction to format string bugs (rev. 2-9-17)Sam Bowne
Slides for a college course at City College San Francisco. Based on "The Shellcoder's Handbook: Discovering and Exploiting Security Holes ", by Chris Anley, John Heasman, Felix Lindner, Gerardo Richarte; ASIN: B004P5O38Q.
Instructor: Sam Bowne
Class website: https://samsclass.info/127/127_S17.shtml
Advanced debugging on ARM Cortex devices such as STM32, Kinetis, LPC, etc.Atollic
Learn more on advanced debugging of ARM Cortex devices, including how to analyse crashed system after a hard fault exception, SWV real-time event and data tracing, analysing execution history using ETM instruction tracing, dual-core debugging, kernel aware RTOS debugging, and more. Also, learn how to introducing bugs in the first place with static source code analysis (such as MISRA-C), code complexity analysis, and source code review meetings (peer review)
Using the big guns: Advanced OS performance tools for troubleshooting databas...Nikolay Savvinov
Using OS performance tools and basic alternatives to troubleshoot production database issues
The document discusses using Linux performance tools like pidstat, ps, and tracing tools like perf, systemtap, and dtrace to troubleshoot complex database problems that may involve issues at the operating system, hardware, or network level. It provides examples of using these tools to diagnose specific issues like memory fragmentation, I/O problems, and network congestion and presents a methodology around reproducing issues, analyzing tool output, identifying root causes, and developing solutions.
Unity - Internals: memory and performanceCodemotion
by Marco Trivellato - In this presentation we will provide in-depth knowledge about the Unity runtime. The first part will focus on memory and how to deal with fragmentation and garbage collection. The second part will cover implementation details and their memory vs cycles tradeoffs in both Unity4 and the upcoming Unity5.
Kernel Recipes 2015: Solving the Linux storage scalability bottlenecksAnne Nicolas
lash devices introduced a sudden shift in the performance profile of direct attached storage. With IOPS rates orders of magnitude higher than rotating storage, it became clear that Linux needed a re-design of its storage stack to properly support and get the most out of these new devices.
This talk will detail the architecture of blk-mq, the redesign of the core of the Linux storage stack, and the later set of changes made to adapt the SCSI stack to this new queuing model. Early results of running Facebook infrastructure production workloads on top of the new stack will also be shared.
Jense Axboe, Facebook
The document describes the process of implementing SMP support for OpenBSD on a SGI Octane 2 machine. Key steps included restructuring per-processor data, implementing locking primitives, handling hardware aspects like spinning up secondary processors, and debugging challenges like detecting deadlocks. Debugging was made difficult by timing issues but was aided by tools like JTAG, DDB, printfs, and modifying locks to record stuck locations. Interrupts could block inter-processor communication so the clock handler was modified to re-enable interrupts during locking.
This document discusses the Java Virtual Machine (JVM) memory model and just-in-time (JIT) compilation. It explains that the JVM uses dynamic compilation via a JIT to optimize bytecode at runtime. The JIT profiles code and performs optimizations like inlining, loop unrolling, and escape analysis. It also discusses how the JVM memory model allows for instruction reordering and caching but ensures sequential consistency through happens-before rules and volatile variables. The document provides examples of anomalies that can occur without synchronization and how tools like synchronized, locks, and atomic operations can be used to prevent issues.
The document discusses principles of computer performance and optimization. It explains that according to Amdahl's law, the most important principle is to optimize for common cases over rare cases. Amdahl's law defines speedup from enhancements based on the fraction of time improved and the improvement achieved. The document also discusses factors that affect CPU time like clock rate, instructions per cycle, and instruction count. It covers principles of locality, characteristics of instruction set architectures, and addressing modes in memory.
The document provides an overview of a presentation on kernel auditing research, including:
- Three parts to the presentation covering kernel auditing research, exploitable bugs found, and kernel exploitation.
- Audits were conducted on several open source kernels, finding over 100 vulnerabilities across them.
- A sample of exploitable bugs is then presented from the audited kernels to provide evidence that kernels are not bug-free and vulnerabilities can be relatively simple to find and exploit.
Larson Macaulay apt_malware_past_present_future_out_of_band_techniquesScott K. Larson
The document discusses advanced persistent threats and techniques used by attackers both historically and currently. It covers topics like out-of-band analysis techniques to gain "perfect knowledge" of attackers through reverse engineering, using telemetry and signatures to detect malware, and challenges with scanning techniques due to polymorphism and evasion methods used by attackers.
Performance Benchmarking: Tips, Tricks, and Lessons LearnedTim Callaghan
Presentation covering 25 years worth of lessons learned while performance benchmarking applications and databases. Presented at Percona Live London in November 2014.
Infrastructure as Code (IaC), how to choose the right tool, terraform vs. CDK vs. Pulumi, best practices, Principles, and a lot of the underlying principles are described in this crash course.
Linux Performance Analysis: New Tools and Old SecretsBrendan Gregg
Talk for USENIX/LISA2014 by Brendan Gregg, Netflix. At Netflix performance is crucial, and we use many high to low level tools to analyze our stack in different ways. In this talk, I will introduce new system observability tools we are using at Netflix, which I've ported from my DTraceToolkit, and are intended for our Linux 3.2 cloud instances. These show that Linux can do more than you may think, by using creative hacks and workarounds with existing kernel features (ftrace, perf_events). While these are solving issues on current versions of Linux, I'll also briefly summarize the future in this space: eBPF, ktap, SystemTap, sysdig, etc.
This document discusses Linux performance analysis tools. It introduces tpoint, a tool for tracing Linux tracepoints. Some example one-liners are provided that demonstrate how to use tpoint to trace disk I/O and see the tasks and processes performing I/O. The document also summarizes ftrace, a Linux kernel tracing tool that can be used to analyze performance issues.
With multicore systems becoming the norm, every programmer is being forced to deal with multi-CPU memory atomicity bugs: data races. Data-race bugs are some of the hardest bugs to find and fix, sometimes taking weeks on end, even for experts. There are very few tools to help here (mostly just academic implementations). The authors of this presentation are at the forefront of multicore Java technology-based systems and daily have to debug data races. They have a lot of hard-won experiences with finding and fixing such bugs, and they share them with you in this presentation.
CSW2017Richard Johnson_harnessing intel processor trace on windows for vulner...CanSecWest
This document discusses using Intel Processor Trace (Intel PT) for hardware-based tracing on Windows. It provides an overview of Intel PT capabilities and how it can be used for fuzzing and vulnerability discovery. Specifically, it describes the development of WinAFL IntelPT, which integrates Intel PT tracing with the WinAFL evolutionary fuzzer to enable high-performance, hardware-driven fuzzing on Windows.
The document discusses efficient techniques for detecting shellcode inline. It describes the structure of shellcode and challenges in detecting it. It introduces libscizzle, which uses efficient emulation to identify possible shellcode execution sequences and verifies candidates using sandboxed hardware execution. Libscizzle scans data at gigabit speeds with no false positives and no known false negatives, representing about a 1000x speed improvement over previous tools like libemu.
The JVM memory model describes how threads in the Java eco-system interact through memory. While the memory model impact on developing for the JVM may not be obvious, it is the cause for certain number of "anomalies" that are, well, by design.
In this presentation we will explore the aspects of the memory model, including things like reordering of instructions, volatile members, monitors, atomics and JIT.
Multicore processors are becoming prevalent due to the limitations of increasing single core clock speeds. This presents challenges for software to effectively utilize multiple cores. Functional programming is one option that avoids shared state and parallel access issues, but requires a significant mindset shift. Refactoring existing code using tools is another option to incrementally introduce parallelism. Hybrid approaches combining paradigms may also help transition. Key application areas currently benefiting include servers, scientific computing, and packet processing. However, significant existing code is not easily parallelized and performance gains have yet to be fully realized.
JavaOne 2010: Top 10 Causes for Java Issues in Production and What to Do When...srisatish ambati
Top 10 Causes for Java Issues in Production and What to Do When Things Go Wrong
JavaOne 2010.
Abstract: It's Friday evening and you hear the first rumble . . . one java node has become slightly unresponsive. You lookup the process, get a thread dump, and for good measure restart it at 8 p.m. Saturday afternoon is when you realize that other nodes have caught the flu and you get the ugly call from the customer. In a matter of hours, you're on that conference bridge with support groups of different packages and Java vendors and one of your uberarchitects. Yes, production instances are up and down, and restarting like there's no tomorrow. Here's an accumulated compendium of the op 10 things that can cause Java production heartburn and what to do when your Java production is on fire. And yes, please have your tools belt on.
Speaker(s):
Cliff Click, Azul Systems, Distinguished Engineer
SriSatish Ambati, Azul Systems, Performance Engineer
Similar to A New Tracer for Reverse Engineering - PacSec 2010 (20)
Stagefright affects over 90% of Android devices and will cause one of the largest security update. However, many news reports in Japan were flawed and caused confusions.
Stealthy Rootkit : How bad guy fools live memory forensics? - PacSec 2009Tsukasa Oi
This document discusses how rootkits can fool live memory forensics through shadow paging. It explains that rootkits can take control of the page table to redirect memory mappings and hide malicious processes and data from forensic software. It acknowledges weaknesses in current forensic tools and outlines approaches like destroying the rootkit's context or acquiring memory directly to obtain correct contents. The conclusion is that live memory forensics is flawed if the system may be infected, and local detection tools should be used instead.
Creating Secure VM (Comarison between Intel and AMD, and one more thing...) -...Tsukasa Oi
This document discusses creating secure virtual machines through techniques like setting breakpoints using debug registers or page table modifications. It compares Intel and AMD virtualization technologies, specifically how AMD-V can intercept the IRET instruction while both support using debug registers or page tables for breakpoints. Full virtualization of x86 on x86_64 architectures is also discussed as a way to do instruction tracing for purposes like malware analysis and reverse engineering. Limitations include supporting x86 segmentation and needing very fast storage for tracing large amounts of data.
Lack of System Registers and two simple anti-forensic attacks - AVTokyo 2009Tsukasa Oi
The document discusses two simple anti-forensics attacks that can fool live memory forensics. The first attack modifies the CR3 register to point to a malicious page table instead of the real one. The second attack modifies the IDTR and IA32_SYSENTER_EIP registers to hook system calls. Most memory acquisition tools do not collect system register values, making these attacks difficult to detect. To prevent these attacks, forensic tools need to acquire system register values and check that the physical and logical memory layouts match the register values.
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAUpanagenda
Webinar Recording: https://www.panagenda.com/webinars/hcl-notes-und-domino-lizenzkostenreduzierung-in-der-welt-von-dlau/
DLAU und die Lizenzen nach dem CCB- und CCX-Modell sind für viele in der HCL-Community seit letztem Jahr ein heißes Thema. Als Notes- oder Domino-Kunde haben Sie vielleicht mit unerwartet hohen Benutzerzahlen und Lizenzgebühren zu kämpfen. Sie fragen sich vielleicht, wie diese neue Art der Lizenzierung funktioniert und welchen Nutzen sie Ihnen bringt. Vor allem wollen Sie sicherlich Ihr Budget einhalten und Kosten sparen, wo immer möglich. Das verstehen wir und wir möchten Ihnen dabei helfen!
Wir erklären Ihnen, wie Sie häufige Konfigurationsprobleme lösen können, die dazu führen können, dass mehr Benutzer gezählt werden als nötig, und wie Sie überflüssige oder ungenutzte Konten identifizieren und entfernen können, um Geld zu sparen. Es gibt auch einige Ansätze, die zu unnötigen Ausgaben führen können, z. B. wenn ein Personendokument anstelle eines Mail-Ins für geteilte Mailboxen verwendet wird. Wir zeigen Ihnen solche Fälle und deren Lösungen. Und natürlich erklären wir Ihnen das neue Lizenzmodell.
Nehmen Sie an diesem Webinar teil, bei dem HCL-Ambassador Marc Thomas und Gastredner Franz Walder Ihnen diese neue Welt näherbringen. Es vermittelt Ihnen die Tools und das Know-how, um den Überblick zu bewahren. Sie werden in der Lage sein, Ihre Kosten durch eine optimierte Domino-Konfiguration zu reduzieren und auch in Zukunft gering zu halten.
Diese Themen werden behandelt
- Reduzierung der Lizenzkosten durch Auffinden und Beheben von Fehlkonfigurationen und überflüssigen Konten
- Wie funktionieren CCB- und CCX-Lizenzen wirklich?
- Verstehen des DLAU-Tools und wie man es am besten nutzt
- Tipps für häufige Problembereiche, wie z. B. Team-Postfächer, Funktions-/Testbenutzer usw.
- Praxisbeispiele und Best Practices zum sofortigen Umsetzen
Unlocking Productivity: Leveraging the Potential of Copilot in Microsoft 365, a presentation by Christoforos Vlachos, Senior Solutions Manager – Modern Workplace, Uni Systems
UiPath Test Automation using UiPath Test Suite series, part 6DianaGray10
Welcome to UiPath Test Automation using UiPath Test Suite series part 6. In this session, we will cover Test Automation with generative AI and Open AI.
UiPath Test Automation with generative AI and Open AI webinar offers an in-depth exploration of leveraging cutting-edge technologies for test automation within the UiPath platform. Attendees will delve into the integration of generative AI, a test automation solution, with Open AI advanced natural language processing capabilities.
Throughout the session, participants will discover how this synergy empowers testers to automate repetitive tasks, enhance testing accuracy, and expedite the software testing life cycle. Topics covered include the seamless integration process, practical use cases, and the benefits of harnessing AI-driven automation for UiPath testing initiatives. By attending this webinar, testers, and automation professionals can gain valuable insights into harnessing the power of AI to optimize their test automation workflows within the UiPath ecosystem, ultimately driving efficiency and quality in software development processes.
What will you get from this session?
1. Insights into integrating generative AI.
2. Understanding how this integration enhances test automation within the UiPath platform
3. Practical demonstrations
4. Exploration of real-world use cases illustrating the benefits of AI-driven test automation for UiPath
Topics covered:
What is generative AI
Test Automation with generative AI and Open AI.
UiPath integration with generative AI
Speaker:
Deepak Rai, Automation Practice Lead, Boundaryless Group and UiPath MVP
Maruthi Prithivirajan, Head of ASEAN & IN Solution Architecture, Neo4j
Get an inside look at the latest Neo4j innovations that enable relationship-driven intelligence at scale. Learn more about the newest cloud integrations and product enhancements that make Neo4j an essential choice for developers building apps with interconnected data and generative AI.
Removing Uninteresting Bytes in Software FuzzingAftab Hussain
Imagine a world where software fuzzing, the process of mutating bytes in test seeds to uncover hidden and erroneous program behaviors, becomes faster and more effective. A lot depends on the initial seeds, which can significantly dictate the trajectory of a fuzzing campaign, particularly in terms of how long it takes to uncover interesting behaviour in your code. We introduce DIAR, a technique designed to speedup fuzzing campaigns by pinpointing and eliminating those uninteresting bytes in the seeds. Picture this: instead of wasting valuable resources on meaningless mutations in large, bloated seeds, DIAR removes the unnecessary bytes, streamlining the entire process.
In this work, we equipped AFL, a popular fuzzer, with DIAR and examined two critical Linux libraries -- Libxml's xmllint, a tool for parsing xml documents, and Binutil's readelf, an essential debugging and security analysis command-line tool used to display detailed information about ELF (Executable and Linkable Format). Our preliminary results show that AFL+DIAR does not only discover new paths more quickly but also achieves higher coverage overall. This work thus showcases how starting with lean and optimized seeds can lead to faster, more comprehensive fuzzing campaigns -- and DIAR helps you find such seeds.
- These are slides of the talk given at IEEE International Conference on Software Testing Verification and Validation Workshop, ICSTW 2022.
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...SOFTTECHHUB
The choice of an operating system plays a pivotal role in shaping our computing experience. For decades, Microsoft's Windows has dominated the market, offering a familiar and widely adopted platform for personal and professional use. However, as technological advancements continue to push the boundaries of innovation, alternative operating systems have emerged, challenging the status quo and offering users a fresh perspective on computing.
One such alternative that has garnered significant attention and acclaim is Nitrux Linux 3.5.0, a sleek, powerful, and user-friendly Linux distribution that promises to redefine the way we interact with our devices. With its focus on performance, security, and customization, Nitrux Linux presents a compelling case for those seeking to break free from the constraints of proprietary software and embrace the freedom and flexibility of open-source computing.
Building Production Ready Search Pipelines with Spark and MilvusZilliz
Spark is the widely used ETL tool for processing, indexing and ingesting data to serving stack for search. Milvus is the production-ready open-source vector database. In this talk we will show how to use Spark to process unstructured data to extract vector representations, and push the vectors to Milvus vector database for search serving.
GraphSummit Singapore | The Art of the Possible with Graph - Q2 2024Neo4j
Neha Bajwa, Vice President of Product Marketing, Neo4j
Join us as we explore breakthrough innovations enabled by interconnected data and AI. Discover firsthand how organizations use relationships in data to uncover contextual insights and solve our most pressing challenges – from optimizing supply chains, detecting fraud, and improving customer experiences to accelerating drug discoveries.
Driving Business Innovation: Latest Generative AI Advancements & Success StorySafe Software
Are you ready to revolutionize how you handle data? Join us for a webinar where we’ll bring you up to speed with the latest advancements in Generative AI technology and discover how leveraging FME with tools from giants like Google Gemini, Amazon, and Microsoft OpenAI can supercharge your workflow efficiency.
During the hour, we’ll take you through:
Guest Speaker Segment with Hannah Barrington: Dive into the world of dynamic real estate marketing with Hannah, the Marketing Manager at Workspace Group. Hear firsthand how their team generates engaging descriptions for thousands of office units by integrating diverse data sources—from PDF floorplans to web pages—using FME transformers, like OpenAIVisionConnector and AnthropicVisionConnector. This use case will show you how GenAI can streamline content creation for marketing across the board.
Ollama Use Case: Learn how Scenario Specialist Dmitri Bagh has utilized Ollama within FME to input data, create custom models, and enhance security protocols. This segment will include demos to illustrate the full capabilities of FME in AI-driven processes.
Custom AI Models: Discover how to leverage FME to build personalized AI models using your data. Whether it’s populating a model with local data for added security or integrating public AI tools, find out how FME facilitates a versatile and secure approach to AI.
We’ll wrap up with a live Q&A session where you can engage with our experts on your specific use cases, and learn more about optimizing your data workflows with AI.
This webinar is ideal for professionals seeking to harness the power of AI within their data management systems while ensuring high levels of customization and security. Whether you're a novice or an expert, gain actionable insights and strategies to elevate your data processes. Join us to see how FME and AI can revolutionize how you work with data!
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdfMalak Abu Hammad
Discover how MongoDB Atlas and vector search technology can revolutionize your application's search capabilities. This comprehensive presentation covers:
* What is Vector Search?
* Importance and benefits of vector search
* Practical use cases across various industries
* Step-by-step implementation guide
* Live demos with code snippets
* Enhancing LLM capabilities with vector search
* Best practices and optimization strategies
Perfect for developers, AI enthusiasts, and tech leaders. Learn how to leverage MongoDB Atlas to deliver highly relevant, context-aware search results, transforming your data retrieval process. Stay ahead in tech innovation and maximize the potential of your applications.
#MongoDB #VectorSearch #AI #SemanticSearch #TechInnovation #DataScience #LLM #MachineLearning #SearchTechnology
In his public lecture, Christian Timmerer provides insights into the fascinating history of video streaming, starting from its humble beginnings before YouTube to the groundbreaking technologies that now dominate platforms like Netflix and ORF ON. Timmerer also presents provocative contributions of his own that have significantly influenced the industry. He concludes by looking at future challenges and invites the audience to join in a discussion.
Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!SOFTTECHHUB
As the digital landscape continually evolves, operating systems play a critical role in shaping user experiences and productivity. The launch of Nitrux Linux 3.5.0 marks a significant milestone, offering a robust alternative to traditional systems such as Windows 11. This article delves into the essence of Nitrux Linux 3.5.0, exploring its unique features, advantages, and how it stands as a compelling choice for both casual users and tech enthusiasts.
In the rapidly evolving landscape of technologies, XML continues to play a vital role in structuring, storing, and transporting data across diverse systems. The recent advancements in artificial intelligence (AI) present new methodologies for enhancing XML development workflows, introducing efficiency, automation, and intelligent capabilities. This presentation will outline the scope and perspective of utilizing AI in XML development. The potential benefits and the possible pitfalls will be highlighted, providing a balanced view of the subject.
We will explore the capabilities of AI in understanding XML markup languages and autonomously creating structured XML content. Additionally, we will examine the capacity of AI to enrich plain text with appropriate XML markup. Practical examples and methodological guidelines will be provided to elucidate how AI can be effectively prompted to interpret and generate accurate XML markup.
Further emphasis will be placed on the role of AI in developing XSLT, or schemas such as XSD and Schematron. We will address the techniques and strategies adopted to create prompts for generating code, explaining code, or refactoring the code, and the results achieved.
The discussion will extend to how AI can be used to transform XML content. In particular, the focus will be on the use of AI XPath extension functions in XSLT, Schematron, Schematron Quick Fixes, or for XML content refactoring.
The presentation aims to deliver a comprehensive overview of AI usage in XML development, providing attendees with the necessary knowledge to make informed decisions. Whether you’re at the early stages of adopting AI or considering integrating it in advanced XML development, this presentation will cover all levels of expertise.
By highlighting the potential advantages and challenges of integrating AI with XML development tools and languages, the presentation seeks to inspire thoughtful conversation around the future of XML development. We’ll not only delve into the technical aspects of AI-powered XML development but also discuss practical implications and possible future directions.
For the full video of this presentation, please visit: https://www.edge-ai-vision.com/2024/06/building-and-scaling-ai-applications-with-the-nx-ai-manager-a-presentation-from-network-optix/
Robin van Emden, Senior Director of Data Science at Network Optix, presents the “Building and Scaling AI Applications with the Nx AI Manager,” tutorial at the May 2024 Embedded Vision Summit.
In this presentation, van Emden covers the basics of scaling edge AI solutions using the Nx tool kit. He emphasizes the process of developing AI models and deploying them globally. He also showcases the conversion of AI models and the creation of effective edge AI pipelines, with a focus on pre-processing, model conversion, selecting the appropriate inference engine for the target hardware and post-processing.
van Emden shows how Nx can simplify the developer’s life and facilitate a rapid transition from concept to production-ready applications.He provides valuable insights into developing scalable and efficient edge AI solutions, with a strong focus on practical implementation.
Communications Mining Series - Zero to Hero - Session 1DianaGray10
This session provides introduction to UiPath Communication Mining, importance and platform overview. You will acquire a good understand of the phases in Communication Mining as we go over the platform with you. Topics covered:
• Communication Mining Overview
• Why is it important?
• How can it help today’s business and the benefits
• Phases in Communication Mining
• Demo on Platform overview
• Q/A
Mind map of terminologies used in context of Generative AI
A New Tracer for Reverse Engineering - PacSec 2010
1. A New Tracer for Reverse Engineering
Niizh (Section 1b) : Background and Implementation (Work in Progress)
Tsukasa Ooi (@a4lg)
2. I...
• will introduce the way to make reverse engineering
more efficient ...possibly.
• Possibly ?
– (Nov 2010) Generic OSes don t work currently.
• Sorry for no live demo!
– Some predictions are included.
3. Related Topics
• Reverse Engineering
– especially dynamic analysis, debuggers and tracers.
• Intel x86 (32-bit) architecture
• Virtualization / Virtual Machine Monitor (VMM)
– Record and Replay
• Intrusion detection and analysis (e.g. honeypots)
• Bug detection (e.g. fuzzing)
4. Agenda
• Drawbacks of instruction tracers
• New tracing method based on Record and Replay
• Tracing-VMM implementation on x64
• Partial Tests
• (Possible) Practical use of this Tracer
• Challenges
5. Target Platform
• Intel x86 (16/32-bit) architecture
• PC/AT
• General purpose OSes (Windows, Linux etc...)
7. Dynamic Analysis
• Analyze running programs
– e.g. By intercepting operations of the program
• Various tools
– Debuggers
• e.g. OllyDbg, IDA Pro...
– Monitors
• Process Monitor, Wireshark...
– Tracers
• API Monitor, OllyDbg, Process Stalker...
• Today, I will talk about so called tracers.
8. Tracers (1)
• Capture and save the information
associated with specific event.
– Various granularity
• instruction, basic block, function, system call...
• Instruction tracing
– REALLY easy to apply automatic analysis
(like automated-unpacking.)
– If you can trace every internal context
each instruction, it means you can acquire any
information you would like.
9. Tracers (2)
• But, in early research, I found most of these
instruction tracers have some drawbacks:
– Extremely Slow
• They hook every instruction execution that makes
tracers really slow.
• x10-x1000
– Generate huge amount of data
• Several gigabytes per real-second.
(real-second : 1sec with no-emulation)
• Save many information each instruction.
• Saving information can be also bottleneck.
10. Tracers (3)
• Can we solve these issues?
• Major Requirements:
– Overhead : <100%
– Size of trace : <5MB/s
• Theme is:
How (did I implement ¦ to make)
VMM-based tracer satisfying these requirements?
12. Record and Replay (0)
• I was going to have independently discovered but:
– I didn t find any documents related before.
• ReVirt : Enabling Intrusion Analysis through Virtual-
Machine Logging and Replay
– I found the new method is a variety of
Record and Replay .
– It is very related and difficult to separate.
• So I m going to describe Record and Replay
with my method.
13. Record and Replay (1)
• The method have some variety of names:
– VMware calls this Record and Replay
– Logging and Replay Lockstep
• Execution with 2-passes (Record/Replay)
• By focusing on common characteristics of
many machine architectures, it makes
trace output phenomenally small.
– Normally, the input from external hardware
is not so frequent.
14. • Many architectures can be represented as this model:
– Input (can be null)
– Calculation / Process (+Internal Context)
– Output (can be null)
• Assuming the output is uniquely determined
by internal context (by function g below.)
• zn+1 = f(zn, in)
on+1 = g(zn+1)
Record and Replay (2)
Input
Output
Calc/Proc
+Context
15. • Saving all information is equivalent to
saving all of internal context (zn).
– The output is not required because we assume
it is uniquely determined by internal context.
• Also save z0 (initial internal context.)
• Function f (equivalent to calculation/process)
must be a mathematic function.
– Same input, same output.
– Not ambiguous.
Record and Replay (3)
Input
Output
Calc/Proc
+Context
16. • Focusing on dependency
– Input : there are no dependency.
– Calculation / Process (+Context) : depend on input
• Now you can find...
– Internal context only depends on internal state and
the input array. You can recover all of these from that
information.
Record and Replay (4)
Input
Calc/Proc
+Context
17. • Pass 1 : Record
– Capture and save initial context
– Run the virtual machine
• Accepts input from external hardware.
– Capture and save all inputs
• This does not generate the dump of
internal context but you can recover
it from this small amount
of data.
Record and Replay (5)
Input
Trace log
Calc/Proc
+Context
InitState
18. • Pass 2 : Replay
– Recover initial context from trace log.
– Run the virtual machine.
• But read trace log to supply input data.
• So it does not accept new hardware inputs.
– Read internal context from
running virtual-machine.
• It is very similar to
Record pass!
Record and Replay (6)
Input
Trace log
Calc/Proc
+Context
InitState
19. Cons. (1)
• It seems to be just running twice but:
– You have saved trace log so you can run
Replay pass anytime, anywhere, as you want.
• You will extract a part of information from Replay pass.
• If you need more information, you just need to
run Replay pass with different configuration.
– If you need to, you can run Replay pass in parallel.
• You can shorten the automated-analysis.
(Actually, you may encounter the dependency issues.)
20. Cons. (2)
• (Cont.)
– Two passes are independent.
• Even if you run slow analysis, the Record pass
remains running as before.
• You may use Replay pass to do slow and
verbose analysis which is difficult to apply directly
(such buffer-overflow detection.)
• This method has an affinity for reverse engineering.
– Trace log contains nearly *everything*
happening in the virtual machine!
21. Real World Example (1)
• VMware Workstation (6 or later)
– Record/Replay feature
• Record execution and you can replay just like
videos and/or you can use it to debug.
– It proprietary and no enough robustness
but this is actually the example implemented
Record and Replay method.
– Trace log : normally 1-10MB/s
22. Real World Example (2)
• VMware Workstation (6 or later)
– But...
• It s still a VMware .
• There is no enough debug interface.
– If debug interface is well equipped,
you could use it for reverse engineering.
• Other examples:
– ReplayDIRECTOR (Java debugging tool)
– Jockey (http://home.gna.org/jockey/)
• User-mode Recording / Debugging library for Linux
23. • All deterministic elements can be considered
one type of input but not inefficient.
– Do you want to record many element of null?!
• Classify the type of so called inputs.
– Nondeterministic Input(s)
– Interrupt(s)
• Just a name; they don t represent
its name literally.
Applying to x86 (1)
入力
トレース
計算/処理
+内部状態
初期状態
24. • Nondeterministic Inputs
– The timing which internal context can be
undetermined can be determined uniquely
(like in instruction in x86.)
– But you cannot determine the actual value
or contents without running it.
– Save actual value or contents.
But don t save its timing.
• We can determine the timing from
recent internal context and interrupts.
Applying to x86 (2.1)
25. • Interrupts
– The timing is not uniquely predictable.
– And actual content can be nondeterministic.
– In this case, trace the timing. Additionally,
if actual content of interrupt is nondeterministic,
trace it too.
• e.g. Interrupt vector number (hardware interrupt)
• The most important thing is:
– Based on these classification, we have to
classify all elements in the virtual machine.
Applying to x86 (2.2)
26. • Modeling ― VM-Internal Disk
– Assume the VM-internal disk is reliable and
record initial disk image.
– Almost all elements are deterministic
except interrupts that disk generates.
• The content read is equivalent to
the content last written.
• But timing of ATA interrupt cannot be
predicted strictly so we can consider this interrupt.
Applying to x86 (3.1)
27. • Modeling ― Mouse, Keyboard, Network
– They are unpredictable/external input.
– The input from the device uses both of
x86 interrupt and I/O port operation.
– Both.
– Network packet you sent are recovered from
the internal context.
Applying to x86 (3.2)
28. • Modeling ― Time Stamp Counter (CPU)
– The clock count since computer reset
that can be read the value with RDTSC instruction.
– Consider Nondeterministic Input.
– Even if the physical location of the value is inside
the CPU, you should consider these value when
they produce unpredictable results.
• If you could model and consider this deterministic,
the implementation can be inefficient.
• NOT considering this deterministic improves
VM emulation efficiency.
Applying to x86 (3.3)
29. • Modeling ― CPU exception
– Almost all exceptions are deterministic
including their timing.
• Page Fault occurs because the CPU has
accessed the invalid memory address.
– So this is not even the input.
• Modeling ― Not determinable behavior of CPU
– After some CPU operation, the part of internal context
can be nondeterministic. (Value/behavior is undefined
by the architecture.)
– Consider this Nondeterministic Inputs.
Applying to x86 (3.4)
30. • Modeling ― Inexact Arithmetic Operation
– Transcendental instruction such as FSINCOS, FATAN
does not define the actual value because
specifying the actual value is very difficult.
– The minimum information that can be used to
recover the original value is considered
Nondeterministic Input.
• Likewise, we have to model *everything*
– Implementation is relatively difficult.
Applying to x86 (3.5)
31. Applying to x86 (4)
• Considering X nondeterministic?
– Increase number of hooks.
– Trace log get bigger, execution get slower.
– Fewer is great.
• I thought these nondeterministic events are
much, much fewer than normal instructions so
there s no problem.
– But it was wrong.
32. How do you think?
• Is this instruction deterministic?
XOR edx, edx
– As you know, this instruction just
clears edx register.
– But answer is No.
• Many of normal operations make some part of
internal context nondeterministic.
– IT IS EFLAGS.
33. The curst of EFLAGS? (1)
• Let s look inside.
– edx IS zero. On the other hand,
EFLAGS.AF is updated to ? .
– Intel s manual says this value is undefined
(can vary.)
xxx......xxx
000......000
x x x x x x
0 0 1 ? 1 0
XOR edx, edx
(next instruction)
OFedx SF ZF AF PF CF
EFLAGS
34. The curst of EFLAGS? (2)
• This is not the end!
– These frequently used instructions as well.
– According to the profiling, 10-15% of instruction
makes a part of EFLAGS undefined!
0 M M ? M 0 AND, OR, XOR, TEST (Logical Arithmetic)
OF SF ZF PF CFAF
M ? ? ? ? M MUL, IMUL (Multiplication)
? ? ? ? ? ? DIV, IDIV (Division)
? M M ? M ? SHL, SHR, SAL, SAR count (Shift)
35. The curst of EFLAGS?(3)
• Not much, much fewer at all!
– Even 10% of instructions, the overhead of hooking
cannot be ignored.
– We can choose EFLAGS not to trace .
For instance we can update EFLAGS register to
deterministic value. But...
• Updating flags (POPF) is extremely slow!
• 24-25 clocks in Intel Nehalem MA (Core i7)
– To avoid this problem, we need to
avoid these values to be affected.
36. The implementation problem (1)
• Public Record and Replay implementation
does not care about this condition!
– They just limit processor model.
If we record the program in processor model A,
we need to replay with the exactly same model.
– Prevents distributed analysis.
– Normally, programs don t depend on these
undefined (nondeterministic) values.
• But technically, 1-bit of nondeterministic value
can cause chaos.
37. The implementation problem (2)
• What is RIGHT?
– We cannot exactly know which CPU model is right.
– I want to integrate information in one.
No more compatibility/portability problems.
• This is no good for reverse engineering.
– I want robustness!
38. EFLAGS : Lazy Evaluation (1)
• EFLAGS and programs have these characteristics:
– Over 80% of updated flags are just discarded.
• We want to trace *everything*. but it is
worthless to trace the value that is not used at all.
– Updating/Evaluating flags are
adjacent in most cases.
• e.g. Compare → Jump Conditionally
• Intel do this optimization! (Macro-Fusion)
– How about lazy evaluation?
• Trace nondeterministic EFLAGS value
when it is used.
39. EFLAGS : Lazy Evaluation (2)
• Current Implementation:
– JIT compiling with static evaluation
(to make programs run faster.)
– Evaluate each instruction block
• From the instruction after some jump operation
to the unconditional jump (instruction/exception).
• Scan each block forward.
– Evaluate propagation of virtual EFLAGS.
• Deterministic or not (Initial Value : No)
• Last instruction that updated flag value.
• We use heuristics.
40. EFLAGS : Lazy Evaluation (3)
• (cont.)
– If the instruction in the block depends on these
flags and virtual flags satisfy the condition below,
we just consider this value nondeterministic.
• The value of virtual flag is nondeterministic.
• The value is deterministic but updated instruction
is too old (32-bytes / 8-instruction or more older.)
• Currently, this is very effective.
– I found almost of all flags are traced
during interrupt handling / context switch.
41. Record and Replay : Conclusion
• Using Record and Replay , we can decrease
the amount of trace log and trace overhead.
• Using (my) improved method,
we can acquire robust trace log in x86 platform.
43. Implementation
• I implement VMM-based tracer.
– To run general purpose OSes.
• But it was not a good idea. Because of its
complexity, I couldn t finalize the VMM (Nov 2010.)
– Using binary translation
• Read guest instruction and transform it
to run on host platform.
– I chose to use x64 platform to implement VMM.
• There s some reason that x64 is good for
binary translation-based x86 emulation.
44. x86 on x64 (1)
• x64 is a 64-bit extension to x86 architecture.
– AMD, Intel and VIA have x64 extension.
– Very similar instruction format.
– Some extensions:
• Increased general purpose and XMM registers (8→16)
• New addressing modes
(64-bit, RIP [program counter] relative)
• There are many elements that make implementing
binary translation-based VMM.
45. x86 on x64 (2.1)
• Benefit : 32-bit registers and clamp
– General purpose register format is based on
its original (that shares lower bits.)
• 例 : ax (16-bit), eax (32-bit), rax (64-bit)
– If you run the instruction which destination is
32-bit register, upper 32-bit of corresponding register
is cleared!
0123
0123
4567
1234
MOV eax, 0x01234567
MOV ax, 0x1234
eax
ax
46. x86 on x64 (2.1)
• Benefit : 32-bit registers and clamp
– General purpose register format is based on
its original (that shares lower bits.)
• 例 : ax (16-bit), eax (32-bit), rax (64-bit)
– If you run the instruction which destination is
32-bit register, upper 32-bit of
corresponding register is cleared!
01234567
00000000
89abcdef
12345678
MOV rax, 0x0123456789abcdef
MOV eax, 0x12345678
rax
eax
47. x86 on x64 (2.2)
• Benefit : Increased Registers (GPR/XMM)
– 8→16 (16 additional register including XMM regs.)
– Save emulator s context without
destroying the existing registers.
rax r8
rcx r9
rdx r10
rbx r11
rsp r12
rbp r13
rsi r14
rdi r15
xmm0 xmm8
xmm1 xmm9
xmm2 xmm10
xmm3 xmm11
xmm4 xmm12
xmm5 xmm13
xmm6 xmm14
xmm7 xmm15
48. x86 on x64 (2.2)
• Benefit : Increased Registers (GPR/XMM)
– 8→16 (16 additional register including XMM regs.)
– Save emulator s context without
destroying the existing registers.
eax cs.base
ecx es.base
edx emuinfo
ds.base ebx
stack esp
ebp tmp2
esi ss.base
tmp1 edi
xmm0 fs.base
xmm1 gs.base
xmm2 tmp3
xmm3 tmp4
xmm4 notused
xmm5 notused
xmm6 notused
xmm7 notused
Actual register mapping table.
For memory/cache optimization,
some registers are relocated.
49. x86 on x64 (2.2)
• Benefit : Increased Registers (GPR/XMM)
– 8→16 (16 additional register including XMM regs.)
– Save emulator s context without
destroying the existing registers.
– XMM registers are difficult to use sometime
but we can transfer to GPR using movq instruction.
50. x86 on x64 (2.3.1)
• Benefit : Remained Addressing Format
– Some addressing modes are added but
still x86-based addressing format.
– x86 have complex addressing mode:
• Like 2-add, 1-shift : [esi+edx*4+123]
• We can use it to separate memory access!
– Address Translation : [segbase+offset]
• All memory access if segbase-relative.
(segbase contains 64-bit address of segment base.)
– Achieving Memory Isolation
• Like Google Native Client for x64
51. x86 on x64 (2.3.2)
• Benefit : Remained Addressing Format
– (e.g. 1) : inc [ds:ecx] → inc [rbx+rcx]
• rbx : Base address of DS segment.
• rcx : Guest ECX register.
– Wait a minute, ecx register is 32-bit but
using rcx register that is 64-bit register!
(You sure that way?)
• No problem. As I described before,
result of 32-bit operations are also clamped.
• We can guarantee that the value of
rcx is in the 32-bit range (0x0000_0000-0xffff_ffff.)
52. x86 on x64 (2.3.3)
• Benefit : Remained Addressing Format
– (e.g. 2 [wrong]) : inc [ds:ecx+edx] → inc [rbx+rcx+rdx]
• Store intermediate result to temporary register.
– (e.g. 2 [correct]) : inc [ds:ecx+edx] →
lea edi, [rcx+rdx] ; inc [rbx+rdi]
• edi/rdi : Temporary register
• Almost same as first example.
– I ll take the best encoding x64 have.
• Store 64-bit address to 32-bit register!
• This is also a valid encoding. Address is automatically
clamped and instruction is shortened.
53. x86 on x64 (2.4.1)
• Benefit : Huge Memory Range
– 64-bit address width
• Valid 48-bit (sign extended) logical address.
• 0x0000_1234_5678 → 0x0000_0000_1234_5678
• 0x8000_1234_5678 → 0xffff_8000_1234_5678
– We can place the data/code that VMM uses
outside the guest accessible region.
• Looking x86 on x86, it needed address compression
to store host/guest data in same address space.
• Increases VMM speed.
54. x86 on x64 (2.4.2)
• Benefit : Huge Memory Range
– But allocating just 4GB is not enough.
The result of address calculation can over/underflow.
– On 32-bit mode on x86, address calculation is
done by 32-bit precision and overflow/underflow
is ignored. It means lower 32-bits is equivalent
to actual accessed memory address.
– So, we modify the page table to satisfy:
lower 32-bits are equivalent == same physical address
55. x86 on x64 (2.4.3)
• Benefit : Huge Memory Range
– Allocate virtual memory region.
– Considering address overflow, we allocate
up to 44.5GB range of virtual memory.
• Red and Blue areas point exactly same physical region.
• We use page table to achieve.
44.5GB
42.25GB
2.25GB
56. x86 on x64 (2.4.4)
• Benefit : Huge Memory Range
– Allocate virtual memory region each
segment and/or segment access control.
• On segment switch, just change base address.
cs.base
ds.base
es.base
ss.base
data3
code0
data3
code3
57. x86 on x64 (2.4.4)
• Benefit : Huge Memory Range
– Allocate virtual memory region each
segment and/or segment access control.
• On segment switch, just change base address.
cs.base
ds.base
es.base
ss.base
data3
code0
data3
code3
58. x86 on x64 (2.5.1)
• Benefit : Simplified Architecture
– Architecture of x64 is relatively simplified
which makes implementing Type-2 VMM easier.
• Only two interrupt handler types:
– Interrupt Gate and Trap Gate
• Now segment is a mere façade.
– Flat memory model for CS, DS, ES and SS.
– Replacing IDT (interrupt vector) to
allocate VM-specific context.
• PatchGuard compatible!
• Nearly stealth but cannot hook system calls.
59. x86 on x64 (2.5.2)
• Benefit : Simplified Architecture
– Pass-through the interrupts
• We can do it safely with IDT switching.
• There s some overhead.
VM OS
Actually implementation is a bit more complicated
but I show the summary.
IDT switch
IDT switch
OS Kernel
VM Trampoline
OS IntHandler
VM Entry
VM IntHandler
VM Kernel
60. x86 on x64 (3)
• Using these techniques, implement
binary translation.
– But currently, it is still incomplete.
• To trace the timing, the following
information is required.
– Value of branch counter
(software implementation is possible.)
– Current program counter (IP, EIP)
– Repeat count (CX, ECX)
• only when rep instruction was executing.
61. Everything into the Ring-0
• Is privilege isolation required?
– Dynamic code is generated safely and
well isolated; enabling run everything in
the kernel-mode (Ring-0.)
• Low-overhead implementation.
• Current implementation do it.
– If this is dangerous behavior, you can also
run the code on user-mode (Ring-3.)
63. Trace size test (1.1)
• Trace log size required
– DLX Linux bundled Bochs 2.45
• From computer reset until login screen.
• 52,217,403 instructions (no-emulation : 53 sec)
– Specs
• 1 MIPS (1,000,000 instructions/sec)
• 32MB MEM, 10MB HDD
– Use Bochs to generate instruction/memory trace
and convert using specific methods.
64. Trace size test (1.2)
• Trace log size required
– Size of initial context is not included.
– Modeled devices in Bochs emulator
and estimated the size of trace log required.
– Due to simplified model, the size
is only estimated (not exact value.)
65. Trace size test (1.3)
• Methods (comparison included)
– Raw
Text-format instruction/memory trace generated by Bochs.
– Verbose
Normal tracer (like OllyDbg does)
– Dumb
Record and Replay plus memory monitoring.
– RnR (1)
Record and Replay (tracing EFLAGS)
– PROPOSAL
Improved Record and Replay method
– RnR (2)
Record and Replay (IGNORING EFLAGS)
66. Trace size test (2.1)
Method Size (bytes)
Raw 7,178,948,236 6.68GB
Verbose X > 419,430,400 400MB
Dumb 60,713,538 57.90MB
RnR (1) 6,932,542 6.61MB
PROPOSAL 389,013 380KB
RnR (2) 31,788 31KB
This table shows PROPOSAL generates only 1/1,000 of trace log
than Verbose tracer. Record and Replay method (ignoring EFLAGS)
is smaller than PROPSAL but it has low portability.
68. Trace size test (2.3)
• Conclusion
– This result didn t come from actual implementation
so there is some suspicious points.
– Despite of this, the proposal method generates
really small trace log compared to old methods.
71. Possible Practical Uses (1)
• Reverse Engineering (non-Malware)
– Everything *worked* is everything *recorded*
• All your program are belong to us!
• Programs behavior is recorded,
including VM detection and/or anti-debugging.
– Of course program is unpacked/decrypted.
• You can integrate multiple analysis.
72. Possible Practical Uses (2)
• Avoiding Anti-debugging/Anti-VM
– No well-known backdoor.
– But binary translation based VM can be detected
by running specific code.
• e.g. Self-modifying code is (extremely) slow.
– You can find how VM is detected.
At least, you can extract useful information to
avoid VM detection.
• Protection of normal program is not so strong.
73. Possible Practical Uses (3)
• Reverse Engineering (Malware)
– It is DANGEROUS to run malware directly!
– However, if you can take care of these problems,
this tracer can be useful.
– Honeypots?
74. Possible Practical Uses (4)
• Fuzzing / Exploit analysis / Bug discovery
– Imagine that Valgrind is applied to all programs
and you can use the guest program interactively.
– By offline-analysis, you can find and track
memory corruption.
– If you can reproduce the issue,
you can extract useful information.
– However, it can be very implementation-dependent
for fuzzing. (efficient or not.)
75. Possible Practical Uses (5)
• Analysis Support
– Export for other well-known tools.
• e.g. Wireshark
– In this case, you have program s behavior so
you can add metadata and/or supplemental info.
• e.g. SSL/TLS auto decryption
• You cannot steal a key from packet dump but
remember, you can run the program which uses
private (common/shared) key!
76. Possible Practical Uses (6)
• <<Place Entry Here>>
– I guess you can use for other purposes.
– I hope that many people work best around
these type of tracer.
78. Challenge : Multicore (1)
• Original Record and Replay is not for
multi-processing environment.
– Many of communications make tracer slow.
– Almost all implementations restricts
1 CPU/thread. (mine, too )
• But, it doesn t mean this is impossible.
– Time-sharing
– Software emulation of MESI protocol
– Trace memory contents
79. Challenge : Multicore (2)
• Time-sharing
– Only one CPU running simultaneously.
– Switch the CPU execution with timer to
simulate running multiple CPUs.
• Pros.
– Almost no synchronization required.
• Cons.
– More CPUs, less efficiency.
– Difficult to reproduce multi-threading problems
because this is not true multi-procesing.
80. Challenge : Multicore (3)
• Software Implementation of MESI protocol
– Memory coherency algorithm
– CPU uses this protocol (or its varieties) to
make memory/cache coherent.
– We can implement this using page-level protection.
– Lock the page to write them.
• Pros.
– High efficiency on few shared pages.
• Cons.
– Software implementation is quite slow.
81. Challenge : Multicore (4)
• Trace Memory contents
– Also trace memory contents read for shared pages.
• Pros.
– Can achieve high efficiency... maybe.
• Cons.
– It is not a perfect-information tracer.
(Which CPU has written this value?!)
– Memory trace is slow.
• Bandwidth monster may be required.
82. Challenge : 64-bit / Others
• x64 on x64 is very difficult.
– There are some ways but not so efficient.
• SSE2 / Reciprocal, Square root instructions
– Not exact value is required for these instructions
and fast to run it (this is a problem.)
• Hypervisor again?
– Trace without portability and convert it to
portable one (using same processor model.)
– This is not perfect, but possible choice.
83. CAUTION : PATENTS
• Some of these techniques are patented!
– Record and Replay
– Optimization for Binary Translation based VMM.
– Difficult/Impossible to avoid these patents.
• However, all patents I have founds are
only United States patent and I guess using this
tracer outside US is no problem.
– Be careful.
84. Conclusion
• I described how to build tracing-VMM for
x86 on x64.
• Using proposal method, trace log gets smaller
and overhead gets lower too.
– However, proper tests (validations) are required
to check whether this is useful for reverse engineering.
• Many of practical uses!
– Some other?
85. contact me at : li at livegrid dot org
Open Source Project : Niizh
will be available at http://niizh.org/
Thank you!
Any questions?