This document discusses embedded system development. It begins with definitions of embedded systems and some of their common characteristics like limited resources and real-time constraints. It then discusses specific issues like memory alignment, flash and RAM sizes, and performance optimizations. Examples are given of embedded projects like digital video recorders and how to address issues like file sorting, memory usage and stack overflows. The conclusion emphasizes that embedded systems involve knowledge from many technical fields and stresses the importance of experience, observation, and a positive problem-solving attitude.
Евгений Крутько, Многопоточные вычисления, современный подход.Platonov Sergey
The document discusses parallel computing in modern C++. It introduces native threads, standard threads in C++11, thread pools, std::async, and examples of parallelizing real applications. It also covers potential issues like data races and tools for detecting them like Valgrind and ThreadSanitizer. Finally, it recommends using std::async, std::future and boost::thread for flexibility and OpenMP for ease of use.
Miller Lee discusses C++ Accelerated Massive Parallelism (C++ AMP) which provides a simpler programming model for GPU computing compared to CUDA and OpenCL. C++ AMP models GPU data as C++ containers and kernels as C++ lambdas. The MCW C++ AMP (CLAMP) compiler translates C++ AMP code to OpenCL, generating OpenCL C code for the device and host code for execution. While C++ AMP code is more concise than OpenCL, its performance depends on the compiler and runtime support.
We all make mistakes while programming and spend a lot of time fixing them.
One of the methods which allows for quick detection of defects is source code static analysis.
This sample program demonstrates how to access Meinberg GPS receivers via the binary data protocol. It can work via a serial port or network socket connection depending on the target operating system. It requires several other modules and supports Windows, Linux, QNX, and DOS targets. The program checks return codes from API functions, gets receiver information, status information, position, time zone, and synthesizer settings. It also has the ability to set the synthesizer frequency and phase.
This document discusses common C++ bugs and tools to find them. It describes various types of memory access bugs like buffer overflows on the stack, heap, and globals that can lead to crashes or security vulnerabilities. Threading bugs like data races, deadlocks, and race conditions on object destruction are also covered. Other undefined behaviors like initialization order issues, lack of sequence points, and integer overflows are explained. The document provides examples of each type of bug and quizzes the reader to find bugs in a code sample. It recommends resources for further reading on debugging techniques and thread sanitizers that can detect races and data races.
The document summarizes AA-sort, a sorting algorithm optimized for SIMD and multicore processors. AA-sort works by first sorting blocks of data in parallel using vectorized combsort. It then merges the sorted blocks together. Key steps include sorting 4 elements within each SIMD register, transposing the registers, and performing a vectorized version of combsort without conditional branches. The document provides pseudocode for these steps.
The document discusses different approaches to implementing GPU-like programming on CPUs using C++AMP. It covers using setjmp/longjmp to implement coroutines for "fake threading", using ucontext for coroutine context switching, and how to pass lambda functions and non-integer arguments to makecontext. Implementing barriers on CPUs requires synchronizing threads with an atomic counter instead of GPU shared memory. Overall, the document shows it is possible to run GPU-like programming models on CPUs by simulating the GPU programming model using language features for coroutines and threading.
AddressSanitizer, ThreadSanitizer, and MemorySanitizer are compiler-based tools that detect bugs like buffer overflows, data races, and uninitialized memory reads in C/C++ programs. AddressSanitizer instruments loads and stores to detect out-of-bounds memory accesses. ThreadSanitizer intercepts synchronization calls to detect data races between threads. MemorySanitizer tracks initialized and uninitialized memory using shadow memory to find uses of uninitialized values. The tools have found thousands of bugs with low overhead. Future work includes supporting more platforms and languages and detecting additional bug classes.
Евгений Крутько, Многопоточные вычисления, современный подход.Platonov Sergey
The document discusses parallel computing in modern C++. It introduces native threads, standard threads in C++11, thread pools, std::async, and examples of parallelizing real applications. It also covers potential issues like data races and tools for detecting them like Valgrind and ThreadSanitizer. Finally, it recommends using std::async, std::future and boost::thread for flexibility and OpenMP for ease of use.
Miller Lee discusses C++ Accelerated Massive Parallelism (C++ AMP) which provides a simpler programming model for GPU computing compared to CUDA and OpenCL. C++ AMP models GPU data as C++ containers and kernels as C++ lambdas. The MCW C++ AMP (CLAMP) compiler translates C++ AMP code to OpenCL, generating OpenCL C code for the device and host code for execution. While C++ AMP code is more concise than OpenCL, its performance depends on the compiler and runtime support.
We all make mistakes while programming and spend a lot of time fixing them.
One of the methods which allows for quick detection of defects is source code static analysis.
This sample program demonstrates how to access Meinberg GPS receivers via the binary data protocol. It can work via a serial port or network socket connection depending on the target operating system. It requires several other modules and supports Windows, Linux, QNX, and DOS targets. The program checks return codes from API functions, gets receiver information, status information, position, time zone, and synthesizer settings. It also has the ability to set the synthesizer frequency and phase.
This document discusses common C++ bugs and tools to find them. It describes various types of memory access bugs like buffer overflows on the stack, heap, and globals that can lead to crashes or security vulnerabilities. Threading bugs like data races, deadlocks, and race conditions on object destruction are also covered. Other undefined behaviors like initialization order issues, lack of sequence points, and integer overflows are explained. The document provides examples of each type of bug and quizzes the reader to find bugs in a code sample. It recommends resources for further reading on debugging techniques and thread sanitizers that can detect races and data races.
The document summarizes AA-sort, a sorting algorithm optimized for SIMD and multicore processors. AA-sort works by first sorting blocks of data in parallel using vectorized combsort. It then merges the sorted blocks together. Key steps include sorting 4 elements within each SIMD register, transposing the registers, and performing a vectorized version of combsort without conditional branches. The document provides pseudocode for these steps.
The document discusses different approaches to implementing GPU-like programming on CPUs using C++AMP. It covers using setjmp/longjmp to implement coroutines for "fake threading", using ucontext for coroutine context switching, and how to pass lambda functions and non-integer arguments to makecontext. Implementing barriers on CPUs requires synchronizing threads with an atomic counter instead of GPU shared memory. Overall, the document shows it is possible to run GPU-like programming models on CPUs by simulating the GPU programming model using language features for coroutines and threading.
AddressSanitizer, ThreadSanitizer, and MemorySanitizer are compiler-based tools that detect bugs like buffer overflows, data races, and uninitialized memory reads in C/C++ programs. AddressSanitizer instruments loads and stores to detect out-of-bounds memory accesses. ThreadSanitizer intercepts synchronization calls to detect data races between threads. MemorySanitizer tracks initialized and uninitialized memory using shadow memory to find uses of uninitialized values. The tools have found thousands of bugs with low overhead. Future work includes supporting more platforms and languages and detecting additional bug classes.
HSA enables more efficient compilation of high-level programming interfaces like OpenACC and C++AMP. For OpenACC, HSA provides flexibility in implementing data transfers and optimizing nested parallel loops. For C++AMP, HSA allows efficient compilation from an even higher level interface where GPU data and kernels are modeled as C++ containers and lambdas, without needing to specify data transfers. Overall, HSA aims to reduce boilerplate code for heterogeneous programming and provide better portability across devices.
Vc4c development of opencl compiler for videocore4nomaddo
This document discusses the development of an OpenCL compiler called VC4C for the VideoCore IV GPU found in the Raspberry Pi. It provides an overview of the VC4 architecture including its quad processing units, texture and memory lookup unit, uniform cache, and vertex pipe memory. It then introduces VC4C as an open-source project that compiles OpenCL to optimized assembly for the VC4. Several challenges are discussed such as limited registers, cache incoherency, and complex iteration patterns from OpenCL IDs. Optimization techniques explored include constant handling, vectorization, kernel fusion, and software pipelining. In conclusion, VC4C remains a work in progress but provides an opportunity for compiler optimization on an unoptimized
The document discusses various topics related to user-space system programming in Linux, including sending and handling signals, signal sets, masking signals, scheduling, inter-process communication (IPC), and timing functions. It provides examples of how to use signals, set priorities and affinities, use timers, and synchronize processes.
The document discusses various topics related to user-space system programming in Linux, including sending and handling signals, signal sets, masking signals, scheduling, inter-process communication (IPC), and timing functions. It provides examples of how to use signals, set priorities, affinity, timers, and IPC between processes.
The document discusses vectorization techniques on x86 processors. It describes how vectorization can significantly improve performance by processing multiple data elements in parallel using SIMD instructions. Vectorization is most effective when combined with multithreading. The document outlines several techniques for writing vectorized code, including using vector instruction sets like SSE and AVX, compiler auto-vectorization and vectorization directives. It also discusses challenges like memory alignment and dependency analysis that compilers have to handle for effective vectorization.
The document discusses process management in operating systems. It covers process concepts like process states, process control blocks (PCBs), and process scheduling. It also covers operations on processes like creation using fork() and exec(), and inter-process communication mechanisms like pipes, shared memory, message queues, semaphores, signals, and FIFOs. Key process management functions like fork(), exec(), wait(), signal(), and alarm() are explained.
This document discusses the Meltdown and Spectre vulnerabilities that were discovered in modern CPUs. Meltdown allows reading kernel memory from user space by exploiting out-of-order execution and speculative execution. Spectre attacks exploit speculative execution to access sensitive information through side channels. The document explains speculative execution, how Meltdown works by inducing mispredictions and reading memory access times, and the two variants of Spectre that exploit conditional branches and indirect branches. Mitigations like KPTI and inserting blocking instructions are discussed along with the performance trade-offs of addressing these vulnerabilities.
Dynamic memory allocation allows programs to request memory from the operating system at runtime. This memory is allocated on the heap. Functions like malloc(), calloc(), and realloc() are used to allocate and reallocate dynamic memory, while free() releases it. Malloc allocates a single block of uninitialized memory. Calloc allocates multiple blocks of initialized (zeroed) memory. Realloc changes the size of previously allocated memory. Proper use of these functions avoids memory leaks.
Build a full-functioned virtual machine from scratch, when Brainfuck is used. Basic concepts about interpreter, optimizations techniques, language specialization, and platform specific tweaks.
This document discusses the Meltdown and Spectre vulnerabilities that were discovered in modern CPUs. Meltdown allows reading kernel memory from user space by exploiting out-of-order execution and speculative execution. Spectre attacks exploit speculative execution to access sensitive information through side channels. The document explains speculative execution, how Meltdown works by reading kernel memory speculatively, and the two variants of Spectre attacks - bound check bypass and branch target injection. Mitigations like KPTI and inserting speculative execution blocking instructions are discussed. The vulnerabilities are considered some of the greatest in computer history due to their fundamental exploitation of CPU designs.
The document contains summaries of code snippets and explanations of technical concepts. It discusses:
1) How a code snippet with post-increment operator i++ would output a garbage value.
2) Why a code snippet multiplying two ints and storing in a long int variable would not give the desired output.
3) Why a code snippet attempting to concatenate a character to a string would not work.
4) How to determine the maximum number of elements an array can hold based on its data type and memory model.
5) How to read data from specific memory locations using the peekb() function in C.
Bridge TensorFlow to run on Intel nGraph backends (v0.5)Mr. Vengineer
The document describes how the nGraph TensorFlow bridge works by rewriting TensorFlow graphs to run on Intel nGraph backends. It discusses how optimization passes are used to modify the graph in several phases: 1) Capturing TensorFlow variables as nGraph variables, 2) Marking/assigning/deassigning nodes to clusters, 3) Encapsulating clusters into nGraphEncapsulateOp nodes to run subgraphs on nGraph. Key classes and files involved are described like NGraphVariableCapturePass, NGraphEncapsulatePass, and how they implement the different rewriting phases to prepare the graph for nGraph execution.
Tiramisu is a code optimization and generation framework that can be integrated into custom compilers. It supports various backends including multi-CPU (using LLVM), GPU (using CUDA), distributed systems (using MPI), and FPGAs (using Xilinx Vivado HLS). Tiramisu uses polyhedral representations to support irregular domains beyond just rectangles. The document provides an overview of Tiramisu and discusses challenges related to supporting different platforms, memory dependencies, efficient code generation, and representations. It also mentions that Tiramisu uses Halide and ISL.
TVM uses Verilator and DPI to connect Verilog/Chisel accelerator models written in SystemVerilog/Chisel to Python code. It initializes the hardware model and controls simulation using methods like SimLaunch, SimWait, SimResume. The Python code loads the accelerator module, allocates memory, runs the accelerator by calling driver functions that interface with the DPI to initialize, launch and wait for completion of the accelerator. This allows accelerators developed in Verilog/Chisel to be tested from Python.
The document summarizes various Python profiling tools. It discusses using the time utility and time module to measure elapsed time. It also covers the profile, cProfile, hotshot, lineprofiler, memoryprofiler, and objgraph modules for profiling code performance and memory usage. Examples are given showing how each tool can be used and the type of output it provides.
The document discusses implementing multiple-precision arithmetic in WebAssembly. It describes how carry operations are important for multiple-precision addition and multiplication but are not supported natively in WebAssembly. It proposes some strategies for emulating carry operations in WebAssembly using instructions like add, lt_u, and select to add multiples of 64-bit elements with carry propagation. Benchmark results show that 32-bit element processing can outperform 64-bit element processing for some operations like multiplication in WebAssembly. Overall, implementing efficient multiple-precision arithmetic in WebAssembly requires emulating carry operations that are supported directly in x64 processors.
The document discusses virtualization and how it works at a high level. It introduces concepts like virtual machines, hypervisors, and how virtualization allows multiple operating systems and applications to run concurrently on the same hardware by dividing the physical resources. It provides examples of how instructions are fetched, decoded and executed for a virtual machine, with the hypervisor supervising and managing access to physical resources.
The document discusses optimization techniques for deep learning frameworks on Intel CPUs and Fugaku aimed architectures. It introduces oneDNN, a performance library for deep learning operations on Intel CPUs. It discusses issues with C++ implementation, and how just-in-time assembly generation using Xbyak can address these issues by generating optimal code depending on parameters. It also introduces Xbyak_aarch64 for generating optimized code for Fugaku's Scalable Vector Extension instructions.
This document describes techniques for creating rootkits on Linux x86 systems. It discusses obtaining the system call table, hooking system calls through various methods like direct modification of the table, inline hooking of system call code, and patching the system call handler. It also presents the idea of abusing debug registers to generate exceptions and intercept system calls. The goal is to conceal running processes, files, and other system data from detection.
We all make mistakes while programming and spend a lot of time fixing them.
One of the methods which allows for quick detection of defects is source code static analysis.
HSA enables more efficient compilation of high-level programming interfaces like OpenACC and C++AMP. For OpenACC, HSA provides flexibility in implementing data transfers and optimizing nested parallel loops. For C++AMP, HSA allows efficient compilation from an even higher level interface where GPU data and kernels are modeled as C++ containers and lambdas, without needing to specify data transfers. Overall, HSA aims to reduce boilerplate code for heterogeneous programming and provide better portability across devices.
Vc4c development of opencl compiler for videocore4nomaddo
This document discusses the development of an OpenCL compiler called VC4C for the VideoCore IV GPU found in the Raspberry Pi. It provides an overview of the VC4 architecture including its quad processing units, texture and memory lookup unit, uniform cache, and vertex pipe memory. It then introduces VC4C as an open-source project that compiles OpenCL to optimized assembly for the VC4. Several challenges are discussed such as limited registers, cache incoherency, and complex iteration patterns from OpenCL IDs. Optimization techniques explored include constant handling, vectorization, kernel fusion, and software pipelining. In conclusion, VC4C remains a work in progress but provides an opportunity for compiler optimization on an unoptimized
The document discusses various topics related to user-space system programming in Linux, including sending and handling signals, signal sets, masking signals, scheduling, inter-process communication (IPC), and timing functions. It provides examples of how to use signals, set priorities and affinities, use timers, and synchronize processes.
The document discusses various topics related to user-space system programming in Linux, including sending and handling signals, signal sets, masking signals, scheduling, inter-process communication (IPC), and timing functions. It provides examples of how to use signals, set priorities, affinity, timers, and IPC between processes.
The document discusses vectorization techniques on x86 processors. It describes how vectorization can significantly improve performance by processing multiple data elements in parallel using SIMD instructions. Vectorization is most effective when combined with multithreading. The document outlines several techniques for writing vectorized code, including using vector instruction sets like SSE and AVX, compiler auto-vectorization and vectorization directives. It also discusses challenges like memory alignment and dependency analysis that compilers have to handle for effective vectorization.
The document discusses process management in operating systems. It covers process concepts like process states, process control blocks (PCBs), and process scheduling. It also covers operations on processes like creation using fork() and exec(), and inter-process communication mechanisms like pipes, shared memory, message queues, semaphores, signals, and FIFOs. Key process management functions like fork(), exec(), wait(), signal(), and alarm() are explained.
This document discusses the Meltdown and Spectre vulnerabilities that were discovered in modern CPUs. Meltdown allows reading kernel memory from user space by exploiting out-of-order execution and speculative execution. Spectre attacks exploit speculative execution to access sensitive information through side channels. The document explains speculative execution, how Meltdown works by inducing mispredictions and reading memory access times, and the two variants of Spectre that exploit conditional branches and indirect branches. Mitigations like KPTI and inserting blocking instructions are discussed along with the performance trade-offs of addressing these vulnerabilities.
Dynamic memory allocation allows programs to request memory from the operating system at runtime. This memory is allocated on the heap. Functions like malloc(), calloc(), and realloc() are used to allocate and reallocate dynamic memory, while free() releases it. Malloc allocates a single block of uninitialized memory. Calloc allocates multiple blocks of initialized (zeroed) memory. Realloc changes the size of previously allocated memory. Proper use of these functions avoids memory leaks.
Build a full-functioned virtual machine from scratch, when Brainfuck is used. Basic concepts about interpreter, optimizations techniques, language specialization, and platform specific tweaks.
This document discusses the Meltdown and Spectre vulnerabilities that were discovered in modern CPUs. Meltdown allows reading kernel memory from user space by exploiting out-of-order execution and speculative execution. Spectre attacks exploit speculative execution to access sensitive information through side channels. The document explains speculative execution, how Meltdown works by reading kernel memory speculatively, and the two variants of Spectre attacks - bound check bypass and branch target injection. Mitigations like KPTI and inserting speculative execution blocking instructions are discussed. The vulnerabilities are considered some of the greatest in computer history due to their fundamental exploitation of CPU designs.
The document contains summaries of code snippets and explanations of technical concepts. It discusses:
1) How a code snippet with post-increment operator i++ would output a garbage value.
2) Why a code snippet multiplying two ints and storing in a long int variable would not give the desired output.
3) Why a code snippet attempting to concatenate a character to a string would not work.
4) How to determine the maximum number of elements an array can hold based on its data type and memory model.
5) How to read data from specific memory locations using the peekb() function in C.
Bridge TensorFlow to run on Intel nGraph backends (v0.5)Mr. Vengineer
The document describes how the nGraph TensorFlow bridge works by rewriting TensorFlow graphs to run on Intel nGraph backends. It discusses how optimization passes are used to modify the graph in several phases: 1) Capturing TensorFlow variables as nGraph variables, 2) Marking/assigning/deassigning nodes to clusters, 3) Encapsulating clusters into nGraphEncapsulateOp nodes to run subgraphs on nGraph. Key classes and files involved are described like NGraphVariableCapturePass, NGraphEncapsulatePass, and how they implement the different rewriting phases to prepare the graph for nGraph execution.
Tiramisu is a code optimization and generation framework that can be integrated into custom compilers. It supports various backends including multi-CPU (using LLVM), GPU (using CUDA), distributed systems (using MPI), and FPGAs (using Xilinx Vivado HLS). Tiramisu uses polyhedral representations to support irregular domains beyond just rectangles. The document provides an overview of Tiramisu and discusses challenges related to supporting different platforms, memory dependencies, efficient code generation, and representations. It also mentions that Tiramisu uses Halide and ISL.
TVM uses Verilator and DPI to connect Verilog/Chisel accelerator models written in SystemVerilog/Chisel to Python code. It initializes the hardware model and controls simulation using methods like SimLaunch, SimWait, SimResume. The Python code loads the accelerator module, allocates memory, runs the accelerator by calling driver functions that interface with the DPI to initialize, launch and wait for completion of the accelerator. This allows accelerators developed in Verilog/Chisel to be tested from Python.
The document summarizes various Python profiling tools. It discusses using the time utility and time module to measure elapsed time. It also covers the profile, cProfile, hotshot, lineprofiler, memoryprofiler, and objgraph modules for profiling code performance and memory usage. Examples are given showing how each tool can be used and the type of output it provides.
The document discusses implementing multiple-precision arithmetic in WebAssembly. It describes how carry operations are important for multiple-precision addition and multiplication but are not supported natively in WebAssembly. It proposes some strategies for emulating carry operations in WebAssembly using instructions like add, lt_u, and select to add multiples of 64-bit elements with carry propagation. Benchmark results show that 32-bit element processing can outperform 64-bit element processing for some operations like multiplication in WebAssembly. Overall, implementing efficient multiple-precision arithmetic in WebAssembly requires emulating carry operations that are supported directly in x64 processors.
The document discusses virtualization and how it works at a high level. It introduces concepts like virtual machines, hypervisors, and how virtualization allows multiple operating systems and applications to run concurrently on the same hardware by dividing the physical resources. It provides examples of how instructions are fetched, decoded and executed for a virtual machine, with the hypervisor supervising and managing access to physical resources.
The document discusses optimization techniques for deep learning frameworks on Intel CPUs and Fugaku aimed architectures. It introduces oneDNN, a performance library for deep learning operations on Intel CPUs. It discusses issues with C++ implementation, and how just-in-time assembly generation using Xbyak can address these issues by generating optimal code depending on parameters. It also introduces Xbyak_aarch64 for generating optimized code for Fugaku's Scalable Vector Extension instructions.
This document describes techniques for creating rootkits on Linux x86 systems. It discusses obtaining the system call table, hooking system calls through various methods like direct modification of the table, inline hooking of system call code, and patching the system call handler. It also presents the idea of abusing debug registers to generate exceptions and intercept system calls. The goal is to conceal running processes, files, and other system data from detection.
We all make mistakes while programming and spend a lot of time fixing them.
One of the methods which allows for quick detection of defects is source code static analysis.
The document contains summaries of code snippets and explanations of technical concepts. It discusses:
1) How a code snippet with post-increment operator i++ would output a garbage value.
2) Why a code snippet multiplying two ints and storing in a long int variable would not give the desired output.
3) Why a code snippet attempting to concatenate a character to a string would not work.
4) How to determine the maximum number of elements an array can hold based on its data type and memory model.
5) How to read data from specific memory locations using the peekb() function.
Here is a bpftrace program to measure scheduler latency for ICMP echo requests:
#!/usr/local/bin/bpftrace
kprobe:icmp_send {
@start[tid] = nsecs;
}
kprobe:__netif_receive_skb_core {
@diff[tid] = hist(nsecs - @start[tid]);
delete(@start[tid]);
}
END {
print(@diff);
clear(@diff);
}
This traces the time between the icmp_send kernel function (when the packet is queued for transmit) and the __netif_receive_skb_core function (when the response packet is received). The
This document describes techniques for creating rootkits on Linux x86 systems. It discusses obtaining the system call table through the interrupt descriptor table and IDT register. It explains how to hook system calls by modifying the system call table entries or using inline assembly. The document also covers abusing debug registers to generate breakpoints and divert execution to custom handlers without modifying code. Overall, the document provides an overview of common rootkit techniques along with code examples for implementing hooks at the system call level and bypassing detection on Linux.
Each process has a unique process ID and maintains its parent's ID. A process's virtual memory is divided into segments like the stack and heap. When a program runs, its command-line arguments and environment are passed via argc/argv and the environ list. The setjmp() and longjmp() functions allow non-local jumps between functions, but their use should be avoided due to restrictions and compiler optimizations that can affect variable values.
Beyond Breakpoints: A Tour of Dynamic AnalysisFastly
Despite advances in software design and static analysis techniques, software remains incredibly complicated and difficult to reason about. Understanding highly-concurrent, kernel-level, and intentionally-obfuscated programs are among the problem domains that spawned the field of dynamic program analysis. More than mere debuggers, the challenge of dynamic analysis tools is to be able record, analyze, and replay execution without sacrificing performance. This talk will provide an introduction to the dynamic analysis research space and hopefully inspire you to consider integrating these techniques into your own internal tools.
The document discusses UNIX processes and related concepts:
1. A UNIX process consists of text, data, and stack segments in memory, and has a process table entry containing process-specific data like file descriptors and environment variables.
2. Processes are started by a kernel which calls a startup routine before main(). Processes can terminate normally via return, exit(), or _exit(), or abnormally via abort() or signals.
3. Functions like atexit(), setjmp(), longjmp(), getrlimit(), and setrlimit() allow processes to register exit handlers, transfer control between functions, and set resource limits.
This document contains code snippets and outputs from several programming assignments. The assignments involve tasks like displaying logged in users, listing connected devices, modifying process priorities, and measuring system memory. Code examples are provided in C, C++, Python, Java, Shell and Perl to demonstrate the various tasks. The outputs confirm that the programs are working as intended by displaying the expected results.
The document discusses analyzing crashes using WinDbg. It provides tips on reconstructing crashed call stacks and investigating what thread or lock is causing a hang. The debugging commands discussed include !analyze, !locks, .cxr, kb to find the crashing function and stuck thread.
Here are the values of some pointer expressions using a and p:
p: Points to the first element of a, which is 10
*p: 10 (the value at the address p points to)
p+1: Points to the second element of a, which is 20
*(p+1): 20
&(p+1): Points to the address of p+1
p-1: Not valid, as p is pointing to the first element already
Exploitation of counter overflows in the Linux kernelVitaly Nikolenko
This document summarizes an exploit talk on counter overflows in the Linux kernel. It discusses how counter overflows can be used to exploit vulnerabilities by overflowing reference counters and triggering object deallocation. It provides examples of real counter overflow vulnerabilities in Linux, such as CVE-2014-2851 and CVE-2016-0728, and outlines the general exploitation procedure, including overflowing the counter, triggering object freeing, overwriting data, and executing code. It also discusses challenges like integer overflow times and techniques like using RCU calls to bypass checks.
The document discusses exploring the x64 architecture, covering topics such as the x64 application binary interface, memory layout differences between x86 and x64, API hooking and code injection techniques for x64, and differences in system calls between x86 and x64. It provides an overview of key technical details and concepts for developers working with x64 platforms.
BPFtrace is a high level tracing language for Linux Berkeley Packet Filter (BPF) available in recent Linux kernels. Built on top of LLVM and BCC (https://github.com/iovisor/bcc), BPFtrace provides an easier way of writing BPF programs for interacting with Linux tracing capabilites, such as kprobes, uprobes, kernel tracepoints, USDT and hardware events.
We'll go over Linux BPF itself and how BPFtrace fits in, followed by some demonstrations of new performance and monitoring tools written with BPFtrace.
https://github.com/iovisor/bpftrace
Don't mention TLB (at all?!?), just confuses people. Was just put so people
were aware that it was being set up for deterministic behaviour (the side
channel is the cache exclusively, not the TLB missing).
Don't mention the privilege level arch stuff until *after* Variant 1 has been
discussed, rather prior to Variant 2, and especially 3/Meltdown.
To explain the victim vs. attacker domains better in Variant 1, the example of
two threads in a process should be given, where one thread is the
'parent'/'governor' of the other(s), and has privileged information, e.g., a
valid TLS session key for a bank account login in another thread/tab in a
browser. One thread should not be able to 'see' another's private data.
Items such as the AntiVirus report could easily be omitted...
Thanks,
Kim Phillips
Rust LDN 24 7 19 Oxidising the Command LineMatt Provost
The document discusses various techniques for building command line utilities in Rust, including handling broken pipes, reading input byte-by-byte or line-by-line, reading from stdin or files, and handling non-UTF8 arguments. It provides code examples of reading from stdin, handling broken pipes gracefully, and collecting command line arguments as OsStrings to support non-UTF8 values. The document concludes by advertising open jobs at Yelp and providing contact information.
The document discusses three sanitizers - AddressSanitizer, ThreadSanitizer, and MemorySanitizer - that detect bugs in C/C++ programs. AddressSanitizer detects memory errors like buffer overflows and use-after-frees. ThreadSanitizer finds data races between threads. MemorySanitizer identifies uses of uninitialized memory. The sanitizers work by instrumenting code at compile-time and providing a run-time library for error detection and reporting. They have found thousands of bugs in major software projects with reasonable overhead. Future work includes supporting more platforms and detecting additional classes of bugs.
Rust: код может быть одновременно безопасным и быстрым, Степан КольцовYandex
Последние 15 лет между разработчиками на Java и на C++ ведётся спор о том, какой язык программирования хуже — Java или C++. Программы на C++ глючат, падают, и в них утекает память. Программы на Java тормозят и требуют слишком много памяти.
Rust — новый язык программирования, разрабатываемый компанией Mozilla — решает проблемы Java и C++: программы, написанные на Rust, одновременно быстрые и безопасные. Rust является таким же низкоуровневым, close-to-metal языком программирования, как и C++, однако в язык встроены конструкции, позволяющие на этапе компиляции доказывать, что в программе не случится обращения к неинициализированной памяти (механизм borrowed pointers). Большая часть моего рассказа будет посвящена описанию этого механизма.
The slide introduce some of the Rust concept that are necessary to write a kernel. Including wrapping an CSRs operation, locking mutable static variable, memory allocator, and pointer in Rust.
Please visit the project github to see the source code of the rrxv6 projects:
https://github.com/yodalee/rrxv6
Similar to ExperiencesSharingOnEmbeddedSystemDevelopment_20160321 (20)
2. Embedded System
• An embedded system is a computer system with
a dedicated function within a larger mechanical or
electrical system, often with real-time
computing constraints.
• It is embedded as part of a complete device often• It is embedded as part of a complete device often
including hardware and mechanical parts.
• Embedded systems control many devices in
common use today, 98 percents of all
microprocessors being manufactured are used in
embedded systems.
• Definition from Wiki.
3. Embedded System Restrictions
• Limited CPU Speed
• Limited Flash Size (code, constant data)
• Limited RAM Size (data, run-time stack)
• Limited Peripherals (GPIO Simulate?)
• Low Power Consumption• Low Power Consumption
• Low Per-Unit Cost
• Small In Size
• Rugged Operating Ranges?
• Response in Real-Time?
11. Forgotten World Under Program
Running – Runtime Stack
• Why our function calls can return to it’s
original address?
• How the functions parameters passing?
• Where are our local variables stored/located?• Where are our local variables stored/located?
• Where is the program runtime stack located?
• How to decide stack size for each task/process?
• Recursive functions vs. runtime stack.
• Buffer overflow attack.
12. Machine status in “run-time stack”
void interrupt (*old_isr)(...);
void interrupt new_isr(…)
{
(*old_isr)();
}
@new_isr$qve proc far
push ax
push bx
push cx
push dx
push es
push ds
push si
push di
push bp
mov bp,DGROUP
mov ds,bp
n-6 mov bp,DGROUP
mov ds,bp
mov bp,sp
; {
; (*old_isr)();
pushf
call dword ptr DGROUP:_old_isr
; }
pop bp
pop di
pop si
pop ds
pop es
pop dx
pop cx
pop bx
pop ax
iret
@new_isr$qve endp
n-2
n
n+2
n+4
n+6
n+8
n+10
n+12
n+14
n+16
n+18
n+22
n+24
CPU flags
Return Address
.
.
.
ax
axbx
cx
dx
es
ds
si
di
bp sp, bp
13. Parameters in “run-time stack”
_DATA segment word public 'DATA'
_c label byte
db 97
_i label word
db 52,18
_j label word
db 120,86,0,0
_DATA ends
_TEXT segment byte public 'CODE'
;
; void main(void)
;
assume cs:_TEXT
_main proc near
char c='a';
int i=0x1234;
long j=0x5678;
int val;
int func(char, int, long);
void main(void)
{
val = func(c, i, j);
} _main proc near
push bp
mov bp,sp
;
; {
; val = func(c, i, j);
;
push word ptr DGROUP:_j+2
push word ptr DGROUP:_j
push word ptr DGROUP:_i
mov al,byte ptr DGROUP:_c
push ax
call near ptr _func
add sp,8
mov word ptr DGROUP:_val,ax
;
; }
;
pop bp
ret
_main endp
_TEXT ends
}
n-16
n-14
n-12
n-10
n-8
n-6
n-4
n-2
n
n+2
n+4
Return Address
.
.
.
low word of j
i
c
bp
high word of j
sp
bp
14. Local variables in “run-time stack”
_TEXT segment byte public 'CODE'
;
; int func(char cc,int ii,long jj)
;
assume cs:_TEXT
_func proc near
push bp
mov bp,sp
sub sp,4
;
; {
; int k=0, l=1;
;
mov word ptr [bp-2],0
mov word ptr [bp-4],1
char c='a';
int i=0x1234;
long j=0x5678;
int val;
int func(char cc, int ii, long jj)
{
int k=0, l=1;
return ii;
}
void main(void)
{
val = func(c, i, j);
}
ENTER Instruction
mov word ptr [bp-4],1
;
; return k;
;
mov ax,word ptr [bp+6]
jmp short @1@58
@1@58:
;
; }
;
mov sp,bp
pop bp
ret
_func endp
;
; void main(void)
;
assume cs:_TEXT
_main proc near
...
_main endp
_TEXT ends
}
n-4
n-2
n
n+2
n+4
n+6
n+8
n+10
n+12
n+14
n+16
Return Address
.
.
.
low word of j =jj
I =ii
c =cc
bp
high word of j =jj
Return Address
sp
bpbp
?? = l
?? = k
Control
Link
bp
LEAVE Instruction
16. Buffer Overflow Prevention
• Use “snprintf()” instead of “sprintf()”:
int snprintf(char *str, size_t size, const char * restrict format, ...)
int sprintf( char * str, const char * format, ... )
Refer “用 snprintf / asprintf 取代不安全的 sprintf”
• Use “strncpy()” instead of “strcpy()”:• Use “strncpy()” instead of “strcpy()”:
char *strncpy(char *dest, const char * src, size_t num)
char *strcpy(char *dest, const char *src)
17. Buffer Overflow Attack
#include <stdio.h>
#include <string.h>
int main(void)
{
char buff[15];
int pass = 0;
printf("n Enter the password : n");
gets(buff);
if(strcmp(buff, "thegeekstuff"))
$ ./bfrovrflw
Enter the password :
thegeekstuff
Correct Password
Run with correct password:
if(strcmp(buff, "thegeekstuff"))
{
printf ("n Wrong Password n");
}
else
{
printf ("n Correct Password n");
pass = 1;
}
if(pass)
{
/* Now Give root or admin rights to user*/
printf ("n Root privileges given to user n");
}
return 0;
}
Root privileges given to the user
$ ./bfrovrflw
Enter the password :
hhhhhhhhhhhhhhhhhhhh
Wrong Password
Root privileges given to the user
Run with buffer overflow attack:
18. /*
StackOverrun.c
This program shows an example of how a stack-based
buffer overrun can be used to execute arbitrary code. Its
objective is to find an input string that executes the function bar.
*/
#pragma check_stack(off)
#include <string.h>
#include <stdio.h>
void foo(const char* input)
Stack Overrun Example from
Howard and LeBlanc
{
char buf[10];
printf("My stack looks like:n%pn%pn%pn%pn%pn% pnn");
strcpy(buf, input);
printf("%sn", buf);
printf("Now the stack looks like:n%pn%pn%pn%pn%pn%pnn");
}
void bar(void)
{
printf("Augh! I've been hacked!n");
}
int main(int argc, char* argv[])
{
//Blatant cheating to make life easier on myself
printf("Address of foo = %pn", foo);
printf("Address of bar = %pn", bar);
if (argc != 2)
{
printf("Please supply a string as an
argument!n");
return -1;
}
foo(argv[1]);
return 0;
}
25. Critical Section Issues
class Counter
{
private int value = 1; //counter starts at one
public Counter(int c) { //constructor initializes counterpublic Counter(int c) { //constructor initializes counter
value = c;
}
public int inc() { //increment value & return prior value
int temp = value; //start of danger zone
value = temp+1; //end of danger zone
return temp;
}
}
26. Critical Section Issues
• The problem occurs if two threads both read
the value field at the line marked “start of
danger zone”, and then both update that field
at the line marked “end of danger zone”.at the line marked “end of danger zone”.
int temp = value;
value = temp+1;
27. Critical Section Issues
Value 2 3 2int temp = value;
value = temp+1;
read 1
read 1
write 2
read 2
write 3
write 2
time
28. The secret of “volatile” keyword
• void dummy_loop(int cnt)
{
volatile int i;
for (i=0; i<cnt; i++) {}
}}
• volatile UINT32 *reg = 0x30000000;
*reg = 100;
*reg = 200;
*reg = 300;
• What’s the result after optimization?
29. Variable Allocation in C/C++
• Global variables:
int var;
static int var;
const int var = 100;const int var = 100;
• Local variables:
void func(void) {
int var;
static int var;
}
31. Reduce of Program Flash Usage
• How to minimize the code size to fit into the
limited flash memory?
– What will be put into the flash memory after
program compiled/linked?program compiled/linked?
– Good algorithm reduced code size.
– Good coding skill reduced code size.
– Optimization during compiling.
=> Timing changed side effect => Usage of
“volatile”
32. Reduce of Data Memory Usage
• How to minimize the data size to fit into the
limited SRAM?
– Usually more precious/limited than flash memory.
– Where will our data located for different kind of– Where will our data located for different kind of
variables? (local vs. global vs. static vs. const)
– Constant tables put into flash instead of SRAM
– Compact data structure design
– Local variable vs. Global variable
33. Conclusion (1/2)
• Embedded system is a mix domain that cover
various technical fields:
– Computer Programming, Assembly Language
– Data Structure, Algorithm
– Operating System, Compiler– Operating System, Compiler
– Computer Organization & Architecture
– Digital System, Electrical Circuits
– Microprocessor Systems
– Digital Signal Processing
– Specific Industry Domain Knowledge.