This document summarizes a presentation given at PyCon TW 2017 about removing the Global Interpreter Lock (GIL) in Python to allow multi-threaded Python programs to take advantage of multi-processor systems. It begins with examples showing how the GIL currently prevents parallel execution across threads. It then explores approaches like using the dynamic linker and dlmopen() function to load separate copies of the Python shared library for each thread, thereby removing the shared GIL. While an ideal solution, challenges remain in fully implementing this approach.
Global Interpreter Lock: Episode I - Break the SealTzung-Bi Shih
PyCon APAC 2015 discusses the Global Interpreter Lock (GIL) in CPython and ways to work around it to achieve higher performance on multi-processor systems. It provides examples of using multiprocessing, pp (Parallel Python), and releasing the GIL using C extensions to allow concurrent execution across multiple CPU cores. Releasing the GIL allows taking advantage of additional CPUs for processor-intensive tasks, while multiprocessing and pp allow running I/O-bound tasks in parallel across multiple processes to improve throughput.
PyCon TW 2017 - PyPy's approach to construct domain-specific language runtime...Tsundere Chen
PyCon TW 2017 - PyPy's approach to construct domain-specific language runtime -Part 2
This is the slide for PyCon TW 2017 Day 3 PyPy's approach to construct domain-specific language runtime's Slide, and this is part 2, Part 1 is jserv's work, refer to his slide
The document discusses different methods for managing services and daemons at system startup. It begins by explaining that traditional init systems launched processes in a specific order, but that event-driven systems are now more common. It then provides examples of init systems including launchd, upstart, and systemd. Launchd configurations in macOS are defined through XML property list files that can start daemons, sockets, periodic jobs, and monitor directories for changes.
Feldo: Function Event Listing and Dynamic Observing for Detecting and Prevent...Tzung-Bi Shih
This document summarizes the OSX.KeRanger ransomware. It begins by describing how the ransomware attaches itself as a disk image and drops a file called General.rtf. It then analyzes how General.rtf is UPX packed and unlinks itself to hide. It explains how the ransomware daemonizes and waits before generating a UUID and communicating with its command and control server to receive an RSA public key and ransom statement. Finally, it details how the ransomware encrypts files based on specific file extensions except for a few file types like README_FOR_DECRYPT.txt.
The document discusses using Go language to build a distributed computing architecture with multiple machines. It covers using RabbitMQ and NSQ for queues, building a simple queue mechanism for single machine versions, and rewriting the architecture to address concurrency limits. It also discusses setting up a server-agent system with RPC communication and context cancellation to enable job cancellation.
This document discusses implementing a job queue in Golang. It begins by explaining buffered and unbuffered channels, and shows examples of using channels to coordinate goroutines. It then demonstrates how to build a job queue that uses a channel to enqueue jobs and have worker goroutines process jobs from the channel concurrently. It also discusses ways to gracefully shutdown workers using contexts and wait groups. Finally, it covers topics like auto-scaling agents, communicating between servers and agents, and handling job cancellation.
Из презентации вы узнаете:
про большинство утилит из арсенала Go, предназначенных для оптимизации производительности;
— как и когда их (утилиты) использовать, а также мы посмотрим как они устроены внутри;
— про применимость linux утилиты perf для оптимизации программ на Go.
Кроме того, устроим небольшой crash course, в рамках которого поэтапно соптимизируем несколько небольших программ на Go с использованием вышеперечисленных утилит.
How to make a large C++-code base manageablecorehard_by
My talk will cover how to work with a large C++ code base professionally. How to write code for debuggability, how to work effectively even due the long C++ compilation times, how and why to utilize the STL algorithms, how and why to keep interfaces clean. In addition, general convenience methods like making wrappers to make the code less error prone (for example ranged integers, listeners, concurrent values). Also a little bit about common architecture patterns to avoid (virtual classes), and patterns to encourage (pure functions), and how std::function/lambda functions can be used to make virtual classes copyable.
Global Interpreter Lock: Episode I - Break the SealTzung-Bi Shih
PyCon APAC 2015 discusses the Global Interpreter Lock (GIL) in CPython and ways to work around it to achieve higher performance on multi-processor systems. It provides examples of using multiprocessing, pp (Parallel Python), and releasing the GIL using C extensions to allow concurrent execution across multiple CPU cores. Releasing the GIL allows taking advantage of additional CPUs for processor-intensive tasks, while multiprocessing and pp allow running I/O-bound tasks in parallel across multiple processes to improve throughput.
PyCon TW 2017 - PyPy's approach to construct domain-specific language runtime...Tsundere Chen
PyCon TW 2017 - PyPy's approach to construct domain-specific language runtime -Part 2
This is the slide for PyCon TW 2017 Day 3 PyPy's approach to construct domain-specific language runtime's Slide, and this is part 2, Part 1 is jserv's work, refer to his slide
The document discusses different methods for managing services and daemons at system startup. It begins by explaining that traditional init systems launched processes in a specific order, but that event-driven systems are now more common. It then provides examples of init systems including launchd, upstart, and systemd. Launchd configurations in macOS are defined through XML property list files that can start daemons, sockets, periodic jobs, and monitor directories for changes.
Feldo: Function Event Listing and Dynamic Observing for Detecting and Prevent...Tzung-Bi Shih
This document summarizes the OSX.KeRanger ransomware. It begins by describing how the ransomware attaches itself as a disk image and drops a file called General.rtf. It then analyzes how General.rtf is UPX packed and unlinks itself to hide. It explains how the ransomware daemonizes and waits before generating a UUID and communicating with its command and control server to receive an RSA public key and ransom statement. Finally, it details how the ransomware encrypts files based on specific file extensions except for a few file types like README_FOR_DECRYPT.txt.
The document discusses using Go language to build a distributed computing architecture with multiple machines. It covers using RabbitMQ and NSQ for queues, building a simple queue mechanism for single machine versions, and rewriting the architecture to address concurrency limits. It also discusses setting up a server-agent system with RPC communication and context cancellation to enable job cancellation.
This document discusses implementing a job queue in Golang. It begins by explaining buffered and unbuffered channels, and shows examples of using channels to coordinate goroutines. It then demonstrates how to build a job queue that uses a channel to enqueue jobs and have worker goroutines process jobs from the channel concurrently. It also discusses ways to gracefully shutdown workers using contexts and wait groups. Finally, it covers topics like auto-scaling agents, communicating between servers and agents, and handling job cancellation.
Из презентации вы узнаете:
про большинство утилит из арсенала Go, предназначенных для оптимизации производительности;
— как и когда их (утилиты) использовать, а также мы посмотрим как они устроены внутри;
— про применимость linux утилиты perf для оптимизации программ на Go.
Кроме того, устроим небольшой crash course, в рамках которого поэтапно соптимизируем несколько небольших программ на Go с использованием вышеперечисленных утилит.
How to make a large C++-code base manageablecorehard_by
My talk will cover how to work with a large C++ code base professionally. How to write code for debuggability, how to work effectively even due the long C++ compilation times, how and why to utilize the STL algorithms, how and why to keep interfaces clean. In addition, general convenience methods like making wrappers to make the code less error prone (for example ranged integers, listeners, concurrent values). Also a little bit about common architecture patterns to avoid (virtual classes), and patterns to encourage (pure functions), and how std::function/lambda functions can be used to make virtual classes copyable.
Deep Dive async/await in Unity with UniTask(EN)Yoshifumi Kawai
The document discusses asynchronous programming in C# using async/await and Rx. It explains that async/await is not truly asynchronous or multithreaded - it is for asynchronous code that runs on a single thread. UniTask is introduced as an alternative to Task that is optimized for Unity's single-threaded environment by avoiding overhead like ExecutionContext and SynchronizationContext. Async/await with UniTask provides better performance than coroutines or Rx observables for Unity.
Bridge TensorFlow to run on Intel nGraph backends (v0.4)Mr. Vengineer
This document discusses bridging TensorFlow to run on Intel nGraph backends. It summarizes various optimization passes used in the nGraph-TensorFlow integration, including passes to liberate nodes from placement constraints, confirm placement, cluster the graph, and encapsulate clusters. Key points:
- NGraphLiberatePass and NGraphConfirmPass run during the PRE_PLACEMENT phase to handle nGraph placement
- NGraphClusterPass runs during POST_REWRITE_FOR_EXEC to cluster the graph into subgraphs, similar to XLA partitioning
- NGraphEncapsulatePass encapsulates clusters into NGraphEncapsulateOp nodes, analogous to XLA's use of _XlaLaunchOp
-
About Those Python Async Concurrent Frameworks - Fantix @ OSTC 2014Fantix King 王川
This document discusses Python asynchronous concurrency frameworks and their future. It compares Twisted, Tornado, Gevent, and the new asyncio module. Twisted and Tornado use callbacks while Gevent uses greenlets. asyncio aims to provide an event loop like Twisted. It also introduces coroutines, tasks, and futures. The document argues that asyncio could serve as a common event loop that existing frameworks adapt to for better interoperability in the future.
Hiveminder - Everything but the Secret SauceJesse Vincent
Ten tools and techniques to help you:
Find bugs faster バグの検出をもっと素早く
Build web apps ウェブアプリの構築
Ship software ソフトのリリース
Get input from users ユーザからの入力を受けつける
Own the Inbox 受信箱を用意する
今日の話
The Simple Scheduler in Embedded System @ OSDC.TW 2014Jian-Hong Pan
The document describes a simple scheduler module implemented in C for embedded systems. It breaks processes into small jobs represented by functions that are scheduled in a first-in, first-out queue without preemption. This allows embedding an operating system concept into simple systems using only functions and a ready queue. Interrupts can add jobs to the queue. The scheduler and example oscilloscope application demonstrate scheduling without process state using only callbacks.
This document discusses using UniRx (Reactive Extensions for Unity) to make asynchronous network requests in a reactive and error-handling manner. Key points include:
1. Wrapping ObservableWWW requests in an ObservableClient to handle errors, timeouts, and retries in a consistent way across methods.
2. Using LINQ operators like Select, Catch, and Timeout to process the network response and handle errors in a reactive pipeline.
3. Implementing more complex retry logic by publishing and connecting an observable to retry requests automatically.
4. Combining multiple asynchronous requests using WhenAll to run them in parallel.
Droidcon Berlin 2021 - With coroutines being the de facto way of exposing async work and streams of changes for Kotlin on Android, developers are obviously attempting to use the same approaches when moving their code to Multiplatform.
But due to the way the memory model differs between JVM and Kotlin Native, it can be a painful experience.
In this talk, we will take a deep dive into the Coroutine API for Kotlin Multiplatform. You will learn how to expose your API with Coroutines while working with the Kotlin Native memory model instead of against it, and avoid the dragons along the way.
The JVM JIT compiler and deoptimizer are triggered under certain conditions like method invocation counts, changes in program behavior, and hot spots. The JIT initially compiles code to generate fast machine instructions while the deoptimizer reverts back to interpreted execution if needed.
This document summarizes Gavin M. Roy's presentation on concurrency with multiprocessing in Python. It discusses using threads via the threading module, issues with the Global Interpreter Lock (GIL) in Python, and how to use the multiprocessing module to achieve true parallelism across multiple processes. It provides examples of creating threads and processes that run concurrently and examples of how to share objects between processes using connections, queues, pipes, managers and reduction tools.
Memory Management of C# with Unity Native CollectionsYoshifumi Kawai
This document discusses C# and memory management in Unity. It begins by introducing the author and some of their open-source projects related to C# and Unity, including libraries for serialization and reactive programming. It then discusses using async/await with Unity through the UniTask library. The document also covers differences in memory management between .NET Core and Unity due to using different runtimes (CoreCLR vs Unity runtime) and virtual machines. It presents examples of using unsafe code and pointers to directly manage memory in C# for cases like native collections. It concludes that while C# aims for a safe managed world, optimizations require bypassing the runtime through unsafe code, and being aware of memory can help better understand behavior and use APIs more
A journey through the wonderful world of Node.js C++ addons. This talk was given at the September 8, 2015 NodeMN meetup.
Code: https://github.com/cb1kenobi/nodemn
Multithreading with modern C++ is hard. Undefined variables, Deadlocks, Livelocks, Race Conditions, Spurious Wakeups, the Double Checked Locking Pattern, etc. And at the base is the new Memory-Modell which make the life not easier. The story of things which can go wrong is very long. In this talk I give you a tour through the things which can go wrong and show how you can avoid them.
Bridge TensorFlow to run on Intel nGraph backends (v0.5)Mr. Vengineer
The document describes how the nGraph TensorFlow bridge works by rewriting TensorFlow graphs to run on Intel nGraph backends. It discusses how optimization passes are used to modify the graph in several phases: 1) Capturing TensorFlow variables as nGraph variables, 2) Marking/assigning/deassigning nodes to clusters, 3) Encapsulating clusters into nGraphEncapsulateOp nodes to run subgraphs on nGraph. Key classes and files involved are described like NGraphVariableCapturePass, NGraphEncapsulatePass, and how they implement the different rewriting phases to prepare the graph for nGraph execution.
node.js and native code extensions by examplePhilipp Fehre
Over the last years node.js has evolved to be a great language to build web applications. The reason for this is not only that it is based on JavaScript which already is established around "the web" but also that it provides excellent facilities for extensions, not only via JavaScript but also integration of native C libraries. Couchbase makes a lot of use of this fact making the Couchbase node.js SDK (Couchnode) a wrapper around the C library providing a node.js like API, but leveraging the power of a native C library underneat. So how is this done? How does such a package look like? Let me show you how integration of C in node.js works and how to "read" a package like Couchnode.
Photon Server Deep Dive - View from Implmentation of PhotonWire, Multiplayer ...Yoshifumi Kawai
This document discusses PhotonWire, a framework for building networked games and applications. It allows clients and servers to communicate asynchronously using operations and operation requests/responses. Clients can send messages to servers using operations, which are received and handled via a switch statement based on operation code. Servers can then send response messages back to clients. The document also mentions plans to improve serialization performance in PhotonWire by replacing the current serializer.
Недавно работы комитета по стандартизации WG21 были завершены, и документ-черновик C++17 был отправлен на рассмотрение в Международную организацию по стандартизации (ISO). С этого момента технически можно считать, что стандарт C++17 у нас есть. Если вы ещё ознакомились с принятыми изменениями, то сейчас для этого самое время. В докладе будет сделан обзор нововведений. Рассмотрено текущее состояние дел у популярных компиляторов с поддержкой С++17
TensorFlow can be installed and run in a distributed environment using Docker. The document discusses setting up TensorFlow workers and parameter servers in Docker containers using a Docker compose file. It demonstrates building Docker images for each role, and configuring the containers to communicate over gRPC. A Jupyter server container is also built to host notebooks. The distributed TensorFlow environment is deployed locally for demonstration purposes. Future directions discussed include running the distributed setup on a native cluster using tools like Docker Swarm or RancherOS, and testing TensorFlow with GPU support in Docker.
Glow is a compiler and execution engine for neural networks created by Facebook. It takes a high-level graph representation of a neural network and compiles it into efficient machine code for different hardware backends like CPU and OpenCL. The key steps in Glow include loading a model, optimizing the graph, lowering it to a low-level IR, scheduling operations to minimize memory usage, generating instructions for the backend, and performing optimizations specific to the target. Glow aims to provide a portable way to deploy neural networks across different hardware platforms.
This document provides an overview of how to contribute to the cPython source code. It discusses running benchmarks to understand performance differences between loops inside and outside functions. It encourages contributing to improve coding skills and help the open source community. The steps outlined are to clone the cPython source code repository, resolve any dependencies during building, review open issues on bugs.python.org, and work on resolving issues - starting with easier ones. Tips are provided such as commenting when taking ownership of an issue, reproducing bugs before working on them, writing tests for code changes, and updating documentation.
Linux kernel tracing superpowers in the cloudAndrea Righi
The Linux 4.x series introduced a new powerful engine of programmable tracing (BPF) that allows to actually look inside the kernel at runtime. This talk will show you how to exploit this engine in order to debug problems or identify performance bottlenecks in a complex environment like a cloud. This talk will cover the latest Linux superpowers that allow to see what is happening “under the hood” of the Linux kernel at runtime. I will explain how to exploit these “superpowers” to measure and trace complex events at runtime in a cloud environment. For example, we will see how we can measure latency distribution of filesystem I/O, details of storage device operations, like individual block I/O request timeouts, or TCP buffer allocations, investigating stack traces of certain events, identify memory leaks, performance bottlenecks and a whole lot more.
Deep Dive async/await in Unity with UniTask(EN)Yoshifumi Kawai
The document discusses asynchronous programming in C# using async/await and Rx. It explains that async/await is not truly asynchronous or multithreaded - it is for asynchronous code that runs on a single thread. UniTask is introduced as an alternative to Task that is optimized for Unity's single-threaded environment by avoiding overhead like ExecutionContext and SynchronizationContext. Async/await with UniTask provides better performance than coroutines or Rx observables for Unity.
Bridge TensorFlow to run on Intel nGraph backends (v0.4)Mr. Vengineer
This document discusses bridging TensorFlow to run on Intel nGraph backends. It summarizes various optimization passes used in the nGraph-TensorFlow integration, including passes to liberate nodes from placement constraints, confirm placement, cluster the graph, and encapsulate clusters. Key points:
- NGraphLiberatePass and NGraphConfirmPass run during the PRE_PLACEMENT phase to handle nGraph placement
- NGraphClusterPass runs during POST_REWRITE_FOR_EXEC to cluster the graph into subgraphs, similar to XLA partitioning
- NGraphEncapsulatePass encapsulates clusters into NGraphEncapsulateOp nodes, analogous to XLA's use of _XlaLaunchOp
-
About Those Python Async Concurrent Frameworks - Fantix @ OSTC 2014Fantix King 王川
This document discusses Python asynchronous concurrency frameworks and their future. It compares Twisted, Tornado, Gevent, and the new asyncio module. Twisted and Tornado use callbacks while Gevent uses greenlets. asyncio aims to provide an event loop like Twisted. It also introduces coroutines, tasks, and futures. The document argues that asyncio could serve as a common event loop that existing frameworks adapt to for better interoperability in the future.
Hiveminder - Everything but the Secret SauceJesse Vincent
Ten tools and techniques to help you:
Find bugs faster バグの検出をもっと素早く
Build web apps ウェブアプリの構築
Ship software ソフトのリリース
Get input from users ユーザからの入力を受けつける
Own the Inbox 受信箱を用意する
今日の話
The Simple Scheduler in Embedded System @ OSDC.TW 2014Jian-Hong Pan
The document describes a simple scheduler module implemented in C for embedded systems. It breaks processes into small jobs represented by functions that are scheduled in a first-in, first-out queue without preemption. This allows embedding an operating system concept into simple systems using only functions and a ready queue. Interrupts can add jobs to the queue. The scheduler and example oscilloscope application demonstrate scheduling without process state using only callbacks.
This document discusses using UniRx (Reactive Extensions for Unity) to make asynchronous network requests in a reactive and error-handling manner. Key points include:
1. Wrapping ObservableWWW requests in an ObservableClient to handle errors, timeouts, and retries in a consistent way across methods.
2. Using LINQ operators like Select, Catch, and Timeout to process the network response and handle errors in a reactive pipeline.
3. Implementing more complex retry logic by publishing and connecting an observable to retry requests automatically.
4. Combining multiple asynchronous requests using WhenAll to run them in parallel.
Droidcon Berlin 2021 - With coroutines being the de facto way of exposing async work and streams of changes for Kotlin on Android, developers are obviously attempting to use the same approaches when moving their code to Multiplatform.
But due to the way the memory model differs between JVM and Kotlin Native, it can be a painful experience.
In this talk, we will take a deep dive into the Coroutine API for Kotlin Multiplatform. You will learn how to expose your API with Coroutines while working with the Kotlin Native memory model instead of against it, and avoid the dragons along the way.
The JVM JIT compiler and deoptimizer are triggered under certain conditions like method invocation counts, changes in program behavior, and hot spots. The JIT initially compiles code to generate fast machine instructions while the deoptimizer reverts back to interpreted execution if needed.
This document summarizes Gavin M. Roy's presentation on concurrency with multiprocessing in Python. It discusses using threads via the threading module, issues with the Global Interpreter Lock (GIL) in Python, and how to use the multiprocessing module to achieve true parallelism across multiple processes. It provides examples of creating threads and processes that run concurrently and examples of how to share objects between processes using connections, queues, pipes, managers and reduction tools.
Memory Management of C# with Unity Native CollectionsYoshifumi Kawai
This document discusses C# and memory management in Unity. It begins by introducing the author and some of their open-source projects related to C# and Unity, including libraries for serialization and reactive programming. It then discusses using async/await with Unity through the UniTask library. The document also covers differences in memory management between .NET Core and Unity due to using different runtimes (CoreCLR vs Unity runtime) and virtual machines. It presents examples of using unsafe code and pointers to directly manage memory in C# for cases like native collections. It concludes that while C# aims for a safe managed world, optimizations require bypassing the runtime through unsafe code, and being aware of memory can help better understand behavior and use APIs more
A journey through the wonderful world of Node.js C++ addons. This talk was given at the September 8, 2015 NodeMN meetup.
Code: https://github.com/cb1kenobi/nodemn
Multithreading with modern C++ is hard. Undefined variables, Deadlocks, Livelocks, Race Conditions, Spurious Wakeups, the Double Checked Locking Pattern, etc. And at the base is the new Memory-Modell which make the life not easier. The story of things which can go wrong is very long. In this talk I give you a tour through the things which can go wrong and show how you can avoid them.
Bridge TensorFlow to run on Intel nGraph backends (v0.5)Mr. Vengineer
The document describes how the nGraph TensorFlow bridge works by rewriting TensorFlow graphs to run on Intel nGraph backends. It discusses how optimization passes are used to modify the graph in several phases: 1) Capturing TensorFlow variables as nGraph variables, 2) Marking/assigning/deassigning nodes to clusters, 3) Encapsulating clusters into nGraphEncapsulateOp nodes to run subgraphs on nGraph. Key classes and files involved are described like NGraphVariableCapturePass, NGraphEncapsulatePass, and how they implement the different rewriting phases to prepare the graph for nGraph execution.
node.js and native code extensions by examplePhilipp Fehre
Over the last years node.js has evolved to be a great language to build web applications. The reason for this is not only that it is based on JavaScript which already is established around "the web" but also that it provides excellent facilities for extensions, not only via JavaScript but also integration of native C libraries. Couchbase makes a lot of use of this fact making the Couchbase node.js SDK (Couchnode) a wrapper around the C library providing a node.js like API, but leveraging the power of a native C library underneat. So how is this done? How does such a package look like? Let me show you how integration of C in node.js works and how to "read" a package like Couchnode.
Photon Server Deep Dive - View from Implmentation of PhotonWire, Multiplayer ...Yoshifumi Kawai
This document discusses PhotonWire, a framework for building networked games and applications. It allows clients and servers to communicate asynchronously using operations and operation requests/responses. Clients can send messages to servers using operations, which are received and handled via a switch statement based on operation code. Servers can then send response messages back to clients. The document also mentions plans to improve serialization performance in PhotonWire by replacing the current serializer.
Недавно работы комитета по стандартизации WG21 были завершены, и документ-черновик C++17 был отправлен на рассмотрение в Международную организацию по стандартизации (ISO). С этого момента технически можно считать, что стандарт C++17 у нас есть. Если вы ещё ознакомились с принятыми изменениями, то сейчас для этого самое время. В докладе будет сделан обзор нововведений. Рассмотрено текущее состояние дел у популярных компиляторов с поддержкой С++17
TensorFlow can be installed and run in a distributed environment using Docker. The document discusses setting up TensorFlow workers and parameter servers in Docker containers using a Docker compose file. It demonstrates building Docker images for each role, and configuring the containers to communicate over gRPC. A Jupyter server container is also built to host notebooks. The distributed TensorFlow environment is deployed locally for demonstration purposes. Future directions discussed include running the distributed setup on a native cluster using tools like Docker Swarm or RancherOS, and testing TensorFlow with GPU support in Docker.
Glow is a compiler and execution engine for neural networks created by Facebook. It takes a high-level graph representation of a neural network and compiles it into efficient machine code for different hardware backends like CPU and OpenCL. The key steps in Glow include loading a model, optimizing the graph, lowering it to a low-level IR, scheduling operations to minimize memory usage, generating instructions for the backend, and performing optimizations specific to the target. Glow aims to provide a portable way to deploy neural networks across different hardware platforms.
This document provides an overview of how to contribute to the cPython source code. It discusses running benchmarks to understand performance differences between loops inside and outside functions. It encourages contributing to improve coding skills and help the open source community. The steps outlined are to clone the cPython source code repository, resolve any dependencies during building, review open issues on bugs.python.org, and work on resolving issues - starting with easier ones. Tips are provided such as commenting when taking ownership of an issue, reproducing bugs before working on them, writing tests for code changes, and updating documentation.
Linux kernel tracing superpowers in the cloudAndrea Righi
The Linux 4.x series introduced a new powerful engine of programmable tracing (BPF) that allows to actually look inside the kernel at runtime. This talk will show you how to exploit this engine in order to debug problems or identify performance bottlenecks in a complex environment like a cloud. This talk will cover the latest Linux superpowers that allow to see what is happening “under the hood” of the Linux kernel at runtime. I will explain how to exploit these “superpowers” to measure and trace complex events at runtime in a cloud environment. For example, we will see how we can measure latency distribution of filesystem I/O, details of storage device operations, like individual block I/O request timeouts, or TCP buffer allocations, investigating stack traces of certain events, identify memory leaks, performance bottlenecks and a whole lot more.
This document provides an overview of using Python and GTK to build graphical user interfaces (GUIs). Some key points:
- GTK is a cross-platform GUI toolkit that can be used with Python to develop applications for Linux, Windows, and Mac.
- The document demonstrates basic GTK widgets like windows, buttons, labels and layout containers. It also covers using event handlers and object-oriented programming with GTK.
- More advanced topics covered include using threads to prevent the GUI from freezing during long operations, loading interfaces from UI files, and building a weather checking application with model-view architecture.
Slides for the Cluj.py meetup where we explored the inner workings of CPython, the reference implementation of Python. Includes examples of writing a C extension to Python, and introduces Cython - ultimately the sanest way of writing C extensions.
Also check out the code samples on GitHub: https://github.com/trustyou/meetups/tree/master/python-c
Brian Bouterse discusses using the GNU Debugger (GDB) to debug hung Python processes. GDB can be used to attach to running Python processes and debug issues that occur in production or with remote/rarely occurring problems. The debugger provides tools like stack traces and examining local variables. Python extensions for GDB provide additional functionality for listing source code, switching threads, and more. Debugging with GDB requires installing debug symbols and dealing with optimized-out code. Alternative tools like strace and rpdb can also provide debugging assistance.
This document summarizes Golang testing techniques including the built-in testing framework, mocks and fakes, monkey patching, helpers like Testify and Ginkgo, and dependency injection. It covers the basics of the built-in framework including table driven tests and code coverage. It discusses various mocking frameworks and issues with monkey patching. It also provides examples of using helpers and implementing dependency injection to make code more testable.
This document discusses Python-GTK and provides information about:
- Installing necessary packages like python-pywapi and glade
- Links to the author Yuren Ju's online profiles
- An assumption that the audience has experience with at least one programming language
- A graph showing Python's popularity based on the TIOBE index
- Comments from others that Python is suitable for beginners and experts alike and is flexible
- Examples of successful projects using Python including the author's first experience four years ago
Runtime Code Generation and Data Management for Heterogeneous Computing in JavaJuan Fumero
This document discusses runtime and data management techniques for heterogeneous computing in Java. It presents an approach that uses three levels of abstraction: parallel skeletons API based on functional programming, a high-level optimizing library that rewrites operations to target specific hardware, and OpenCL code generation and runtime with data management for heterogeneous architectures. It describes how the runtime performs type inference, IR generation, optimizations, and kernel generation to compile Java code into OpenCL kernels. It also discusses how custom array types are used to reduce data marshaling overhead between the Java and OpenCL runtimes.
1) 4Com is a company that provides communication solutions to help customers effectively manage calls, emails, chats, and other messaging. They are described as a "cloud" for their customers like ADAC.
2) Microservices are used as the architecture, with each service having a single responsibility within bounded contexts. Services communicate through defined APIs and documentation.
3) Go is used as the programming language because it is simple, has many built-in features like HTTP support, and produces single binaries. Go encourages testability and concurrency.
Go: Why it goes
by Serhii Pichkurov
In this talk Serhii will talk about Go, also known as Golang – an open source language developed at Google and used in production by companies such as Docker, Dropbox, Facebook and Google itself. Go is now heavily used as a general-purpose programming language that’s a pleasure to use and maintain. This introductory talk contains many live demos of basic language concepts, concurrency model, simple HTTP-based endpoint implementation and, of course, tests using build-in framework. This presentation will be interesting for backend engineers and DevOps to understand why Go had become so popular and how it might help to build robust and maintainable services.
Join this session after which you can start coding using language that has static safe compiler, GC and is as fast as C++ or Java, with even simpler syntax than Python!
Not Your Fathers C - C Application Development In 2016maiktoepfer
- The document discusses different approaches for copying strings in C, including strcpy, strncpy, strlcpy, and strcpy_s.
- strcpy can cause buffer overflows if the destination is too small, while strncpy does not guarantee a properly terminated string.
- strlcpy aims to prevent overflows and ensure valid strings, but may truncate and requires external libraries.
- strcpy_s from C11 solves the problems of previous functions and is part of the standard, but support is limited.
The document discusses different approaches to implementing GPU-like programming on CPUs using C++AMP. It covers using setjmp/longjmp to implement coroutines for "fake threading", using ucontext for coroutine context switching, and how to pass lambda functions and non-integer arguments to makecontext. Implementing barriers on CPUs requires synchronizing threads with an atomic counter instead of GPU shared memory. Overall, the document shows it is possible to run GPU-like programming models on CPUs by simulating the GPU programming model using language features for coroutines and threading.
The document discusses intra-machine parallelism and threaded programming. It introduces key concepts like threads, processes, synchronization constructs (locks and condition variables), and challenges like overhead and Amdahl's law. An example of domain decomposition for parallel rendering is presented to demonstrate how to divide a problem into independent tasks and assign them to threads.
1) Qooxdoo is a JavaScript framework that provides object-oriented programming features to JavaScript. It turns JavaScript into a "grown up OO language" and allows developers to write browser-based applications without needing HTML or CSS knowledge.
2) The document discusses how to get started with a basic "Hello World" Qooxdoo application by installing Python, unpacking Qooxdoo, and generating and running the application files.
3) Key aspects of programming with Qooxdoo include leveraging JavaScript features like anonymous functions, closures, and proper understanding of scoping, as the framework relies heavily on these elements.
This document discusses adding logical replication protocol support to the psycopg2 library to allow Python applications to consume real-time replication streams from PostgreSQL. It provides code examples for connecting to replication slots, consuming change streams and messages, and stopping replication. Docker images are also available to simplify testing logical replication. Physical replication is also supported through a separate connection class. Asynchronous replication connections and keepalive messages are demonstrated as well.
Robust C++ Task Systems Through Compile-time ChecksStoyan Nikolov
Task-based (aka job systems) engine architectures are becoming the de-facto standard for AAA game engines and software solutions. The talk explains how the task system in the Hummingbird game UI engine was designed to both be convenient and to avoid common programmer pitfalls. Advanced C++ techniques are employed to warn and shield the developer from errors at compile time.
This document provides an overview of concurrency in Python using multiprocessing and threading. It begins by introducing the speaker and defining key terms like concurrency, threads, and processes. It then discusses the benefits and use cases of threads versus processes. The document also covers the Global Interpreter Lock (GIL) in Python and how multiprocessing can help avoid it. It provides an example benchmark showing multiprocessing can significantly outperform threading for CPU-bound tasks. Finally, it discusses key aspects of Python's multiprocessing module like Process, Queue, Pool, and Manager classes.
Despite being a slow interpreter, Python is a key component in high-performance computing (HPC). Python is easy to use. C++ is fast. Together they are a beautiful blend. A new tool, pybind11, makes this approach even more attractive to HPC code. It focuses on the niceties C++11 brings in. Beyond the syntactic sugar around the Python C API, it is interesting to see how pybind11 handles the vast difference between the two languages, and what matters to HPC.
Similar to Global Interpreter Lock: Episode III - cat < /dev/zero > GIL; (20)
Mobile App Development Company In Noida | Drona InfotechDrona Infotech
Looking for a reliable mobile app development company in Noida? Look no further than Drona Infotech. We specialize in creating customized apps for your business needs.
Visit Us For : https://www.dronainfotech.com/mobile-application-development/
Neo4j - Product Vision and Knowledge Graphs - GraphSummit ParisNeo4j
Dr. Jesús Barrasa, Head of Solutions Architecture for EMEA, Neo4j
Découvrez les dernières innovations de Neo4j, et notamment les dernières intégrations cloud et les améliorations produits qui font de Neo4j un choix essentiel pour les développeurs qui créent des applications avec des données interconnectées et de l’IA générative.
May Marketo Masterclass, London MUG May 22 2024.pdfAdele Miller
Can't make Adobe Summit in Vegas? No sweat because the EMEA Marketo Engage Champions are coming to London to share their Summit sessions, insights and more!
This is a MUG with a twist you don't want to miss.
Software Engineering, Software Consulting, Tech Lead, Spring Boot, Spring Cloud, Spring Core, Spring JDBC, Spring Transaction, Spring MVC, OpenShift Cloud Platform, Kafka, REST, SOAP, LLD & HLD.
Why Mobile App Regression Testing is Critical for Sustained Success_ A Detail...kalichargn70th171
A dynamic process unfolds in the intricate realm of software development, dedicated to crafting and sustaining products that effortlessly address user needs. Amidst vital stages like market analysis and requirement assessments, the heart of software development lies in the meticulous creation and upkeep of source code. Code alterations are inherent, challenging code quality, particularly under stringent deadlines.
Introducing Crescat - Event Management Software for Venues, Festivals and Eve...Crescat
Crescat is industry-trusted event management software, built by event professionals for event professionals. Founded in 2017, we have three key products tailored for the live event industry.
Crescat Event for concert promoters and event agencies. Crescat Venue for music venues, conference centers, wedding venues, concert halls and more. And Crescat Festival for festivals, conferences and complex events.
With a wide range of popular features such as event scheduling, shift management, volunteer and crew coordination, artist booking and much more, Crescat is designed for customisation and ease-of-use.
Over 125,000 events have been planned in Crescat and with hundreds of customers of all shapes and sizes, from boutique event agencies through to international concert promoters, Crescat is rigged for success. What's more, we highly value feedback from our users and we are constantly improving our software with updates, new features and improvements.
If you plan events, run a venue or produce festivals and you're looking for ways to make your life easier, then we have a solution for you. Try our software for free or schedule a no-obligation demo with one of our product specialists today at crescat.io
AI Fusion Buddy Review: Brand New, Groundbreaking Gemini-Powered AI AppGoogle
AI Fusion Buddy Review: Brand New, Groundbreaking Gemini-Powered AI App
👉👉 Click Here To Get More Info 👇👇
https://sumonreview.com/ai-fusion-buddy-review
AI Fusion Buddy Review: Key Features
✅Create Stunning AI App Suite Fully Powered By Google's Latest AI technology, Gemini
✅Use Gemini to Build high-converting Converting Sales Video Scripts, ad copies, Trending Articles, blogs, etc.100% unique!
✅Create Ultra-HD graphics with a single keyword or phrase that commands 10x eyeballs!
✅Fully automated AI articles bulk generation!
✅Auto-post or schedule stunning AI content across all your accounts at once—WordPress, Facebook, LinkedIn, Blogger, and more.
✅With one keyword or URL, generate complete websites, landing pages, and more…
✅Automatically create & sell AI content, graphics, websites, landing pages, & all that gets you paid non-stop 24*7.
✅Pre-built High-Converting 100+ website Templates and 2000+ graphic templates logos, banners, and thumbnail images in Trending Niches.
✅Say goodbye to wasting time logging into multiple Chat GPT & AI Apps once & for all!
✅Save over $5000 per year and kick out dependency on third parties completely!
✅Brand New App: Not available anywhere else!
✅ Beginner-friendly!
✅ZERO upfront cost or any extra expenses
✅Risk-Free: 30-Day Money-Back Guarantee!
✅Commercial License included!
See My Other Reviews Article:
(1) AI Genie Review: https://sumonreview.com/ai-genie-review
(2) SocioWave Review: https://sumonreview.com/sociowave-review
(3) AI Partner & Profit Review: https://sumonreview.com/ai-partner-profit-review
(4) AI Ebook Suite Review: https://sumonreview.com/ai-ebook-suite-review
#AIFusionBuddyReview,
#AIFusionBuddyFeatures,
#AIFusionBuddyPricing,
#AIFusionBuddyProsandCons,
#AIFusionBuddyTutorial,
#AIFusionBuddyUserExperience
#AIFusionBuddyforBeginners,
#AIFusionBuddyBenefits,
#AIFusionBuddyComparison,
#AIFusionBuddyInstallation,
#AIFusionBuddyRefundPolicy,
#AIFusionBuddyDemo,
#AIFusionBuddyMaintenanceFees,
#AIFusionBuddyNewbieFriendly,
#WhatIsAIFusionBuddy?,
#HowDoesAIFusionBuddyWorks
Hand Rolled Applicative User ValidationCode KataPhilip Schwarz
Could you use a simple piece of Scala validation code (granted, a very simplistic one too!) that you can rewrite, now and again, to refresh your basic understanding of Applicative operators <*>, <*, *>?
The goal is not to write perfect code showcasing validation, but rather, to provide a small, rough-and ready exercise to reinforce your muscle-memory.
Despite its grandiose-sounding title, this deck consists of just three slides showing the Scala 3 code to be rewritten whenever the details of the operators begin to fade away.
The code is my rough and ready translation of a Haskell user-validation program found in a book called Finding Success (and Failure) in Haskell - Fall in love with applicative functors.
Flutter is a popular open source, cross-platform framework developed by Google. In this webinar we'll explore Flutter and its architecture, delve into the Flutter Embedder and Flutter’s Dart language, discover how to leverage Flutter for embedded device development, learn about Automotive Grade Linux (AGL) and its consortium and understand the rationale behind AGL's choice of Flutter for next-gen IVI systems. Don’t miss this opportunity to discover whether Flutter is right for your project.
Atelier - Innover avec l’IA Générative et les graphes de connaissancesNeo4j
Atelier - Innover avec l’IA Générative et les graphes de connaissances
Allez au-delà du battage médiatique autour de l’IA et découvrez des techniques pratiques pour utiliser l’IA de manière responsable à travers les données de votre organisation. Explorez comment utiliser les graphes de connaissances pour augmenter la précision, la transparence et la capacité d’explication dans les systèmes d’IA générative. Vous partirez avec une expérience pratique combinant les relations entre les données et les LLM pour apporter du contexte spécifique à votre domaine et améliorer votre raisonnement.
Amenez votre ordinateur portable et nous vous guiderons sur la mise en place de votre propre pile d’IA générative, en vous fournissant des exemples pratiques et codés pour démarrer en quelques minutes.
Do you want Software for your Business? Visit Deuglo
Deuglo has top Software Developers in India. They are experts in software development and help design and create custom Software solutions.
Deuglo follows seven steps methods for delivering their services to their customers. They called it the Software development life cycle process (SDLC).
Requirement — Collecting the Requirements is the first Phase in the SSLC process.
Feasibility Study — after completing the requirement process they move to the design phase.
Design — in this phase, they start designing the software.
Coding — when designing is completed, the developers start coding for the software.
Testing — in this phase when the coding of the software is done the testing team will start testing.
Installation — after completion of testing, the application opens to the live server and launches!
Maintenance — after completing the software development, customers start using the software.
Microservice Teams - How the cloud changes the way we workSven Peters
A lot of technical challenges and complexity come with building a cloud-native and distributed architecture. The way we develop backend software has fundamentally changed in the last ten years. Managing a microservices architecture demands a lot of us to ensure observability and operational resiliency. But did you also change the way you run your development teams?
Sven will talk about Atlassian’s journey from a monolith to a multi-tenanted architecture and how it affected the way the engineering teams work. You will learn how we shifted to service ownership, moved to more autonomous teams (and its challenges), and established platform and enablement teams.
DDS Security Version 1.2 was adopted in 2024. This revision strengthens support for long runnings systems adding new cryptographic algorithms, certificate revocation, and hardness against DoS attacks.
Global Interpreter Lock: Episode III - cat < /dev/zero > GIL;
1. PyCon TW 2017
Global Interpreter Lock
Episode III - cat < /dev/zero > GIL;
Tzung-Bi Shih
<penvirus@gmail.com>
2. PyCon TW 2017
Preface
• GIL Episode I - Break the Seal[1]
• PyCon APAC 2015
• how to along with GIL peacefully
• GIL Episode II - project#23345678
• too boring for most of us
• GIL Episode III - cat < /dev/zero > GIL;
• how to nullify the GIL
2
3. PyCon TW 2017
Introduction
• GIL prevents us (innocently) from utilizing full power
of multiprocessors
• inevitable multiprocessing for processor-bound
Python program
• if DO care performance, will ask for lower level's
favor, e.g., numpy, Cython, CPython extension
3
4. PyCon TW 2017
Motivation
4
• high performance data processing platform
• require a plugin framework
• coworkers were less familiar to C programming
• theoretically, 64-bit processor has 16 exbibytes for
virtually addressing (16*1024*1024 tebibytes)
➡ N processors, N worker threads, good
➡ 1 process, 1 Python runtime, W T Fantasty
5. PyCon TW 2017
Problem Statement
5
To pursue the best performance, a system is built based on a
multithreading C program. For some degrees of flexibility
and extensibility, the designer decides to embed CPython
runtime into the C program.
CPython leverages a global big lock to secure and serialize
execution of threads. Only one thread is able to use the
Python runtime at a time. The lock results in poor scalability.
Be knowing the execution of threads in the Python runtime
are independent beforehand, the designer moves the lock to
a private namespace for each thread. As a consequence,
threads can now utilize their proprietary Python runtime in
parallel.
6. PyCon TW 2017
Example: 1a.c[2]
6
int main()
{
int ret = 1;
pthread_t t1, t2;
Py_Initialize();
PyRun_SimpleString("count = 23345678");
if (pthread_create(&t1, NULL, task1, NULL)) {
ERR("failed to pthread_create");
goto leave;
}
if (pthread_create(&t2, NULL, task2, NULL)) {
ERR("failed to pthread_create");
goto leave;
}
pthread_join(t1, NULL);
pthread_join(t2, NULL);
ret = 0;
leave:
Py_Finalize();
return ret;
}
void *task1(void *arg)
{
PyRun_SimpleString(
"import timen"
"print 'task1: {0:.6f}'.format(time.time())n"
"time.sleep(10)n"
"print 'task1: {0:.6f}'.format(time.time())n"
"print 'task1: {0}'.format(count)n"
);
return NULL;
}
void *task2(void *arg)
{
PyRun_SimpleString(
"import timen"
"print 'task2: {0:.6f}'.format(time.time())n"
"print 'task2: {0}'.format(count)n"
"for i in xrange(23345678):n"
" count += 1n"
"print 'task2: {0:.6f}'.format(time.time())n"
);
return NULL;
}
Get crashed if we don't acquire[3] the GIL before using the Python runtime.
$ ./1a
task1: 1481803678.055977
Segmentation fault (core dumped)
7. PyCon TW 2017
GIL
Example: 1b.c[4]
7
void *task1(void *arg)
{
PyGILState_STATE gstate;
gstate = PyGILState_Ensure();
PyRun_SimpleString(
"import timen"
"print 'task1: {0:.6f}'.format(time.time())n"
"time.sleep(10)n"
"print 'task1: {0:.6f}'.format(time.time())n"
"print 'task1: {0}'.format(count)n"
);
PyGILState_Release(gstate);
return NULL;
}
int main()
{
[snip]
PyThreadState *th_state;
Py_Initialize();
PyEval_InitThreads();
[snip]
th_state = PyEval_SaveThread();
pthread_join(t1, NULL);
pthread_join(t2, NULL);
PyEval_RestoreThread(th_state);
[snip]
}
$ ./1b
task1: 1481804487.332934
task2: 1481804487.333096
task2: 23345678
task2: 1481804488.877374
task1: 1481804497.344352
task1: 46691356
void *task2(void *arg)
{
PyGILState_STATE gstate;
gstate = PyGILState_Ensure();
PyRun_SimpleString(
"import timen"
"print 'task2: {0:.6f}'.format(time.time())n"
"print 'task2: {0}'.format(count)n"
"for i in xrange(23345678):n"
" count += 1n"
"print 'task2: {0:.6f}'.format(time.time())n"
);
PyGILState_Release(gstate);
return NULL;
}
1.
3.
4.
5.
6.
7.
8.
11.
12.
13.
14.
15.
16.
2.
17.
acquire lock
release lock
Our multithreading program has been
serialized into one "effective" thread.
Note: if task2 get the lock first, the output will be different
$ ./1b
task2: 1496904735.421629
task2: 23345678
task1: 1496904735.421837
task2: 1496904736.919195
task1: 1496904745.433233
task1: 46691356
*9.
*10.
mark *: perhaps multiple times
8. PyCon TW 2017
Ideal Example: 1c.c[5]
Object-like Programming
8
int main()
{
int ret = 1;
pthread_t t1, t2;
Python ctx1, ctx2;
Py_Initialize(&ctx1);
Py_Initialize(&ctx2);
PyRun_SimpleString(&ctx1, "count = 23345678");
PyRun_SimpleString(&ctx2, "count = 23345678");
if (pthread_create(&t1, NULL, task1, &ctx1)) {
ERR("failed to pthread_create");
goto leave;
}
if (pthread_create(&t2, NULL, task2, &ctx2)) {
ERR("failed to pthread_create");
goto leave;
}
pthread_join(t1, NULL);
pthread_join(t2, NULL);
ret = 0;
leave:
Py_Finalize(&ctx1);
Py_Finalize(&ctx2);
return ret;
}
void *task1(void *arg)
{
Python *ctx = arg;
PyRun_SimpleString(ctx,
"import timen"
"print 'task1: {0:.6f}'.format(time.time())n"
"time.sleep(10)n"
"print 'task1: {0:.6f}'.format(time.time())n"
"print 'task1: {0}'.format(count)n"
);
return NULL;
}
void *task2(void *arg)
{
Python *ctx = arg;
PyRun_SimpleString(ctx,
"import timen"
"print 'task2: {0:.6f}'.format(time.time())n"
"print 'task2: {0}'.format(count)n"
"for i in xrange(23345678):n"
" count += 1n"
"print 'task2: {0:.6f}'.format(time.time())n"
);
return NULL;
}
warning: the example won't compile successfully; only shows the ideal case
23345678
46691356
11. PyCon TW 2017
Example: 2a.c[6]
11
void *task1(void *arg)
{
void *handle = dlopen("/usr/lib/x86_64-linux-gnu/libpython2.7.so", RTLD_LAZY | RTLD_LOCAL);
if (!handle) {
ERR("failed to dlopen: %s", dlerror());
goto leave;
}
void (*_Py_Initialize)() = dlsym(handle, "Py_Initialize");
if (!_Py_Initialize) {
ERR("failed to dlsym: %s", dlerror());
goto leave;
}
int (*_PyRun_SimpleString)(const char *) = dlsym(handle, "PyRun_SimpleString");
if (!_PyRun_SimpleString) {
ERR("failed to dlsym: %s", dlerror());
goto leave;
}
[snip]
_Py_Initialize();
_PyRun_SimpleString("count = 23345678");
[snip]
}
void *task2(void *arg)
{
void *handle = dlopen("/usr/lib/x86_64-linux-gnu/libpython2.7.so", RTLD_LAZY | RTLD_LOCAL);
if (!handle) {
ERR("failed to dlopen: %s", dlerror());
goto leave;
}
[snip]
}
$ ./2a
Segmentation fault (core dumped)
crashed
=> the GIL was still shared
12. PyCon TW 2017
Example: 2b.c[7]
12
void *task1(void *arg)
{
void *handle = dlopen("./1.so", RTLD_LAZY | RTLD_LOCAL);
if (!handle) {
ERR("failed to dlopen: %s", dlerror());
goto leave;
}
[snip]
}
void *task2(void *arg)
{
void *handle = dlopen("./2.so", RTLD_LAZY | RTLD_LOCAL);
if (!handle) {
ERR("failed to dlopen: %s", dlerror());
goto leave;
}
[snip]
}
int main()
{
[snip]
system("cp /usr/lib/x86_64-linux-gnu/libpython2.7.so 1.so");
system("cp /usr/lib/x86_64-linux-gnu/libpython2.7.so 2.so");
[snip]
}
$ ./2b
task1: 1481821866.140683
task2: 1481821866.140725
task2: 23345678
task2: 1481821867.924265
task1: 1481821876.150968
task1: 23345678
2 distinct shared objects
2 different inodes
symbolic links won't do the trick
don't forget the "./"
dlopen(3) manual[8]:
If filename is NULL, then the returned
handle is for the main program. If
filename contains a slash ("/"), then it
is interpreted as a (relative or
absolute) pathname. Otherwise, the
dynamic linker searches for the object.
it works; but too silly
13. PyCon TW 2017
Dynamic Linker
13
dlopen(3) manual[8]:
The dlmopen() function differs from dlopen() primarily in that it accepts an
additional argument, lmid, that specifies the link-map list (also referred to
as a namespace) in which the shared object should be loaded.
Possible uses of dlmopen() are plugins where the author of the plugin-
loading framework can't trust the plugin authors and does not wish any
undefined symbols from the plugin framework to be resolved to plugin
symbols. Another use is to load the same object more than once.
Without the use of dlmopen(), this would require the creation of distinct
copies of the shared object file. Using dlmopen(), this can be achieved
by loading the same shared object file into different namespaces.
The glibc implementation supports a maximum of 16 namespaces.
15. PyCon TW 2017
Example: 2c.c[9] (2/2)
15
$ LD_DEBUG=all LD_DEBUG_OUTPUT=x ./2c
Segmentation fault (core dumped)
56072: calling init: /lib/x86_64-linux-gnu/libc.so.6
56072:
56072: symbol=__vdso_clock_gettime; lookup in file=linux-vdso.so.1 [0]
56072: binding file linux-vdso.so.1 [0] to linux-vdso.so.1 [0]: normal symbol `__vdso_clock_gettime' [LINUX_2.6]
56072: symbol=__vdso_getcpu; lookup in file=linux-vdso.so.1 [0]
56072: binding file linux-vdso.so.1 [0] to linux-vdso.so.1 [0]: normal symbol `__vdso_getcpu' [LINUX_2.6]
56072:
56072: calling init: /lib/x86_64-linux-gnu/libm.so.6
56072:
56072:
56072: calling init: /lib/x86_64-linux-gnu/libutil.so.1
56072:
56072:
56072: calling init: /lib/x86_64-linux-gnu/libdl.so.2
56072:
56072:
56072: calling init: /lib/x86_64-linux-gnu/libz.so.1
56072:
56072:
56072: calling init: /usr/lib/x86_64-linux-gnu/libpython2.7.so
56072:
56072: opening file=/usr/lib/x86_64-linux-gnu/libpython2.7.so [1]; direct_opencount=1
56072: calling init: /lib/x86_64-linux-gnu/libc.so.6
56072:
56072: symbol=__vdso_clock_gettime; lookup in file=linux-vdso.so.1 [0]
56072: binding file linux-vdso.so.1 [0] to linux-vdso.so.1 [0]: normal symbol `__vdso_clock_gettime' [LINUX_2.6]
56072: symbol=__vdso_getcpu; lookup in file=linux-vdso.so.1 [0]
56072: binding file linux-vdso.so.1 [0] to linux-vdso.so.1 [0]: normal symbol `__vdso_getcpu' [LINUX_2.6]
see ld.so(8)[10]
still don't know why; has some clues to trace; need too much patient
16. PyCon TW 2017
Example: 3a.c[11] 3b.c[12]
16
int global = 23345678;
void print()
{
printf("%ld: %dn", syscall(SYS_gettid), global++);
}
void *task(void *arg)
{
void *handle = dlmopen(LM_ID_NEWLM, "./3a.so", RTLD_LAZY | RTLD_LOCAL);
if (!handle) {
ERR("failed to dlmopen: %s", dlerror());
goto leave;
}
void (*print)() = dlsym(handle, "print");
if (!print) {
ERR("failed to dlsym: %s", dlerror());
goto leave;
}
print(); sched_yield();
print(); sched_yield();
print();
leave:
if (handle)
dlclose(handle);
return NULL;
}
dlmopen( ) is truly separate global variables into two namespaces
$ ./3b
55230: 23345678
55231: 23345678
55230: 23345679
55231: 23345679
55230: 23345680
55231: 23345680
$ gcc -shared -fPIC -o 3a.so 3a.c
exercise: replace the dlmopen( ) to dlopen( ) and observe the output
17. PyCon TW 2017
Example: 4a.c[13]
17
int main()
{
int ret = 1;
pthread_t t1;
if (pthread_create(&t1, NULL, task, NULL)) {
ERR("failed to pthread_create");
goto leave;
}
pthread_join(t1, NULL);
if (pthread_create(&t1, NULL, task, NULL)) {
ERR("failed to pthread_create");
goto leave;
}
pthread_join(t1, NULL);
ret = 0;
leave:
return ret;
}
dlmopen( ) with libpython2.7.so still get
crashed even in a very basic usage
$ ./4a
Segmentation fault (core dumped)
test combination: glibc 2.12 + python 2.7.12
void *task(void *arg)
{
void *handle = dlmopen(LM_ID_NEWLM, "/usr/lib/x86_64-linux-gnu/libpython2.7.so", RTLD_LAZY | RTLD_LOCAL);
if (!handle) {
ERR("failed to dlmopen: %s", dlerror());
goto leave;
}
dlclose(handle);
leave:
return NULL;
}
exercise: further tests and observations on 4b.c[14] and 4c.c[15]
18. PyCon TW 2017
More Complicated Example
18你好,我爸爸是______
Note: we intentionally and
concurrently use Python
runtime as much as possible
to show that GIL has been
nullified.
• processor-bound task
• calculate entropy of data from /dev/urandom
• a toggle switch
• on -> off, print the current entropy
• off -> on, reset and keep calculating
• otherwise, do nothing; default value: off
• IO-bound task (async)
• wait SIGUSR1 for reading JSON file "./config.json"
• { "calc": true | false }
• IO-bound task (sync)
• listen to tcp 127.0.0.1:5884 for executing arbitrary Python code
• return { "success": true | false }
19. PyCon TW 2017
Example: 6a.py[16]
Entropy Calculation
19
switch_on = False
def dbg(msg):
print('[DEBUG] {0}'.format(msg))
class Entropy(object):
def __init__(self):
self._count = defaultdict(int)
self._size = 0
def update(self, data):
for c in data:
self._count[ord(c)] += 1
self._size += len(data)
def final(self):
if not self._size:
return 0.0
ent = 0.0
for i, c in self._count.items():
prob = float(c) / float(self._size)
ent -= prob * log(prob)
return ent / log(2.0)
def run():
current_on = False
ent = None
# ```mknod /dev/urandom c 1 9''' if the device doesn't exist
with open('/dev/urandom') as rf:
while True:
if not switch_on:
if current_on:
print('{0:.4f}'.format(ent.final()))
current_on = False
ent = None
dbg('switch off')
time.sleep(1)
else:
if not current_on:
current_on = True
ent = Entropy()
dbg('switch on')
data = rf.read(4096)
ent.update(data)
the toggle switch
the entry point, no return
Note: the code is mainly for python2; it needs some modifications for python3
20. PyCon TW 2017
Example: 6b.c[17] (1/4)
Main Function
20
int main()
{
int ret = 1;
pthread_t calc, config, nihao5884;
struct context ctx = {0};
sigset_t sigmask;
pthread_mutex_init(&ctx.lock, NULL);
pthread_cond_init(&ctx.cond, NULL);
sigemptyset(&sigmask);
sigaddset(&sigmask, SIGUSR1);
if (pthread_sigmask(SIG_BLOCK, &sigmask, NULL)) {
ERR("failed to pthread_sigmask");
goto leave;
}
if (pthread_create(&calc, NULL, calc_task, &ctx)) {
ERR("failed to pthread_create");
goto leave;
}
if (pthread_create(&config, NULL, config_task, &ctx)) {
ERR("failed to pthread_create");
goto leave;
}
if (pthread_create(&nihao5884, NULL, nihao5884_task, &ctx)) {
ERR("failed to pthread_create");
goto leave;
}
pthread_join(calc, NULL);
pthread_join(config, NULL);
pthread_join(nihao5884, NULL);
/* never reachable */
[snip]
struct common_operations {
void (*_Py_InitializeEx)(int);
void (*_Py_Finalize)();
int (*_PyRun_SimpleFileEx)(FILE *, const char *, int);
PyObject *(*_PyModule_New)(const char *);
PyObject *(*_PyModule_GetDict)(PyObject *);
PyObject *(*_PyDict_GetItemString)(PyObject *, const char *);
PyObject *(*_PyImport_ImportModule)(const char *);
PyObject *(*_PyObject_CallObject)(PyObject *, PyObject *);
void (*_Py_IncRef)(PyObject *);
void (*_Py_DecRef)(PyObject *);
void (*_PyEval_InitThreads)();
PyThreadState *(*_PyEval_SaveThread)();
void (*_PyEval_RestoreThread)(PyThreadState *);
void *(*_PyGILState_Ensure)();
void (*_PyGILState_Release)(void *);
int (*_PyRun_SimpleString)(const char *);
long (*_PyLong_AsLong)(PyObject *);
PyObject *(*_PyBool_FromLong)(long);
int (*_PyDict_SetItemString)(PyObject *, const char *, PyObject *);
};
struct context {
pthread_mutex_t lock;
pthread_cond_t cond;
PyObject *calc_main_dict;
struct common_operations *calc_ops;
int cur_switch_on;
};
very simple architecture:
3 worker threads
1 shared context
a few function pointers
21. PyCon TW 2017
ops._PyEval_InitThreads();
ops._Py_InitializeEx(0);
ops._PyRun_SimpleFileEx(fp, "6a.py", 1 /* closeit, fp will be closed */);
main_module = ops._PyImport_ImportModule("__main__");
if (!main_module) {
ERR("failed to _PyImport_ImportModule");
goto leave_python;
}
ctx->calc_main_dict = ops._PyModule_GetDict(main_module);
if (!ctx->calc_main_dict) {
ERR("failed to _PyModule_GetDict");
goto leave_python;
}
ops._Py_IncRef(ctx->calc_main_dict);
switch_on = ops._PyDict_GetItemString(ctx->calc_main_dict, "switch_on");
if (!switch_on) {
ERR("failed to _PyDict_GetItemString");
goto leave_python;
}
ctx->cur_switch_on = ops._PyLong_AsLong(switch_on);
run_method = ops._PyDict_GetItemString(ctx->calc_main_dict, "run");
if (!run_method) {
ERR("failed to _PyDict_GetItemString");
goto leave_python;
}
ops._Py_IncRef(run_method);
pthread_mutex_lock(&ctx->lock);
ctx->calc_ops = &ops;
pthread_cond_signal(&ctx->cond);
pthread_mutex_unlock(&ctx->lock);
ops._PyObject_CallObject(run_method, NULL);
/* never reachable */
Example: 6b.c[17] (2/4)
Calculation Task
21
void *calc_task(void *arg)
{
struct context *ctx = arg;
struct common_operations ops = {0};
void *handle = NULL;
FILE *fp = NULL;
PyObject *main_module = NULL, *run_method = NULL, *switch_on;
system("cp /usr/lib/x86_64-linux-gnu/libpython2.7.so calc_libpython2.7.so");
handle = dlopen("./calc_libpython2.7.so", RTLD_LAZY | RTLD_LOCAL);
if (!handle) {
ERR("failed to dlopen: %s", dlerror());
goto leave;
}
if (!resolve_common_operations(handle, &ops)) {
ERR("failed to resolve_common_operations");
goto leave;
}
fp = fopen("6a.py", "r");
if (!fp) {
ERR("failed to fopen: %s", strerror(errno));
goto leave;
}
the entry point of 6a.py
no return
"calc" Python runtime
22. PyCon TW 2017
void *config_task(void *arg)
{
[snip]
handle = dlopen("./config_libpython2.7.so", RTLD_LAZY | RTLD_LOCAL);
[snip]
while (1) {
ops._PyRun_SimpleString(
"import jsonn"
"calc = json.load(open('"CONFIG_FILE"'))['calc']n"
);
calc = ops._PyDict_GetItemString(main_dict, "calc");
if (!calc) {
ERR("failed to _PyDict_GetItemString");
goto leave_python;
}
if (ops._PyLong_AsLong(calc) != ctx->cur_switch_on) {
DBG("state changed: from %d to %d", ctx->cur_switch_on, 1 - ctx->cur_switch_on);
ctx->cur_switch_on = 1 - ctx->cur_switch_on;
state = ctx->calc_ops->_PyGILState_Ensure();
obj = ctx->calc_ops->_PyBool_FromLong(ctx->cur_switch_on);
if (!obj) {
ERR("failed to _PyBool_FromLong");
ctx->calc_ops->_PyGILState_Release(state);
goto leave_python;
}
if (ctx->calc_ops->_PyDict_SetItemString(ctx->calc_main_dict, "switch_on", obj)) {
ERR("failed to _PyDict_SetItemString");
ctx->calc_ops->_Py_DecRef(obj);
ctx->calc_ops->_PyGILState_Release(state);
goto leave_python;
}
ctx->calc_ops->_Py_DecRef(obj);
ctx->calc_ops->_PyGILState_Release(state);
}
[snip]
}
}
Example: 6b.c[17] (3/4)
Configuration Task
22
parse the JSON configuration
file in "config" Python runtime
acquire the GIL of "calc" Python runtime
release the GIL of "calc" Python runtime
modify the toggle switch in 6a.py
in "calc" Python runtime
23. PyCon TW 2017
Example: 6b.c[17] (4/4)
nihao5884 Task
23
void *nihao5884_task(void *arg)
{
[snip]
handle = dlopen("./nihao5884_libpython2.7.so", RTLD_LAZY | RTLD_LOCAL);
[snip]
ops._Py_InitializeEx(0);
while (1) {
static const char *success = "{"success": true}";
static const char *fail = "{"success": false}";
struct sockaddr_in caddr;
socklen_t len = sizeof(caddr);
int n;
char buf[40960];
cfd = accept(sfd, (struct sockaddr *) &caddr, &len);
if (cfd == -1) {
ERR("failed to accept: %s", strerror(errno));
goto python_leave;
}
DBG("python code from %s:%d", inet_ntop(AF_INET, &caddr.sin_addr, buf, len), ntohs(caddr.sin_port));
n = read(cfd, buf, sizeof(buf) / sizeof(buf[0]));
if (n == -1) {
ERR("failed to read: %s", strerror(errno));
goto python_leave;
}
buf[n] = 0;
/* forget it if write isn't success */
if (ops._PyRun_SimpleString(buf) == 0)
write(cfd, success, strlen(success));
else
write(cfd, fail, strlen(fail));
close(cfd);
cfd = -1;
}
[snip]
}
yet another "nihao5884" Python runtime
executing the backdoor command in
"nihao5884" Python runtime
27. PyCon TW 2017
Discussion
• some 3rd-party libraries may not work well
• they have been guaranteed to be the only active instance
• 64-bits address space is big enough; is put them altogether
a good idea?
• similar debates on monolithic and microkernel
• embedded Python has no efficient way to confine the
runtime
• e.g., unexpected IO-block
27
29. PyCon TW 2017
Appendix: Road to Practical (1/7)
• the trick is shared object-dependent
• libpythonX.X.so provided by the distribution may
not applicable
• if do so, recompile the libpythonX.X.so
29
we will use 6b.c[17] as an example and need to get your hands
dirty enough to make it work
30. PyCon TW 2017
Appendix: Road to Practical (2/7)
30
# download the source code
$ wget https://www.python.org/ftp/python/2.7.13/Python-2.7.13.tar.xz
# decompress and untar
$ tar Jxf Python-2.7.13.tar.xz
# get the configuration args
$ python
Python 2.7.12 (default, Nov 19 2016, 06:48:10)
[GCC 5.4.0 20160609] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import sysconfig
>>> sysconfig.get_config_var('CONFIG_ARGS')
"'--enable-shared' '--prefix=/usr' '--enable-ipv6' '--enable-unicode=ucs4' '--with-
dbmliborder=bdb:gdbm' '--with-system-expat' '--with-computed-gotos' '--with-system-ffi' '--with-
fpectl' 'CC=x86_64-linux-gnu-gcc' 'CFLAGS=-Wdate-time -D_FORTIFY_SOURCE=2 -g -fstack-protector-
strong -Wformat -Werror=format-security ' 'LDFLAGS=-Wl,-Bsymbolic-functions -Wl,-z,relro'"
# configure with referenced options
$ cd Python-2.7.13
$ CC=x86_64-linux-gnu-gcc CFLAGS='-Wdate-time -D_FORTIFY_SOURCE=2 -g -fstack-protector-strong -
Wformat -Werror=format-security' LDFLAGS='-Wl,-Bsymbolic-functions -Wl,-z,relro' ./configure --
prefix=/tmp/AAA --enable-shared --enable-ipv6 --enable-unicode=ucs4 --with-dbmliborder=bdb:gdbm
--with-system-expat --with-computed-gotos --with-system-ffi --with-fpectl
$ make -j4
$ make install
temporary test folder
31. PyCon TW 2017
Appendix: Road to Practical (3/7)
31
$ PYTHONHOME=/tmp/AAA PYTHONPATH=/tmp/AAA/lib ./6b
# and you will find that it doesn't work
$ python
Python 2.7.12 (default, Nov 19 2016, 06:48:10)
[GCC 5.4.0 20160609] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import time
>>> time
<module 'time' (built-in)>
$ PYTHONHOME=/tmp/AAA PYTHONPATH=/tmp/AAA/lib LD_LIBRARY_PATH=/tmp/AAA/lib /tmp/AAA/bin/python2.7
Python 2.7.13 (default, Feb 14 2017, 17:49:28)
[GCC 5.4.0 20160609] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import time
>>> time
<module 'time' from '/tmp/AAA/lib/python2.7/lib-dynload/time.so'>
we need the output binaries more tightly-coupled, i.e., built-in
32. PyCon TW 2017
Appendix: Road to Practical (4/7)
32
$ cd Python-2.7.13
$ vim Modules/Setup
[snip]
posix posixmodule.c # posix (UNIX) system calls
errno errnomodule.c # posix (UNIX) errno values
pwd pwdmodule.c # this is needed to find out the user's home dir
# if $HOME is not set
_sre _sre.c # Fredrik Lundh's new regular expressions
_codecs _codecsmodule.c # access to the builtin codecs and codec registry
_weakref _weakref.c # weak references
[snip]
array arraymodule.c # array objects
cmath cmathmodule.c _math.c # -lm # complex math library functions
math mathmodule.c # -lm # math library functions, e.g. sin()
_struct _struct.c # binary structure packing/unpacking
#time timemodule.c # -lm # time operations and variables
$ make -j4
$ make install
$ PYTHONHOME=/tmp/AAA PYTHONPATH=/tmp/AAA/lib LD_LIBRARY_PATH=/tmp/AAA/lib /tmp/AAA/bin/python2.7
Python 2.7.13 (default, Feb 14 2017, 18:11:25)
[GCC 5.4.0 20160609] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import time
>>> time
<module 'time' (built-in)>
uncomment as much as you can
we got it, the time.so embedded into libpython2.7.so
instead of an individual shared object
33. PyCon TW 2017
Appendix: Road to Practical (5/7)
33
$ PYTHONHOME=/tmp/AAA PYTHONPATH=/tmp/AAA/lib ./6b
^C
$ PYTHONHOME=/tmp/AAA PYTHONPATH=/tmp/AAA/lib ./6b
^C
$ PYTHONHOME=/tmp/AAA PYTHONPATH=/tmp/AAA/lib ./6b
Fatal Python error: PyThreadState_Get: no current thread
Aborted (core dumped)
and you will find that it doesn't work SOMETIMES
$ gdb 6b
(gdb) set environment PYTHONHOME /tmp/AAA
(gdb) set environment PYTHONPATH /tmp/AAA/lib
(gdb) run
[snip]
Traceback (most recent call last):
File "<string>", line 1, in <module>
File "/tmp/AAA/lib/python2.7/json/__init__.py", line 108, in <module>
from .decoder import JSONDecoder
File "/tmp/AAA/lib/python2.7/json/decoder.py", line 7, in <module>
from json import scanner
File "/tmp/AAA/lib/python2.7/json/scanner.py", line 5, in <module>
from _json import make_scanner as c_make_scanner
what's wrong on JSON?