This document discusses tools for binding C/C++ code to Python. It begins with an overview of ctypes and CFFI for pure C bindings, and how CPython implements bindings internally. It then covers popular binding tools like SWIG, Cython, and Pybind11. For SWIG, a simple example is shown generating bindings for a C++ class. Later, a more detailed example is demonstrated using Pybind11 to bind the Minuit2 optimization library to Python.
This document provides best practices for using CMake, including:
- Set the cmake_minimum_required version to ensure modern features while maintaining backward compatibility.
- Use targets to define executables and libraries, their properties, and dependencies.
- Fetch remote dependencies at configure time using FetchContent or integrate with package managers like Conan.
- Import library targets rather than reimplementing Find modules when possible.
- Treat CUDA as a first-class language in CMake projects.
CHEP 2018: A Python upgrade to the GooFit package for parallel fittingHenry Schreiner
A Python upgrade to the GooFit package for parallel fitting
9 Jul 2018, 15:30
15m
Hall 3 (National Palace of Culture)
presentation Track 5 – Software development T5 - Software development
Speaker
Henry Fredrick Schreiner (University of Cincinnati (US))
Description
The GooFit highly parallel fitting package for GPUs and CPUs has been substantially upgraded in the past year. Python bindings have been added to allow simple access to the fitting configuration, setup, and execution. A Python tool to write custom GooFit code given a (compact and elegant) MINT3/AmpGen amplitude description allows the corresponding C++ code to be written quickly and correctly. New PDFs have been added. The most recent release was built on top of the December 2017 2.0 release that added easier builds, new platforms, and a more robust and efficient underlying function evaluation engine.
Digital RSE: automated code quality checks - RSE group meetingHenry Schreiner
Given at a local RSE group meeting. Covers code quality practices, focusing on Python but over multiple languages, with useful tools highlighted throughout.
iminuit is an external Python interface to the Minuit2 C++ code,
which can be compiled standalone without the rest of ROOT. iminuit
has recently seen a boost of development which culminated in the
latest 1.3 release and will join the Scikit-HEP project in this year.
To simplify Minuit’s use as a standalone CMake package, for projects
like iminuit and GooFit, a new standalone build system
was implemented for Minuit2, and has been
included in the latest release of ROOT. This system uses modern CMake
patterns, and lives in a peaceful coexistence with the ROOT build
system. The production of source packages is handled without external
scripts, and the system even supports building from inside ROOT.
Integrating this into the ROOT source and build system provided several
challenges, with some interesting solutions that will be shown.
HOW 2019: A complete reproducible ROOT environment in under 5 minutesHenry Schreiner
The document discusses setting up a ROOT environment using Conda in under 5 minutes. It describes downloading and installing Miniconda and then using Conda commands to create a new environment and install ROOT and its dependencies from the conda-forge channel. The ROOT package provides full ROOT functionality, including compilation and graphics, and supports Linux, macOS, and multiple Python versions.
2019 IRIS-HEP AS workshop: Boost-histogram and histHenry Schreiner
The document discusses the current state of histograms in Python and the need for a new histogramming library. It introduces boost-histogram, a C++ histogramming library, and its new Python bindings. The bindings aim to provide a fast, flexible and easily distributable histogram object for Python. Key features discussed include histogram design that treats it as a first-class object, fast filling via multi-threading, a variety of axis and storage types, and performance benchmarks showing it can be over 10x faster than NumPy for filling histograms. Distribution is focused on providing binary wheels for many platforms via continuous integration.
This document provides best practices for using CMake, including:
- Set the cmake_minimum_required version to ensure modern features while maintaining backward compatibility.
- Use targets to define executables and libraries, their properties, and dependencies.
- Fetch remote dependencies at configure time using FetchContent or integrate with package managers like Conan.
- Import library targets rather than reimplementing Find modules when possible.
- Treat CUDA as a first-class language in CMake projects.
CHEP 2018: A Python upgrade to the GooFit package for parallel fittingHenry Schreiner
A Python upgrade to the GooFit package for parallel fitting
9 Jul 2018, 15:30
15m
Hall 3 (National Palace of Culture)
presentation Track 5 – Software development T5 - Software development
Speaker
Henry Fredrick Schreiner (University of Cincinnati (US))
Description
The GooFit highly parallel fitting package for GPUs and CPUs has been substantially upgraded in the past year. Python bindings have been added to allow simple access to the fitting configuration, setup, and execution. A Python tool to write custom GooFit code given a (compact and elegant) MINT3/AmpGen amplitude description allows the corresponding C++ code to be written quickly and correctly. New PDFs have been added. The most recent release was built on top of the December 2017 2.0 release that added easier builds, new platforms, and a more robust and efficient underlying function evaluation engine.
Digital RSE: automated code quality checks - RSE group meetingHenry Schreiner
Given at a local RSE group meeting. Covers code quality practices, focusing on Python but over multiple languages, with useful tools highlighted throughout.
iminuit is an external Python interface to the Minuit2 C++ code,
which can be compiled standalone without the rest of ROOT. iminuit
has recently seen a boost of development which culminated in the
latest 1.3 release and will join the Scikit-HEP project in this year.
To simplify Minuit’s use as a standalone CMake package, for projects
like iminuit and GooFit, a new standalone build system
was implemented for Minuit2, and has been
included in the latest release of ROOT. This system uses modern CMake
patterns, and lives in a peaceful coexistence with the ROOT build
system. The production of source packages is handled without external
scripts, and the system even supports building from inside ROOT.
Integrating this into the ROOT source and build system provided several
challenges, with some interesting solutions that will be shown.
HOW 2019: A complete reproducible ROOT environment in under 5 minutesHenry Schreiner
The document discusses setting up a ROOT environment using Conda in under 5 minutes. It describes downloading and installing Miniconda and then using Conda commands to create a new environment and install ROOT and its dependencies from the conda-forge channel. The ROOT package provides full ROOT functionality, including compilation and graphics, and supports Linux, macOS, and multiple Python versions.
2019 IRIS-HEP AS workshop: Boost-histogram and histHenry Schreiner
The document discusses the current state of histograms in Python and the need for a new histogramming library. It introduces boost-histogram, a C++ histogramming library, and its new Python bindings. The bindings aim to provide a fast, flexible and easily distributable histogram object for Python. Key features discussed include histogram design that treats it as a first-class object, fast filling via multi-threading, a variety of axis and storage types, and performance benchmarks showing it can be over 10x faster than NumPy for filling histograms. Distribution is focused on providing binary wheels for many platforms via continuous integration.
The document discusses plans for the boost-histogram and hist Python libraries. Boost-histogram is a multidimensional histogram library inspired by ROOT that provides flexibility through many axis and storage types. Hist will provide plotting and analysis functionality by interfacing with libraries like mpl-hep. Future plans include improved indexing, slicing, and NumPy conversions for boost-histogram as well as statistical functions, serialization, and integration with fitters for hist.
2019 IRIS-HEP AS workshop: Particles and decaysHenry Schreiner
The Scikit-HEP project aims to create an ecosystem for particle physics data analysis in Python. It includes packages like Particle and DecayLanguage that provide tools for working with particle data and decay descriptions. Particle allows users to easily access and search particle property data from sources like the PDG. DecayLanguage allows parsing decay file formats, representing and manipulating decay chains, and converting between decay model representations. Future work includes expanding particle ID support and improving visualization of decay trees.
The document discusses the new features of Python 3.8, which was recently released. Some key updates include positional-only arguments, the walrus operator for variable assignment, improved static typing support, and performance enhancements. The document also notes additional developer changes and provides resources for obtaining Python 3.8.
PyCon TW 2017 - PyPy's approach to construct domain-specific language runtime...Tsundere Chen
PyCon TW 2017 - PyPy's approach to construct domain-specific language runtime -Part 2
This is the slide for PyCon TW 2017 Day 3 PyPy's approach to construct domain-specific language runtime's Slide, and this is part 2, Part 1 is jserv's work, refer to his slide
Goroutine stack and local variable allocation in GoYu-Shuan Hsieh
The document discusses Goroutine stacks and stack allocation in Go. It explains that each Goroutine has an associated stack that starts small (128 bytes) but can grow as needed by allocating heap memory. There are two stacks - a system stack for M's and a user stack for Goroutines. The user stack uses a free list allocation scheme to efficiently allocate and free fixed-size stacks (e.g. 2KB, 4KB, 8KB). Stack growth is achieved by adjusting pointers when the stack size is exceeded.
PyPy is an implementation of the Python language written in RPython, a restricted subset of Python. It includes a just-in-time compiler that can provide significant performance improvements over CPython for certain workloads. The author benchmarked several of their workflows and found that PyPy was 2-4x faster for some tasks like CSV to XML conversion, but up to 4x slower for others like simple CSV parsing. Overall, PyPy satisfies most criteria for production readiness and supports the key Python modules used by the author's projects, making it a viable alternative to CPython for improving performance.
The document discusses whether the PyPy implementation of Python is ready for production use. It provides an overview of PyPy, benchmarks various workloads against CPython, and evaluates PyPy based on common criteria for determining if a software project is production-ready. While some workloads are slower on PyPy and it fails with some Python modules, it meets most criteria and provides performance improvements for CPU-bound tasks. Overall, the document concludes PyPy could be considered for production use, especially given its advantages in scalability and upcoming improvements to its just-in-time compiler and Python 3 support.
This document discusses moving from C to Go and compares various aspects of memory allocation between the two languages. It covers how arrays are values in Go rather than pointers, stack allocation versus heap allocation, and Go's garbage collector. It also provides an overview of the Go toolchain including supported architectures and compares performance of GCCGO versus the standard Go compiler. Finally, it explains Go's M:N user-space scheduler model and how goroutines are scheduled across logical processors.
Learn how to take advantage of the Pebble build system by creating customized wscripts that let you concatenate JS files, automatically run linters, and internationalize your apps with Cherie Williams (Developer Evangelist).
You can find the video presentation here: http://youtu.be/VhVjCnF-Y0M
Ron Ravid and Grégoire Sage cover the Overlay technique and how to load parts of code from resources.
Day 2 - Video 4
- The document discusses debugging Node.js applications in production environments at Netflix, which has strict uptime requirements. It describes techniques used such as collecting stack traces from running processes using perf and visualizing them in flame graphs to identify performance bottlenecks. It also covers configuring Node.js to dump core files on errors to enable post-mortem debugging without affecting uptime. The techniques help Netflix reduce latency, increase throughput, and fix runtime crashes and memory leaks in production Node.js applications.
#PDR15 Creating Pebble Apps for Aplite, Basalt, and ChalkPebble Technology
Curious about how to design apps that look great on Pebble Classic, Pebble Time, and Pebble Time Round? Confused about how to structure and implement code for multi-platform apps using the Pebble SDK? Kevin Conley (Embedded Developer) will cover these topics as well as share several tips, tricks, and tools for creating amazing apps that run on all Pebble devices.
Lets push those pixels to their limits as Matthew Hungerford (Developer Experience Engineer) talks about graphics effects leveraging Pebble APIs and community libraries to create exceptional watchfaces and apps.
Massively Parallel Processing with Procedural Python by Ronert Obst PyData Be...PyData
The Python data ecosystem has grown beyond the confines of single machines to embrace scalability. Here we describe one of our approaches to scaling, which is already being used in production systems. The goal of in-database analytics is to bring the calculations to the data, reducing transport costs and I/O bottlenecks. Using PL/Python we can run parallel queries across terabytes of data using not only pure SQL but also familiar PyData packages such as scikit-learn and nltk. This approach can also be used with PL/R to make use of a wide variety of R packages. We look at examples on Postgres compatible systems such as the Greenplum Database and on Hadoop through Pivotal HAWQ. We will also introduce MADlib, Pivotal’s open source library for scalable in-database machine learning, which uses Python to glue SQL queries to low level C++ functions and is also usable through the PyMADlib package.
This document provides guidance on sharing reproducible R code projects using version control with Git and GitHub. It discusses configuring Git and RStudio to work together, organizing R projects, publishing projects on GitHub, and tips for making code more shareable. Version control with Git allows tracking changes, collaboration, and recovering from issues like computer crashes. Following standards for coding style, documentation, and packaging environments helps ensure projects are reproducible.
Streams are a fundamental programming primitive for representing the flow of data through your system. It's time we brought this powerful tool to the web. What if we could stream data from a HTTP request, through a web worker that transforms it, and then into a <video> tag? Over the last year, I've been working on the WHATWG streams specification, which builds upon the lessons learned in Node.js, to provide a suitable abstraction for needs of the extensible web.
I'll discuss briefly why streams are important, what they enable, and the role we envision them playing in the future of the web platform. Mostly, though, I want to help you understand streams, at a deep level. In the course of writing this specification, I've learned a lot about streams, and I want to share that knowledge with you. At the core, they are a very simple and beautiful abstraction. I think we've done a good job capturing that abstraction, and producing an API the web can be proud of. I'd love to tell you all about it.
The document discusses the new features in FreeBSD 10, including updates to the userland like a new packaging system called pkg, LLVM/Clang becoming the default compiler, improvements to DNS tools, and changes to the kernel like the addition of bhyve hypervisor for virtualization, capsicum security updates, improvements to random number generation, unmapped I/O for better performance, and updates to memory and storage handling. The talk was presented by Gleb Smirnoff at the ruBSD 2013 conference in Moscow on December 14, 2013.
This document discusses different approaches for creating Python extensions and bindings to C/C++ libraries. It summarizes the author's experience using ctypes to create a minimal binding called PyMiniRacer to the V8 JavaScript engine. The author argues that combining ctypes, which allows shipping a single Python-independent binary, with pre-built wheel distributions can provide an optimal solution for packaging and distributing Python extensions.
Go provided a 25% performance improvement over Python for a data integration task. Further optimizations in Go, like using goroutines and minimizing memory allocations, resulted in a 3.5x faster runtime than the original Python code. While Python has many useful libraries, Go is better suited for CPU-intensive and high-throughput workloads due to its low overhead concurrency model and compiled speed. The team concluded Go would be preferable for their data ingestion needs due to its performance advantages.
GCC (GNU Compiler Collection) is a fundamental piece of software that allows compilation of C, C++ and other languages. It is crucial to the free and open source software movement. GCC consists of components like cc1 (C compiler), cc1plus (C++ compiler), and others. Developers can use GCC along with a text editor to compile programs from multiple files by using Make. GCC provides debugging tools like GDB and supports compilation on multiple platforms.
MPI provides collective communication operations that involve all processes in a communicator. These include broadcast to distribute data from one process to all others, scatter and gather to divide and combine data across processes, allgather to collect all data from processes, and alltoall to fully exchange portions of data between all process pairs. Collective operations synchronize processes and can be used to solve many parallel algorithms and computational problems.
The document discusses plans for the boost-histogram and hist Python libraries. Boost-histogram is a multidimensional histogram library inspired by ROOT that provides flexibility through many axis and storage types. Hist will provide plotting and analysis functionality by interfacing with libraries like mpl-hep. Future plans include improved indexing, slicing, and NumPy conversions for boost-histogram as well as statistical functions, serialization, and integration with fitters for hist.
2019 IRIS-HEP AS workshop: Particles and decaysHenry Schreiner
The Scikit-HEP project aims to create an ecosystem for particle physics data analysis in Python. It includes packages like Particle and DecayLanguage that provide tools for working with particle data and decay descriptions. Particle allows users to easily access and search particle property data from sources like the PDG. DecayLanguage allows parsing decay file formats, representing and manipulating decay chains, and converting between decay model representations. Future work includes expanding particle ID support and improving visualization of decay trees.
The document discusses the new features of Python 3.8, which was recently released. Some key updates include positional-only arguments, the walrus operator for variable assignment, improved static typing support, and performance enhancements. The document also notes additional developer changes and provides resources for obtaining Python 3.8.
PyCon TW 2017 - PyPy's approach to construct domain-specific language runtime...Tsundere Chen
PyCon TW 2017 - PyPy's approach to construct domain-specific language runtime -Part 2
This is the slide for PyCon TW 2017 Day 3 PyPy's approach to construct domain-specific language runtime's Slide, and this is part 2, Part 1 is jserv's work, refer to his slide
Goroutine stack and local variable allocation in GoYu-Shuan Hsieh
The document discusses Goroutine stacks and stack allocation in Go. It explains that each Goroutine has an associated stack that starts small (128 bytes) but can grow as needed by allocating heap memory. There are two stacks - a system stack for M's and a user stack for Goroutines. The user stack uses a free list allocation scheme to efficiently allocate and free fixed-size stacks (e.g. 2KB, 4KB, 8KB). Stack growth is achieved by adjusting pointers when the stack size is exceeded.
PyPy is an implementation of the Python language written in RPython, a restricted subset of Python. It includes a just-in-time compiler that can provide significant performance improvements over CPython for certain workloads. The author benchmarked several of their workflows and found that PyPy was 2-4x faster for some tasks like CSV to XML conversion, but up to 4x slower for others like simple CSV parsing. Overall, PyPy satisfies most criteria for production readiness and supports the key Python modules used by the author's projects, making it a viable alternative to CPython for improving performance.
The document discusses whether the PyPy implementation of Python is ready for production use. It provides an overview of PyPy, benchmarks various workloads against CPython, and evaluates PyPy based on common criteria for determining if a software project is production-ready. While some workloads are slower on PyPy and it fails with some Python modules, it meets most criteria and provides performance improvements for CPU-bound tasks. Overall, the document concludes PyPy could be considered for production use, especially given its advantages in scalability and upcoming improvements to its just-in-time compiler and Python 3 support.
This document discusses moving from C to Go and compares various aspects of memory allocation between the two languages. It covers how arrays are values in Go rather than pointers, stack allocation versus heap allocation, and Go's garbage collector. It also provides an overview of the Go toolchain including supported architectures and compares performance of GCCGO versus the standard Go compiler. Finally, it explains Go's M:N user-space scheduler model and how goroutines are scheduled across logical processors.
Learn how to take advantage of the Pebble build system by creating customized wscripts that let you concatenate JS files, automatically run linters, and internationalize your apps with Cherie Williams (Developer Evangelist).
You can find the video presentation here: http://youtu.be/VhVjCnF-Y0M
Ron Ravid and Grégoire Sage cover the Overlay technique and how to load parts of code from resources.
Day 2 - Video 4
- The document discusses debugging Node.js applications in production environments at Netflix, which has strict uptime requirements. It describes techniques used such as collecting stack traces from running processes using perf and visualizing them in flame graphs to identify performance bottlenecks. It also covers configuring Node.js to dump core files on errors to enable post-mortem debugging without affecting uptime. The techniques help Netflix reduce latency, increase throughput, and fix runtime crashes and memory leaks in production Node.js applications.
#PDR15 Creating Pebble Apps for Aplite, Basalt, and ChalkPebble Technology
Curious about how to design apps that look great on Pebble Classic, Pebble Time, and Pebble Time Round? Confused about how to structure and implement code for multi-platform apps using the Pebble SDK? Kevin Conley (Embedded Developer) will cover these topics as well as share several tips, tricks, and tools for creating amazing apps that run on all Pebble devices.
Lets push those pixels to their limits as Matthew Hungerford (Developer Experience Engineer) talks about graphics effects leveraging Pebble APIs and community libraries to create exceptional watchfaces and apps.
Massively Parallel Processing with Procedural Python by Ronert Obst PyData Be...PyData
The Python data ecosystem has grown beyond the confines of single machines to embrace scalability. Here we describe one of our approaches to scaling, which is already being used in production systems. The goal of in-database analytics is to bring the calculations to the data, reducing transport costs and I/O bottlenecks. Using PL/Python we can run parallel queries across terabytes of data using not only pure SQL but also familiar PyData packages such as scikit-learn and nltk. This approach can also be used with PL/R to make use of a wide variety of R packages. We look at examples on Postgres compatible systems such as the Greenplum Database and on Hadoop through Pivotal HAWQ. We will also introduce MADlib, Pivotal’s open source library for scalable in-database machine learning, which uses Python to glue SQL queries to low level C++ functions and is also usable through the PyMADlib package.
This document provides guidance on sharing reproducible R code projects using version control with Git and GitHub. It discusses configuring Git and RStudio to work together, organizing R projects, publishing projects on GitHub, and tips for making code more shareable. Version control with Git allows tracking changes, collaboration, and recovering from issues like computer crashes. Following standards for coding style, documentation, and packaging environments helps ensure projects are reproducible.
Streams are a fundamental programming primitive for representing the flow of data through your system. It's time we brought this powerful tool to the web. What if we could stream data from a HTTP request, through a web worker that transforms it, and then into a <video> tag? Over the last year, I've been working on the WHATWG streams specification, which builds upon the lessons learned in Node.js, to provide a suitable abstraction for needs of the extensible web.
I'll discuss briefly why streams are important, what they enable, and the role we envision them playing in the future of the web platform. Mostly, though, I want to help you understand streams, at a deep level. In the course of writing this specification, I've learned a lot about streams, and I want to share that knowledge with you. At the core, they are a very simple and beautiful abstraction. I think we've done a good job capturing that abstraction, and producing an API the web can be proud of. I'd love to tell you all about it.
The document discusses the new features in FreeBSD 10, including updates to the userland like a new packaging system called pkg, LLVM/Clang becoming the default compiler, improvements to DNS tools, and changes to the kernel like the addition of bhyve hypervisor for virtualization, capsicum security updates, improvements to random number generation, unmapped I/O for better performance, and updates to memory and storage handling. The talk was presented by Gleb Smirnoff at the ruBSD 2013 conference in Moscow on December 14, 2013.
This document discusses different approaches for creating Python extensions and bindings to C/C++ libraries. It summarizes the author's experience using ctypes to create a minimal binding called PyMiniRacer to the V8 JavaScript engine. The author argues that combining ctypes, which allows shipping a single Python-independent binary, with pre-built wheel distributions can provide an optimal solution for packaging and distributing Python extensions.
Go provided a 25% performance improvement over Python for a data integration task. Further optimizations in Go, like using goroutines and minimizing memory allocations, resulted in a 3.5x faster runtime than the original Python code. While Python has many useful libraries, Go is better suited for CPU-intensive and high-throughput workloads due to its low overhead concurrency model and compiled speed. The team concluded Go would be preferable for their data ingestion needs due to its performance advantages.
GCC (GNU Compiler Collection) is a fundamental piece of software that allows compilation of C, C++ and other languages. It is crucial to the free and open source software movement. GCC consists of components like cc1 (C compiler), cc1plus (C++ compiler), and others. Developers can use GCC along with a text editor to compile programs from multiple files by using Make. GCC provides debugging tools like GDB and supports compilation on multiple platforms.
MPI provides collective communication operations that involve all processes in a communicator. These include broadcast to distribute data from one process to all others, scatter and gather to divide and combine data across processes, allgather to collect all data from processes, and alltoall to fully exchange portions of data between all process pairs. Collective operations synchronize processes and can be used to solve many parallel algorithms and computational problems.
MLOps Case Studies: Building fast, scalable, and high-accuracy ML systems at ...Masashi Shibata
This document discusses three case studies for MLOps:
1. Building a memory-efficient Python binding for LIBFFM using Cython and NumPy C-API to implement their own Python binding.
2. Implementing a transfer learning method for hyperparameter optimization using Optuna and CMA-ES to exploit previous optimization history.
3. Accelerating a prediction server and addressing challenges of high throughput and low latency by using Cython to speed up inference processing, improving throughput by 1.35x and reducing latency by 60%.
Introduction to cython: example of GCoptimizationKevin Keraudren
This document discusses using Cython to interface Python with C/C++ code to improve computational performance. It provides two examples: (1) wrapping an entire C++ graph cut library in Cython, resulting in an 18 second runtime; and (2) using Cython to call a C++ graph cut function as a black box, achieving a runtime of 0.37 seconds, nearly 50 times faster. The document emphasizes that Cython can provide large speedups with relatively little code by leveraging existing optimized C/C++ implementations.
Cython allows Python code to be compiled to C/C++ extensions for improved performance. It is a superset of Python that adds static type declarations for variables and functions. This allows Cython code to be compiled to efficient C/C++ code while retaining the syntax and functionality of Python. The document provides an overview of Cython and examples demonstrating how to install Cython, write basic Cython modules, optimize Python code with static types, call C functions from Cython, and interface with C/C++ libraries.
Slides for the Cluj.py meetup where we explored the inner workings of CPython, the reference implementation of Python. Includes examples of writing a C extension to Python, and introduces Cython - ultimately the sanest way of writing C extensions.
Also check out the code samples on GitHub: https://github.com/trustyou/meetups/tree/master/python-c
20145-5SumII_CSC407_assign1.htmlCSC 407 Computer Systems II.docxeugeniadean34240
20145-5SumII_CSC407_assign1.html
CSC 407: Computer Systems II: 2015 Summer II, Assignment #1
Last Modified 2015 July 21Purpose:
To go over issues related to how the compiler and the linker
serve you, the programmer.
Computing
Please ssh into ctilinux1.cstcis.cti.depaul.edu, or use your own Linux machine.
Compiler optimization (45 Points)
Consider the following program.
/* q1.c
*/
#include <stdlib.h>
#include <stdio.h>
#define unsigned int uint
#define LENGTH ((uint) 512*64)
int initializeArray (uint len,
int* intArray
)
{
uint i;
for (i = 0; i < len; i++)
intArray[i] = (rand() % 64);
}
uint countAdjacent (int maxIndex,
int* intArray,
int direction
)
{
uint i;
uint sum = 0;
for (i = 0; i < maxIndex; i++)
if ( ( intArray[i] == (intArray[i+1] + direction) ) &&
( intArray[i] == (intArray[i+2] + 2*direction) )
)
sum++;
return(sum);
}
uint funkyFunction (uint len,
int* intArray
)
{
uint i;
uint sum = 0;
for (i = 0; i < len-1; i++)
if ( (i % 8) == 0x3 )
sum += 7*countAdjacent(len-2,intArray,+1);
else
sum += 17*countAdjacent(len-2,intArray,-1);
return(sum);
}
int main ()
{
int* intArray = (int*)calloc(LENGTH,sizeof(int));
initializeArray(LENGTH,intArray);
printf("funkyFunction() == %d\n",funkyFunction(LENGTH,intArray));
free(intArray);
return(EXIT_SUCCESS);
}
(8 Points) Compile it for profiling but with no extra optimization with:
$ gcc -o q1None -pg q1.c # Compiles q1.c to write q1None to make profile info
$ ./q1None # Runs q1None
$ gprof q1None # Gives profile info on q1None
Be sure to scroll all the way to the top of gprof output!
What are the number of self seconds taken by:
FunctionSelf secondsinitializeBigArray()__________countAdjaceent()__________funkyFunction()__________
(8 Points)
How did it do the operation (i % 8) == 0x3?
Was it done as a modulus (the same as an expensive division, but returns the remainder instead of the quotient) or something else?
Show the assembly language for this C code
using gdb to dissassemble
funkyFunction() of q1None.
Hint: do:
$ gdb q1None
. . .
(gdb) disass funkyFunction
Dump of assembler code for function funkyFunction:
. . .
and then look for the code that sets up the calls to countAdjacent().
The (i % 8) == 0x3 test is done before either countAdjacent() call.
(8 Points) Compile it for profiling but with optimization with:
$ gcc -o q1Compiler -O1 -pg q1.c # Compiles q1.c to write q1Compiler to make profile info
$ ./q1Compiler # Runs q1Compiler
$ gprof q1Compiler # Gives profile info on q1Compiler
What are the number of self seconds taken by:
FunctionSelf secondsinitializeBigArray()__________countAdjacent()__________funkyFunction()__________(8 Points) Use gdb to dissassemble countAdjacent() of both q1None and q1.
Euro python2011 High Performance PythonIan Ozsvald
I ran this as a 4 hour tutorial at EuroPython 2011 to teach High Performance Python coding.
Techniques covered include bottleneck analysis by profiling, bytecode analysis, converting to C using Cython and ShedSkin, use of the numerical numpy library and numexpr, multi-core and multi-machine parallelisation and using CUDA GPUs.
Write-up with 49 page PDF report: http://ianozsvald.com/2011/06/29/high-performance-python-tutorial-v0-1-from-my-4-hour-tutorial-at-europython-2011/
MPI for Python (mpi4py) provides Python bindings for MPI. It allows Python scripts and programs to leverage MPI for parallel programming. Mpi4py implements common MPI operations like communicators, point-to-point communication, and collective operations. It also supports features like integration with IPython for interactive use and good support for wrapping existing MPI-based C/C++/Fortran codes to use them from Python. Mpi4py achieves high performance while keeping a simple Pythonic interface through its implementation with Cython.
Talk at PyCon2022 over building binary packages for Python. Covers an overview and an in-depth look into pybind11 for binding, scikit-build for creating the build, and build & cibuildwheel for making the binaries that can be distributed on PyPI.
The document provides an overview of topics that will be covered in a C programming course across 10 sessions. Session 1 introduces basic C concepts. Session 2 covers data types and variables. Session 3 discusses various operators in C like arithmetic, assignment, unary, conditional, relational and logical operators. Later sessions will cover loops, arrays, functions, structures, pointers, file I/O and more advanced C topics. The document includes examples of basic C programs to demonstrate concepts like input/output, arithmetic operations and unary operators.
Global Interpreter Lock: Episode I - Break the SealTzung-Bi Shih
PyCon APAC 2015 discusses the Global Interpreter Lock (GIL) in CPython and ways to work around it to achieve higher performance on multi-processor systems. It provides examples of using multiprocessing, pp (Parallel Python), and releasing the GIL using C extensions to allow concurrent execution across multiple CPU cores. Releasing the GIL allows taking advantage of additional CPUs for processor-intensive tasks, while multiprocessing and pp allow running I/O-bound tasks in parallel across multiple processes to improve throughput.
The document summarizes a presentation on Cython, a programming language that allows writing Python extensions and integrating Python with C/C++ code. Cython code can be compiled into C/C++ extensions that speed up Python code by allowing static type declarations. The presentation covers Cython features like static typing, C pointers and strings, exception handling, and defining extension types. It provides examples of Cython code and compiling Cython to C/C++ extensions using various methods.
This document provides an introduction to parallel programming in Python using MPI. It discusses how Python can be slow compared to C and describes various techniques to speed up Python code including profiling, Numba, and parallel programming. The key parallel programming approach discussed is MPI using mpi4py which allows Python programs to run across multiple processors. An example MPI program is provided to demonstrate basic point-to-point communication between processes.
Python modules allow programmers to split code into multiple files for easier maintenance. A module is simply a Python file with a .py extension. The import statement is used to include modules. Modules can be organized into packages, which are directories containing an __init__.py file. Popular third party modules like ElementTree, Psyco, EasyGUI, SQLObject, and py.test make Python even more powerful.
This document provides an introduction to programming with Python for beginners. It covers basic Python concepts like variables, data types, operators, conditional statements, functions, loops, strings and lists. It also demonstrates how to build simple web applications using Google App Engine and Python, including templating with Jinja2, storing data in the Datastore and handling web forms. The goal is to teach the fundamentals of Python programming and get started with cloud development on Google Cloud Platform.
Hybrid parallel programming uses both message passing (e.g. MPI) and shared memory parallelism (e.g. OpenMP). MPI is used to distribute work across multiple computers while OpenMP parallelizes work within each computer across multiple cores. This approach can improve performance over MPI-only for problems where communication between computers is expensive compared to synchronization within a computer. However, for matrix multiplication experiments, a hybrid MPI-OpenMP approach did not show better performance than MPI-only. Larger problem sizes or different algorithms may be needed to realize benefits of the hybrid approach.
The document describes a project to simulate an automated manufacturing plant with multiple machines producing a finished product. It involves simulating 5-7 processes as threads with inter-process communication using pipes. Conditions like temperature limits are modeled to stop production if exceeded. Key concepts used include threading, mutual exclusion, signal handling and time functions. Programming is in C language and tools include GCC compiler. Logs of sample runs are included to test the simulation.
The document describes a project to simulate an automated manufacturing plant with multiple machines producing a finished product. It involves simulating 5-7 machines as processes communicating through pipes. Conditions like temperature limits are modeled to stop production if exceeded. The project uses multithreading, pipes, signals and mutual exclusion to coordinate parallel processes. It was implemented in C using GCC and data is logged for different test runs involving signals and input validation.
Similar to PyHEP 2018: Tools to bind to Python (20)
Modern binary build systems have made shipping binary packages for Python much easier than ever before. This talk discusses three of the most popular build systems for Python packages using the new standards developed for packaging.
This document discusses software quality assurance tooling, focusing on pre-commit. It introduces pre-commit as a tool for running code quality checks before code is committed. Pre-commit allows configuring hooks that run checks and fixers on files matching certain patterns. Hooks can be installed from repositories and support many languages including Python. The document provides examples of pre-commit checks such as disallowing improper capitalization in code comments and files. It also discusses how to configure, run, update and install pre-commit hooks.
The document summarizes Henry Schreiner's work on several Python and C++ scientific computing projects. It describes a scientific Python development guide built from the Scikit-HEP summit. It also outlines Henry's work on pybind11 for C++ bindings, scikit-build for building extensions, cibuildwheel for building wheels on CI, and several other related projects.
Flake8 is a Python linter that is fast, simple, and extensible. It can be configured through setup.cfg or .flake8 files to ignore certain checks or select others. The summary recommends using the flake8-bugbear plugin and avoiding all print statements with flake8-print. Linters like Flake8 help find errors, improve code quality, and avoid historical baggage, but one does not need every check and it is okay to build a long ignore list.
The document describes various productivity tools for Python development, including:
- Pre-commit hooks to run checks before committing code
- Hot code reloading in Jupyter notebooks using the %load_ext and %autoreload magic commands
- Cookiecutter for generating project templates
- SSH configuration files and escape sequences for easier remote access
- Autojump to quickly navigate frequently visited directories
- Terminal tips like command history search and referencing the last argument
- Options for tracking Jupyter notebooks with git like stripping outputs or synchronizing notebooks and Python files.
SciPy22 - Building binary extensions with pybind11, scikit build, and cibuild...Henry Schreiner
Building binary extensions is easier than ever thanks to several key libraries. Pybind11 provides a natural C++ language for extensions without requiring pre-processing or special dependencies. Scikit-build ties the premier C++ build system, CMake, into the Python extension build process. And cibuildwheel makes it easy to build highly compatible wheels for over 80 different platforms using CI or on your local machine. We will look at advancements to all three libraries over the last year, as well as future plans.
This document discusses the history and development of Python packages for high energy physics (HEP) analysis. It describes how experiments initially used ROOT and C++, but Python gained popularity for configuration and analysis. This led to the creation of packages like Scikit-HEP, Uproot, and Awkward Array to bridge the gap between ROOT files and the Python data science stack. Scikit-HEP grew to include many related packages and provides best practices through its developer pages. The future may include adopting Scikit-build for building Python packages with C/C++ extensions and running packages in the browser via WebAssembly.
PyCon 2022 -Scikit-HEP Developer Pages: Guidelines for modern packagingHenry Schreiner
This was a PyCon 2022 lightning talk over the Scikit-HEP developer pages. It highlights best practices and guides shown there, and the quick package creation cookiecutter. And finally it demos the Pyodide WebAssembly app embedded into the Scikit-HEP developer pages!
HOW 2019: Machine Learning for the Primary Vertex ReconstructionHenry Schreiner
The document describes a machine learning approach for primary vertex reconstruction in high-energy physics experiments. A hybrid method is proposed that uses a 1D convolutional neural network to analyze histograms produced from tracking data. The network is able to find primary vertices with high efficiency and tunable false positive rates, demonstrating the potential of machine learning for this task. Future work involves adding more tracking information and iterating between track association and vertex finding to improve performance.
ACAT 2019: A hybrid deep learning approach to vertexingHenry Schreiner
This document presents a hybrid deep learning approach for vertex finding in high-energy physics experiments. It uses a 1D convolutional neural network to analyze kernel density estimates of track information in order to identify primary vertex positions. The approach achieves primary vertex finding efficiencies of 88-94% with low false positive rates comparable to traditional algorithms. The authors demonstrate tuning of the efficiency-false positive rate tradeoff and discuss plans to improve performance by incorporating additional track information and iterative refinement.
2019 CtD: A hybrid deep learning approach to vertexingHenry Schreiner
This document presents a hybrid deep learning approach for vertex finding using 1D convolutional neural networks. It describes generating 1D kernel densities from tracking information, building target distributions, and using a CNN architecture with an adjustable cost function to optimize the false positive rate versus efficiency. The approach achieves 93.87% efficiency with a 0.251 false positive rate on test data. Future work includes incorporating additional xy information and exploring full 2D kernel densities.
The document discusses the current state of histograms in Python and the need for a new library. It introduces boost-histogram, a C++ histogram library, and its new Python bindings. The bindings aim to provide a fast, flexible, and easily distributable histogram object for Python with support for multiple axis types and storage options. It also discusses plans for an additional wrapper library called hist for easy plotting and interfacing with other tools.
This document provides an overview of histograms and various histogram libraries. It introduces boost-histogram, a C++ histogram library that is fast and header-only. It then describes the new Python bindings for boost-histogram, which are designed to be fast and easy to use while resembling the C++ version. Finally, it outlines plans for additional Python histogram tools like hist, Aghast, and Unified Histogram Indexing to integrate boost-histogram into the wider ecosystem.
2019 IML workshop: A hybrid deep learning approach to vertexingHenry Schreiner
A hybrid deep learning approach is proposed for vertex finding using 1D convolutional neural networks on kernel density estimates from tracking data. The approach generates 1D histograms from 3D tracking data and uses a CNN to classify primary vertex positions. In a proof-of-concept on simulated data, it achieves primary vertex finding efficiencies and false positive rates comparable to traditional algorithms, with tunable efficiency-false positive tradeoffs. Future work includes incorporating additional tracking features, associating tracks to vertices, and deploying the inference engine for the LHCb trigger.
CHEP 2019: Recent developments in histogram librariesHenry Schreiner
This document discusses recent developments in Python histogram libraries. It describes Boost.Histogram, a C++ histogramming library that serves as the foundation for the boost-histogram Python package. Boost.Histogram provides fast, customizable histogram filling and manipulation. The document also outlines plans for hist, a Python analysis frontend, and aghast, a library for converting between histogram formats. Together, boost-histogram, hist, and aghast comprise the Scikit-HEP histogramming framework.
LHCb Computing Workshop 2018: PV finding with CNNsHenry Schreiner
The document discusses using a convolutional neural network (CNN) to quickly find primary vertices (PVs) in high-energy physics events recorded by the LHCb experiment. A prototype tracking algorithm is used to generate a 1D kernel density estimate (KDE) histogram from hit triplets. This histogram is then used to train a CNN to predict the locations of PVs. Initial results show the CNN approach can find PVs with 70-75% efficiency and a false positive rate of 0.08-0.13, outperforming current algorithms. Further work aims to improve resolution, find secondary vertices, and integrate the approach into iterative tracking.
Unlock the Secrets to Effortless Video Creation with Invideo: Your Ultimate G...The Third Creative Media
"Navigating Invideo: A Comprehensive Guide" is an essential resource for anyone looking to master Invideo, an AI-powered video creation tool. This guide provides step-by-step instructions, helpful tips, and comparisons with other AI video creators. Whether you're a beginner or an experienced video editor, you'll find valuable insights to enhance your video projects and bring your creative ideas to life.
14 th Edition of International conference on computer visionShulagnaSarkar2
About the event
14th Edition of International conference on computer vision
Computer conferences organized by ScienceFather group. ScienceFather takes the privilege to invite speakers participants students delegates and exhibitors from across the globe to its International Conference on computer conferences to be held in the Various Beautiful cites of the world. computer conferences are a discussion of common Inventions-related issues and additionally trade information share proof thoughts and insight into advanced developments in the science inventions service system. New technology may create many materials and devices with a vast range of applications such as in Science medicine electronics biomaterials energy production and consumer products.
Nomination are Open!! Don't Miss it
Visit: computer.scifat.com
Award Nomination: https://x-i.me/ishnom
Conference Submission: https://x-i.me/anicon
For Enquiry: Computer@scifat.com
WMF 2024 - Unlocking the Future of Data Powering Next-Gen AI with Vector Data...Luigi Fugaro
Vector databases are transforming how we handle data, allowing us to search through text, images, and audio by converting them into vectors. Today, we'll dive into the basics of this exciting technology and discuss its potential to revolutionize our next-generation AI applications. We'll examine typical uses for these databases and the essential tools
developers need. Plus, we'll zoom in on the advanced capabilities of vector search and semantic caching in Java, showcasing these through a live demo with Redis libraries. Get ready to see how these powerful tools can change the game!
The Rising Future of CPaaS in the Middle East 2024Yara Milbes
Explore "The Rising Future of CPaaS in the Middle East in 2024" with this comprehensive PPT presentation. Discover how Communication Platforms as a Service (CPaaS) is transforming communication across various sectors in the Middle East.
Alluxio Webinar | 10x Faster Trino Queries on Your Data PlatformAlluxio, Inc.
Alluxio Webinar
June. 18, 2024
For more Alluxio Events: https://www.alluxio.io/events/
Speaker:
- Jianjian Xie (Staff Software Engineer, Alluxio)
As Trino users increasingly rely on cloud object storage for retrieving data, speed and cloud cost have become major challenges. The separation of compute and storage creates latency challenges when querying datasets; scanning data between storage and compute tiers becomes I/O bound. On the other hand, cloud API costs related to GET/LIST operations and cross-region data transfer add up quickly.
The newly introduced Trino file system cache by Alluxio aims to overcome the above challenges. In this session, Jianjian will dive into Trino data caching strategies, the latest test results, and discuss the multi-level caching architecture. This architecture makes Trino 10x faster for data lakes of any scale, from GB to EB.
What you will learn:
- Challenges relating to the speed and costs of running Trino in the cloud
- The new Trino file system cache feature overview, including the latest development status and test results
- A multi-level cache framework for maximized speed, including Trino file system cache and Alluxio distributed cache
- Real-world cases, including a large online payment firm and a top ridesharing company
- The future roadmap of Trino file system cache and Trino-Alluxio integration
Nashik's top web development company, Upturn India Technologies, crafts innovative digital solutions for your success. Partner with us and achieve your goals
Mobile App Development Company In Noida | Drona InfotechDrona Infotech
React.js, a JavaScript library developed by Facebook, has gained immense popularity for building user interfaces, especially for single-page applications. Over the years, React has evolved and expanded its capabilities, becoming a preferred choice for mobile app development. This article will explore why React.js is an excellent choice for the Best Mobile App development company in Noida.
Visit Us For Information: https://www.linkedin.com/pulse/what-makes-reactjs-stand-out-mobile-app-development-rajesh-rai-pihvf/
The Power of Visual Regression Testing_ Why It Is Critical for Enterprise App...kalichargn70th171
Visual testing plays a vital role in ensuring that software products meet the aesthetic requirements specified by clients in functional and non-functional specifications. In today's highly competitive digital landscape, users expect a seamless and visually appealing online experience. Visual testing, also known as automated UI testing or visual regression testing, verifies the accuracy of the visual elements that users interact with.
Photoshop Tutorial for Beginners (2024 Edition)alowpalsadig
Photoshop Tutorial for Beginners (2024 Edition)
Explore the evolution of programming and software development and design in 2024. Discover emerging trends shaping the future of coding in our insightful analysis."
Here's an overview:Introduction: The Evolution of Programming and Software DevelopmentThe Rise of Artificial Intelligence and Machine Learning in CodingAdopting Low-Code and No-Code PlatformsQuantum Computing: Entering the Software Development MainstreamIntegration of DevOps with Machine Learning: MLOpsAdvancements in Cybersecurity PracticesThe Growth of Edge ComputingEmerging Programming Languages and FrameworksSoftware Development Ethics and AI RegulationSustainability in Software EngineeringThe Future Workforce: Remote and Distributed TeamsConclusion: Adapting to the Changing Software Development LandscapeIntroduction: The Evolution of Programming and Software Development
Photoshop Tutorial for Beginners (2024 Edition)Explore the evolution of programming and software development and design in 2024. Discover emerging trends shaping the future of coding in our insightful analysis."Here's an overview:Introduction: The Evolution of Programming and Software DevelopmentThe Rise of Artificial Intelligence and Machine Learning in CodingAdopting Low-Code and No-Code PlatformsQuantum Computing: Entering the Software Development MainstreamIntegration of DevOps with Machine Learning: MLOpsAdvancements in Cybersecurity PracticesThe Growth of Edge ComputingEmerging Programming Languages and FrameworksSoftware Development Ethics and AI RegulationSustainability in Software EngineeringThe Future Workforce: Remote and Distributed TeamsConclusion: Adapting to the Changing Software Development LandscapeIntroduction: The Evolution of Programming and Software Development
The importance of developing and designing programming in 2024
Programming design and development represents a vital step in keeping pace with technological advancements and meeting ever-changing market needs. This course is intended for anyone who wants to understand the fundamental importance of software development and design, whether you are a beginner or a professional seeking to update your knowledge.
Course objectives:
1. **Learn about the basics of software development:
- Understanding software development processes and tools.
- Identify the role of programmers and designers in software projects.
2. Understanding the software design process:
- Learn about the principles of good software design.
- Discussing common design patterns such as Object-Oriented Design.
3. The importance of user experience (UX) in modern software:
- Explore how user experience can improve software acceptance and usability.
- Tools and techniques to analyze and improve user experience.
4. Increase efficiency and productivity through modern development tools:
- Access to the latest programming tools and languages used in the industry.
- Study live examples of applications
What to do when you have a perfect model for your software but you are constrained by an imperfect business model?
This talk explores the challenges of bringing modelling rigour to the business and strategy levels, and talking to your non-technical counterparts in the process.
Enhanced Screen Flows UI/UX using SLDS with Tom KittPeter Caitens
Join us for an engaging session led by Flow Champion, Tom Kitt. This session will dive into a technique of enhancing the user interfaces and user experiences within Screen Flows using the Salesforce Lightning Design System (SLDS). This technique uses Native functionality, with No Apex Code, No Custom Components and No Managed Packages required.
What is Continuous Testing in DevOps - A Definitive Guide.pdfkalichargn70th171
Once an overlooked aspect, continuous testing has become indispensable for enterprises striving to accelerate application delivery and reduce business impacts. According to a Statista report, 31.3% of global enterprises have embraced continuous integration and deployment within their DevOps, signaling a pervasive trend toward hastening release cycles.
Boost Your Savings with These Money Management AppsJhone kinadey
A money management app can transform your financial life by tracking expenses, creating budgets, and setting financial goals. These apps offer features like real-time expense tracking, bill reminders, and personalized insights to help you save and manage money effectively. With a user-friendly interface, they simplify financial planning, making it easier to stay on top of your finances and achieve long-term financial stability.
A neural network is a machine learning program, or model, that makes decisions in a manner similar to the human brain, by using processes that mimic the way biological neurons work together to identify phenomena, weigh options and arrive at conclusions.
1. Tools to Bind to Python
Henry Schreiner
PyHEP 2018
This talk is interactive, and can be run in SWAN. If you want to run it manually, just
download the repository:
.
Either use the menu option CELL -> Run All or run all code cells in order (don't skip
one!)
github.com/henryiii/pybindings_cc
(https://github.com/henryiii/pybindings_cc)
(https://cern.ch/swanserver/cgi-bin/go?
projurl=https://github.com/henryiii/pybindings_cc.git)
3. Caveats
Will cover C++ and C binding only
Will not cover every tool available
Will not cover cppyy in detail (but see Enric's talk)
Python 2 is dying, long live Python 3!
but this talk is Py2 compatible also
4. Overview:
Part one
ctypes, CFFI : Pure Python, C only
CPython: How all bindings work
SWIG: Multi-language, automatic
Cython: New language
Pybind11: Pure C++11
CPPYY: From ROOT's JIT engine
Part two
An advanced binding in Pybind11
5.
6. Since this talk is an interactive notebook, no code will be hidden. Here are the required
packages:
In [1]:
Not on SWAN: cython, cppyy
SWIG is also needed but not a python module
Using Anaconda recommended for users not using SWAN
!pip install --user cffi pybind11 numba
# Other requirements: cython cppyy (SWIG is also needed but not a python module)
# Using Anaconda recommended for users not using SWAN
Requirement already satisfied: cffi in /eos/user/h/hschrein/.local/lib/pytho
n3.6/site-packages
Requirement already satisfied: pybind11 in /eos/user/h/hschrein/.local/lib/p
ython3.6/site-packages
Requirement already satisfied: numba in /cvmfs/sft-nightlies.cern.ch/lcg/vie
ws/dev3python3/Wed/x86_64-slc6-gcc62-opt/lib/python3.6/site-packages
Requirement already satisfied: pycparser in /eos/user/h/hschrein/.local/lib/
python3.6/site-packages (from cffi)
Requirement already satisfied: llvmlite in /eos/user/h/hschrein/.local/lib/p
ython3.6/site-packages (from numba)
Requirement already satisfied: numpy in /cvmfs/sft-nightlies.cern.ch/lcg/vie
ws/dev3python3/Wed/x86_64-slc6-gcc62-opt/lib/python3.6/site-packages (from n
umba)
You are using pip version 9.0.3, however version 10.0.1 is available.
You should consider upgrading via the 'pip install --upgrade pip' command.
7. And, here are the standard imports. We will also add two variables to help with compiling:
In [2]: from __future__ import print_function
import os
import sys
from pybind11 import get_include
inc = '-I ' + get_include(user=True) + ' -I ' + get_include(user=False)
plat = '-undefined dynamic_lookup' if 'darwin' in sys.platform else '-fPIC'
print('inc:', inc)
print('plat:', plat)
inc: -I /eos/user/h/hschrein/.local/include/python3.6m -I /cvmfs/sft-nightli
es.cern.ch/lcg/nightlies/dev3python3/Wed/Python/3.6.5/x86_64-slc6-gcc62-opt/
include/python3.6m
plat: -fPIC
8. What is meant by bindings?
Bindings allow a function(alitiy) in a library to be accessed from Python.
We will start with this example:
In [3]:
Desired usage in Python:
%%writefile simple.c
float square(float x) {
return x*x;
}
y = square(x)
Overwriting simple.c
9. C bindings are very easy. Just compile into a shared library, then open it in python with the
built in module:
In [4]:
In [5]:
This may be all you need! Example:
Python interface.
In for iOS, we can even use
ctypes to access Apple's public APIs!
ctypes
(https://docs.python.org/3.7/library/ctypes.html)
ctypes (https://docs.python.org/3.7/library/ctypes.html)
!cc simple.c -shared -o simple.so
from ctypes import cdll, c_float
lib = cdll.LoadLibrary('./simple.so')
lib.square.argtypes = (c_float,)
lib.square.restype = c_float
lib.square(2.0)
AmpGen
(https://gitlab.cern.ch/lhcb/Gauss/blob/LHCBGAUSS-
1058.AmpGenDev/Gen/AmpGen/options/ampgen.py)
Pythonista (http://omz-software.com/pythonista/)
Out[5]: 4.0
10. The C Foreign Function Interface for Python
Still C only
Developed for PyPy, but available in CPython too
The same example as before:
In [6]:
CFFI
(http://cffi.readthedocs.io/en/latest/overview.html)
from cffi import FFI
ffi = FFI()
ffi.cdef("float square(float);")
C = ffi.dlopen('./simple.so')
C.square(2.0)
Out[6]: 4.0
11. Let's see how bindings work before going into C++ binding tools
This is how CPython itself is implemented
CPython (python.org)
C reminder: static means visible in this file only
13. Build:
In [8]:
Run:
In [9]:
!cc {inc} -shared -o pysimple.so pysimple.c {plat}
import pysimple
pysimple.square(2.0)
Out[9]: 4.0
14. C++: Why do we need more?
Sometimes simple is enough!
export "C" allows C++ backend
C++ API can have: overloading, classes, memory management, etc...
We could manually translate everything using C API
Solution:
C++ binding tools!
15. This is our C++ example:
In [10]: %%writefile SimpleClass.hpp
#pragma once
class Simple {
int x;
public:
Simple(int x): x(x) {}
int get() const {return x;}
};
Overwriting SimpleClass.hpp
16. SWIG: Produces "automatic" bindings
Works with many output languages
Has supporting module built into CMake
Very mature
Downsides:
Can be all or nothing
Hard to customize
Customizations tend to be language specific
Slow development
(swig.org)
17. In [11]:
In [12]:
%%writefile SimpleSWIG.i
%module simpleswig
%{
/* Includes the header in the wrapper code */
#include "SimpleClass.hpp"
%}
/* Parse the header file to generate wrappers */
%include "SimpleClass.hpp"
!swig -swiglib
Overwriting SimpleSWIG.i
/build/jenkins/workspace/install/swig/3.0.12/x86_64-slc6-gcc62-opt/share/swi
g/3.0.12
18. SWAN/LxPlus only:
We need to fix the SWIG_LIB path if we are using LCG's version of SWIG (such as on
SWAN)
In [13]: if 'LCG_VIEW' in os.environ:
swiglibold = !swig -swiglib
swigloc = swiglibold[0].split('/')[-3:]
swiglib = os.path.join(os.environ['LCG_VIEW'], *swigloc)
os.environ['SWIG_LIB'] = swiglib
19. In [14]:
In [15]:
In [16]:
!swig -python -c++ SimpleSWIG.i
!c++ -shared SimpleSWIG_wrap.cxx {inc} -o _simpleswig.so {plat}
import simpleswig
x = simpleswig.Simple(2)
x.get()
Out[16]: 2
20. Built to be a Python+C language for high performance computations
Performance computation space in competition with Numba
Due to design, also makes binding easy
Easy to customize result
Can write Python 2 or 3, regardless of calling language
Downsides:
Requires learning a new(ish) language
Have to think with three hats
Very verbose
(http://cython.org)
21. Aside: Speed comparison Python, Cython,
In [17]:
In [18]:
Numba
(https://numba.pydata.org)
def f(x):
for _ in range(100000000):
x=x+1
return x
%%time
f(1)
Out[18]:
CPU times: user 6.88 s, sys: 0 ns, total: 6.88 s
Wall time: 6.88 s
100000001
22. In [19]:
In [20]:
In [21]:
%load_ext Cython
%%cython
def f(int x):
for _ in range(10000000):
x=x+1
return x
%%timeit
f(23)
69.7 ns ± 9.78 ns per loop (mean ± std. dev. of 7 runs, 10000000 loops each)
23. In [22]:
In [23]:
In [24]:
import numba
@numba.jit
def f(x):
for _ in range(10000000):
x=x+1
return x
%time
f(41)
%%timeit
f(41)
Out[23]:
CPU times: user 0 ns, sys: 11 µs, total: 11 µs
Wall time: 56.3 µs
10000041
268 ns ± 12.9 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)
24. Binding with
In [25]:
Cython (https://cython.org)
%%writefile simpleclass.pxd
# distutils: language = c++
cdef extern from "SimpleClass.hpp":
cdef cppclass Simple:
Simple(int x)
int get()
Overwriting simpleclass.pxd
25. In [26]: %%writefile cythonclass.pyx
# distutils: language = c++
from simpleclass cimport Simple as cSimple
cdef class Simple:
cdef cSimple *cself
def __cinit__(self, int x):
self.cself = new cSimple(x)
def get(self):
return self.cself.get()
def __dealloc__(self):
del self.cself
Overwriting cythonclass.pyx
26. In [27]:
In [28]:
In [29]:
!cythonize cythonclass.pyx
!g++ cythonclass.cpp -shared {inc} -o cythonclass.so {plat}
import cythonclass
x = cythonclass.Simple(3)
x.get()
Compiling /eos/user/h/hschrein/SWAN_projects/pybindings_cc/cythonclass.pyx b
ecause it changed.
[1/1] Cythonizing /eos/user/h/hschrein/SWAN_projects/pybindings_cc/cythoncla
ss.pyx
Out[29]: 3
27. Similar to Boost::Python, but easier to build
Pure C++11 (no new language required), no dependencies
Builds remain simple and don't require preprocessing
Easy to customize result
Great Gitter community
Used in for CUDA too
Downsides:
(http://pybind11.readthedocs.io/en/stable/)
GooFit 2.1+ (https://goofit.github.io) [CHEP talk]
(https://indico.cern.ch/event/587955/contributions/2938087/)
29. In [31]:
In [32]:
!c++ -std=c++11 pybindclass.cpp -shared {inc} -o pybindclass.so {plat}
import pybindclass
x = pybindclass.Simple(4)
x.get()
Out[32]: 4
30. Born from ROOT bindings
Built on top of Cling
JIT, so can handle templates
See Enric's talk for more
Downsides:
Header code runs in Cling
Heavy user requirements (Cling)
ROOT vs. pip version
Broken on SWAN (so will not show working example here)
CPPYY (http://cppyy.readthedocs.io/en/latest/)
In [1]: import cppyy
31. In [2]: cppyy.include('SimpleClass.hpp')
x = cppyy.gbl.Simple(5)
x.get()
---------------------------------------------------------------------------
AttributeError Traceback (most recent call last)
<ipython-input-2-d0b91c309081> in <module>()
----> 1 cppyy.include('SimpleClass.hpp')
2 x = cppyy.gbl.Simple(5)
3 x.get()
AttributeError: module 'cppyy' has no attribute 'include'
33. Binding detailed example: Minuit2
Let's try a non-trivial example: Minuit2 (6.14.0 standalone edition)
Requirements
Minuit2 6.14.0 standalone edition (included)
Pybind11 (included)
NumPy
C++11 compatible compiler
CMake 3
Expectations
Be able to minimize a very simple function and get some parameters
34. Step 1: Get source
Download Minuit2 source (provided in minuit2src)
Install Pybind11 or add as submodule (provided in pybind11)
35. Step 2: Plan interface
You should know what the C++ looks like, and know what you want the Python to
look like
For now, let's replicate the C++ experience
For example: a simple minimizer for (should quickly find 0 as minimum):
Define FCN
Setup parameters
Minimize
Print result
Will use print out for illustration (instead of MnPrint::SetLevel)
f (x) = x2
37. In [2]: %%writefile simpleminuit.cpp
#include "SimpleFCN.h"
int main() {
SimpleFCN fcn;
MnUserParameters upar;
upar.Add("x", 1., 0.1);
MnMigrad migrad(fcn, upar);
FunctionMinimum min = migrad();
std::cout << min << std::endl;
}
Overwriting simpleminuit.cpp
38. In [3]: %%writefile CMakeLists.txt
cmake_minimum_required(VERSION 3.4)
project(Minuit2SimpleExamle LANGUAGES CXX)
add_subdirectory(minuit2src)
add_executable(simpleminuit simpleminuit.cpp SimpleFCN.h)
target_link_libraries(simpleminuit PRIVATE Minuit2::Minuit2)
Overwriting CMakeLists.txt
39. Standard CMake configure and build (using Ninja instead of Make for speed)
In [4]: !cmake -GNinja .
!cmake --build .
-- Configuring done
-- Generating done
-- Build files have been written to: /eos/user/h/hschrein/SWAN_projects/pybi
ndings_cc
[2/2] Linking CXX executable simpleminuitinuit.dir/simpleminuit.cpp.o
40. In [5]: !./simpleminuit
val = 1
val = 1.001
val = 0.999
val = 1.0006
val = 0.999402
val = -8.23008e-11
val = 0.000345267
val = -0.000345267
val = -8.23008e-11
val = 0.000345267
val = -0.000345267
val = 6.90533e-05
val = -6.90535e-05
Minuit did successfully converge.
# of function calls: 13
minimum function Value: 6.773427082119e-21
minimum edm: 6.773427081817e-21
minimum internal state vector: LAVector parameters:
-8.230083281546e-11
minimum internal covariance matrix: LASymMatrix parameters:
1
# ext. || Name || type || Value || Error +/-
0 || x || free || -8.230083281546e-11 ||0.7071067811865
41. Step 3: Bind parts we need
subclassable FCNBase
MnUserParameters (constructor and Add(string, double, double))
MnMigrad (constructor and operator())
FunctionMinimum (cout)
42. Recommended structure of a Pybind11 program
main.cpp
Builds module
Avoids imports (fast compile)
include <pybind11/pybind11.h>
namespace py = pybind11;
void init_part1(py::module &);
void init_part2(py::module &);
PYBIND11_MODULE(mymodule, m) {
m.doc() = "Real code would never have such poor documentation...";
init_part1(m);
init_part2(m);
}
44. We will put all headers in a collective header (not a good idea unless you are trying to show
files one per slide).
In [8]: %%writefile pyminuit2/PyHeader.h
#pragma once
#include <pybind11/pybind11.h>
#include <pybind11/functional.h>
#include <pybind11/numpy.h>
#include <pybind11/stl.h>
#include <Minuit2/FCNBase.h>
#include <Minuit2/MnMigrad.h>
#include <Minuit2/MnApplication.h>
#include <Minuit2/MnUserParameters.h>
#include <Minuit2/FunctionMinimum.h>
namespace py = pybind11;
using namespace pybind11::literals;
using namespace ROOT::Minuit2;
Overwriting pyminuit2/PyHeader.h
45. Overloads
Pure virtual methods cannot be instantiated in C++
Have to provide "Trampoline class" to provide Python class
In [9]: %%writefile pyminuit2/FCNBase.cpp
#include "PyHeader.h"
class PyFCNBase : public FCNBase {
public:
using FCNBase::FCNBase;
double operator()(const std::vector<double> &v) const override {
PYBIND11_OVERLOAD_PURE_NAME(
double, FCNBase, "__call__", operator(), v);}
double Up() const override {
PYBIND11_OVERLOAD_PURE(double, FCNBase, Up, );}
};
void init_FCNBase(py::module &m) {
py::class_<FCNBase, PyFCNBase>(m, "FCNBase")
.def(py::init<>())
.def("__call__", &FCNBase::operator())
.def("Up", &FCNBase::Up);
}
Overwriting pyminuit2/FCNBase.cpp
49. In [13]: %%writefile CMakeLists.txt
cmake_minimum_required(VERSION 3.4)
project(Minuit2SimpleExamle LANGUAGES CXX)
set(CMAKE_POSITION_INDEPENDENT_CODE ON)
add_subdirectory(minuit2src)
add_executable(simpleminuit simpleminuit.cpp SimpleFCN.h)
target_link_libraries(simpleminuit PRIVATE Minuit2::Minuit2)
add_subdirectory(pybind11)
file(GLOB OUTPUT pyminuit2/*.cpp)
pybind11_add_module(minuit2 ${OUTPUT})
target_link_libraries(minuit2 PUBLIC Minuit2::Minuit2)
Overwriting CMakeLists.txt
50. In [14]: !cmake .
!cmake --build .
-- pybind11 v2.2.3
-- Configuring done
-- Generating done
-- Build files have been written to: /eos/user/h/hschrein/SWAN_projects/pybi
ndings_cc
[85/85] Linking CXX shared module minuit2.cpython-36m-x86_64-linux-gnu.so[Ko
51. Usage
We can now use our module! (Built in the current directory by CMake)
In [15]:
In [16]:
import sys
if '.' not in sys.path:
sys.path.append('.')
import minuit2
class SimpleFCN (minuit2.FCNBase):
def Up(self):
return 0.5
def __call__(self, v):
print("val =", v[0])
return v[0]**2;
52. In [17]: fcn = SimpleFCN()
upar = minuit2.MnUserParameters()
upar.Add("x", 1., 0.1)
migrad = minuit2.MnMigrad(fcn, upar)
min = migrad()
val = 1.0
val = 1.001
val = 0.999
val = 1.0005980198587356
val = 0.9994019801412644
val = -8.230083281546285e-11
val = 0.00034526688527999595
val = -0.0003452670498816616
val = -8.230083281546285e-11
val = 0.00034526688527999595
val = -0.0003452670498816616
val = 6.905331121533294e-05
val = -6.905347581699857e-05
53. In [18]: print(min)
Minuit did successfully converge.
# of function calls: 13
minimum function Value: 6.773427082119e-21
minimum edm: 6.773427081817e-21
minimum internal state vector: LAVector parameters:
-8.230083281546e-11
minimum internal covariance matrix: LASymMatrix parameters:
1
# ext. || Name || type || Value || Error +/-
0 || x || free || -8.230083281546e-11 ||0.7071067811865
54. Done
See
for a more
complete example
Pybind11 bindings can talk to each other at the C level!
Overall topics covered:
ctypes, CFFI : Pure Python, C only
CPython: How all bindings work
SWIG: Multi-language, automatic
Cython: New language
Pybind11: Pure C++11
CPPYY: From ROOT's JIT engine
An advanced binding in Pybind11
GooFit's built in Minuit2 bindings
(https://github.com/GooFit/GooFit/tree/master/python/Minuit2)
55. Backup:
This is the setup.py file for the Miniut2 bindings. With this, you can use the standard
Python tools to build! (but slower and more verbose than CMake)
In [19]: %%writefile setup.py
from setuptools import setup, Extension
from setuptools.command.build_ext import build_ext
import sys
import setuptools
from pathlib import Path # Python 3 or Python 2 backport: pathlib2
import pybind11 # Real code should defer this import
56. sources = set(str(p) for p in Path('Minuit2-6.14.0-Source/src').glob('**/*.cxx')
)
sources.remove('Minuit2-6.14.0-Source/src/TMinuit2TraceObject.cxx')
## Add your sources to `sources`
sources |= set(str(p) for p in Path('pyminuit2').glob('*.cpp'))
ext_modules = [
Extension(
'minuit2',
list(sources),
include_dirs=[
pybind11.get_include(False),
pybind11.get_include(True),
'Minuit2-6.14.0-Source/inc',
],
language='c++',
define_macros=[('WARNINGMSG', None),
('MATH_NO_PLUGIN_MANAGER', None),
('ROOT_Math_VecTypes', None)
],
),
]
class BuildExt(build_ext):
"""A custom build extension for adding compiler-specific options."""
c_opts = {
'msvc': ['/EHsc'],
'unix': [],
}
if sys.platform == 'darwin':
c_opts['unix'] += ['-stdlib=libc++', '-mmacosx-version-min=10.7']
def build_extensions(self):
ct = self.compiler.compiler_type
opts = self.c_opts.get(ct, [])
if ct == 'unix':