Pythran is a an ahead of time compiler that turns modules written in a large subset of Python into C++ meta-programs that can be compiled into efficient native modules. It targets mainly compute intensive part of the code, hence it comes as no surprise that it focuses on scientific applications that makes extensive use of Numpy. Under the hood, Pythran inter-procedurally analyses the program and performs high level optimizations and parallel code generation. Parallelism can be found implicitly in Python intrinsics or Numpy operations, or explicitly specified by the programmer using OpenMP directives directly in the Python source code. Either way, the input code remains fully compatible with the Python interpreter. While the idea is similar to Parakeet or Numba, the approach differs significantly: the code generation is not performed at runtime but offline. Pythran generates C++11 heavily templated code that makes use of the NT2 meta-programming library and relies on any standard-compliant compiler to generate the binary code. We propose to walk through some examples and benchmarks, exposing the current state of what Pythran provides as well as the limit of the approach.
Swift is proposed as a next-generation platform for TensorFlow that could provide benefits over Python like improved performance, type safety, and enabling automatic differentiation at compile time. However, Python currently dominates the machine learning ecosystem. Swift and Python are intended to have a complementary relationship, with each suited to different use cases. Examples show comparable MNIST implementations in both Swift and Python for TensorFlow.
Tensorflow in practice by Engineer - donghwi chaDonghwi Cha
- Tensorflow is an introduction to the machine learning framework Tensorflow covering key concepts like computation graphs, operations, sessions, training, replication, and clustering.
- Key aspects discussed include how Tensorflow executes operations as a static computation graph, uses sessions to run graphs and tensors to hold values, and supports data parallelism through replication across devices/workers.
- The document provides examples of building neural network models in Tensorflow and discusses techniques for training models like backpropagation and distributing training using data parallelism.
This document provides an introduction to parallel programming in Python using MPI. It discusses how Python can be slow compared to C and describes various techniques to speed up Python code including profiling, Numba, and parallel programming. The key parallel programming approach discussed is MPI using mpi4py which allows Python programs to run across multiple processors. An example MPI program is provided to demonstrate basic point-to-point communication between processes.
Python for Scientific Computing -- Ricardo Cruzrpmcruz
This document discusses Python for scientific computing. It provides notes on NumPy, the fundamental package for scientific computing in Python. NumPy allows vectorized mathematical operations on multidimensional arrays in a simple and efficient manner. The notes cover common NumPy operations and syntax as compared to MATLAB and R. Pandas is also introduced as a package for data manipulation and analysis based on the concept of data frames from R. Examples are given of generating fake data to demonstrate modeling capabilities in Python.
For the full video of this presentation, please visit:
https://www.embedded-vision.com/platinum-members/embedded-vision-alliance/embedded-vision-training/videos/pages/dec-2019-alliance-vitf-facebook
For more information about embedded vision, please visit:
http://www.embedded-vision.com
Joseph Spisak, Product Manager at Facebook, delivers the presentation "PyTorch Deep Learning Framework: Status and Directions" at the Embedded Vision Alliance's December 2019 Vision Industry and Technology Forum. Spisak gives an update on the Torch deep learning framework and where it’s heading.
PVS-Studio team experience: checking various open source projects, or mistake...Andrey Karpov
To let the world know about our product, we check open-source projects. By the moment we have checked 245 projects. A side effect: we found 9574 errors and notified the authors about them.
A peek on numerical programming in perl and python e christopher dyken 2005Jules Krdenas
This document compares the numerical programming capabilities and performance of Perl and Python with and without numerical libraries like NumPy and PDL. It implements a trapezoidal quadrature rule to integrate three different functions in standard C, optimized C, Python, Python with NumPy, Python with numarray, Perl, and Perl with PDL. The results show that plain Python and Perl are much slower than C, but with numerical libraries their performance is comparable to optimized C for problems that can be formulated as element-by-element array operations. NumPy performs worse for simple functions but the gap decreases for more complex functions that use trigonometric operations. So for numerical problems, Python and Perl with add-on libraries can be viable alternatives to C/C
Swift is proposed as a next-generation platform for TensorFlow that could provide benefits over Python like improved performance, type safety, and enabling automatic differentiation at compile time. However, Python currently dominates the machine learning ecosystem. Swift and Python are intended to have a complementary relationship, with each suited to different use cases. Examples show comparable MNIST implementations in both Swift and Python for TensorFlow.
Tensorflow in practice by Engineer - donghwi chaDonghwi Cha
- Tensorflow is an introduction to the machine learning framework Tensorflow covering key concepts like computation graphs, operations, sessions, training, replication, and clustering.
- Key aspects discussed include how Tensorflow executes operations as a static computation graph, uses sessions to run graphs and tensors to hold values, and supports data parallelism through replication across devices/workers.
- The document provides examples of building neural network models in Tensorflow and discusses techniques for training models like backpropagation and distributing training using data parallelism.
This document provides an introduction to parallel programming in Python using MPI. It discusses how Python can be slow compared to C and describes various techniques to speed up Python code including profiling, Numba, and parallel programming. The key parallel programming approach discussed is MPI using mpi4py which allows Python programs to run across multiple processors. An example MPI program is provided to demonstrate basic point-to-point communication between processes.
Python for Scientific Computing -- Ricardo Cruzrpmcruz
This document discusses Python for scientific computing. It provides notes on NumPy, the fundamental package for scientific computing in Python. NumPy allows vectorized mathematical operations on multidimensional arrays in a simple and efficient manner. The notes cover common NumPy operations and syntax as compared to MATLAB and R. Pandas is also introduced as a package for data manipulation and analysis based on the concept of data frames from R. Examples are given of generating fake data to demonstrate modeling capabilities in Python.
For the full video of this presentation, please visit:
https://www.embedded-vision.com/platinum-members/embedded-vision-alliance/embedded-vision-training/videos/pages/dec-2019-alliance-vitf-facebook
For more information about embedded vision, please visit:
http://www.embedded-vision.com
Joseph Spisak, Product Manager at Facebook, delivers the presentation "PyTorch Deep Learning Framework: Status and Directions" at the Embedded Vision Alliance's December 2019 Vision Industry and Technology Forum. Spisak gives an update on the Torch deep learning framework and where it’s heading.
PVS-Studio team experience: checking various open source projects, or mistake...Andrey Karpov
To let the world know about our product, we check open-source projects. By the moment we have checked 245 projects. A side effect: we found 9574 errors and notified the authors about them.
A peek on numerical programming in perl and python e christopher dyken 2005Jules Krdenas
This document compares the numerical programming capabilities and performance of Perl and Python with and without numerical libraries like NumPy and PDL. It implements a trapezoidal quadrature rule to integrate three different functions in standard C, optimized C, Python, Python with NumPy, Python with numarray, Perl, and Perl with PDL. The results show that plain Python and Perl are much slower than C, but with numerical libraries their performance is comparable to optimized C for problems that can be formulated as element-by-element array operations. NumPy performs worse for simple functions but the gap decreases for more complex functions that use trigonometric operations. So for numerical problems, Python and Perl with add-on libraries can be viable alternatives to C/C
A Speculative Technique for Auto-Memoization Processor with MultithreadingMatsuo and Tsumura lab.
Yushi KAMIYA, Tomoaki TSUMURA, Hiroshi MATSUO, Yasuhiko NAKASHIMA:
"A Speculative Technique for Auto-Memoization Processor with Multithreading"(発表資料)
Proc. 10th Intl. Conf. on Parallel and Distributed Computing, Applications and Technologies (PDCAT'09), Higashi-Hiroshima, Japan, pp.160-166 (Dec. 2009)
This document summarizes two projects that aim to optimize Python performance: the AST optimizer and register-based bytecode. The AST optimizer rewrites Python abstract syntax trees to enable more optimizations than CPython's peephole optimizer. It aims to improve performance by constant folding, loop optimizations, and other transformations. The register-based bytecode project rewrites CPython's stack-based bytecode to use registers, enabling additional optimizations like common subexpression elimination and constant/attribute deduplication. Both projects showed speedups over CPython in benchmarks, but may not translate to all real applications. The document provides details on each project's approach and goals.
Basic concept of Deep Learning with explaining its structure and backpropagation method and understanding autograd in PyTorch. (+ Data parallism in PyTorch)
In this talk I introduced Yampa, the AFRP framework in Haskell, and the Quake-like game made by it. The content convers the basic of Functional Reactive Programming, Haskell Arrow, Yampa itself, time-space leak, etc.
Tiramisu is a code optimization and generation framework that can be integrated into custom compilers. It supports various backends including multi-CPU (using LLVM), GPU (using CUDA), distributed systems (using MPI), and FPGAs (using Xilinx Vivado HLS). Tiramisu uses polyhedral representations to support irregular domains beyond just rectangles. The document provides an overview of Tiramisu and discusses challenges related to supporting different platforms, memory dependencies, efficient code generation, and representations. It also mentions that Tiramisu uses Halide and ISL.
Igor Rivin wrote about his experience learning Julia by implementing algorithms to compute multiplicative functions over integer ranges. His initial implementations in Julia were slow, taking 10 minutes to run. After profiling, he realized Julia is not a functional programming language and rewrote the algorithms to avoid excessive object creation/destruction, achieving a 1500x speedup. The key insight was to first sieve and factor all numbers, then compute functions on the prime powers, avoiding re-factoring. This approach was much faster, beating Mathematica by factorizing once and reusing factors. Rivin concluded Julia freed him from the constraints of shell languages like Mathematica while avoiding low-level languages, and reinforced the importance of keeping algorithms simple.
TCO in Python via bytecode manipulation.lnikolaeva
The document discusses optimizing tail recursion in Python by manipulating bytecode. It begins by describing the problem of recursion depth limits in Python and the challenge of optimizing recursive function calls to avoid stack overflows. It then provides an introduction to Python bytecode and disassembly and shows how a recursive function's bytecode can be modified by replacing the recursive call instruction with instructions to reset variables and jump back to the start of the function. This optimization technique is implemented in a recursion optimizer tool and shown to improve performance over the original recursive function. Further optimizations are discussed, such as tracking the call stack to properly handle calls to other functions within the recursive function.
Egor Bogatov - .NET Core intrinsics and other micro-optimizationsEgor Bogatov
The document discusses various micro-optimizations and techniques for optimizing .NET code including using Span over Substring, avoiding allocations by using stackalloc or ArrayPool, eliminating bounds checks, using intrinsics and SIMD, and examples of optimizing specific algorithms like reversing an array. It covers potential pitfalls like type casts not being free and differences between runtimes. Intrinsics and SIMD can provide significant performance improvements over naive implementations by leveraging CPU instructions.
Use the Matplotlib, Luke @ PyCon Taiwan 2012Wen-Wei Liao
Matplotlib is a Python 2D plotting library that produces publication-quality figures in both hardcopy and interactive environments across platforms. It provides both object-oriented and MATLAB-style state machine interfaces. Matplotlib can be used to create simple plots with just a few commands or customize plots extensively by manipulating the object properties of figures, axes, and artists.
Presented by Stephen Murtagh, Etsy.com, Inc.
TF-IDF (term frequency, inverse document frequency) is a standard method of weighting query terms for scoring documents, and is the method that is used by default in Solr/Lucene. Unfortunately, TF-IDF is really only a measure of rarity, not quality or usefulness. This means it would give more weight to a useless, rare term, such as a misspelling, than to a more useful, but more common, term.
In this presentation, we will discuss our experiences replacing Lucene's TF-IDF based scoring function with a more useful one using information gain, a standard machine-learning measure that combines frequency and specificity. Information gain is much more expensive to compute, however, so this requires periodically computing the term weights outside of Solr/Lucene and making the results accessible within Solr/Lucene.
How to add an optimization for C# to RyuJITEgor Bogatov
This document discusses various ways to optimize C# code by adding custom optimizations to the JIT compiler. It begins by explaining how to morph the intermediate representation (IR) tree during JIT compilation to optimize expressions like dividing by a constant. It then covers implementing range check elimination, loop optimizations like invariant code hoisting, and ideas for optimizations not yet implemented like loop unrolling and deletion. The goal is to explore how to most easily add optimizations by modifying phases in the JIT compiler.
The document summarizes a presentation on Cython, a programming language that allows writing Python extensions and integrating Python with C/C++ code. Cython code can be compiled into C/C++ extensions that speed up Python code by allowing static type declarations. The presentation covers Cython features like static typing, C pointers and strings, exception handling, and defining extension types. It provides examples of Cython code and compiling Cython to C/C++ extensions using various methods.
В последнее время экосистема .NET развивается очень динамично: постоянно появляются новые технологии и инструменты, а старые обзаводятся новыми возможностями. Уследить за всем очень сложно, поэтому в этом докладе мы постараемся обзорно взглянуть на текущее состояние платформы .NET, а также на то, что нас ждёт в ближайшем будущем. Будем говорить про грядущий C#7, про кроссплатформенность и нативную компиляцию, про новый .NET Core 5 и ASP.NET 5, про новые инструменты для разработчиков и последние анонсы от Microsoft.
Seeing with Python presented at PyCon AU 2014Mark Rees
This document discusses the history and current state of computer vision. It begins with definitions of computer vision from the 1980s, focusing on machine vision and automatically analyzing images. It then provides a 2014 definition that emphasizes duplicating human vision abilities through electronic image perception and understanding using models from various fields. The document notes computer vision involves more than just image capture, including image processing, algorithm development, and display control. It also lists and briefly describes several popular Python libraries for computer vision tasks, such as PIL, Scipy ndimage, Mahotas, PCV, SimpleCV, and OpenCV. It concludes with resources for learning more about computer vision and Python.
The document provides an introduction to Objective-C, including background information on its origins and current usage. It discusses key Objective-C concepts like classes, methods, memory management, and the differences between static, stack and heap memory. Code examples are provided to demonstrate how to declare classes, instantiate objects, call methods, and handle memory allocation and release of objects.
A formalization of complex event stream processingSylvain Hallé
Information systems in general, and business processes in particular, generate a wealth of information in the form of event traces or logs. The analysis of these logs, either offline or in real-time, can be put to numerous uses: computation of various statistics, detection of anomalous patterns or compliance violations of some form of contract. However, current solutions for Complex Event Processing (CEP) generally offer only a restricted set of predefined queries on traces, and otherwise require a user to write procedural code to compute custom queries. In this presentation, we present a formal and declarative language for the manipulation of event traces.
Efficient Volume and Edge-Skeleton Computation for Polytopes Given by OraclesVissarion Fisikopoulos
The document discusses efficient algorithms for computing volume and edge skeletons of polytopes defined implicitly by optimization oracles. It presents an algorithm to compute the edge skeleton of a polytope in oracle calls and arithmetic operations. It also describes using geometric random walks and optimization oracles to approximate polytope volume, which is more efficient than exact computation for high dimensions. Experimental results show the approach computes volume within minutes for polytopes up to dimension 12 with less than 2% error.
Effective Numerical Computation in NumPy and SciPyKimikazu Kato
This document provides an overview of effective numerical computation in NumPy and SciPy. It discusses how Python can be used for numerical computation tasks like differential equations, simulations, and machine learning. While Python is initially slower than languages like C, libraries like NumPy and SciPy allow Python code to achieve sufficient speed through techniques like broadcasting, indexing, and using sparse matrix representations. The document provides examples of how to efficiently perform tasks like applying functions element-wise to sparse matrices and calculating norms. It also presents a case study for efficiently computing a formula that appears in a machine learning paper using different sparse matrix representations in SciPy.
This document summarizes SciSmalltalk, an open-source library for scientific computing in Smalltalk. The library contains packages for numerical methods like matrices, statistics, polynomials, integration, and differential equations. It has over 600 unit tests and supports Pharo and Squeak. The library is developed and maintained by contributors from various universities and has been in development since 2012. Future plans include improving modularity and packaging, reproducing R examples in Roassal, and welcoming new contributions.
Imagine writing a pure Python library which can achieve the performance of Fortran or C/C++.
To this end we have developed Pyccel, which translates Python code to either Fortran or C, and makes the generated code callable from Python. The generated Fortran or C code is not only fast, but also human-readable; hence it can easily be profiled and optimized for the target machine.
Pyccel has a focus on high-performance computing applications, where the efficient usage of the available hardware resources is fundamental.
To this end it provides type annotations, function decorators, and OpenMP pragmas.
Pyccel is easy to use, is almost completely written in Python, and compares favourably against other Python accelerators.
Numba: Array-oriented Python Compiler for NumPyTravis Oliphant
Numba is a Python compiler that translates Python code into fast machine code using the LLVM compiler infrastructure. It allows Python code that works with NumPy arrays to be just-in-time compiled to native machine instructions, achieving performance comparable to C, C++ and Fortran for numeric work. Numba provides decorators like @jit that can compile functions for improved performance on NumPy array operations. It aims to make Python a compiled and optimized language for scientific computing by leveraging type information from NumPy to generate fast machine code.
A Speculative Technique for Auto-Memoization Processor with MultithreadingMatsuo and Tsumura lab.
Yushi KAMIYA, Tomoaki TSUMURA, Hiroshi MATSUO, Yasuhiko NAKASHIMA:
"A Speculative Technique for Auto-Memoization Processor with Multithreading"(発表資料)
Proc. 10th Intl. Conf. on Parallel and Distributed Computing, Applications and Technologies (PDCAT'09), Higashi-Hiroshima, Japan, pp.160-166 (Dec. 2009)
This document summarizes two projects that aim to optimize Python performance: the AST optimizer and register-based bytecode. The AST optimizer rewrites Python abstract syntax trees to enable more optimizations than CPython's peephole optimizer. It aims to improve performance by constant folding, loop optimizations, and other transformations. The register-based bytecode project rewrites CPython's stack-based bytecode to use registers, enabling additional optimizations like common subexpression elimination and constant/attribute deduplication. Both projects showed speedups over CPython in benchmarks, but may not translate to all real applications. The document provides details on each project's approach and goals.
Basic concept of Deep Learning with explaining its structure and backpropagation method and understanding autograd in PyTorch. (+ Data parallism in PyTorch)
In this talk I introduced Yampa, the AFRP framework in Haskell, and the Quake-like game made by it. The content convers the basic of Functional Reactive Programming, Haskell Arrow, Yampa itself, time-space leak, etc.
Tiramisu is a code optimization and generation framework that can be integrated into custom compilers. It supports various backends including multi-CPU (using LLVM), GPU (using CUDA), distributed systems (using MPI), and FPGAs (using Xilinx Vivado HLS). Tiramisu uses polyhedral representations to support irregular domains beyond just rectangles. The document provides an overview of Tiramisu and discusses challenges related to supporting different platforms, memory dependencies, efficient code generation, and representations. It also mentions that Tiramisu uses Halide and ISL.
Igor Rivin wrote about his experience learning Julia by implementing algorithms to compute multiplicative functions over integer ranges. His initial implementations in Julia were slow, taking 10 minutes to run. After profiling, he realized Julia is not a functional programming language and rewrote the algorithms to avoid excessive object creation/destruction, achieving a 1500x speedup. The key insight was to first sieve and factor all numbers, then compute functions on the prime powers, avoiding re-factoring. This approach was much faster, beating Mathematica by factorizing once and reusing factors. Rivin concluded Julia freed him from the constraints of shell languages like Mathematica while avoiding low-level languages, and reinforced the importance of keeping algorithms simple.
TCO in Python via bytecode manipulation.lnikolaeva
The document discusses optimizing tail recursion in Python by manipulating bytecode. It begins by describing the problem of recursion depth limits in Python and the challenge of optimizing recursive function calls to avoid stack overflows. It then provides an introduction to Python bytecode and disassembly and shows how a recursive function's bytecode can be modified by replacing the recursive call instruction with instructions to reset variables and jump back to the start of the function. This optimization technique is implemented in a recursion optimizer tool and shown to improve performance over the original recursive function. Further optimizations are discussed, such as tracking the call stack to properly handle calls to other functions within the recursive function.
Egor Bogatov - .NET Core intrinsics and other micro-optimizationsEgor Bogatov
The document discusses various micro-optimizations and techniques for optimizing .NET code including using Span over Substring, avoiding allocations by using stackalloc or ArrayPool, eliminating bounds checks, using intrinsics and SIMD, and examples of optimizing specific algorithms like reversing an array. It covers potential pitfalls like type casts not being free and differences between runtimes. Intrinsics and SIMD can provide significant performance improvements over naive implementations by leveraging CPU instructions.
Use the Matplotlib, Luke @ PyCon Taiwan 2012Wen-Wei Liao
Matplotlib is a Python 2D plotting library that produces publication-quality figures in both hardcopy and interactive environments across platforms. It provides both object-oriented and MATLAB-style state machine interfaces. Matplotlib can be used to create simple plots with just a few commands or customize plots extensively by manipulating the object properties of figures, axes, and artists.
Presented by Stephen Murtagh, Etsy.com, Inc.
TF-IDF (term frequency, inverse document frequency) is a standard method of weighting query terms for scoring documents, and is the method that is used by default in Solr/Lucene. Unfortunately, TF-IDF is really only a measure of rarity, not quality or usefulness. This means it would give more weight to a useless, rare term, such as a misspelling, than to a more useful, but more common, term.
In this presentation, we will discuss our experiences replacing Lucene's TF-IDF based scoring function with a more useful one using information gain, a standard machine-learning measure that combines frequency and specificity. Information gain is much more expensive to compute, however, so this requires periodically computing the term weights outside of Solr/Lucene and making the results accessible within Solr/Lucene.
How to add an optimization for C# to RyuJITEgor Bogatov
This document discusses various ways to optimize C# code by adding custom optimizations to the JIT compiler. It begins by explaining how to morph the intermediate representation (IR) tree during JIT compilation to optimize expressions like dividing by a constant. It then covers implementing range check elimination, loop optimizations like invariant code hoisting, and ideas for optimizations not yet implemented like loop unrolling and deletion. The goal is to explore how to most easily add optimizations by modifying phases in the JIT compiler.
The document summarizes a presentation on Cython, a programming language that allows writing Python extensions and integrating Python with C/C++ code. Cython code can be compiled into C/C++ extensions that speed up Python code by allowing static type declarations. The presentation covers Cython features like static typing, C pointers and strings, exception handling, and defining extension types. It provides examples of Cython code and compiling Cython to C/C++ extensions using various methods.
В последнее время экосистема .NET развивается очень динамично: постоянно появляются новые технологии и инструменты, а старые обзаводятся новыми возможностями. Уследить за всем очень сложно, поэтому в этом докладе мы постараемся обзорно взглянуть на текущее состояние платформы .NET, а также на то, что нас ждёт в ближайшем будущем. Будем говорить про грядущий C#7, про кроссплатформенность и нативную компиляцию, про новый .NET Core 5 и ASP.NET 5, про новые инструменты для разработчиков и последние анонсы от Microsoft.
Seeing with Python presented at PyCon AU 2014Mark Rees
This document discusses the history and current state of computer vision. It begins with definitions of computer vision from the 1980s, focusing on machine vision and automatically analyzing images. It then provides a 2014 definition that emphasizes duplicating human vision abilities through electronic image perception and understanding using models from various fields. The document notes computer vision involves more than just image capture, including image processing, algorithm development, and display control. It also lists and briefly describes several popular Python libraries for computer vision tasks, such as PIL, Scipy ndimage, Mahotas, PCV, SimpleCV, and OpenCV. It concludes with resources for learning more about computer vision and Python.
The document provides an introduction to Objective-C, including background information on its origins and current usage. It discusses key Objective-C concepts like classes, methods, memory management, and the differences between static, stack and heap memory. Code examples are provided to demonstrate how to declare classes, instantiate objects, call methods, and handle memory allocation and release of objects.
A formalization of complex event stream processingSylvain Hallé
Information systems in general, and business processes in particular, generate a wealth of information in the form of event traces or logs. The analysis of these logs, either offline or in real-time, can be put to numerous uses: computation of various statistics, detection of anomalous patterns or compliance violations of some form of contract. However, current solutions for Complex Event Processing (CEP) generally offer only a restricted set of predefined queries on traces, and otherwise require a user to write procedural code to compute custom queries. In this presentation, we present a formal and declarative language for the manipulation of event traces.
Efficient Volume and Edge-Skeleton Computation for Polytopes Given by OraclesVissarion Fisikopoulos
The document discusses efficient algorithms for computing volume and edge skeletons of polytopes defined implicitly by optimization oracles. It presents an algorithm to compute the edge skeleton of a polytope in oracle calls and arithmetic operations. It also describes using geometric random walks and optimization oracles to approximate polytope volume, which is more efficient than exact computation for high dimensions. Experimental results show the approach computes volume within minutes for polytopes up to dimension 12 with less than 2% error.
Effective Numerical Computation in NumPy and SciPyKimikazu Kato
This document provides an overview of effective numerical computation in NumPy and SciPy. It discusses how Python can be used for numerical computation tasks like differential equations, simulations, and machine learning. While Python is initially slower than languages like C, libraries like NumPy and SciPy allow Python code to achieve sufficient speed through techniques like broadcasting, indexing, and using sparse matrix representations. The document provides examples of how to efficiently perform tasks like applying functions element-wise to sparse matrices and calculating norms. It also presents a case study for efficiently computing a formula that appears in a machine learning paper using different sparse matrix representations in SciPy.
This document summarizes SciSmalltalk, an open-source library for scientific computing in Smalltalk. The library contains packages for numerical methods like matrices, statistics, polynomials, integration, and differential equations. It has over 600 unit tests and supports Pharo and Squeak. The library is developed and maintained by contributors from various universities and has been in development since 2012. Future plans include improving modularity and packaging, reproducing R examples in Roassal, and welcoming new contributions.
Imagine writing a pure Python library which can achieve the performance of Fortran or C/C++.
To this end we have developed Pyccel, which translates Python code to either Fortran or C, and makes the generated code callable from Python. The generated Fortran or C code is not only fast, but also human-readable; hence it can easily be profiled and optimized for the target machine.
Pyccel has a focus on high-performance computing applications, where the efficient usage of the available hardware resources is fundamental.
To this end it provides type annotations, function decorators, and OpenMP pragmas.
Pyccel is easy to use, is almost completely written in Python, and compares favourably against other Python accelerators.
Numba: Array-oriented Python Compiler for NumPyTravis Oliphant
Numba is a Python compiler that translates Python code into fast machine code using the LLVM compiler infrastructure. It allows Python code that works with NumPy arrays to be just-in-time compiled to native machine instructions, achieving performance comparable to C, C++ and Fortran for numeric work. Numba provides decorators like @jit that can compile functions for improved performance on NumPy array operations. It aims to make Python a compiled and optimized language for scientific computing by leveraging type information from NumPy to generate fast machine code.
Profiling in Python provides concise summaries of key profiling tools in 3 sentences:
cProfile and line_profiler profile execution time and identify slow lines of code. memory_profiler profiles memory usage with line-by-line or time-based outputs. YEP extends profiling to compiled C/C++ extensions like Cython modules, which are not covered by the standard Python profilers.
The document discusses various randomization algorithms used in Python programming including pseudorandom number generators (PRNGs) like the Mersenne Twister and Linear Congruential Generator (LCG). It provides details on how these algorithms work, their statistical properties, and examples of using LCG to simulate coin tosses and compare results to Python's random number generator.
This document summarizes an advanced Python programming course, covering topics like performance tuning, garbage collection, and extending Python. It discusses profiling Python code to find bottlenecks, using more efficient algorithms and data structures, optimizing code through techniques like reducing temporary objects and inline functions, leveraging faster tools like NumPy, writing extension modules in C, and parallelizing computation across CPUs and clusters. It also explains basic garbage collection algorithms like reference counting and mark-and-sweep used in CPython.
Numerical tour in the Python eco-system: Python, NumPy, scikit-learnArnaud Joly
We first present the Python programming language and the NumPy package for scientific computing. Then, we devise a digit recognition system highlighting the scikit-learn package.
This document discusses parallelizing loops to compute pi through numerical integration. It begins by describing how pi can be computed by approximating the integral of 1/(1+x^2) from 0 to 1. It then shows C++ and Fortran code that performs this computation sequentially with increasing numbers of steps to improve the approximation of pi. The document discusses variable scopes in C++ and Fortran and how OpenMP parallelization works by dividing the loop workload across threads. It demonstrates parallelizing the loops in both languages using the reduction clause to correctly sum values across threads. Finally, it discusses considerations for parallelizing loops and challenges like data dependencies.
This document provides an overview of SciPy and demonstrates some of its functionality for interpolation, integration, and optimization. SciPy imports NumPy functions and provides additional tools for domains like signal processing, linear algebra, special functions, and more. Examples show linear, cubic and barycentric interpolation of 1D and 2D functions. Integration tools like quad, dblquad, simps and cumtrapz are applied to functions. However, optimization examples are not shown.
Simple, fast, and scalable torch7 tutorialJin-Hwa Kim
A tutorial based on basic information of Torch7. It covers installation, simple runable codes, tensor manipulations, sweep out key-packages and post-hoc audience q&a.
Java/Scala Lab: Анатолий Кметюк - Scala SubScript: Алгебра для реактивного пр...GeeksLab Odessa
SubScript - это расширение языка Scala, добавляющее поддержку конструкций и синтаксиса аглебры общающихся процессов (Algebra of Communicating Processes, ACP). SubScript является перспективным расширением, применимым как для разработки высоконагруженных параллельных систем, так и для простых персональных приложений.
The document summarizes various Python profiling tools. It discusses using the time utility and time module to measure elapsed time. It also covers the profile, cProfile, hotshot, lineprofiler, memoryprofiler, and objgraph modules for profiling code performance and memory usage. Examples are given showing how each tool can be used and the type of output it provides.
TensorFrames: Google Tensorflow on Apache SparkDatabricks
Presentation at Bay Area Spark Meetup by Databricks Software Engineer and Spark committer Tim Hunter.
This presentation covers how you can use TensorFrames with Tensorflow to distributed computing on GPU.
This document discusses various techniques for optimizing Python code to improve performance. It begins by explaining that Python is an interpreted language and is generally slower than compiled languages like C/C++. Several methods for speeding up Python code are then presented: using local variables instead of global variables, leveraging built-in functions, list comprehensions, generator expressions, NumPy for numeric computing, Numba for just-in-time compilation, and algorithm/data structure optimization. Specific code examples are provided to demonstrate how these techniques can significantly reduce runtime. The key message is that with the right optimizations, Python code can achieve speeds comparable to lower-level languages while retaining the benefits of a high-level, interpreted language.
By Tobias Grosser, Scalable Parallel Computing Laboratory
The COSMO climate and weather model delivers daily forecasts for Switzerland and many other nations. As a traditional HPC application it was developed with SIMD-CPUs in mind and large manual efforts were required to enable the 2016 move to GPU acceleration. As today's high-performance computer systems increasingly rely on accelerators to reach peak performance and manual translation to accelerators is both costly and difficult to maintain, we propose a fully automatic accelerator compiler for the automatic translation of scientific Fortran codes to CUDA GPU accelerated systems. Several challenges had to be overcome to make this reality: 1) improved scalability, 2) automatic data placement using unified memory, 3) loop rescheduling to expose coarse-grained parallelism, 4) inter-procedural loop optimization, and 5) plenty of performance tuning. Our evaluation shows that end-to-end automatic accelerator compilation is possible for non-trivial portions of the COSMO climate model, despite the lack of complete static information. Non-trivial loop optimizations previously implemented manually are performed fully automatically and memory management happens fully transparently using unified memory. Our preliminary results show notable performance improvements over sequential CPU code (40s to 8s reduction in execution time) and we are currently working on closing the remaining gap to hand-tuned GPU code. This talk is a status update on our most recent efforts and also intended to gather feedback on future research plans towards automatically mapping COSMO to FPGAs.
Tobias Grosser Bio
Tobias Grosser is a senior researcher in the Scalable Parallel Computing Laboratory (SPCL) of Torsten Hoefler at the Computer Science Department of ETH Zürich. Supported by a Google PhD Fellowship he received his doctoral degree from Universite Pierre et Marie Curie under the supervision of Albert Cohen. Tobias' research is taking place at the border of low-level compilers and high-level program transformations with the goal of enabling complex - but highly-beneficial - program transformations in a production compiler environment. He develops with the Polly loop optimizer a loop transformation framework which today is a community project supported throught the Polly Labs research laboratory. Tobias also developed advanced tiling schemes for the efficient execution of iterated stencils. Today Tobias leads the heterogeneous compute efforts in the Swiss University funded ComPASC project and is about to start a three year NSF Ambizione project on advancing automatic compilation and heterogenization techniques at ETH Zurich.
Email
bgerofi@riken.jp
For more info on The Linaro High Performance Computing (HPC) visit https://www.linaro.org/sig/hpc/
The document discusses properties in Python classes. Properties allow accessing attributes through normal attribute syntax, while allowing custom behavior through getter and setter methods. This avoids directly accessing attributes and allows for validation in setters. Properties are defined using the @property and @setter decorators, providing a cleaner syntax than regular getter/setter methods. They behave like regular attributes but allow underlying method calls.
This document summarizes key aspects of iteration in Python based on the provided document:
1. Python supports multiple ways of iteration including for loops and generators. For loops are preferred for iteration over finite collections while generators enable infinite iteration.
2. Common iteration patterns include iterating over elements, indices, or both using enumerate(). Numerical iteration can be done with for loops or while loops.
3. Functions are first-class objects in Python and can be passed as arguments or returned as values, enabling functional programming patterns like mapping and filtering.
This document discusses various techniques for optimizing Python code, including:
1. Using the right algorithms and data structures to minimize time complexity, such as choosing lists, sets or dictionaries based on needed functionality.
2. Leveraging Python-specific optimizations like string concatenation, lookups, loops and imports.
3. Profiling code with tools like timeit, cProfile and visualizers to identify bottlenecks before optimizing.
4. Optimizing only after validating a performance need and starting with general strategies before rewriting hotspots in Python or other languages. Premature optimization can complicate code.
Being a slow interpreter, Python may drive a system to deliver utmost speed if some guidelines are followed. The key is to treat programming languages as syntactic sugar to the machine code. It expedites the workflow of timing, iterative design, automatic testing, optimization, and realize an HPC system balancing the time to market and quality of code.
Speed is the king. 10x productive developers change business. So does 10x faster code. Python is 100x slower than C++ but it only matters when you really use Python to implement number-crunching algorithms. We should not do that, and instead go directly with C++ for speed. It calls for strict disciplines of software engineering and code quality, but it should be noted that here the quality is defined by the runtime and the time to market.
The presentation focuses on the Python side of the development workflow. It is made possible by confining C++ in architecture defined by the Python code, which realizes most of the software engineering. The room for writing fast C++ code is provided by pybind11 and careful design of typed data objects. The data objects hold memory buffers exposed to Python as numpy ndarrays for direct access for speed.
Similar to Pythran: Static compiler for high performance by Mehdi Amini PyData SV 2014 (20)
Michal Mucha: Build and Deploy an End-to-end Streaming NLP Insight System | P...PyData
This document outlines the steps to build your own natural language processing (NLP) system, beginning with creating a streaming consumer, launching a message queue service, creating a data pre-processing service, serving an ML model, and publishing predictions to a messaging app. It discusses separating components for modularity and ease of testing/extensibility. The presenter recommends tools like Anaconda, Docker, Redis, Fast.ai and SpaCy and walks through setting up the environment and each step in a Jupyter notebook. The goal is to experiment with building your own end-to-end NLP system in a modular, reusable way.
Unit testing data with marbles - Jane Stewart Adams, Leif WalshPyData
In the same way that we need to make assertions about how code functions, we need to make assertions about data, and unit testing is a promising framework. In this talk, we'll explore what is unique about unit testing data, and see how Two Sigma's open source library Marbles addresses these unique challenges in several real-world scenarios.
The TileDB Array Data Storage Manager - Stavros Papadopoulos, Jake BolewskiPyData
TileDB is an open-source storage manager for multi-dimensional sparse and dense array data. It has a novel architecture that addresses some of the pain points in storing array data on “big-data” and “cloud” storage architectures. This talk will highlight TileDB’s design and its ability to integrate with analysis environments relevant to the PyData community such as Python, R, Julia, etc.
Using Embeddings to Understand the Variance and Evolution of Data Science... ...PyData
In this talk I will discuss exponential family embeddings, which are methods that extend the idea behind word embeddings to other data types. I will describe how we used dynamic embeddings to understand how data science skill-sets have transformed over the last 3 years using our large corpus of jobs. The key takeaway is that these models can enrich analysis of specialized datasets.
Deploying Data Science for Distribution of The New York Times - Anne BauerPyData
How many newspapers should be distributed to each store for sale every day? The data science group at The New York Times addresses this optimization problem using custom time series modeling and analytical solutions, while also incorporating qualitative business concerns. I'll describe our modeling and data engineering approaches, written in Python and hosted on Google Cloud Platform.
Graph Analytics - From the Whiteboard to Your Toolbox - Sam LermaPyData
This document provides an introduction to graph theory concepts and working with graph data in Python. It begins with basic graph definitions and real-world graph examples. Various graph concepts are then demonstrated visually, such as vertices, edges, paths, cycles, and graph properties. Finally, it discusses working with graph data structures and algorithms in the NetworkX library in Python, including graph generation, analysis, and visualization. The overall goal is to introduce readers to graph theory and spark their interest in further exploration.
Do Your Homework! Writing tests for Data Science and Stochastic Code - David ...PyData
To productionize data science work (and have it taken seriously by software engineers, CTOs, clients, or the open source community), you need to write tests! Except… how can you test code that performs nondeterministic tasks like natural language parsing and modeling? This talk presents an approach to testing probabilistic functions in code, illustrated with concrete examples written for Pytest.
RESTful Machine Learning with Flask and TensorFlow Serving - Carlo MazzaferroPyData
Those of us who use TensorFlow often focus on building the model that's most predictive, not the one that's most deployable. So how to put that hard work to work? In this talk, we'll walk through a strategy for taking your machine learning models from Jupyter Notebook into production and beyond.
Mining dockless bikeshare and dockless scootershare trip data - Stefanie Brod...PyData
In September 2017, dockless bikeshare joined the transportation options in the District of Columbia. In March 2018, scooter share followed. During the pilot of these technologies, Python has helped District Department of Transportation answer some critical questions. This talk will discuss how Python was used to answer research questions and how it supported the evaluation of this demonstration.
Avoiding Bad Database Surprises: Simulation and Scalability - Steven LottPyData
The document discusses how to avoid bad database surprises through early simulation and scalability testing. It provides examples of web and analytics apps that did not scale due to unanticipated database issues. It recommends using Python classes and JSON schema to define data models and generate synthetic test data. This allows simulating the full system early in development to identify potential performance bottlenecks before real data is involved.
Machine learning often requires us to think spatially and make choices about what it means for two instances to be close or far apart. So which is best - Euclidean? Manhattan? Cosine? It all depends! In this talk, we'll explore open source tools and visual diagnostic strategies for picking good distance metrics when doing machine learning on text.
End-to-End Machine learning pipelines for Python driven organizations - Nick ...PyData
The recent advances in machine learning and artificial intelligence are amazing! Yet, in order to have real value within a company, data scientists must be able to get their models off of their laptops and deployed within a company’s data pipelines and infrastructure. In this session, I'll demonstrate how one-off experiments can be transformed into scalable ML pipelines with minimal effort.
We will be using Beautiful Soup to Webscrape the IMDB website and create a function that will allow you to create a dictionary object on specific metadata of the IMDB profile for any IMDB ID you pass through as an argument.
1D Convolutional Neural Networks for Time Series Modeling - Nathan Janos, Jef...PyData
This talk describes an experimental approach to time series modeling using 1D convolution filter layers in a neural network architecture. This approach was developed at System1 for forecasting marketplace value of online advertising categories.
Extending Pandas with Custom Types - Will AydPyData
Pandas v.0.23 brought to life a new extension interface through which you can extend NumPy's type system. This talk will explain what that means in more detail and provide practical examples of how the new interface can be leveraged to drastically improve your reporting.
Machine learning models are increasingly used to make decisions that affect people’s lives. With this power comes a responsibility to ensure that model predictions are fair. In this talk I’ll introduce several common model fairness metrics, discuss their tradeoffs, and finally demonstrate their use with a case study analyzing anonymized data from one of Civis Analytics’s client engagements.
What's the Science in Data Science? - Skipper SeaboldPyData
The gold standard for validating any scientific assumption is to run an experiment. Data science isn’t any different. Unfortunately, it’s not always possible to design the perfect experiment. In this talk, we’ll take a realistic look at measurement using tools from the social sciences to conduct quasi-experiments with observational data.
Applying Statistical Modeling and Machine Learning to Perform Time-Series For...PyData
Forecasting time-series data has applications in many fields, including finance, health, etc. There are potential pitfalls when applying classic statistical and machine learning methods to time-series problems. This talk will give folks the basic toolbox to analyze time-series data and perform forecasting using statistical and machine learning models, as well as interpret and convey the outputs.
Solving very simple substitution ciphers algorithmically - Stephen Enright-WardPyData
A historical text may now be unreadable, because its language is unknown, or its script forgotten (or both), or because it was deliberately enciphered. Deciphering needs two steps: Identify the language, then map the unknown script to a familiar one. I’ll present an algorithm to solve a cartoon version of this problem, where the language is known, and the cipher is alphabet rearrangement.
The Face of Nanomaterials: Insightful Classification Using Deep Learning - An...PyData
Artificial intelligence is emerging as a new paradigm in materials science. This talk describes how physical intuition and (insightful) machine learning can solve the complicated task of structure recognition in materials at the nanoscale.
20 Comprehensive Checklist of Designing and Developing a WebsitePixlogix Infotech
Dive into the world of Website Designing and Developing with Pixlogix! Looking to create a stunning online presence? Look no further! Our comprehensive checklist covers everything you need to know to craft a website that stands out. From user-friendly design to seamless functionality, we've got you covered. Don't miss out on this invaluable resource! Check out our checklist now at Pixlogix and start your journey towards a captivating online presence today.
GraphSummit Singapore | The Art of the Possible with Graph - Q2 2024Neo4j
Neha Bajwa, Vice President of Product Marketing, Neo4j
Join us as we explore breakthrough innovations enabled by interconnected data and AI. Discover firsthand how organizations use relationships in data to uncover contextual insights and solve our most pressing challenges – from optimizing supply chains, detecting fraud, and improving customer experiences to accelerating drug discoveries.
For the full video of this presentation, please visit: https://www.edge-ai-vision.com/2024/06/building-and-scaling-ai-applications-with-the-nx-ai-manager-a-presentation-from-network-optix/
Robin van Emden, Senior Director of Data Science at Network Optix, presents the “Building and Scaling AI Applications with the Nx AI Manager,” tutorial at the May 2024 Embedded Vision Summit.
In this presentation, van Emden covers the basics of scaling edge AI solutions using the Nx tool kit. He emphasizes the process of developing AI models and deploying them globally. He also showcases the conversion of AI models and the creation of effective edge AI pipelines, with a focus on pre-processing, model conversion, selecting the appropriate inference engine for the target hardware and post-processing.
van Emden shows how Nx can simplify the developer’s life and facilitate a rapid transition from concept to production-ready applications.He provides valuable insights into developing scalable and efficient edge AI solutions, with a strong focus on practical implementation.
Maruthi Prithivirajan, Head of ASEAN & IN Solution Architecture, Neo4j
Get an inside look at the latest Neo4j innovations that enable relationship-driven intelligence at scale. Learn more about the newest cloud integrations and product enhancements that make Neo4j an essential choice for developers building apps with interconnected data and generative AI.
Climate Impact of Software Testing at Nordic Testing DaysKari Kakkonen
My slides at Nordic Testing Days 6.6.2024
Climate impact / sustainability of software testing discussed on the talk. ICT and testing must carry their part of global responsibility to help with the climat warming. We can minimize the carbon footprint but we can also have a carbon handprint, a positive impact on the climate. Quality characteristics can be added with sustainability, and then measured continuously. Test environments can be used less, and in smaller scale and on demand. Test techniques can be used in optimizing or minimizing number of tests. Test automation can be used to speed up testing.
Full-RAG: A modern architecture for hyper-personalizationZilliz
Mike Del Balso, CEO & Co-Founder at Tecton, presents "Full RAG," a novel approach to AI recommendation systems, aiming to push beyond the limitations of traditional models through a deep integration of contextual insights and real-time data, leveraging the Retrieval-Augmented Generation architecture. This talk will outline Full RAG's potential to significantly enhance personalization, address engineering challenges such as data management and model training, and introduce data enrichment with reranking as a key solution. Attendees will gain crucial insights into the importance of hyperpersonalization in AI, the capabilities of Full RAG for advanced personalization, and strategies for managing complex data integrations for deploying cutting-edge AI solutions.
Unlocking Productivity: Leveraging the Potential of Copilot in Microsoft 365, a presentation by Christoforos Vlachos, Senior Solutions Manager – Modern Workplace, Uni Systems
Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!SOFTTECHHUB
As the digital landscape continually evolves, operating systems play a critical role in shaping user experiences and productivity. The launch of Nitrux Linux 3.5.0 marks a significant milestone, offering a robust alternative to traditional systems such as Windows 11. This article delves into the essence of Nitrux Linux 3.5.0, exploring its unique features, advantages, and how it stands as a compelling choice for both casual users and tech enthusiasts.
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...SOFTTECHHUB
The choice of an operating system plays a pivotal role in shaping our computing experience. For decades, Microsoft's Windows has dominated the market, offering a familiar and widely adopted platform for personal and professional use. However, as technological advancements continue to push the boundaries of innovation, alternative operating systems have emerged, challenging the status quo and offering users a fresh perspective on computing.
One such alternative that has garnered significant attention and acclaim is Nitrux Linux 3.5.0, a sleek, powerful, and user-friendly Linux distribution that promises to redefine the way we interact with our devices. With its focus on performance, security, and customization, Nitrux Linux presents a compelling case for those seeking to break free from the constraints of proprietary software and embrace the freedom and flexibility of open-source computing.
Pushing the limits of ePRTC: 100ns holdover for 100 daysAdtran
At WSTS 2024, Alon Stern explored the topic of parametric holdover and explained how recent research findings can be implemented in real-world PNT networks to achieve 100 nanoseconds of accuracy for up to 100 days.
In his public lecture, Christian Timmerer provides insights into the fascinating history of video streaming, starting from its humble beginnings before YouTube to the groundbreaking technologies that now dominate platforms like Netflix and ORF ON. Timmerer also presents provocative contributions of his own that have significantly influenced the industry. He concludes by looking at future challenges and invites the audience to join in a discussion.
Threats to mobile devices are more prevalent and increasing in scope and complexity. Users of mobile devices desire to take full advantage of the features
available on those devices, but many of the features provide convenience and capability but sacrifice security. This best practices guide outlines steps the users can take to better protect personal devices and information.
UiPath Test Automation using UiPath Test Suite series, part 5DianaGray10
Welcome to UiPath Test Automation using UiPath Test Suite series part 5. In this session, we will cover CI/CD with devops.
Topics covered:
CI/CD with in UiPath
End-to-end overview of CI/CD pipeline with Azure devops
Speaker:
Lyndsey Byblow, Test Suite Sales Engineer @ UiPath, Inc.
Presentation of the OECD Artificial Intelligence Review of Germany
Pythran: Static compiler for high performance by Mehdi Amini PyData SV 2014
1. Pythran – C++ for Snakes
Static Compiler for High Performance
Serge Guelton & Pierrick Brunet & Mehdi Amini
PyData - May 4th 2014
Get the slides: http://goo.gl/6dgra0
2. 1 33
Disclaimer
Timings were performed using OSX 10.9, an i7-3740QM CPU @ 2.70GHz, Python
2.7.6, current Pythran (branch Pydata2014), pypy 2.2.1, numba 0.13.1, gcc 4.8, Clang
r207887.
I am not Pythonista, but I’m interested in performance in general.
Daily job: driver-level C code, assembly, multi-threaded C++, GPU, ...
3. 1 33
Disclaimer
Timings were performed using OSX 10.9, an i7-3740QM CPU @ 2.70GHz, Python
2.7.6, current Pythran (branch Pydata2014), pypy 2.2.1, numba 0.13.1, gcc 4.8, Clang
r207887.
I am not Pythonista, but I’m interested in performance in general.
Daily job: driver-level C code, assembly, multi-threaded C++, GPU, ...
Note: I love to troll, so let’s have a
beer later and talk about how awful
Python is! ;-)
4. 1 33
Disclaimer
Timings were performed using OSX 10.9, an i7-3740QM CPU @ 2.70GHz, Python
2.7.6, current Pythran (branch Pydata2014), pypy 2.2.1, numba 0.13.1, gcc 4.8, Clang
r207887.
I am not Pythonista, but I’m interested in performance in general.
Daily job: driver-level C code, assembly, multi-threaded C++, GPU, ...
Note: I love to troll, so let’s have a
beer later and talk about how awful
Python is! ;-)
By the way this talk is written in Latex and takes more than 10 seconds to compile.
7. 3 33
Tools for Scientific Computing in Python
FORTRAN + f2py
C + SWIG
8. 4 33
I Do not Know Much About Python But it Does not Matter Because...
I only care about the white spot.
9. 5 33
Regular IPython Session
>>> import numpy as np
>>> def rosen(x):
... return sum (100.*(x[1:]-x[: -1]**2.)**2.
... + (1-x[: -1])**2.)
>>> import numpy as np
>>> r = np.random.rand (100000)
>>> %timeit rosen(r)
10 loops , best of 3: 35.1 ms per loop
In mathematical optimization, the Rosenbrock function is a non-convex
function used as a performance test problem for optimization algorithms
introduced by Howard H. Rosenbrock in 1960.[1] It is also known as
Rosenbrock’s valley or Rosenbrock’s banana function. (Wikipedia)
10. 6 33
IPython Session with Pythran
>>> %load_ext pythranmagic
>>> %% pythran
import numpy as np
#pythran export rosen(float [])
def rosen(x):
return sum (100.*(x[1:]-x[: -1]**2.)**2.
+ (1-x[: -1])**2.)
>>> import numpy as np
>>> r = np.random.rand (100000)
>>> %timeit rosen(r)
10000 loops , best of 3: 121 us per loop
11. 6 33
IPython Session with Pythran
>>> %load_ext pythranmagic
>>> %% pythran
import numpy as np
#pythran export rosen(float [])
def rosen(x):
return sum (100.*(x[1:]-x[: -1]**2.)**2.
+ (1-x[: -1])**2.)
>>> import numpy as np
>>> r = np.random.rand (100000)
>>> %timeit rosen(r)
10000 loops , best of 3: 121 us per loop
That’s a ×290 speedup!
12. 7 33
Pythran’s Meta Program Generator
Python Module [.py]OpenMP
Pythran
C++ meta-programboost::python pythonic++
g++Type Info NT2
Native Module [.so]
13. 8 33
Pythran Moto
Goals
1. Performance First
sacrifice language feature
2. Subset of Python
backward compatibility matters
pythran ≈ python - useless stuff
(i.e. class, eval, introspection, polymorphic variable)
3. Modular Compilation
focus on numerical kernels
Means
1. Static Compilation
2. Static Optimizations
3. Vectorization & Parallelization
14. 9 33
A Story of Python & C++
def dot(l0 , l1):
return sum(x*y for x,y in zip(l0 ,l1))
template <class T0 , class T1 >
auto dot(T0&& l0 , T1&& l1)
-> decltype(/* ... */)
{
return pythonic ::sum(
pythonic ::map(
operator_ :: multiply (),
pythonic ::zip(std:: forward <T0 >(l0),
std:: forward <T1 >(l1))
) );
}
15. 10 33
C++ as a Back-end
A High Level Language
Rich Standard Template Library
Object Oriented Programming
Meta-Programming
Glimpses of type inference C++11
Variadic Templates C++11
A Low Level Language
Few Hidden Costs: ”you don’t pay for what you don’t use”
Direct access to vector instruction units SSE, AVX, . . .
Good for Parallel Programming OpenMP, TBB, . . .
(and the parallel STL is proposed for C++17)
16. 11 33
Let’s Dive Into the Backend Runtime...
template <class Op , class S0 , class ... Iters >
auto map_aux (Op &op , S0 const &seq , Iters ... iters)
-> sequence <decltype(op(* seq.begin (), *iters ...)) >
{
decltype(_map(op , seq , iters ...)) s;
auto iter = std :: back_inserter (s);
for(auto& iseq : seq)
*iter ++= op(iseq , *iters ++...);
return s;
}
template <class Op , class S0 , class ... SN >
auto map ( Op op , S0 const &seq , SN const &... seqs )
-> decltype(_map( op , seq , seqs.begin ()...))
{
return _map(op , seq , seqs.begin ()...);
}
17. 11 33
Let’s Dive Into the Backend Runtime... I’m Kidding!
template <class Op , class S0 , class ... Iters >
auto map_aux (Op &op , S0 const &seq , Iters ... iters)
-> sequence <decltype(op(* seq.begin (), *iters ...)) >
{
decltype(_map(op , seq , iters ...)) s;
auto iter = std :: back_inserter (s);
for(auto& iseq : seq)
*iter ++= op(iseq , *iters ++...);
return s;
}
template <class Op , class S0 , class ... SN >
auto map ( Op op , S0 const &seq , SN const &... seqs )
-> decltype(_map( op , seq , seqs.begin ()...))
{
return _map(op , seq , seqs.begin ()...);
}
18. 12 33
Static Compilation
Buys time for many time-consuming analyses
Points-to, used-def, variable scope, memory effects, function purity. . .
Unleashes powerful C++ optimizations
Lazy Evaluation, Map Parallelizations, Constant Folding
Requires static type inference
#pythran export foo(int list , float)
Only annotate exported functions!
19. 13 33
Pythran’s Moto 1/3
Gather as many information as possible
(Typing is just one information among others)
20. 14 33
Example of Analysis: Points-to
def foo(a,b):
c = a or b
return c*2
Where does c points to ?
21. 15 33
Example of Analysis: Argument Effects
def fib(n):
return n if n<2 else fib(n-1) + fib(n-2)
def bar(l):
return map(fib , l)
def foo(l):
return map(fib , random.sample(l, 3))
Do fibo, bar and foo update their arguments?
22. 16 33
Example of Analysis: Pure Functions
def f0(a):
return a**2
def f1(a):
b = f0(a)
print b
return b
l = list (...)
map(f0 , l)
map(f1 , l)
Are f0 and f1 pure functions?
23. 17 33
Example of Analysis: Use - Def Chains
a = ’1’
if cond:
a = int(a)
else:
a = 3
print a
a = 4
Which version of a is seen by the print statement?
24. 18 33
Pythran’s Moto 2/3
Gather as many information as possible
Turn them into Code Optimizations!
25. 19 33
Example of Code Optimization: False Polymorphism
a = cos (1)
if cond:
a = str(a)
else:
a = None
foo(a)
Is this code snippet statically typable?
26. 20 33
Example of Code Optimization: Lazy Iterators
def valid_conversion (n):
l = map(math.cos , range(n))
return sum(l)
def invalid_conversion (n):
l = map(math.cos , range(n))
l[0] = 1
return sum(l) + max(l)
Which map can be converted into an imap
27. 21 33
Example of Code Optimization: Constant Folding
def esieve(n):
candidates = range (2, n+1)
return sorted(
set(candidates) - set(p*i
for p in candidates
for i in range(p, n+1))
)
cache = esieve (100)
Can we evalute esieve at compile time?
28. 22 33
Pythran’s Moto 3/3
Gather as many information as possible
Turn them into Code Optimizations
Vectorize! Parallelize!
29. 23 33
Explicit Parallelization
def hyantes(xmin , ymin , xmax , ymax , step , range_ , range_x , range_y , t):
pt = [[0]* range_y for _ in range(range_x )]
#omp parallel for
for i in xrange(range_x ):
for j in xrange(range_y ):
s = 0
for k in t:
tmp = 6368.* math.acos(math.cos(xmin+step*i)* math.cos( k[0] ) *
math.cos(( ymin+step*j)-k[1]) +
math.sin(xmin+step*i)* math.sin(k[0]))
if tmp < range_:
s+=k[2] / (1+ tmp)
pt[i][j] = s
return pt
Tool CPython Pythran OpenMP
Timing 639.0ms 44.8ms 11.2ms
Speedup ×1 ×14.2 ×57.7
30. 24 33
Library Level Optimizations
Numpy is the key
Basic block of Python Scientific Computing
High-level Array Manipulations
Many common functions implemented
1. This smells like FORTRAN
2. For the compiler guy, FORTRAN smells good
3. Unlock vectorization & parallelization of Numpy code!
Cython is known to be ”as fast as C”, the only way we found to beat it is to be ”as fast as Fortran”, hence:
Pythran
31. 25 33
Efficient Numpy Expressions
Expression Templates
1. A classic C++ meta-programming optimization
2. Brings Lazy Evaluation to C++
3. Equivalent to loop fusion
More Optimizations
vectorization through boost::simd and nt2
parallelization through #pragma omp
32. 26 33
Julia Set, a Cython Example
def run_julia(cr , ci , N, bound , lim , cutoff ):
julia = np.empty ((N, N), np.uint32)
grid_x = np.linspace(-bound , bound , N)
t0 = time ()
#omp parallel for
for i, x in enumerate(grid_x ):
for j, y in enumerate(grid_x ):
julia[i,j] = kernel(x, y, cr , ci , lim , cutoff)
return julia , time () - t0
From Scipy2013 Cython Tutorial.
Tool CPython Cython Pythran +OpenMP
Timing 3630ms 4.3ms 3.71ms 1.52ms
Speedup ×1 ×837 ×970 ×2368
33. 27 33
Mandelbrot, a Numba example
@autojit
def mandel(x, y, max_iters ):
"""
Given the real and imaginary parts of a complex number ,
determine if it is a candidate for membership in the Mandelbrot
set given a fixed number of iterations.
"""
i = 0
c = complex(x, y)
z = 0.0j
for i in range(max_iters ):
z = z**2 + c
if abs(z)**2 >= 4:
return i
return 255
@autojit
def create_fractal (min_x , max_x , min_y , max_y , image , iters )
height = image.shape [0]
width = image.shape [1]
pixel_size_x = (max_x - min_x) / width
pixel_size_y = (max_y - min_y) / height
for x in range(width ):
real = min_x + x * pixel_size_x
for y in range(height ):
imag = min_y + y * pixel_size_y
color = mandel(real , imag , iters)
image[y, x] = color
return image
Tool CPython Numba Pythran
Timing 8170ms 56ms 47.2ms
Speedup ×1 ×145 ×173
”g++ -Ofast” here
34. 28 33
Nqueens, to Show Some Cool Python
# Pure -Python implementation of
# itertools. permutations ()
# Why? Because we can :-)
def permutations (iterable , r=None ):
pool = tuple(iterable)
n = len(pool)
if r is None:
r = n
indices = range(n)
cycles = range(n-r+1, n+1)[:: -1]
yield tuple(pool[i] for i in indices [:r])
while n:
for i in reversed(xrange(r)):
cycles[i] -= 1
if cycles[i] == 0:
indices[i:] = indices[i+1:] + indices[i:i+1]
cycles[i] = n - i
else:
j = cycles[i]
indices[i], indices[-j] =
indices[-j], indices[i]
yield tuple(pool[i] for i in indices [:r])
break
else:
return
#pythran export n_queens(int)
def n_queens( queen_count ):
"""N-Queens solver.
Args:
queen_count: the number of queens to solve
for. This is also the board size.
Yields:
Solutions to the problem. Each yielded
value is looks like (3, 8, 2, ..., 6)
where each number is the column position
for the queen , and the index into the
tuple indicates the row.
"""
out =list ()
cols = range( queen_count )
for vec in permutations (cols ,None ):
if ( queen_count == len(set(vec[i]+i
for i in cols ))):
out.append(vec)
return out
Solving the NQueen problem, using generator, generator
expression, list comprehension, sets. . .
http://code.google.com/p/unladen-swallow/
Tool CPython PyPy Pythran
Timing 2640.6ms 501.1ms 693.3ms
Speedup ×1 ×5.27 ×3.8
36. 30 33
Crushing the Head of the Snake
I would have titled it ”What every Pythonista should know about Python!”
(hopefully we’ll get the video soon).
37. 31 33
The Compiler Is not the Solution to Keyboard-Chair Interface, Is It?
def total(arr):
s = 0
for j in range(len(arr )):
s += arr[j]
return s
def varsum(arr):
vs = 0
for j in range(len(arr )):
mean = (total(arr) / len(arr ))
vs += (arr[j] - mean) ** 2
return vs
#pythran export stddev(float64 list list)
def stddev(partitions ):
ddof =1
final = 0.0
for part in partitions:
m = total(part) / len(part)
# Find the mean of the entire grup.
gtotal = total ([ total(p) for p in partitions ])
glength = total ([ len(p) for p in partitions ])
g = gtotal / glength
adj = ((2 * total(part) * (m - g)) + ((g ** 2 - m **
final += varsum(part) + adj
return math.sqrt(final / (glength - ddof ))
Version Awful Less Awful OK Differently OK OK Numpy
CPython 127s 150ms 54.3ms 53.8ms 47.6ms 8.2ms
Pythran
38. 31 33
The Compiler Is not the Solution to Keyboard-Chair Interface, Is It?
def total(arr):
s = 0
for j in range(len(arr )):
s += arr[j]
return s
def varsum(arr):
vs = 0
for j in range(len(arr )):
mean = (total(arr) / len(arr ))
vs += (arr[j] - mean) ** 2
return vs
#pythran export stddev(float64 list list)
def stddev(partitions ):
ddof =1
final = 0.0
for part in partitions:
m = total(part) / len(part)
# Find the mean of the entire grup.
gtotal = total ([ total(p) for p in partitions ])
glength = total ([ len(p) for p in partitions ])
g = gtotal / glength
adj = ((2 * total(part) * (m - g)) + ((g ** 2 - m **
final += varsum(part) + adj
return math.sqrt(final / (glength - ddof ))
Version Awful Less Awful OK Differently OK OK Numpy
CPython 127s 150ms 54.3ms 53.8ms 47.6ms 8.2ms
Pythran 1.38s
39. 31 33
The Compiler Is not the Solution to Keyboard-Chair Interface, Is It?
def total(arr):
s = 0
for j in range(len(arr )):
s += arr[j]
return s
def varsum(arr):
vs = 0
for j in range(len(arr )):
mean = (total(arr) / len(arr ))
vs += (arr[j] - mean) ** 2
return vs
#pythran export stddev(float64 list list)
def stddev(partitions ):
ddof =1
final = 0.0
for part in partitions:
m = total(part) / len(part)
# Find the mean of the entire grup.
gtotal = total ([ total(p) for p in partitions ])
glength = total ([ len(p) for p in partitions ])
g = gtotal / glength
adj = ((2 * total(part) * (m - g)) + ((g ** 2 - m **
final += varsum(part) + adj
return math.sqrt(final / (glength - ddof ))
Version Awful Less Awful OK Differently OK OK Numpy
CPython 127s 150ms 54.3ms 53.8ms 47.6ms 8.2ms
Pythran 1.38s 4.7ms
40. 31 33
The Compiler Is not the Solution to Keyboard-Chair Interface, Is It?
def total(arr):
s = 0
for j in range(len(arr )):
s += arr[j]
return s
def varsum(arr):
vs = 0
for j in range(len(arr )):
mean = (total(arr) / len(arr ))
vs += (arr[j] - mean) ** 2
return vs
#pythran export stddev(float64 list list)
def stddev(partitions ):
ddof =1
final = 0.0
for part in partitions:
m = total(part) / len(part)
# Find the mean of the entire grup.
gtotal = total ([ total(p) for p in partitions ])
glength = total ([ len(p) for p in partitions ])
g = gtotal / glength
adj = ((2 * total(part) * (m - g)) + ((g ** 2 - m **
final += varsum(part) + adj
return math.sqrt(final / (glength - ddof ))
Version Awful Less Awful OK Differently OK OK Numpy
CPython 127s 150ms 54.3ms 53.8ms 47.6ms 8.2ms
Pythran 1.38s 4.7ms 4.8ms
41. 31 33
The Compiler Is not the Solution to Keyboard-Chair Interface, Is It?
def total(arr):
s = 0
for j in range(len(arr )):
s += arr[j]
return s
def varsum(arr):
vs = 0
for j in range(len(arr )):
mean = (total(arr) / len(arr ))
vs += (arr[j] - mean) ** 2
return vs
#pythran export stddev(float64 list list)
def stddev(partitions ):
ddof =1
final = 0.0
for part in partitions:
m = total(part) / len(part)
# Find the mean of the entire grup.
gtotal = total ([ total(p) for p in partitions ])
glength = total ([ len(p) for p in partitions ])
g = gtotal / glength
adj = ((2 * total(part) * (m - g)) + ((g ** 2 - m **
final += varsum(part) + adj
return math.sqrt(final / (glength - ddof ))
Version Awful Less Awful OK Differently OK OK Numpy
CPython 127s 150ms 54.3ms 53.8ms 47.6ms 8.2ms
Pythran 1.38s 4.7ms 4.8ms 5.8ms
42. 31 33
The Compiler Is not the Solution to Keyboard-Chair Interface, Is It?
def total(arr):
s = 0
for j in range(len(arr )):
s += arr[j]
return s
def varsum(arr):
vs = 0
for j in range(len(arr )):
mean = (total(arr) / len(arr ))
vs += (arr[j] - mean) ** 2
return vs
#pythran export stddev(float64 list list)
def stddev(partitions ):
ddof =1
final = 0.0
for part in partitions:
m = total(part) / len(part)
# Find the mean of the entire grup.
gtotal = total ([ total(p) for p in partitions ])
glength = total ([ len(p) for p in partitions ])
g = gtotal / glength
adj = ((2 * total(part) * (m - g)) + ((g ** 2 - m **
final += varsum(part) + adj
return math.sqrt(final / (glength - ddof ))
Version Awful Less Awful OK Differently OK OK Numpy
CPython 127s 150ms 54.3ms 53.8ms 47.6ms 8.2ms
Pythran 1.38s 4.7ms 4.8ms 5.8ms 4.7ms
43. 31 33
The Compiler Is not the Solution to Keyboard-Chair Interface, Is It?
def total(arr):
s = 0
for j in range(len(arr )):
s += arr[j]
return s
def varsum(arr):
vs = 0
for j in range(len(arr )):
mean = (total(arr) / len(arr ))
vs += (arr[j] - mean) ** 2
return vs
#pythran export stddev(float64 list list)
def stddev(partitions ):
ddof =1
final = 0.0
for part in partitions:
m = total(part) / len(part)
# Find the mean of the entire grup.
gtotal = total ([ total(p) for p in partitions ])
glength = total ([ len(p) for p in partitions ])
g = gtotal / glength
adj = ((2 * total(part) * (m - g)) + ((g ** 2 - m **
final += varsum(part) + adj
return math.sqrt(final / (glength - ddof ))
Version Awful Less Awful OK Differently OK OK Numpy
CPython 127s 150ms 54.3ms 53.8ms 47.6ms 8.2ms
Pythran 1.38s 4.7ms 4.8ms 5.8ms 4.7ms 1.7ms
44. 32 33
Engineering Stuff
Get It
Follow the source: https://github.com/serge-sans-paille/pythran
(we are waiting for your pull requests)
Debian repo: deb http://ridee.enstb.org/debian unstable main
Join us: #pythran on Freenode (very active),
or pythran@freelists.org (not very active)
Available on PyPI, using Python 2.7, +2000 test cases, PEP8 approved, clang++ &
g++ (>= 4.8) friendly, Linux and OSX validated.
47. 33
Conclusion
Compiling Python means more than typing and translating
What next:
Release the PyData version (ooops we’re late)
http://pythonhosted.org/pythran/
48. 33
Conclusion
Compiling Python means more than typing and translating
What next:
Release the PyData version (ooops we’re late I did not specify which year)
User module import (pull request already issued)
http://pythonhosted.org/pythran/
49. 33
Conclusion
Compiling Python means more than typing and translating
What next:
Release the PyData version (ooops we’re late I did not specify which year)
User module import (pull request already issued)
Set-up an daily performance regression test bot (ongoing work with codespeed)
http://pythonhosted.org/pythran/
50. 33
Conclusion
Compiling Python means more than typing and translating
What next:
Release the PyData version (ooops we’re late I did not specify which year)
User module import (pull request already issued)
Set-up an daily performance regression test bot (ongoing work with codespeed)
Re-enable vectorization through boost::simd and nt2 (soon)
http://pythonhosted.org/pythran/
51. 33
Conclusion
Compiling Python means more than typing and translating
What next:
Release the PyData version (ooops we’re late I did not specify which year)
User module import (pull request already issued)
Set-up an daily performance regression test bot (ongoing work with codespeed)
Re-enable vectorization through boost::simd and nt2 (soon)
More Numpy support, start looking into Scipy
http://pythonhosted.org/pythran/
52. 33
Conclusion
Compiling Python means more than typing and translating
What next:
Release the PyData version (ooops we’re late I did not specify which year)
User module import (pull request already issued)
Set-up an daily performance regression test bot (ongoing work with codespeed)
Re-enable vectorization through boost::simd and nt2 (soon)
More Numpy support, start looking into Scipy
Polyhedral transformations (in another life...)
http://pythonhosted.org/pythran/