Keynote talk at PyCon Estonia 2019 where I discuss how to extend CPython and how that has led to a robust ecosystem around Python. I then discuss the need to define and build a Python extension language I later propose as EPython on OpenTeams: https://openteams.com/initiatives/2
Talk given at first OmniSci user conference where I discuss cooperating with open-source communities to ensure you get useful answers quickly from your data. I get a chance to introduce OpenTeams in this talk as well and discuss how it can help companies cooperate with communities.
At my first visit to SciPy in Latin America, I was able to review the history of PyData, SciPy, and NumFOCUS, and discuss how to grow its communities and cooperate in the future. I also introduce OpenTeams as a way for open-source contributors to grow their reputation and build businesses.
A lecture given for Stats 285 at Stanford on October 30, 2017. I discuss how OSS technology developed at Anaconda, Inc. has helped to scale Python to GPUs and Clusters.
Talk given at first OmniSci user conference where I discuss cooperating with open-source communities to ensure you get useful answers quickly from your data. I get a chance to introduce OpenTeams in this talk as well and discuss how it can help companies cooperate with communities.
At my first visit to SciPy in Latin America, I was able to review the history of PyData, SciPy, and NumFOCUS, and discuss how to grow its communities and cooperate in the future. I also introduce OpenTeams as a way for open-source contributors to grow their reputation and build businesses.
A lecture given for Stats 285 at Stanford on October 30, 2017. I discuss how OSS technology developed at Anaconda, Inc. has helped to scale Python to GPUs and Clusters.
Big Data is a new term used in Business Analytics to identify datasets that we can not manage with current methodologies or data mining software tools due to their large size and complexity. Big Data mining is the capability of extracting useful information from these large datasets or streams of data. New mining techniques are necessary due to the volume, variability, and velocity, of such data.
In this talk, we will focus on advanced techniques in Big Data mining in real time using evolving data stream techniques: using a small amount of time and memory resources, and being able to adapt to changes. We will discuss a social network application of data stream mining to compute user influence probabilities. And finally, we will present the MOA software framework with classification, regression, and frequent pattern methods, and the SAMOA distributed streaming software that runs on top of Storm, Samza and S4.
Data Science and Deep Learning on Spark with 1/10th of the Code with Roope As...Databricks
Scalability and interactivity make Spark an excellent platform for data scientists who want to analyze very large datasets and build predictive models. However, the productivity of data scientists is hampered by lack of abstractions for building models for diverse types of data. For example, processing text or image data requires low level data coercion and transformation steps, which are not easy to compose into complex workflows for production applications. There is also a lack of domain specific libraries, for example for computer vision and image processing.
We present an open-source Spark library which simplifies common data science tasks such as feature construction and hyperparameter tuning, and allows data scientists to iterate and experiment on their models faster. The library integrates seamlessly with SparkML pipeline object model, and is installable through spark-packages.
The library brings deep learning and image processing to Spark through CNTK, OpenCV and Tensorflow in frictionless manner, thus enabling scenarios such training on GPU-enabled nodes, deep neural net featurization and transfer learning on large image datasets. We discuss the design and architecture of the library, and show examples of building a machine learning models for image classification.
introduction to Python by Mohamed Hegazy , in this slides you will find some code samples , these slides first presented in TensorFlow Dev Summit 2017 Extended by GDG Helwan
Content and talk by Giovani Lanzani (GoDataDriven) at SEA Amsterdam in November 2014. Real time data driven applications using Python and pandas as backend
Develop a fundamental overview of Google TensorFlow, one of the most widely adopted technologies for advanced deep learning and neural network applications. Understand the core concepts of artificial intelligence, deep learning and machine learning and the applications of TensorFlow in these areas.
The deck also introduces the Spotle.ai masterclass in Advanced Deep Learning With Tensorflow and Keras.
Note: Make sure to download the slides to get the high-resolution version!
Also, you can find the webinar recording here (please also download for better quality): https://www.dropbox.com/s/72qi6wjzi61gs3q/H2ODeepLearningArnoCandel052114.mov
Come hear how Deep Learning in H2O is unlocking never before seen performance for prediction!
H2O is google-scale open source machine learning engine for R & Big Data. Enterprises can now use all of their data without sampling and build intelligent applications. This live webinar introduces Distributed Deep Learning concepts, implementation and results from recent developments. Real world classification & regression use cases from eBay text dataset, MNIST handwritten digits and Cancer datasets will present the power of this game changing technology.
- Powered by the open source machine learning software H2O.ai. Contributors welcome at: https://github.com/h2oai
- To view videos on H2O open source machine learning software, go to: https://www.youtube.com/user/0xdata
Python array API standardization - current state and benefitsRalf Gommers
Talk given at GTC Fall 2021.
The Python array API standard, which was first announced towards the end of 2020, is maturing and becoming available to Python end users. NumPy now has a reference implementation, PyTorch support is close to complete, and other libraries have started to implement support. In this talk we will discuss the current state of implementations, and look at a concrete use case of moving a scientific analysis workflow to using the API standard - thereby gaining access to GPU acceleration.
This is an 1 hour presentation on Neural Networks, Deep Learning, Computer Vision, Recurrent Neural Network and Reinforcement Learning. The talks later have links on how to run Neural Networks on
Travis Oliphant "Python for Speed, Scale, and Science"Fwdays
Python is sometimes discounted as slow because of its dynamic typing and interpreted nature and not suitable for scale because of the GIL. But, in this talk, I will show how with the help of talented open-source contributors around the world, we have been able to build systems in Python that are fast and scalable to many machines and how this has helped Python take over Science.
With Anaconda (in particular Numba and Dask) you can scale up your NumPy and Pandas stack to many cpus and GPUs as well as scale-out to run on clusters of machines including Hadoop.
Big Data is a new term used in Business Analytics to identify datasets that we can not manage with current methodologies or data mining software tools due to their large size and complexity. Big Data mining is the capability of extracting useful information from these large datasets or streams of data. New mining techniques are necessary due to the volume, variability, and velocity, of such data.
In this talk, we will focus on advanced techniques in Big Data mining in real time using evolving data stream techniques: using a small amount of time and memory resources, and being able to adapt to changes. We will discuss a social network application of data stream mining to compute user influence probabilities. And finally, we will present the MOA software framework with classification, regression, and frequent pattern methods, and the SAMOA distributed streaming software that runs on top of Storm, Samza and S4.
Data Science and Deep Learning on Spark with 1/10th of the Code with Roope As...Databricks
Scalability and interactivity make Spark an excellent platform for data scientists who want to analyze very large datasets and build predictive models. However, the productivity of data scientists is hampered by lack of abstractions for building models for diverse types of data. For example, processing text or image data requires low level data coercion and transformation steps, which are not easy to compose into complex workflows for production applications. There is also a lack of domain specific libraries, for example for computer vision and image processing.
We present an open-source Spark library which simplifies common data science tasks such as feature construction and hyperparameter tuning, and allows data scientists to iterate and experiment on their models faster. The library integrates seamlessly with SparkML pipeline object model, and is installable through spark-packages.
The library brings deep learning and image processing to Spark through CNTK, OpenCV and Tensorflow in frictionless manner, thus enabling scenarios such training on GPU-enabled nodes, deep neural net featurization and transfer learning on large image datasets. We discuss the design and architecture of the library, and show examples of building a machine learning models for image classification.
introduction to Python by Mohamed Hegazy , in this slides you will find some code samples , these slides first presented in TensorFlow Dev Summit 2017 Extended by GDG Helwan
Content and talk by Giovani Lanzani (GoDataDriven) at SEA Amsterdam in November 2014. Real time data driven applications using Python and pandas as backend
Develop a fundamental overview of Google TensorFlow, one of the most widely adopted technologies for advanced deep learning and neural network applications. Understand the core concepts of artificial intelligence, deep learning and machine learning and the applications of TensorFlow in these areas.
The deck also introduces the Spotle.ai masterclass in Advanced Deep Learning With Tensorflow and Keras.
Note: Make sure to download the slides to get the high-resolution version!
Also, you can find the webinar recording here (please also download for better quality): https://www.dropbox.com/s/72qi6wjzi61gs3q/H2ODeepLearningArnoCandel052114.mov
Come hear how Deep Learning in H2O is unlocking never before seen performance for prediction!
H2O is google-scale open source machine learning engine for R & Big Data. Enterprises can now use all of their data without sampling and build intelligent applications. This live webinar introduces Distributed Deep Learning concepts, implementation and results from recent developments. Real world classification & regression use cases from eBay text dataset, MNIST handwritten digits and Cancer datasets will present the power of this game changing technology.
- Powered by the open source machine learning software H2O.ai. Contributors welcome at: https://github.com/h2oai
- To view videos on H2O open source machine learning software, go to: https://www.youtube.com/user/0xdata
Python array API standardization - current state and benefitsRalf Gommers
Talk given at GTC Fall 2021.
The Python array API standard, which was first announced towards the end of 2020, is maturing and becoming available to Python end users. NumPy now has a reference implementation, PyTorch support is close to complete, and other libraries have started to implement support. In this talk we will discuss the current state of implementations, and look at a concrete use case of moving a scientific analysis workflow to using the API standard - thereby gaining access to GPU acceleration.
This is an 1 hour presentation on Neural Networks, Deep Learning, Computer Vision, Recurrent Neural Network and Reinforcement Learning. The talks later have links on how to run Neural Networks on
Travis Oliphant "Python for Speed, Scale, and Science"Fwdays
Python is sometimes discounted as slow because of its dynamic typing and interpreted nature and not suitable for scale because of the GIL. But, in this talk, I will show how with the help of talented open-source contributors around the world, we have been able to build systems in Python that are fast and scalable to many machines and how this has helped Python take over Science.
With Anaconda (in particular Numba and Dask) you can scale up your NumPy and Pandas stack to many cpus and GPUs as well as scale-out to run on clusters of machines including Hadoop.
Euro python2011 High Performance PythonIan Ozsvald
I ran this as a 4 hour tutorial at EuroPython 2011 to teach High Performance Python coding.
Techniques covered include bottleneck analysis by profiling, bytecode analysis, converting to C using Cython and ShedSkin, use of the numerical numpy library and numexpr, multi-core and multi-machine parallelisation and using CUDA GPUs.
Write-up with 49 page PDF report: http://ianozsvald.com/2011/06/29/high-performance-python-tutorial-v0-1-from-my-4-hour-tutorial-at-europython-2011/
Python has become the most widely used language for machine learning and data science projects due to its simplicity and versatility.
Furthermore, developers get to put all their effort into solving an Machine Learning or data science problem instead of focusing on the technical aspects of the language.
For this purpose, Python provides access to great libraries and frameworks for AI and machine learning (ML), flexibility and platform independence
In this talk I will try to get a selection of libraries and frameworks that can help us introduce in the Machine Learning world and answer the question that all people is doing, What makes Python the best programming language for machine learning?
Large Data Analyze with PyTables,
This presentation has been collected from several other presentations(PyTables presentation).
For more presentation in this field please refer to this link (http://pytables.org/moin/HowToUse#Presentations).
Amazon EC2 may offer the possibility of high performance computing to programmers on a budget. Instead of building and maintaining a permanent Beowulf cluster, we can launch a cluster on-demand using Python and EC2. This talk will cover the basics involved in getting your own cluster running using Python, demonstrate how to run some large parallel computations using Python MPI wrappers, and show some initial results on cluster performance.
Textbook Solutions refer https://pythonxiisolutions.blogspot.com/
Practical's Solutions refer https://prippython12.blogspot.com/
A library is a collection of the modules that caters together to specific type of needs. Smaller handleable units are modules. Standard library modules. Modularity reduced complexity to some extend. A package is a directory that contains sub packages and modules in it along with
Python is dominating the fast-growing data-science landscape. This talk provides a foundational overview of the practice of data science and some of the most popular Python libraries for doing data science. It also provides an overview of how Anaconda brings it all together.
With Dask and Numba, you can NumPy-like and Pandas-like code and have it run very fast on multi-core systems as well as at scale on many-node clusters.
Making NumPy-style and Pandas-style code faster and run in parallel. Continuum has been working on scaled versions of NumPy and Pandas for 4 years. This talk describes how Numba and Dask provide scaled Python today.
Using Anaconda to light up dark data. My talk given to the Berkeley Institute of Data Science describing Anaconda and the Blaze ecosystem for bringing a virtual analytical database to your data.
Conda is a cross-platform package manager that lets you quickly and easily build environments containing complicated software stacks. It was built to manage the NumPy stack in Python but can be used to manage any complex software dependencies.
Blaze: a large-scale, array-oriented infrastructure for PythonTravis Oliphant
This talk gives a high-level overview of the motivation, design goals, and status of the Blaze project from Continuum Analytics which is a large-scale array object for Python.
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered QualityInflectra
In this insightful webinar, Inflectra explores how artificial intelligence (AI) is transforming software development and testing. Discover how AI-powered tools are revolutionizing every stage of the software development lifecycle (SDLC), from design and prototyping to testing, deployment, and monitoring.
Learn about:
• The Future of Testing: How AI is shifting testing towards verification, analysis, and higher-level skills, while reducing repetitive tasks.
• Test Automation: How AI-powered test case generation, optimization, and self-healing tests are making testing more efficient and effective.
• Visual Testing: Explore the emerging capabilities of AI in visual testing and how it's set to revolutionize UI verification.
• Inflectra's AI Solutions: See demonstrations of Inflectra's cutting-edge AI tools like the ChatGPT plugin and Azure Open AI platform, designed to streamline your testing process.
Whether you're a developer, tester, or QA professional, this webinar will give you valuable insights into how AI is shaping the future of software delivery.
UiPath Test Automation using UiPath Test Suite series, part 3DianaGray10
Welcome to UiPath Test Automation using UiPath Test Suite series part 3. In this session, we will cover desktop automation along with UI automation.
Topics covered:
UI automation Introduction,
UI automation Sample
Desktop automation flow
Pradeep Chinnala, Senior Consultant Automation Developer @WonderBotz and UiPath MVP
Deepak Rai, Automation Practice Lead, Boundaryless Group and UiPath MVP
Neuro-symbolic is not enough, we need neuro-*semantic*Frank van Harmelen
Neuro-symbolic (NeSy) AI is on the rise. However, simply machine learning on just any symbolic structure is not sufficient to really harvest the gains of NeSy. These will only be gained when the symbolic structures have an actual semantics. I give an operational definition of semantics as “predictable inference”.
All of this illustrated with link prediction over knowledge graphs, but the argument is general.
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf91mobiles
91mobiles recently conducted a Smart TV Buyer Insights Survey in which we asked over 3,000 respondents about the TV they own, aspects they look at on a new TV, and their TV buying preferences.
PHP Frameworks: I want to break free (IPC Berlin 2024)Ralf Eggert
In this presentation, we examine the challenges and limitations of relying too heavily on PHP frameworks in web development. We discuss the history of PHP and its frameworks to understand how this dependence has evolved. The focus will be on providing concrete tips and strategies to reduce reliance on these frameworks, based on real-world examples and practical considerations. The goal is to equip developers with the skills and knowledge to create more flexible and future-proof web applications. We'll explore the importance of maintaining autonomy in a rapidly changing tech landscape and how to make informed decisions in PHP development.
This talk is aimed at encouraging a more independent approach to using PHP frameworks, moving towards a more flexible and future-proof approach to PHP development.
Search and Society: Reimagining Information Access for Radical FuturesBhaskar Mitra
The field of Information retrieval (IR) is currently undergoing a transformative shift, at least partly due to the emerging applications of generative AI to information access. In this talk, we will deliberate on the sociotechnical implications of generative AI for information access. We will argue that there is both a critical necessity and an exciting opportunity for the IR community to re-center our research agendas on societal needs while dismantling the artificial separation between the work on fairness, accountability, transparency, and ethics in IR and the rest of IR research. Instead of adopting a reactionary strategy of trying to mitigate potential social harms from emerging technologies, the community should aim to proactively set the research agenda for the kinds of systems we should build inspired by diverse explicitly stated sociotechnical imaginaries. The sociotechnical imaginaries that underpin the design and development of information access technologies needs to be explicitly articulated, and we need to develop theories of change in context of these diverse perspectives. Our guiding future imaginaries must be informed by other academic fields, such as democratic theory and critical theory, and should be co-developed with social science scholars, legal scholars, civil rights and social justice activists, and artists, among others.
The Art of the Pitch: WordPress Relationships and SalesLaura Byrne
Clients don’t know what they don’t know. What web solutions are right for them? How does WordPress come into the picture? How do you make sure you understand scope and timeline? What do you do if sometime changes?
All these questions and more will be explored as we talk about matching clients’ needs with what your agency offers without pulling teeth or pulling your hair out. Practical tips, and strategies for successful relationship building that leads to closing the deal.
State of ICS and IoT Cyber Threat Landscape Report 2024 previewPrayukth K V
The IoT and OT threat landscape report has been prepared by the Threat Research Team at Sectrio using data from Sectrio, cyber threat intelligence farming facilities spread across over 85 cities around the world. In addition, Sectrio also runs AI-based advanced threat and payload engagement facilities that serve as sinks to attract and engage sophisticated threat actors, and newer malware including new variants and latent threats that are at an earlier stage of development.
The latest edition of the OT/ICS and IoT security Threat Landscape Report 2024 also covers:
State of global ICS asset and network exposure
Sectoral targets and attacks as well as the cost of ransom
Global APT activity, AI usage, actor and tactic profiles, and implications
Rise in volumes of AI-powered cyberattacks
Major cyber events in 2024
Malware and malicious payload trends
Cyberattack types and targets
Vulnerability exploit attempts on CVEs
Attacks on counties – USA
Expansion of bot farms – how, where, and why
In-depth analysis of the cyber threat landscape across North America, South America, Europe, APAC, and the Middle East
Why are attacks on smart factories rising?
Cyber risk predictions
Axis of attacks – Europe
Systemic attacks in the Middle East
Download the full report from here:
https://sectrio.com/resources/ot-threat-landscape-reports/sectrio-releases-ot-ics-and-iot-security-threat-landscape-report-2024/
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024Tobias Schneck
As AI technology is pushing into IT I was wondering myself, as an “infrastructure container kubernetes guy”, how get this fancy AI technology get managed from an infrastructure operational view? Is it possible to apply our lovely cloud native principals as well? What benefit’s both technologies could bring to each other?
Let me take this questions and provide you a short journey through existing deployment models and use cases for AI software. On practical examples, we discuss what cloud/on-premise strategy we may need for applying it to our own infrastructure to get it to work from an enterprise perspective. I want to give an overview about infrastructure requirements and technologies, what could be beneficial or limiting your AI use cases in an enterprise environment. An interactive Demo will give you some insides, what approaches I got already working for real.
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...James Anderson
Effective Application Security in Software Delivery lifecycle using Deployment Firewall and DBOM
The modern software delivery process (or the CI/CD process) includes many tools, distributed teams, open-source code, and cloud platforms. Constant focus on speed to release software to market, along with the traditional slow and manual security checks has caused gaps in continuous security as an important piece in the software supply chain. Today organizations feel more susceptible to external and internal cyber threats due to the vast attack surface in their applications supply chain and the lack of end-to-end governance and risk management.
The software team must secure its software delivery process to avoid vulnerability and security breaches. This needs to be achieved with existing tool chains and without extensive rework of the delivery processes. This talk will present strategies and techniques for providing visibility into the true risk of the existing vulnerabilities, preventing the introduction of security issues in the software, resolving vulnerabilities in production environments quickly, and capturing the deployment bill of materials (DBOM).
Speakers:
Bob Boule
Robert Boule is a technology enthusiast with PASSION for technology and making things work along with a knack for helping others understand how things work. He comes with around 20 years of solution engineering experience in application security, software continuous delivery, and SaaS platforms. He is known for his dynamic presentations in CI/CD and application security integrated in software delivery lifecycle.
Gopinath Rebala
Gopinath Rebala is the CTO of OpsMx, where he has overall responsibility for the machine learning and data processing architectures for Secure Software Delivery. Gopi also has a strong connection with our customers, leading design and architecture for strategic implementations. Gopi is a frequent speaker and well-known leader in continuous delivery and integrating security into software delivery.
5. Where I started
Started as my graduate student
“procrastination project” (as Multipack)
in 1998 and became SciPy in 2001 with
the help of colleagues.
108 releases, 766 contributors
Used by: 128,495
Pearu Peterson
Estonia was critical
To both SciPy and
NumPy
6. Where it led for me
Gave up my chance at a tenured academic
position in 2005-2006 to bring together the
diverging array community in Python and unify
Numeric and Numarray.
159 releases, 827 contributors
Used by: 254,856
7. What amplified data science
Created by Wes McKinney. Also, AQR agreed to
release this data-frame he started at AQR (while
dozens of other data-frames in hedge-funds and
investment banks did not get open-sourced)
106 releases, 1601 contributors
Used by: 139,133
8. Why Python for ML?
Created by David Cournapeau as Google Summer
of Code Project and then quickly added to by
100s of researchers around the world. Supported
by INRIA.
100 releases, 1433 contributors
Used by: 70,287
9. First DL Framework in Python
Built at Université de Montréal by Frédéric
Bastien and his students. Many contributors.
Forms foundation for PyMC3 and other libraries.
33 releases, 332 contributors
Used by: 6,194
13. Keys to Python Success
Modular Extensibility
New Types and Functions
Protocol Overloading (i.e. “dunder” methods)
Interoperability
14. Modular Extensibility
Modules Packages
>>> import numpy
>>> numpy.__file__
{path-prefix}numpy/__init__.py
>>> numpy.__path__
{path-prefix}numpy
>>> numpy.linalg.__file__
{path-prefix}numpy/linalg/__init__.py
>>> import math
>>> math.__file__
{path}math{platform}.so
>>> import os
>>> os.__file__
{path}os.py
.pydor
# my_module.py
a = 3
b = 4
def cross(x,y):
Return a*x + b*y
>>> import my_module
>>> my_module.__file__
{path}my_module.py
>>> ks = my_module.__dict__.keys()
>>> [y for y in ks
if not y.startswith('__')]
['a', 'b', 'cross']
subpackages = []
for name in dir(numpy):
obj = getattr(numpy, name)
if hasattr(obj, '__file__') and
obj.__file__.endswith('__init__.py')
subpackages.append(obj.__name__)
>>> print subpackages
[‘numpy.matrixlib','numpy.compat','numpy.core',
'numpy.fft','numpy.lib','numpy.linalg','numpy.ma',
'numpy.matrixlib','numpy.polynomial','numpy.random',
'numpy.testing']
15. New types New functions
class Node:
def __init__(self, item, parent=None):
self.item = item
self.children = []
if parent is not None:
parent.children.append(self)
from math import sqrt
def kurtosis(data):
N = len(data)
mean = sum(data)/N
std = sqrt(sum((x-mean)**2 for x in data)/N)
zi = ((x-mean)/std for x in data)
return sum(z**4 for z in zi)/N - 3
>>> g = Node(“Root”)
>>> type(g)
__main__.Node
>>> type(g).__mro__
(__main__.Node, object)
>>> type(Node).__mro__
(type, object)
>>> type(3)
int
>>> type(3).__mro__
(int, object)
>> type(int).__mro__
(type, object)
>>> type(kurtosis)
function
>>> type(sqrt)
builtin_function_or_method
>>> type(sum)
builtin_function_or_method
>>> import numpy; type(numpy.add)
numpy.ufunc
New Types and Functions
19. First problem: Efficient Data Input
The first step is to get the data right
“It’s Always About the Data”
http://www.python.org/doc/essays/refcnt/
Reference Counting Essay
May 1998
Guido van Rossum
TableIO
April 1998
Michael A. Miller
NumPyIO
June 1998
20. A walk through bitarray
Ilan Schnell
Built all first versions
of Anaconda
bitarray: efficient arrays of booleans
https://github.com/ilanschnell/bitarray
27. Powerful but requires care!
• Reference counting (you have to do this manually)
• Error handling (can be tedious)
• Initialization (can byte you badly if you aren’t careful)
• Other run-times (PyPy, RustPython) can’t easily use
your tool.
• You have access to all the machinery Python itself
uses to create all of its own builtins.
• You are literally extending Python with new builtin
types and functions.
• Incredible speed as fast as machine can work.
29. What should you do today?
• Just write your code in Python and use existing extensions.
• If More Speed is needed:
My opinionated modern view
• Use Numba
• Use Cython
• Use mypy (and eventually mypyc)
• Run with PyPy
• Use Rust and PyO3
• Or if few existing extensions being used:
30. • An open-source, function-at-a-time compiler library for Python
• Compiler toolbox for different targets and execution models:
• single-threaded CPU, multi-threaded CPU, GPU
• regular functions, “universal functions” (array functions), etc
• Speedup: 2x (compared to basic NumPy code) to 200x (compared to pure
Python)
• Combine ease of writing Python with speeds approaching FORTRAN
• Empowers scientists who make tools for themselves and other scientists
Numba: A JIT Compiler for Python
31. 7 things about Numba you may not know
1
2
3
4
5
6
7
Numba is 100% Open Source
Numba + Jupyter = Rapid
CUDA Prototyping
Numba can compile for the
CPU and the GPU at the same time
Numba makes array processing
easy with @(gu)vectorize
Numba comes with a
CUDA Simulator
You can send Numba
functions over the network
Numba has typed Lists and
Dictionaries (soon)
32. Numba (compile Python to CPUs and GPUs)
conda install numba
Intermediate
Representation
(IR)
x86
ARM
PTX
Python
LLVMNumba
Code Generation
Backend
Parsing
Frontend
33. How does Numba work?
Python Function
(bytecode)
Bytecode
Analysis
Functions
Arguments
Numba IR
Machine
Code
Execute!
Type
Inference
LLVM/NVVM JIT LLVM IR
Lowering
Rewrite IR
Cache
@jit
def do_math(a, b):
…
>>> do_math(x, y)
34. Supported Platforms and Hardware
OS HW SW
Windows
(7 and later)
32 and 64-bit CPUs (Incl
Xeon Phi)
Python 2.7, 3.4-3.7
OS X
(10.9 and later)
CUDA & HSA GPUs NumPy 1.10 and later
Linux
(RHEL 6 and later)
Some support for ARM and
ROCm
36. Basic Example
Array Allocation
Looping over ndarray x as an iterator
Using numpy math functions
Returning a slice of the array
2.7x speedup!
Numba decorator
(nopython=True not required)
37. • Detects CPU model during compilation and optimizes for that target
• Automatic type inference: No need to give type signatures for functions
• Dispatches to multiple type-specializations for the same function
• Call out to C libraries with CFFI and types
• Special "callback" mode for creating C callbacks to use with external
libraries
• Optional caching to disk, and ahead-of-time creation of shared libraries
• Compiler is extensible with new data types and functions
Numba Features
38. • Three main technologies for parallelism:
Parallel Computing
SIMD Multi-threading Distributed Computing
x0x1x2x3 x0x1x2x3
x0x3
x2 x1
39. • Numba's CPU detection will enable
LLVM to autovectorize for
appropriate SIMD instruction set:
• SSE, AVX, AVX2, AVX-512
• Will become even more important
as AVX-512 is now available on
both Xeon Phi and Skylake Xeon
processors
SIMD: Single Instruction Multiple Data
40. Manual Multithreading: Release the GIL
SpeedupRatio 0
0.9
1.8
2.6
3.5
Number of Threads
1 2 4
Option to release the GIL
Using Python
concurrent.futures
41. Universal Functions (Ufuncs)
Ufuncs are a core concept in NumPy for array-oriented
computing.
◦ A function with scalar inputs is broadcast across the elements of
the input arrays:
• np.add([1,2,3], 3) == [4, 5, 6]
• np.add([1,2,3], [10, 20, 30]) == [11, 22, 33]
◦ Parallelism is present, by construction. Numba will generate
loops and can automatically multi-thread if requested.
◦ Before Numba, creating fast ufuncs required writing C. No
longer!
44. ParallelAccelerator
• ParallelAccelerator is a special compiler pass contributed by Intel Labs
• Todd A. Anderson, Ehsan Totoni, Paul Liu
• Based on similar contribution to Julia
• Automatically generates mulithreaded code in a Numba compiled-
function:
• Array expressions and reductions
• Random functions
• Dot products
• Explicit loops indicated with prange() call
49. Basic use
Create a text file with a .pyx extension along with a setup.py
setup.py
helloworld.pyx
Hint: can use %%cython magic in notebooks
After %load_ext Cython
Borrowed from Cython documentation
cython.readthedocs.io
54. MyPyC
mypyc is a compiler that compiles mypy-annotated, statically typed
Python modules into CPython C extensions.
https://github.com/python/mypy/tree/master/mypyc
• Most type annotations are enforced at runtime (raising TypeError on mismatch)
• Classes are compiled into extension classes without __dict__ (much, but not quite, like if they used __slots__)
• Monkey patching doesn't work
• Instance attributes won't fall back to class attributes if undefined
• Metaclasses not supported
• Also there are still a bunch of bad bugs and unsupported features :)
Still Experimental!
60. What do we need?
•A way to extend Python that targets multiple run-
times by default (at least PyPy, CPython,
RustPython) with the ability to add new run-times
•Use a subset of typed-Python to do it — i.e. a
domain-specific extension language in Python
itself
•Need NumPy, Pandas, SciPy, Scikit-Learn, and
more to use this approach (this will take time)
62. A Bold Proposal
• Create a Cython-like tool that uses mypy typing
• Borrow heavily from Cython ideas but start a new
project that could be pulled into Python itself.
• At the same time work from below to continue the
clean-up of CPython C-API that has already started.
63. Need ~$5million commitment for a 3-year project to start this
• Core team of 5+ devs with 1 lead
• 1/2 time project manager and PSF representative
• 3+ community liaisons and developer evangelists
• Start with $500k Phase 0 to prove the idea
• Get total funding from at least 20 companies:
$25k initial buy-in, at least $250k
commitment over 3 years to start the effort.
• Allow up to $100k initial and $1million
commitment.
• Paying participants get project-management
attention and early easy-to-use runtimes and
binary extensions delivered with ability to set
priorities (plus marketing and the knowledge
they are leading Python forward).
How?
LABS
Cooperative
Community
Work Order
• We have the people in our network of
collaborators.
• We have a sales and marketing team
that will pitch this.
• We are just rolling out the proposal.
Interested? travis@quansight.com
65. A new platform to help open-source projects and
developers thrive professionally and financially.
Sign up to:
• build your open-
source portfolio
• show which
projects you use
• thank contributors
for projects you
love
• (soon) get
connected to
initiatives like the
one to make
Python universally
extensible.
https://openteams.com
66. LABS
Sustaining the Future
Open-source innovation and
maintenance around the entire data-
science and AI workflow.
• NumPy ecosystem maintenance
• Maintenance and support with PyData core team
• Improve connection of NumPy to ML Frameworks
• GPU Support for NumPy Ecosystem
• Improve foundations of Array computing
• JupyterLab
• Data Catalog standards
• Packaging (conda-forge, PyPA, etc.)
uarray — unified array interface and symbolic NumPy
xnd — re-factored NumPy (low-level cross-language
libraries for N-D (tensor) computing)
Partnered with NumFOCUS
and Ursa Labs (supporting
Apache Arrow)