Talk given at first OmniSci user conference where I discuss cooperating with open-source communities to ensure you get useful answers quickly from your data. I get a chance to introduce OpenTeams in this talk as well and discuss how it can help companies cooperate with communities.
4. Building new solutions
Replaced by
Spin Out
Spin Out
2012
2018
?
Key members of the management team at Continuum created
Quansight. In a real sense NumFOCUS and Anaconda are our first
(spin-out) organizations.
2015
5. Build and Connect
Companies and
Communities to
Solve Challenging
Problems with Data
Continuing my quest to find more
ways to pay developers to work on
open source!
7. An early stage venture
capital firm investing in
startups that build on
open-source technology
and support the
communities they
depend on.
Bradden Blair
supporting
FairOSS
8. LABS
Sustaining the Future
Open-source innovation and
maintenance around the entire data-
science and AI workflow.
• NumPy ecosystem maintenance (PyData Core Team)
• Improve connection of NumPy to ML Frameworks
• GPU Support for NumPy Ecosystem
• Improve foundations of Array computing
• JupyterLab and JupyterHub
• Data Catalog standards
• Packaging (conda-forge, PyPA, etc.)
uarray — unified array interface for SciPy refactor
xnd — re-factored NumPy (low-level cross-language
libraries for N-D (tensor) computing)
Collaborating with
NumFOCUS and Ursa Labs
(supporting Apache Arrow)
Bokeh
Adapted from Jake Vanderplas
PyCon 2017 Keynote
10. Where I started
Started as my graduate student
“procrastination project” (as Multipack)
in 1998 and became SciPy in 2001 with
the help of Eric Jones, Pearu Peterson,
and others.
108 releases, 766 contributors
Used by: 128,495
11. SciPy
“Distribution of Python Numerical Tools masquerading as one Library”
Name Description
cluster KMeans and Vector Quantization
fftpack Discrete Fourier Transform
integrate Numerical Integration
interpolate Interpolation routines
io Data Input and Output
linalg Fast Linear algebra
misc Utilities
ndimage N-dimensional Image processing
Name Description
odr Orthogonal Distance Regression
optimize
Constrained and Unconstrained
Optimization
signal Signal Processing Tools
sparse Sparse Matrices and Algebra
spatial Spatial Data Structures and Algorithms
special Special functions (e.g. Bessel)
stats Statistical Functions and Distributions
12. Where it led for me
159 releases, 827 contributors
Used by: 254,856
Standard Array/Tensor Library driving Python
to be de facto language for Data Science and ML
13. Brief History of NumPy
Person Package Year
Jim Fulton Matrix Object 1994
Jim Hugunin Numeric 1995
Perry Greenfield,
Rick White,Todd
Miller
Numarray 2001
Travis Oliphant NumPy 2005
14. NumPy was created to unify array objects in
Python and unify the early PyData community
Numeric
Numarray
NumPy
I essentially sacrificed tenure at a University to write NumPy and
unify array objects.
18. Thriving in a Data-Driven
World starts with building on
the Open Source Software
that forms the foundation of
Data Science and Machine
Learning today.
Open Source
Ecosystem
Your Product/
Project
19. With Quansight, you can actually
“influence the direction of the wind”
LABS
Open-source powered development
Community Work Orders let you influence OSS
Cooperative Platform for Community Work Orders
20. An effective case study in connecting with open source communities
(harnessing and influencing the open-source wind)
21. Quansight and OmniSci funded 10+ open source developers for
1¾ years to connect OmniSci with the Pydata community
! JupyterLab Extensions
! Ibis SQL Framework (OmniSci Backend, geospatial
function)
! Altair & VegaLite Visualization (Modernized visualization
specifications)
! Conda packages
! User Defined Table Functions with Numba
OmniSci Immerse & JupyterLab working interchangably. Python
Data Scientists and OmniSci users can work in a unified
development environment.
24. OmniSci Engine can be
Connected directly to
JupyterLab components
The data can be used by
the entire PyData
ecosystem
All the open-source
deliciousness can be re-
used
As the community tools
get better, OmniSci users
benefit automatically!
25.
26. Compile (Numerical) Python
to Native code for CPU and GPU
an open source JIT compiler that
translates a subset of Python and
NumPy code into fast machine code.
http://numba.pydata.org
29. Problem
Open Source Teams
! Burned out
! Underrepresented
! Underpaid
Organizations
! Disconnected from
the Community
! Lack support and
maintenance
There’s no easy way to connect the
community with organizations
30. Solution
A marketplace where companies can cooperatively fund progress and
maintenance for projects and technology that affects them
Organizations
Save money &
Reduce risk
Teams
Improve
project health
Copyright OpenTeams 2019. All rights reserved.
34. Initiatives are Progress or
Maintenance with an
accountable organization
committed to finalizing and
following-up to do the work
using open-source devs.
The platform enables easy
signaling and cooperation
between many potential
funders and organized open-
source groups.
Alpha Feature!
35. Several Deep Learning Libraries to choose
Built on NumPy/SciPy
Recommended
Recommended
36. Key Features Needed for any ML Library
• Ability to create chains of functions on n-dimensional arrays
• Ability to derive the derivative of the Loss-Function quickly (Automatic
Differentiation)
• Key Loss Functions implemented
• Cross-validation methods
• An Optimization library with several useful methods
• Ability to compute functions on n-dimensional arrays on multiple
hardware with highly parallel-execution
• Ability to create chains of functions on n-dimensional arrays
• Ability to compute functions on n-dimensional arrays on multiple hardware
For Training
For Inference
Missing from NumPy / SciPy and Scikit-Learn
37. Most Libraries (other than Chainer) chose
to re-implement NumPy and SciPy as they
needed.
• Started with a legacy code in another language
• Had to work with other languages too (Node, Java, C++, Lua, etc.)
• Needed only a subset of functionality of NumPy / SciPy to build ML
• Needed GPU support
• Lacked familiarity with the NumPy / SciPy communities and how to engage
with them
Reasons:
Result: Many competing similar choices for Deep Learning
40. NumPy was created to unify array objects in
Python and unify the early PyData community
Numeric
Numarray
NumPy
I essentially sacrificed tenure at a University to write NumPy and
unify array objects.
41. We have a “divided” community again!
Numeric
Numarray
NumPy
43. Examples of packages being built on
fragmented APIS
FastAI
skorch
Pyro
Eduard
anyrl
Braid
PyMC4
Horovod
MLFlow
But note
44. Real Problem — Funding for Community Devs
Full-time: 2 Full-time: .5
Full-time: 1
Open Source is too important to be just left to volunteer time — current situation is not working to
sustain millions of users:
• No funding for creators of these libraries to continue their work
• GPU support could have been added to NumPy years ago
• SciPy took 17 years to hit 1.0
• NumPy should already be at 2.0 — but not without full-time guidance
Full-time: 2
Full-time: 0
45. Solution
A marketplace where companies can cooperatively fund progress and
maintenance for projects and technology that affects them
Organizations
Save money &
Reduce risk
Teams
Improve
project health
Copyright OpenTeams 2019. All rights reserved.
46. Initiatives are Progress or
Maintenance with an
accountable organization
committed to finalizing and
following-up to do the work
using open-source devs.
The platform enables easy
signaling and cooperation
between many potential
funders and organized open-
source groups.
48. High Level APIs for Arrays (Tensors),
DataFrames, and DataTypes
LABS
49. OpenTensors
• Community-driven and governed with many companies and
contributors (project managed by Quansight Labs)
• Addition of standardized automatic differentiation, graph-construction
(lazy mode), addition of GPUs, and sparse arrays
• Use for Deep Learning but all the other uses of PyData/NumFOCUS
ecosystem
Provide a community-sponsored and backed future!
50. Join Us!
• Solidifying commitments for at least $6million over 3 years
($2million / year) (need <33% from any one company).
• Register support for the initiative on openteams.com
• Email rgommers@quansight.com, travis@quansight.com or
matt@quansight.com
• Tweet to @quansightai or @openteamsinc
• Get in touch to ensure your needs are included in the initial
deliverables
51. How to Thrive In a Data-Driven World?
Open Source Contributors of the Projects you depend on!