SlideShare a Scribd company logo
1 of 51
Download to read offline
© 2017 Continuum Analytics - Confidential & Proprietary© 2018 Quansight - Confidential & Proprietary
Thriving in a Data Driven World
travis@quansight.com
@quansightai
@teoliphant
Converge 2019
https://www.quansight.com
1998 20182001
2015
2009 20122005
…
2001
2006
My Python Data and ML/AI Time-Line
1991
2003
2014
2008
2010 2016
2009
Starting companies to sustain OSS
renamed
~18 million Anaconda users
Peter Wang
Building new solutions
Replaced by
Spin Out
Spin Out
2012
2018
?
Key members of the management team at Continuum created
Quansight. In a real sense NumFOCUS and Anaconda are our first
(spin-out) organizations.
2015
Build and Connect
Companies and
Communities to
Solve Challenging
Problems with Data
Continuing my quest to find more
ways to pay developers to work on
open source!
© 2018 Quansight - Confidential & Proprietary
6
Core Business
Quansight Labs Support
Staffing / Mentoring / Training
Custom Data/Viz/ML Consulting
Open Source Consulting
An early stage venture
capital firm investing in
startups that build on
open-source technology
and support the
communities they
depend on.
Bradden Blair
supporting
FairOSS
LABS
Sustaining the Future
Open-source innovation and
maintenance around the entire data-
science and AI workflow.
• NumPy ecosystem maintenance (PyData Core Team)
• Improve connection of NumPy to ML Frameworks
• GPU Support for NumPy Ecosystem
• Improve foundations of Array computing
• JupyterLab and JupyterHub
• Data Catalog standards
• Packaging (conda-forge, PyPA, etc.)
uarray — unified array interface for SciPy refactor
xnd — re-factored NumPy (low-level cross-language
libraries for N-D (tensor) computing)
Collaborating with
NumFOCUS and Ursa Labs
(supporting Apache Arrow)
Bokeh
Adapted from Jake Vanderplas
PyCon 2017 Keynote
My little “side projects” became my life
Where I started
Started as my graduate student
“procrastination project” (as Multipack)
in 1998 and became SciPy in 2001 with
the help of Eric Jones, Pearu Peterson,
and others.
108 releases, 766 contributors
Used by: 128,495
SciPy
“Distribution of Python Numerical Tools masquerading as one Library”
Name Description
cluster KMeans and Vector Quantization
fftpack Discrete Fourier Transform
integrate Numerical Integration
interpolate Interpolation routines
io Data Input and Output
linalg Fast Linear algebra
misc Utilities
ndimage N-dimensional Image processing
Name Description
odr Orthogonal Distance Regression
optimize
Constrained and Unconstrained
Optimization
signal Signal Processing Tools
sparse Sparse Matrices and Algebra
spatial Spatial Data Structures and Algorithms
special Special functions (e.g. Bessel)
stats Statistical Functions and Distributions
Where it led for me
159 releases, 827 contributors
Used by: 254,856
Standard Array/Tensor Library driving Python
to be de facto language for Data Science and ML
Brief History of NumPy
Person Package Year
Jim Fulton Matrix Object 1994
Jim Hugunin Numeric 1995
Perry Greenfield,
Rick White,Todd
Miller
Numarray 2001
Travis Oliphant NumPy 2005
NumPy was created to unify array objects in
Python and unify the early PyData community
Numeric
Numarray
NumPy
I essentially sacrificed tenure at a University to write NumPy and
unify array objects.
Python’s Scientific Ecosystem
Bokeh
Jake Vanderplas PyCon 2017 Keynote
Huge Impact (from diverse efforts of 1000s)
LIGO : Gravitional Waves
Higgs Boson
Discovery
Black Hole
Imaging
Java
JavaScript
Python
Google Search Trends
Jun 2019
Thriving in a Data-Driven
World starts with building on
the Open Source Software
that forms the foundation of
Data Science and Machine
Learning today.
Open Source
Ecosystem
Your Product/
Project
With Quansight, you can actually
“influence the direction of the wind”
LABS
Open-source powered development
Community Work Orders let you influence OSS
Cooperative Platform for Community Work Orders
An effective case study in connecting with open source communities
(harnessing and influencing the open-source wind)
Quansight and OmniSci funded 10+ open source developers for
1¾ years to connect OmniSci with the Pydata community
! JupyterLab Extensions
! Ibis SQL Framework (OmniSci Backend, geospatial
function)
! Altair & VegaLite Visualization (Modernized visualization
specifications)
! Conda packages
! User Defined Table Functions with Numba
OmniSci Immerse & JupyterLab working interchangably. Python
Data Scientists and OmniSci users can work in a unified
development environment.
Jupyter Lab Extension
OmniSci Engine can be
Connected directly to
JupyterLab components
The data can be used by
the entire PyData
ecosystem
All the open-source
deliciousness can be re-
used
As the community tools
get better, OmniSci users
benefit automatically!
Compile (Numerical) Python
to Native code for CPU and GPU
an open source JIT compiler that
translates a subset of Python and
NumPy code into fast machine code.
http://numba.pydata.org
Omnisci Table
User-defined
Table Functions
Result Set
rbc (Remote-Backend Compiler)
SQL Engine
https://github.com/xnd-project/rbcXNDhttps://xnd.io
Problem
Open Source Teams
! Burned out
! Underrepresented
! Underpaid
Organizations
! Disconnected from
the Community
! Lack support and
maintenance
There’s no easy way to connect the
community with organizations
Solution
A marketplace where companies can cooperatively fund progress and
maintenance for projects and technology that affects them
Organizations
Save money &
Reduce risk
Teams
Improve
project health
Copyright OpenTeams 2019. All rights reserved.
Projects develop their roadmaps
Copyright OpenTeams 2019. All rights reserved.
Product
Organizations find and fund projects they depend on
Product
Companies hire from the communityProduct
Initiatives are Progress or
Maintenance with an
accountable organization
committed to finalizing and
following-up to do the work
using open-source devs.
The platform enables easy
signaling and cooperation
between many potential
funders and organized open-
source groups.
Alpha Feature!
Several Deep Learning Libraries to choose
Built on NumPy/SciPy
Recommended
Recommended
Key Features Needed for any ML Library
• Ability to create chains of functions on n-dimensional arrays
• Ability to derive the derivative of the Loss-Function quickly (Automatic
Differentiation)
• Key Loss Functions implemented
• Cross-validation methods
• An Optimization library with several useful methods
• Ability to compute functions on n-dimensional arrays on multiple
hardware with highly parallel-execution
• Ability to create chains of functions on n-dimensional arrays
• Ability to compute functions on n-dimensional arrays on multiple hardware
For Training
For Inference
Missing from NumPy / SciPy and Scikit-Learn
Most Libraries (other than Chainer) chose
to re-implement NumPy and SciPy as they
needed.
• Started with a legacy code in another language
• Had to work with other languages too (Node, Java, C++, Lua, etc.)
• Needed only a subset of functionality of NumPy / SciPy to build ML
• Needed GPU support
• Lacked familiarity with the NumPy / SciPy communities and how to engage
with them
Reasons:
Result: Many competing similar choices for Deep Learning
]
https://github.com/josephmisiti/awesome-machine-learning#python-general-purpose
http://deeplearning.net/software_links/
http://scikit-learn.org/stable/related_projects.html
Explosion of ML Frameworks and libraries
TVM/NNVM
Now array-like objects everywhere
Sparse Arrays
Neon
CUDArray
NumPy was created to unify array objects in
Python and unify the early PyData community
Numeric
Numarray
NumPy
I essentially sacrificed tenure at a University to write NumPy and
unify array objects.
We have a “divided” community again!
Numeric
Numarray
NumPy
Python’s Scientific Ecosystem
Bokeh
Jake Vanderplas PyCon 2017 Keynote
Examples of packages being built on
fragmented APIS
FastAI
skorch
Pyro
Eduard
anyrl
Braid
PyMC4
Horovod
MLFlow
But note
Real Problem — Funding for Community Devs
Full-time: 2 Full-time: .5
Full-time: 1
Open Source is too important to be just left to volunteer time — current situation is not working to
sustain millions of users:
• No funding for creators of these libraries to continue their work
• GPU support could have been added to NumPy years ago
• SciPy took 17 years to hit 1.0
• NumPy should already be at 2.0 — but not without full-time guidance
Full-time: 2
Full-time: 0
Solution
A marketplace where companies can cooperatively fund progress and
maintenance for projects and technology that affects them
Organizations
Save money &
Reduce risk
Teams
Improve
project health
Copyright OpenTeams 2019. All rights reserved.
Initiatives are Progress or
Maintenance with an
accountable organization
committed to finalizing and
following-up to do the work
using open-source devs.
The platform enables easy
signaling and cooperation
between many potential
funders and organized open-
source groups.
Community proposal — gathering support
High Level APIs for Arrays (Tensors),
DataFrames, and DataTypes
LABS
OpenTensors
• Community-driven and governed with many companies and
contributors (project managed by Quansight Labs)
• Addition of standardized automatic differentiation, graph-construction
(lazy mode), addition of GPUs, and sparse arrays
• Use for Deep Learning but all the other uses of PyData/NumFOCUS
ecosystem
Provide a community-sponsored and backed future!
Join Us!
• Solidifying commitments for at least $6million over 3 years
($2million / year) (need <33% from any one company).
• Register support for the initiative on openteams.com
• Email rgommers@quansight.com, travis@quansight.com or
matt@quansight.com
• Tweet to @quansightai or @openteamsinc
• Get in touch to ensure your needs are included in the initial
deliverables
How to Thrive In a Data-Driven World?
Open Source Contributors of the Projects you depend on!

More Related Content

What's hot

Intro to TensorFlow and PyTorch Workshop at Tubular Labs
Intro to TensorFlow and PyTorch Workshop at Tubular LabsIntro to TensorFlow and PyTorch Workshop at Tubular Labs
Intro to TensorFlow and PyTorch Workshop at Tubular LabsKendall
 
Scipy 2011 Time Series Analysis in Python
Scipy 2011 Time Series Analysis in PythonScipy 2011 Time Series Analysis in Python
Scipy 2011 Time Series Analysis in PythonWes McKinney
 
[Update] PyTorch Tutorial for NTU Machine Learing Course 2017
[Update] PyTorch Tutorial for NTU Machine Learing Course 2017[Update] PyTorch Tutorial for NTU Machine Learing Course 2017
[Update] PyTorch Tutorial for NTU Machine Learing Course 2017Yu-Hsun (lymanblue) Lin
 
Getting started with TensorFlow
Getting started with TensorFlowGetting started with TensorFlow
Getting started with TensorFlowElifTech
 
Deep Learning with PyTorch
Deep Learning with PyTorchDeep Learning with PyTorch
Deep Learning with PyTorchMayur Bhangale
 
Tensorflow presentation
Tensorflow presentationTensorflow presentation
Tensorflow presentationAhmed rebai
 
Python NumPy Tutorial | NumPy Array | Edureka
Python NumPy Tutorial | NumPy Array | EdurekaPython NumPy Tutorial | NumPy Array | Edureka
Python NumPy Tutorial | NumPy Array | EdurekaEdureka!
 
Koss Lab 세미나 오픈소스 인공지능(AI) 프레임웍파헤치기
Koss Lab 세미나 오픈소스 인공지능(AI) 프레임웍파헤치기 Koss Lab 세미나 오픈소스 인공지능(AI) 프레임웍파헤치기
Koss Lab 세미나 오픈소스 인공지능(AI) 프레임웍파헤치기 Mario Cho
 
What is TensorFlow? | Introduction to TensorFlow | TensorFlow Tutorial For Be...
What is TensorFlow? | Introduction to TensorFlow | TensorFlow Tutorial For Be...What is TensorFlow? | Introduction to TensorFlow | TensorFlow Tutorial For Be...
What is TensorFlow? | Introduction to TensorFlow | TensorFlow Tutorial For Be...Simplilearn
 
TensorFlow Object Detection | Realtime Object Detection with TensorFlow | Ten...
TensorFlow Object Detection | Realtime Object Detection with TensorFlow | Ten...TensorFlow Object Detection | Realtime Object Detection with TensorFlow | Ten...
TensorFlow Object Detection | Realtime Object Detection with TensorFlow | Ten...Edureka!
 
TensorFlow Dev Summit 2017 요약
TensorFlow Dev Summit 2017 요약TensorFlow Dev Summit 2017 요약
TensorFlow Dev Summit 2017 요약Jin Joong Kim
 
Deep learning with TensorFlow
Deep learning with TensorFlowDeep learning with TensorFlow
Deep learning with TensorFlowNdjido Ardo BAR
 
Online learning, Vowpal Wabbit and Hadoop
Online learning, Vowpal Wabbit and HadoopOnline learning, Vowpal Wabbit and Hadoop
Online learning, Vowpal Wabbit and HadoopHéloïse Nonne
 
The Flow of TensorFlow
The Flow of TensorFlowThe Flow of TensorFlow
The Flow of TensorFlowJeongkyu Shin
 
深層学習フレームワーク概要とChainerの事例紹介
深層学習フレームワーク概要とChainerの事例紹介深層学習フレームワーク概要とChainerの事例紹介
深層学習フレームワーク概要とChainerの事例紹介Kenta Oono
 
OpenPOWER Workshop in Silicon Valley
OpenPOWER Workshop in Silicon ValleyOpenPOWER Workshop in Silicon Valley
OpenPOWER Workshop in Silicon ValleyGanesan Narayanasamy
 

What's hot (20)

Icpp power ai-workshop 2018
Icpp power ai-workshop 2018Icpp power ai-workshop 2018
Icpp power ai-workshop 2018
 
Intro to TensorFlow and PyTorch Workshop at Tubular Labs
Intro to TensorFlow and PyTorch Workshop at Tubular LabsIntro to TensorFlow and PyTorch Workshop at Tubular Labs
Intro to TensorFlow and PyTorch Workshop at Tubular Labs
 
Scipy 2011 Time Series Analysis in Python
Scipy 2011 Time Series Analysis in PythonScipy 2011 Time Series Analysis in Python
Scipy 2011 Time Series Analysis in Python
 
[Update] PyTorch Tutorial for NTU Machine Learing Course 2017
[Update] PyTorch Tutorial for NTU Machine Learing Course 2017[Update] PyTorch Tutorial for NTU Machine Learing Course 2017
[Update] PyTorch Tutorial for NTU Machine Learing Course 2017
 
TensorFlow
TensorFlowTensorFlow
TensorFlow
 
Getting started with TensorFlow
Getting started with TensorFlowGetting started with TensorFlow
Getting started with TensorFlow
 
Deep Learning with PyTorch
Deep Learning with PyTorchDeep Learning with PyTorch
Deep Learning with PyTorch
 
Tensorflow presentation
Tensorflow presentationTensorflow presentation
Tensorflow presentation
 
Python NumPy Tutorial | NumPy Array | Edureka
Python NumPy Tutorial | NumPy Array | EdurekaPython NumPy Tutorial | NumPy Array | Edureka
Python NumPy Tutorial | NumPy Array | Edureka
 
Koss Lab 세미나 오픈소스 인공지능(AI) 프레임웍파헤치기
Koss Lab 세미나 오픈소스 인공지능(AI) 프레임웍파헤치기 Koss Lab 세미나 오픈소스 인공지능(AI) 프레임웍파헤치기
Koss Lab 세미나 오픈소스 인공지능(AI) 프레임웍파헤치기
 
What is TensorFlow? | Introduction to TensorFlow | TensorFlow Tutorial For Be...
What is TensorFlow? | Introduction to TensorFlow | TensorFlow Tutorial For Be...What is TensorFlow? | Introduction to TensorFlow | TensorFlow Tutorial For Be...
What is TensorFlow? | Introduction to TensorFlow | TensorFlow Tutorial For Be...
 
TensorFlow Object Detection | Realtime Object Detection with TensorFlow | Ten...
TensorFlow Object Detection | Realtime Object Detection with TensorFlow | Ten...TensorFlow Object Detection | Realtime Object Detection with TensorFlow | Ten...
TensorFlow Object Detection | Realtime Object Detection with TensorFlow | Ten...
 
TensorFlow Dev Summit 2017 요약
TensorFlow Dev Summit 2017 요약TensorFlow Dev Summit 2017 요약
TensorFlow Dev Summit 2017 요약
 
Tensorflow
TensorflowTensorflow
Tensorflow
 
Intro to Python
Intro to PythonIntro to Python
Intro to Python
 
Deep learning with TensorFlow
Deep learning with TensorFlowDeep learning with TensorFlow
Deep learning with TensorFlow
 
Online learning, Vowpal Wabbit and Hadoop
Online learning, Vowpal Wabbit and HadoopOnline learning, Vowpal Wabbit and Hadoop
Online learning, Vowpal Wabbit and Hadoop
 
The Flow of TensorFlow
The Flow of TensorFlowThe Flow of TensorFlow
The Flow of TensorFlow
 
深層学習フレームワーク概要とChainerの事例紹介
深層学習フレームワーク概要とChainerの事例紹介深層学習フレームワーク概要とChainerの事例紹介
深層学習フレームワーク概要とChainerの事例紹介
 
OpenPOWER Workshop in Silicon Valley
OpenPOWER Workshop in Silicon ValleyOpenPOWER Workshop in Silicon Valley
OpenPOWER Workshop in Silicon Valley
 

Similar to Keynote at Converge 2019

Role of python in hpc
Role of python in hpcRole of python in hpc
Role of python in hpcDr Reeja S R
 
Deep Learning Frameworks 2019 | Which Deep Learning Framework To Use | Deep L...
Deep Learning Frameworks 2019 | Which Deep Learning Framework To Use | Deep L...Deep Learning Frameworks 2019 | Which Deep Learning Framework To Use | Deep L...
Deep Learning Frameworks 2019 | Which Deep Learning Framework To Use | Deep L...Simplilearn
 
Enabling Python to be a Better Big Data Citizen
Enabling Python to be a Better Big Data CitizenEnabling Python to be a Better Big Data Citizen
Enabling Python to be a Better Big Data CitizenWes McKinney
 
AWS re:Invent 2016: Bringing Deep Learning to the Cloud with Amazon EC2 (CMP314)
AWS re:Invent 2016: Bringing Deep Learning to the Cloud with Amazon EC2 (CMP314)AWS re:Invent 2016: Bringing Deep Learning to the Cloud with Amazon EC2 (CMP314)
AWS re:Invent 2016: Bringing Deep Learning to the Cloud with Amazon EC2 (CMP314)Amazon Web Services
 
The road ahead for scientific computing with Python
The road ahead for scientific computing with PythonThe road ahead for scientific computing with Python
The road ahead for scientific computing with PythonRalf Gommers
 
PyDataStructs Tech Share at Quansight
PyDataStructs Tech Share at QuansightPyDataStructs Tech Share at Quansight
PyDataStructs Tech Share at QuansightGagandeep Singh
 
Python in geospatial analysis
Python in geospatial analysisPython in geospatial analysis
Python in geospatial analysisSakthivel R
 
Top 7 Frameworks for Integration AI in App Development
Top 7 Frameworks for Integration AI in App DevelopmentTop 7 Frameworks for Integration AI in App Development
Top 7 Frameworks for Integration AI in App DevelopmentInexture Solutions
 
PyData: The Next Generation | Data Day Texas 2015
PyData: The Next Generation | Data Day Texas 2015PyData: The Next Generation | Data Day Texas 2015
PyData: The Next Generation | Data Day Texas 2015Cloudera, Inc.
 
Elyra - a set of AI-centric extensions to JupyterLab Notebooks.
Elyra - a set of AI-centric extensions to JupyterLab Notebooks.Elyra - a set of AI-centric extensions to JupyterLab Notebooks.
Elyra - a set of AI-centric extensions to JupyterLab Notebooks.Luciano Resende
 
Deep Learning on Qubole Data Platform
Deep Learning on Qubole Data PlatformDeep Learning on Qubole Data Platform
Deep Learning on Qubole Data PlatformShivaji Dutta
 
Big Data Analytics (ML, DL, AI) hands-on
Big Data Analytics (ML, DL, AI) hands-onBig Data Analytics (ML, DL, AI) hands-on
Big Data Analytics (ML, DL, AI) hands-onDony Riyanto
 
Machine learning from software developers point of view
Machine learning from software developers point of viewMachine learning from software developers point of view
Machine learning from software developers point of viewPierre Paci
 
An Incomplete Data Tools Landscape for Hackers in 2015
An Incomplete Data Tools Landscape for Hackers in 2015An Incomplete Data Tools Landscape for Hackers in 2015
An Incomplete Data Tools Landscape for Hackers in 2015Wes McKinney
 

Similar to Keynote at Converge 2019 (20)

Python libraries
Python librariesPython libraries
Python libraries
 
PyData Boston 2013
PyData Boston 2013PyData Boston 2013
PyData Boston 2013
 
Role of python in hpc
Role of python in hpcRole of python in hpc
Role of python in hpc
 
London level39
London level39London level39
London level39
 
Deep Learning Frameworks 2019 | Which Deep Learning Framework To Use | Deep L...
Deep Learning Frameworks 2019 | Which Deep Learning Framework To Use | Deep L...Deep Learning Frameworks 2019 | Which Deep Learning Framework To Use | Deep L...
Deep Learning Frameworks 2019 | Which Deep Learning Framework To Use | Deep L...
 
Enabling Python to be a Better Big Data Citizen
Enabling Python to be a Better Big Data CitizenEnabling Python to be a Better Big Data Citizen
Enabling Python to be a Better Big Data Citizen
 
AWS re:Invent 2016: Bringing Deep Learning to the Cloud with Amazon EC2 (CMP314)
AWS re:Invent 2016: Bringing Deep Learning to the Cloud with Amazon EC2 (CMP314)AWS re:Invent 2016: Bringing Deep Learning to the Cloud with Amazon EC2 (CMP314)
AWS re:Invent 2016: Bringing Deep Learning to the Cloud with Amazon EC2 (CMP314)
 
The road ahead for scientific computing with Python
The road ahead for scientific computing with PythonThe road ahead for scientific computing with Python
The road ahead for scientific computing with Python
 
PyDataStructs Tech Share at Quansight
PyDataStructs Tech Share at QuansightPyDataStructs Tech Share at Quansight
PyDataStructs Tech Share at Quansight
 
Toolboxes for data scientists
Toolboxes for data scientistsToolboxes for data scientists
Toolboxes for data scientists
 
Python in geospatial analysis
Python in geospatial analysisPython in geospatial analysis
Python in geospatial analysis
 
Top 7 Frameworks for Integration AI in App Development
Top 7 Frameworks for Integration AI in App DevelopmentTop 7 Frameworks for Integration AI in App Development
Top 7 Frameworks for Integration AI in App Development
 
PyData: The Next Generation | Data Day Texas 2015
PyData: The Next Generation | Data Day Texas 2015PyData: The Next Generation | Data Day Texas 2015
PyData: The Next Generation | Data Day Texas 2015
 
Python
PythonPython
Python
 
Elyra - a set of AI-centric extensions to JupyterLab Notebooks.
Elyra - a set of AI-centric extensions to JupyterLab Notebooks.Elyra - a set of AI-centric extensions to JupyterLab Notebooks.
Elyra - a set of AI-centric extensions to JupyterLab Notebooks.
 
Python ml
Python mlPython ml
Python ml
 
Deep Learning on Qubole Data Platform
Deep Learning on Qubole Data PlatformDeep Learning on Qubole Data Platform
Deep Learning on Qubole Data Platform
 
Big Data Analytics (ML, DL, AI) hands-on
Big Data Analytics (ML, DL, AI) hands-onBig Data Analytics (ML, DL, AI) hands-on
Big Data Analytics (ML, DL, AI) hands-on
 
Machine learning from software developers point of view
Machine learning from software developers point of viewMachine learning from software developers point of view
Machine learning from software developers point of view
 
An Incomplete Data Tools Landscape for Hackers in 2015
An Incomplete Data Tools Landscape for Hackers in 2015An Incomplete Data Tools Landscape for Hackers in 2015
An Incomplete Data Tools Landscape for Hackers in 2015
 

More from Travis Oliphant

PyData Barcelona Keynote
PyData Barcelona KeynotePyData Barcelona Keynote
PyData Barcelona KeynoteTravis Oliphant
 
Python for Data Science with Anaconda
Python for Data Science with AnacondaPython for Data Science with Anaconda
Python for Data Science with AnacondaTravis Oliphant
 
Fast and Scalable Python
Fast and Scalable PythonFast and Scalable Python
Fast and Scalable PythonTravis Oliphant
 
Scaling PyData Up and Out
Scaling PyData Up and OutScaling PyData Up and Out
Scaling PyData Up and OutTravis Oliphant
 
Scale up and Scale Out Anaconda and PyData
Scale up and Scale Out Anaconda and PyDataScale up and Scale Out Anaconda and PyData
Scale up and Scale Out Anaconda and PyDataTravis Oliphant
 
Python as the Zen of Data Science
Python as the Zen of Data SciencePython as the Zen of Data Science
Python as the Zen of Data ScienceTravis Oliphant
 
Anaconda and PyData Solutions
Anaconda and PyData SolutionsAnaconda and PyData Solutions
Anaconda and PyData SolutionsTravis Oliphant
 
Continuum Analytics and Python
Continuum Analytics and PythonContinuum Analytics and Python
Continuum Analytics and PythonTravis Oliphant
 
Effectively using Open Source with conda
Effectively using Open Source with condaEffectively using Open Source with conda
Effectively using Open Source with condaTravis Oliphant
 
Blaze: a large-scale, array-oriented infrastructure for Python
Blaze: a large-scale, array-oriented infrastructure for PythonBlaze: a large-scale, array-oriented infrastructure for Python
Blaze: a large-scale, array-oriented infrastructure for PythonTravis Oliphant
 
Numba: Array-oriented Python Compiler for NumPy
Numba: Array-oriented Python Compiler for NumPyNumba: Array-oriented Python Compiler for NumPy
Numba: Array-oriented Python Compiler for NumPyTravis Oliphant
 

More from Travis Oliphant (14)

PyData Barcelona Keynote
PyData Barcelona KeynotePyData Barcelona Keynote
PyData Barcelona Keynote
 
Python for Data Science with Anaconda
Python for Data Science with AnacondaPython for Data Science with Anaconda
Python for Data Science with Anaconda
 
Fast and Scalable Python
Fast and Scalable PythonFast and Scalable Python
Fast and Scalable Python
 
Scaling PyData Up and Out
Scaling PyData Up and OutScaling PyData Up and Out
Scaling PyData Up and Out
 
Scale up and Scale Out Anaconda and PyData
Scale up and Scale Out Anaconda and PyDataScale up and Scale Out Anaconda and PyData
Scale up and Scale Out Anaconda and PyData
 
Python as the Zen of Data Science
Python as the Zen of Data SciencePython as the Zen of Data Science
Python as the Zen of Data Science
 
Anaconda and PyData Solutions
Anaconda and PyData SolutionsAnaconda and PyData Solutions
Anaconda and PyData Solutions
 
Continuum Analytics and Python
Continuum Analytics and PythonContinuum Analytics and Python
Continuum Analytics and Python
 
Bids talk 9.18
Bids talk 9.18Bids talk 9.18
Bids talk 9.18
 
Effectively using Open Source with conda
Effectively using Open Source with condaEffectively using Open Source with conda
Effectively using Open Source with conda
 
Blaze: a large-scale, array-oriented infrastructure for Python
Blaze: a large-scale, array-oriented infrastructure for PythonBlaze: a large-scale, array-oriented infrastructure for Python
Blaze: a large-scale, array-oriented infrastructure for Python
 
Numba: Array-oriented Python Compiler for NumPy
Numba: Array-oriented Python Compiler for NumPyNumba: Array-oriented Python Compiler for NumPy
Numba: Array-oriented Python Compiler for NumPy
 
PyData Introduction
PyData IntroductionPyData Introduction
PyData Introduction
 
Numba
NumbaNumba
Numba
 

Recently uploaded

SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024Scott Keck-Warren
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 
Snow Chain-Integrated Tire for a Safe Drive on Winter Roads
Snow Chain-Integrated Tire for a Safe Drive on Winter RoadsSnow Chain-Integrated Tire for a Safe Drive on Winter Roads
Snow Chain-Integrated Tire for a Safe Drive on Winter RoadsHyundai Motor Group
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure servicePooja Nehwal
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Alan Dix
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsMemoori
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 
Key Features Of Token Development (1).pptx
Key  Features Of Token  Development (1).pptxKey  Features Of Token  Development (1).pptx
Key Features Of Token Development (1).pptxLBM Solutions
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersThousandEyes
 
Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxOnBoard
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Allon Mureinik
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Patryk Bandurski
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsMark Billinghurst
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 

Recently uploaded (20)

SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
The transition to renewables in India.pdf
The transition to renewables in India.pdfThe transition to renewables in India.pdf
The transition to renewables in India.pdf
 
Snow Chain-Integrated Tire for a Safe Drive on Winter Roads
Snow Chain-Integrated Tire for a Safe Drive on Winter RoadsSnow Chain-Integrated Tire for a Safe Drive on Winter Roads
Snow Chain-Integrated Tire for a Safe Drive on Winter Roads
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial Buildings
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
Vulnerability_Management_GRC_by Sohang Sengupta.pptx
Vulnerability_Management_GRC_by Sohang Sengupta.pptxVulnerability_Management_GRC_by Sohang Sengupta.pptx
Vulnerability_Management_GRC_by Sohang Sengupta.pptx
 
Key Features Of Token Development (1).pptx
Key  Features Of Token  Development (1).pptxKey  Features Of Token  Development (1).pptx
Key Features Of Token Development (1).pptx
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
 
Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptx
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
 
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptxE-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR Systems
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 

Keynote at Converge 2019

  • 1. © 2017 Continuum Analytics - Confidential & Proprietary© 2018 Quansight - Confidential & Proprietary Thriving in a Data Driven World travis@quansight.com @quansightai @teoliphant Converge 2019 https://www.quansight.com
  • 2. 1998 20182001 2015 2009 20122005 … 2001 2006 My Python Data and ML/AI Time-Line 1991 2003 2014 2008 2010 2016 2009
  • 3. Starting companies to sustain OSS renamed ~18 million Anaconda users Peter Wang
  • 4. Building new solutions Replaced by Spin Out Spin Out 2012 2018 ? Key members of the management team at Continuum created Quansight. In a real sense NumFOCUS and Anaconda are our first (spin-out) organizations. 2015
  • 5. Build and Connect Companies and Communities to Solve Challenging Problems with Data Continuing my quest to find more ways to pay developers to work on open source!
  • 6. © 2018 Quansight - Confidential & Proprietary 6 Core Business Quansight Labs Support Staffing / Mentoring / Training Custom Data/Viz/ML Consulting Open Source Consulting
  • 7. An early stage venture capital firm investing in startups that build on open-source technology and support the communities they depend on. Bradden Blair supporting FairOSS
  • 8. LABS Sustaining the Future Open-source innovation and maintenance around the entire data- science and AI workflow. • NumPy ecosystem maintenance (PyData Core Team) • Improve connection of NumPy to ML Frameworks • GPU Support for NumPy Ecosystem • Improve foundations of Array computing • JupyterLab and JupyterHub • Data Catalog standards • Packaging (conda-forge, PyPA, etc.) uarray — unified array interface for SciPy refactor xnd — re-factored NumPy (low-level cross-language libraries for N-D (tensor) computing) Collaborating with NumFOCUS and Ursa Labs (supporting Apache Arrow) Bokeh Adapted from Jake Vanderplas PyCon 2017 Keynote
  • 9. My little “side projects” became my life
  • 10. Where I started Started as my graduate student “procrastination project” (as Multipack) in 1998 and became SciPy in 2001 with the help of Eric Jones, Pearu Peterson, and others. 108 releases, 766 contributors Used by: 128,495
  • 11. SciPy “Distribution of Python Numerical Tools masquerading as one Library” Name Description cluster KMeans and Vector Quantization fftpack Discrete Fourier Transform integrate Numerical Integration interpolate Interpolation routines io Data Input and Output linalg Fast Linear algebra misc Utilities ndimage N-dimensional Image processing Name Description odr Orthogonal Distance Regression optimize Constrained and Unconstrained Optimization signal Signal Processing Tools sparse Sparse Matrices and Algebra spatial Spatial Data Structures and Algorithms special Special functions (e.g. Bessel) stats Statistical Functions and Distributions
  • 12. Where it led for me 159 releases, 827 contributors Used by: 254,856 Standard Array/Tensor Library driving Python to be de facto language for Data Science and ML
  • 13. Brief History of NumPy Person Package Year Jim Fulton Matrix Object 1994 Jim Hugunin Numeric 1995 Perry Greenfield, Rick White,Todd Miller Numarray 2001 Travis Oliphant NumPy 2005
  • 14. NumPy was created to unify array objects in Python and unify the early PyData community Numeric Numarray NumPy I essentially sacrificed tenure at a University to write NumPy and unify array objects.
  • 15. Python’s Scientific Ecosystem Bokeh Jake Vanderplas PyCon 2017 Keynote
  • 16. Huge Impact (from diverse efforts of 1000s) LIGO : Gravitional Waves Higgs Boson Discovery Black Hole Imaging
  • 18. Thriving in a Data-Driven World starts with building on the Open Source Software that forms the foundation of Data Science and Machine Learning today. Open Source Ecosystem Your Product/ Project
  • 19. With Quansight, you can actually “influence the direction of the wind” LABS Open-source powered development Community Work Orders let you influence OSS Cooperative Platform for Community Work Orders
  • 20. An effective case study in connecting with open source communities (harnessing and influencing the open-source wind)
  • 21. Quansight and OmniSci funded 10+ open source developers for 1¾ years to connect OmniSci with the Pydata community ! JupyterLab Extensions ! Ibis SQL Framework (OmniSci Backend, geospatial function) ! Altair & VegaLite Visualization (Modernized visualization specifications) ! Conda packages ! User Defined Table Functions with Numba OmniSci Immerse & JupyterLab working interchangably. Python Data Scientists and OmniSci users can work in a unified development environment.
  • 22.
  • 24. OmniSci Engine can be Connected directly to JupyterLab components The data can be used by the entire PyData ecosystem All the open-source deliciousness can be re- used As the community tools get better, OmniSci users benefit automatically!
  • 25.
  • 26. Compile (Numerical) Python to Native code for CPU and GPU an open source JIT compiler that translates a subset of Python and NumPy code into fast machine code. http://numba.pydata.org
  • 27. Omnisci Table User-defined Table Functions Result Set rbc (Remote-Backend Compiler) SQL Engine https://github.com/xnd-project/rbcXNDhttps://xnd.io
  • 28.
  • 29. Problem Open Source Teams ! Burned out ! Underrepresented ! Underpaid Organizations ! Disconnected from the Community ! Lack support and maintenance There’s no easy way to connect the community with organizations
  • 30. Solution A marketplace where companies can cooperatively fund progress and maintenance for projects and technology that affects them Organizations Save money & Reduce risk Teams Improve project health Copyright OpenTeams 2019. All rights reserved.
  • 31. Projects develop their roadmaps Copyright OpenTeams 2019. All rights reserved. Product
  • 32. Organizations find and fund projects they depend on Product
  • 33. Companies hire from the communityProduct
  • 34. Initiatives are Progress or Maintenance with an accountable organization committed to finalizing and following-up to do the work using open-source devs. The platform enables easy signaling and cooperation between many potential funders and organized open- source groups. Alpha Feature!
  • 35. Several Deep Learning Libraries to choose Built on NumPy/SciPy Recommended Recommended
  • 36. Key Features Needed for any ML Library • Ability to create chains of functions on n-dimensional arrays • Ability to derive the derivative of the Loss-Function quickly (Automatic Differentiation) • Key Loss Functions implemented • Cross-validation methods • An Optimization library with several useful methods • Ability to compute functions on n-dimensional arrays on multiple hardware with highly parallel-execution • Ability to create chains of functions on n-dimensional arrays • Ability to compute functions on n-dimensional arrays on multiple hardware For Training For Inference Missing from NumPy / SciPy and Scikit-Learn
  • 37. Most Libraries (other than Chainer) chose to re-implement NumPy and SciPy as they needed. • Started with a legacy code in another language • Had to work with other languages too (Node, Java, C++, Lua, etc.) • Needed only a subset of functionality of NumPy / SciPy to build ML • Needed GPU support • Lacked familiarity with the NumPy / SciPy communities and how to engage with them Reasons: Result: Many competing similar choices for Deep Learning
  • 39. Now array-like objects everywhere Sparse Arrays Neon CUDArray
  • 40. NumPy was created to unify array objects in Python and unify the early PyData community Numeric Numarray NumPy I essentially sacrificed tenure at a University to write NumPy and unify array objects.
  • 41. We have a “divided” community again! Numeric Numarray NumPy
  • 42. Python’s Scientific Ecosystem Bokeh Jake Vanderplas PyCon 2017 Keynote
  • 43. Examples of packages being built on fragmented APIS FastAI skorch Pyro Eduard anyrl Braid PyMC4 Horovod MLFlow But note
  • 44. Real Problem — Funding for Community Devs Full-time: 2 Full-time: .5 Full-time: 1 Open Source is too important to be just left to volunteer time — current situation is not working to sustain millions of users: • No funding for creators of these libraries to continue their work • GPU support could have been added to NumPy years ago • SciPy took 17 years to hit 1.0 • NumPy should already be at 2.0 — but not without full-time guidance Full-time: 2 Full-time: 0
  • 45. Solution A marketplace where companies can cooperatively fund progress and maintenance for projects and technology that affects them Organizations Save money & Reduce risk Teams Improve project health Copyright OpenTeams 2019. All rights reserved.
  • 46. Initiatives are Progress or Maintenance with an accountable organization committed to finalizing and following-up to do the work using open-source devs. The platform enables easy signaling and cooperation between many potential funders and organized open- source groups.
  • 47. Community proposal — gathering support
  • 48. High Level APIs for Arrays (Tensors), DataFrames, and DataTypes LABS
  • 49. OpenTensors • Community-driven and governed with many companies and contributors (project managed by Quansight Labs) • Addition of standardized automatic differentiation, graph-construction (lazy mode), addition of GPUs, and sparse arrays • Use for Deep Learning but all the other uses of PyData/NumFOCUS ecosystem Provide a community-sponsored and backed future!
  • 50. Join Us! • Solidifying commitments for at least $6million over 3 years ($2million / year) (need <33% from any one company). • Register support for the initiative on openteams.com • Email rgommers@quansight.com, travis@quansight.com or matt@quansight.com • Tweet to @quansightai or @openteamsinc • Get in touch to ensure your needs are included in the initial deliverables
  • 51. How to Thrive In a Data-Driven World? Open Source Contributors of the Projects you depend on!