© 2017 Anaconda, Inc. - Confidential & Proprietary
GPU Computing with Python and
Anaconda: The Next Frontier
Accelerate. Connect. Empower.
Stan Seibert
Director of Community Innovation
© 2017 Anaconda, Inc. - Confidential & Proprietary 2
GPUs & Python: A Great Combination
• Python is becoming the glue that binds data
science
• Rapid integration empowers data scientists to
combine new technologies
• This is our goal for Anaconda:
• Free distribution of Python and R for
Win/Mac/Linux
• Includes GPU-accelerated packages:
Caffe, TensorFlow, PyTorch, Theano,
Numba, Pyculib...
© 2017 Anaconda, Inc. - Confidential & Proprietary 3
ReLU
ReLU
ReLU
ReLU
Deep Learning: An Early Success
• Powerful machine learning
technique
• Many great open source options
• Every major package has a Python
interface
• Very compute intensive
➡Perfect for GPU acceleration
© 2017 Anaconda, Inc. - Confidential & Proprietary 4
• Compile numerical
Python functions for
CPU or GPU
• Based on the LLVM
compiler library
• Great for rapid,
custom algorithm
development
Numba: JIT Python Compilation
© 2017 Anaconda, Inc. - Confidential & Proprietary
Problem: An Ecosystem of Silos?
GPU
ETL/Data
Prep
Database
Machine
Learning
Visualization
Data
Data Data
Data
© 2017 Anaconda, Inc. - Confidential & Proprietary
Problem: An Ecosystem of Silos?
GPU
ETL/Data
Prep
Database
Machine
Learning
Visualization
Data
Data Data
Data
CPU transfer
CPU transferCPU transfer
© 2017 Anaconda, Inc. - Confidential & Proprietary
Problem: An Ecosystem of Silos?
GPU
ETL/Data
Prep
Database
Machine
Learning
Visualization
Data
Data Data
Data
CPU transfer
CPU transferCPU transfer Why do GPU applications share
data through slow CPU memory?
© 2017 Anaconda, Inc. - Confidential & Proprietary
GPU Open Analytics Initiative
Goal:
Standardize data exchange between
GPU analytics applications
Current Members:
MapD, Anaconda, H2O.ai,
BlazingDB, Graphistry, Gunrock
http://gpuopenanalytics.com/
© 2017 Anaconda, Inc. - Confidential & Proprietary 9
Streamlining the Data Science Pipeline
GPU Database
Python Data
Transformation
Generalized
Linear Model
All data stays on the GPU
GDF
Packed
Array
Apache
Arrow
© 2017 Anaconda, Inc. - Confidential & Proprietary 10
• A format for tabular data in GPU memory
• Exchange GDF between different libraries
• Move between processes using CUDA IPC
• Based on Apache Arrow
• Code in separate library
• Work in progress to move functionality
into Arrow project
GPU Dataframe (GDF)
© 2017 Anaconda, Inc. - Confidential & Proprietary 11
• A Python library of manipulating GPU Dataframes:
• Create from NumPy arrays and Pandas Dataframes
• Exchange between processes
• Math operations
• Sort, Filter, Join, Group By
• Ideal for data manipulation and feature engineering stages between
data source and machine learning
• Not intended to replace dedicated database applications
• Interoperates with our Python compiler for GPU: Numba
PyGDF: Python GPU Dataframes
© 2017 Anaconda, Inc. - Confidential & Proprietary 12
PyGDF: Group By Performance
GPU speedup become
very large above 10
million elements
Aggregation functions
are extremely efficient
on the GPU
© 2017 Anaconda, Inc. - Confidential & Proprietary 13
• Scalable execution task graphs of task graphs from single
computers to 1000+ node clusters
• Scheduler is "resource aware" and can direct GPU tasks to nodes
with appropriate hardware. Great for heterogeneous clusters!
Dask: Distributed Computing
© 2017 Anaconda, Inc. - Confidential & Proprietary 14
The Future
• In flight:
• Merger of common code into Apache Arrow GPU support
• Node.js interface to GDF (Graphistry)
• Dask GDF: Distributed GPU dataframe
• Other potential future projects:
• Tensor exchange between Python GPU libraries
• GPU shared memory service (Plasma for GPU)
• Can we improve the interaction of unified memory and IPC?
• What do you want to see?
© 2017 Anaconda, Inc. - Confidential & Proprietary
Learn More
GPU Open Analytics Website
http://gpuopenanalytics.com
GOAI Github Organization
https://github.com/gpuopenanalytics/
GOAI Google Group
https://groups.google.com/forum/#!forum/gpuopenanalytics

GPU Computing with Python and Anaconda: The Next Frontier

  • 1.
    © 2017 Anaconda,Inc. - Confidential & Proprietary GPU Computing with Python and Anaconda: The Next Frontier Accelerate. Connect. Empower. Stan Seibert Director of Community Innovation
  • 2.
    © 2017 Anaconda,Inc. - Confidential & Proprietary 2 GPUs & Python: A Great Combination • Python is becoming the glue that binds data science • Rapid integration empowers data scientists to combine new technologies • This is our goal for Anaconda: • Free distribution of Python and R for Win/Mac/Linux • Includes GPU-accelerated packages: Caffe, TensorFlow, PyTorch, Theano, Numba, Pyculib...
  • 3.
    © 2017 Anaconda,Inc. - Confidential & Proprietary 3 ReLU ReLU ReLU ReLU Deep Learning: An Early Success • Powerful machine learning technique • Many great open source options • Every major package has a Python interface • Very compute intensive ➡Perfect for GPU acceleration
  • 4.
    © 2017 Anaconda,Inc. - Confidential & Proprietary 4 • Compile numerical Python functions for CPU or GPU • Based on the LLVM compiler library • Great for rapid, custom algorithm development Numba: JIT Python Compilation
  • 5.
    © 2017 Anaconda,Inc. - Confidential & Proprietary Problem: An Ecosystem of Silos? GPU ETL/Data Prep Database Machine Learning Visualization Data Data Data Data
  • 6.
    © 2017 Anaconda,Inc. - Confidential & Proprietary Problem: An Ecosystem of Silos? GPU ETL/Data Prep Database Machine Learning Visualization Data Data Data Data CPU transfer CPU transferCPU transfer
  • 7.
    © 2017 Anaconda,Inc. - Confidential & Proprietary Problem: An Ecosystem of Silos? GPU ETL/Data Prep Database Machine Learning Visualization Data Data Data Data CPU transfer CPU transferCPU transfer Why do GPU applications share data through slow CPU memory?
  • 8.
    © 2017 Anaconda,Inc. - Confidential & Proprietary GPU Open Analytics Initiative Goal: Standardize data exchange between GPU analytics applications Current Members: MapD, Anaconda, H2O.ai, BlazingDB, Graphistry, Gunrock http://gpuopenanalytics.com/
  • 9.
    © 2017 Anaconda,Inc. - Confidential & Proprietary 9 Streamlining the Data Science Pipeline GPU Database Python Data Transformation Generalized Linear Model All data stays on the GPU GDF Packed Array Apache Arrow
  • 10.
    © 2017 Anaconda,Inc. - Confidential & Proprietary 10 • A format for tabular data in GPU memory • Exchange GDF between different libraries • Move between processes using CUDA IPC • Based on Apache Arrow • Code in separate library • Work in progress to move functionality into Arrow project GPU Dataframe (GDF)
  • 11.
    © 2017 Anaconda,Inc. - Confidential & Proprietary 11 • A Python library of manipulating GPU Dataframes: • Create from NumPy arrays and Pandas Dataframes • Exchange between processes • Math operations • Sort, Filter, Join, Group By • Ideal for data manipulation and feature engineering stages between data source and machine learning • Not intended to replace dedicated database applications • Interoperates with our Python compiler for GPU: Numba PyGDF: Python GPU Dataframes
  • 12.
    © 2017 Anaconda,Inc. - Confidential & Proprietary 12 PyGDF: Group By Performance GPU speedup become very large above 10 million elements Aggregation functions are extremely efficient on the GPU
  • 13.
    © 2017 Anaconda,Inc. - Confidential & Proprietary 13 • Scalable execution task graphs of task graphs from single computers to 1000+ node clusters • Scheduler is "resource aware" and can direct GPU tasks to nodes with appropriate hardware. Great for heterogeneous clusters! Dask: Distributed Computing
  • 14.
    © 2017 Anaconda,Inc. - Confidential & Proprietary 14 The Future • In flight: • Merger of common code into Apache Arrow GPU support • Node.js interface to GDF (Graphistry) • Dask GDF: Distributed GPU dataframe • Other potential future projects: • Tensor exchange between Python GPU libraries • GPU shared memory service (Plasma for GPU) • Can we improve the interaction of unified memory and IPC? • What do you want to see?
  • 15.
    © 2017 Anaconda,Inc. - Confidential & Proprietary Learn More GPU Open Analytics Website http://gpuopenanalytics.com GOAI Github Organization https://github.com/gpuopenanalytics/ GOAI Google Group https://groups.google.com/forum/#!forum/gpuopenanalytics