SlideShare a Scribd company logo
© 2017 Anaconda, Inc. - Confidential & Proprietary
GPU Computing with Python and
Anaconda: The Next Frontier
Accelerate. Connect. Empower.
Stan Seibert
Director of Community Innovation
© 2017 Anaconda, Inc. - Confidential & Proprietary 2
GPUs & Python: A Great Combination
• Python is becoming the glue that binds data
science
• Rapid integration empowers data scientists to
combine new technologies
• This is our goal for Anaconda:
• Free distribution of Python and R for
Win/Mac/Linux
• Includes GPU-accelerated packages:
Caffe, TensorFlow, PyTorch, Theano,
Numba, Pyculib...
© 2017 Anaconda, Inc. - Confidential & Proprietary 3
ReLU
ReLU
ReLU
ReLU
Deep Learning: An Early Success
• Powerful machine learning
technique
• Many great open source options
• Every major package has a Python
interface
• Very compute intensive
➡Perfect for GPU acceleration
© 2017 Anaconda, Inc. - Confidential & Proprietary 4
• Compile numerical
Python functions for
CPU or GPU
• Based on the LLVM
compiler library
• Great for rapid,
custom algorithm
development
Numba: JIT Python Compilation
© 2017 Anaconda, Inc. - Confidential & Proprietary
Problem: An Ecosystem of Silos?
GPU
ETL/Data
Prep
Database
Machine
Learning
Visualization
Data
Data Data
Data
© 2017 Anaconda, Inc. - Confidential & Proprietary
Problem: An Ecosystem of Silos?
GPU
ETL/Data
Prep
Database
Machine
Learning
Visualization
Data
Data Data
Data
CPU transfer
CPU transferCPU transfer
© 2017 Anaconda, Inc. - Confidential & Proprietary
Problem: An Ecosystem of Silos?
GPU
ETL/Data
Prep
Database
Machine
Learning
Visualization
Data
Data Data
Data
CPU transfer
CPU transferCPU transfer Why do GPU applications share
data through slow CPU memory?
© 2017 Anaconda, Inc. - Confidential & Proprietary
GPU Open Analytics Initiative
Goal:
Standardize data exchange between
GPU analytics applications
Current Members:
MapD, Anaconda, H2O.ai,
BlazingDB, Graphistry, Gunrock
http://gpuopenanalytics.com/
© 2017 Anaconda, Inc. - Confidential & Proprietary 9
Streamlining the Data Science Pipeline
GPU Database
Python Data
Transformation
Generalized
Linear Model
All data stays on the GPU
GDF
Packed
Array
Apache
Arrow
© 2017 Anaconda, Inc. - Confidential & Proprietary 10
• A format for tabular data in GPU memory
• Exchange GDF between different libraries
• Move between processes using CUDA IPC
• Based on Apache Arrow
• Code in separate library
• Work in progress to move functionality
into Arrow project
GPU Dataframe (GDF)
© 2017 Anaconda, Inc. - Confidential & Proprietary 11
• A Python library of manipulating GPU Dataframes:
• Create from NumPy arrays and Pandas Dataframes
• Exchange between processes
• Math operations
• Sort, Filter, Join, Group By
• Ideal for data manipulation and feature engineering stages between
data source and machine learning
• Not intended to replace dedicated database applications
• Interoperates with our Python compiler for GPU: Numba
PyGDF: Python GPU Dataframes
© 2017 Anaconda, Inc. - Confidential & Proprietary 12
PyGDF: Group By Performance
GPU speedup become
very large above 10
million elements
Aggregation functions
are extremely efficient
on the GPU
© 2017 Anaconda, Inc. - Confidential & Proprietary 13
• Scalable execution task graphs of task graphs from single
computers to 1000+ node clusters
• Scheduler is "resource aware" and can direct GPU tasks to nodes
with appropriate hardware. Great for heterogeneous clusters!
Dask: Distributed Computing
© 2017 Anaconda, Inc. - Confidential & Proprietary 14
The Future
• In flight:
• Merger of common code into Apache Arrow GPU support
• Node.js interface to GDF (Graphistry)
• Dask GDF: Distributed GPU dataframe
• Other potential future projects:
• Tensor exchange between Python GPU libraries
• GPU shared memory service (Plasma for GPU)
• Can we improve the interaction of unified memory and IPC?
• What do you want to see?
© 2017 Anaconda, Inc. - Confidential & Proprietary
Learn More
GPU Open Analytics Website
http://gpuopenanalytics.com
GOAI Github Organization
https://github.com/gpuopenanalytics/
GOAI Google Group
https://groups.google.com/forum/#!forum/gpuopenanalytics

More Related Content

What's hot

What's hot (20)

DGX Sessions You Won't Want to Miss at GTC 2019
DGX Sessions You Won't Want to Miss at GTC 2019DGX Sessions You Won't Want to Miss at GTC 2019
DGX Sessions You Won't Want to Miss at GTC 2019
 
CUDA Sessions You Won't Want to Miss at GTC 2019
CUDA Sessions You Won't Want to Miss at GTC 2019CUDA Sessions You Won't Want to Miss at GTC 2019
CUDA Sessions You Won't Want to Miss at GTC 2019
 
OpenACC Monthly Highlights February 2019
OpenACC Monthly Highlights February 2019OpenACC Monthly Highlights February 2019
OpenACC Monthly Highlights February 2019
 
GTC 2017: Powering the AI Revolution
GTC 2017: Powering the AI RevolutionGTC 2017: Powering the AI Revolution
GTC 2017: Powering the AI Revolution
 
OpenACC Month Highlights- October
OpenACC Month Highlights- OctoberOpenACC Month Highlights- October
OpenACC Month Highlights- October
 
Enabling Artificial Intelligence - Alison B. Lowndes
Enabling Artificial Intelligence - Alison B. LowndesEnabling Artificial Intelligence - Alison B. Lowndes
Enabling Artificial Intelligence - Alison B. Lowndes
 
EPSRC CDT Conference
EPSRC CDT ConferenceEPSRC CDT Conference
EPSRC CDT Conference
 
GTC 2018: A New AI Era Dawns
GTC 2018: A New AI Era DawnsGTC 2018: A New AI Era Dawns
GTC 2018: A New AI Era Dawns
 
組み込みから HPC まで ARM コアで実現するエコシステム
組み込みから HPC まで ARM コアで実現するエコシステム組み込みから HPC まで ARM コアで実現するエコシステム
組み込みから HPC まで ARM コアで実現するエコシステム
 
OpenACC Monthly Highlights: February 2021
OpenACC Monthly Highlights: February 2021OpenACC Monthly Highlights: February 2021
OpenACC Monthly Highlights: February 2021
 
Tesla Accelerated Computing Platform
Tesla Accelerated Computing PlatformTesla Accelerated Computing Platform
Tesla Accelerated Computing Platform
 
Accelerated Computing: The Path Forward
Accelerated Computing: The Path ForwardAccelerated Computing: The Path Forward
Accelerated Computing: The Path Forward
 
Nvidia SC16: The Greatest Challenges Can't Wait
Nvidia SC16: The Greatest Challenges Can't WaitNvidia SC16: The Greatest Challenges Can't Wait
Nvidia SC16: The Greatest Challenges Can't Wait
 
Top 5 Data Science Sessions from GTC 2019
Top 5 Data Science Sessions from GTC 2019Top 5 Data Science Sessions from GTC 2019
Top 5 Data Science Sessions from GTC 2019
 
Building the World's Largest GPU
Building the World's Largest GPUBuilding the World's Largest GPU
Building the World's Largest GPU
 
Shattering AI Performance Records
Shattering AI Performance RecordsShattering AI Performance Records
Shattering AI Performance Records
 
OpenACC Monthly Highlights April 2017
OpenACC Monthly Highlights  April 2017OpenACC Monthly Highlights  April 2017
OpenACC Monthly Highlights April 2017
 
HPE and NVIDIA empowering AI and IoT
HPE and NVIDIA empowering AI and IoTHPE and NVIDIA empowering AI and IoT
HPE and NVIDIA empowering AI and IoT
 
OpenACC Monthly Highlights: November 2020
OpenACC Monthly Highlights: November 2020OpenACC Monthly Highlights: November 2020
OpenACC Monthly Highlights: November 2020
 
Dell and NVIDIA for Your AI workloads in the Data Center
Dell and NVIDIA for Your AI workloads in the Data CenterDell and NVIDIA for Your AI workloads in the Data Center
Dell and NVIDIA for Your AI workloads in the Data Center
 

Similar to GPU Computing with Python and Anaconda: The Next Frontier

Similar to GPU Computing with Python and Anaconda: The Next Frontier (20)

Luciano Resende - Scaling Big Data Interactive Workloads across Kubernetes Cl...
Luciano Resende - Scaling Big Data Interactive Workloads across Kubernetes Cl...Luciano Resende - Scaling Big Data Interactive Workloads across Kubernetes Cl...
Luciano Resende - Scaling Big Data Interactive Workloads across Kubernetes Cl...
 
High Performance Python on Apache Spark
High Performance Python on Apache SparkHigh Performance Python on Apache Spark
High Performance Python on Apache Spark
 
High-Performance Python On Spark
High-Performance Python On SparkHigh-Performance Python On Spark
High-Performance Python On Spark
 
PyData Boston 2013
PyData Boston 2013PyData Boston 2013
PyData Boston 2013
 
GPU Computing With Apache Spark And Python
GPU Computing With Apache Spark And PythonGPU Computing With Apache Spark And Python
GPU Computing With Apache Spark And Python
 
Chicago Data Summit: Keynote - Data Processing with Hadoop: Scalable and Cost...
Chicago Data Summit: Keynote - Data Processing with Hadoop: Scalable and Cost...Chicago Data Summit: Keynote - Data Processing with Hadoop: Scalable and Cost...
Chicago Data Summit: Keynote - Data Processing with Hadoop: Scalable and Cost...
 
Simplifying AI integration on Apache Spark
Simplifying AI integration on Apache SparkSimplifying AI integration on Apache Spark
Simplifying AI integration on Apache Spark
 
Enabling Python to be a Better Big Data Citizen
Enabling Python to be a Better Big Data CitizenEnabling Python to be a Better Big Data Citizen
Enabling Python to be a Better Big Data Citizen
 
Scaling notebooks for Deep Learning workloads
Scaling notebooks for Deep Learning workloadsScaling notebooks for Deep Learning workloads
Scaling notebooks for Deep Learning workloads
 
Next-generation Python Big Data Tools, powered by Apache Arrow
Next-generation Python Big Data Tools, powered by Apache ArrowNext-generation Python Big Data Tools, powered by Apache Arrow
Next-generation Python Big Data Tools, powered by Apache Arrow
 
Dask: Scaling Python
Dask: Scaling PythonDask: Scaling Python
Dask: Scaling Python
 
Apache Arrow at DataEngConf Barcelona 2018
Apache Arrow at DataEngConf Barcelona 2018Apache Arrow at DataEngConf Barcelona 2018
Apache Arrow at DataEngConf Barcelona 2018
 
Data Analytics and Machine Learning: From Node to Cluster on ARM64
Data Analytics and Machine Learning: From Node to Cluster on ARM64Data Analytics and Machine Learning: From Node to Cluster on ARM64
Data Analytics and Machine Learning: From Node to Cluster on ARM64
 
BKK16-404B Data Analytics and Machine Learning- from Node to Cluster
BKK16-404B Data Analytics and Machine Learning- from Node to ClusterBKK16-404B Data Analytics and Machine Learning- from Node to Cluster
BKK16-404B Data Analytics and Machine Learning- from Node to Cluster
 
BKK16-408B Data Analytics and Machine Learning From Node to Cluster
BKK16-408B Data Analytics and Machine Learning From Node to ClusterBKK16-408B Data Analytics and Machine Learning From Node to Cluster
BKK16-408B Data Analytics and Machine Learning From Node to Cluster
 
.NET per la Data Science e oltre
.NET per la Data Science e oltre.NET per la Data Science e oltre
.NET per la Data Science e oltre
 
Don't Repeat Our Mistakes! Lessons Learned from Running Go Daddy's Private Cl...
Don't Repeat Our Mistakes! Lessons Learned from Running Go Daddy's Private Cl...Don't Repeat Our Mistakes! Lessons Learned from Running Go Daddy's Private Cl...
Don't Repeat Our Mistakes! Lessons Learned from Running Go Daddy's Private Cl...
 
An Incomplete Data Tools Landscape for Hackers in 2015
An Incomplete Data Tools Landscape for Hackers in 2015An Incomplete Data Tools Landscape for Hackers in 2015
An Incomplete Data Tools Landscape for Hackers in 2015
 
PyData: The Next Generation | Data Day Texas 2015
PyData: The Next Generation | Data Day Texas 2015PyData: The Next Generation | Data Day Texas 2015
PyData: The Next Generation | Data Day Texas 2015
 
Ibis: operating the Python data ecosystem at Hadoop scale by Wes McKinney
Ibis: operating the Python data ecosystem at Hadoop scale by Wes McKinneyIbis: operating the Python data ecosystem at Hadoop scale by Wes McKinney
Ibis: operating the Python data ecosystem at Hadoop scale by Wes McKinney
 

More from NVIDIA

NVIDIA GTC 2020 October Summary
NVIDIA GTC 2020 October SummaryNVIDIA GTC 2020 October Summary
NVIDIA GTC 2020 October Summary
NVIDIA
 

More from NVIDIA (20)

NVIDIA Story 2023.pdf
NVIDIA Story 2023.pdfNVIDIA Story 2023.pdf
NVIDIA Story 2023.pdf
 
NVIDIA GTC2022 Spring Highlights
NVIDIA GTC2022 Spring HighlightsNVIDIA GTC2022 Spring Highlights
NVIDIA GTC2022 Spring Highlights
 
NVIDIA Brochure 2021 Company Overview
NVIDIA Brochure 2021 Company OverviewNVIDIA Brochure 2021 Company Overview
NVIDIA Brochure 2021 Company Overview
 
NVIDIA GTC 2020 October Summary
NVIDIA GTC 2020 October SummaryNVIDIA GTC 2020 October Summary
NVIDIA GTC 2020 October Summary
 
The Best of AI and HPC in Healthcare and Life Sciences
The Best of AI and HPC in Healthcare and Life SciencesThe Best of AI and HPC in Healthcare and Life Sciences
The Best of AI and HPC in Healthcare and Life Sciences
 
NLP for Biomedical Applications
NLP for Biomedical ApplicationsNLP for Biomedical Applications
NLP for Biomedical Applications
 
Top 5 Deep Learning and AI Stories - August 30, 2019
Top 5 Deep Learning and AI Stories - August 30, 2019Top 5 Deep Learning and AI Stories - August 30, 2019
Top 5 Deep Learning and AI Stories - August 30, 2019
 
Seven Ways to Boost Artificial Intelligence Research
Seven Ways to Boost Artificial Intelligence ResearchSeven Ways to Boost Artificial Intelligence Research
Seven Ways to Boost Artificial Intelligence Research
 
NVIDIA Developer Program Overview
NVIDIA Developer Program OverviewNVIDIA Developer Program Overview
NVIDIA Developer Program Overview
 
NVIDIA at Computex 2019
NVIDIA at Computex 2019 NVIDIA at Computex 2019
NVIDIA at Computex 2019
 
Top 5 DGX Sessions From GTC 2019
Top 5 DGX Sessions From GTC 2019Top 5 DGX Sessions From GTC 2019
Top 5 DGX Sessions From GTC 2019
 
DGX POD Top 4 Sessions From GTC 2019
DGX POD Top 4 Sessions From GTC 2019DGX POD Top 4 Sessions From GTC 2019
DGX POD Top 4 Sessions From GTC 2019
 
This Week in Data Science - Top 5 News - April 26, 2019
This Week in Data Science - Top 5 News - April 26, 2019This Week in Data Science - Top 5 News - April 26, 2019
This Week in Data Science - Top 5 News - April 26, 2019
 
GTC 2019 Keynote in Silicon Valley
GTC 2019 Keynote in Silicon ValleyGTC 2019 Keynote in Silicon Valley
GTC 2019 Keynote in Silicon Valley
 
CUDA DLI Training Courses at GTC 2019
CUDA DLI Training Courses at GTC 2019CUDA DLI Training Courses at GTC 2019
CUDA DLI Training Courses at GTC 2019
 
Transforming Healthcare at GTC Silicon Valley
Transforming Healthcare at GTC Silicon ValleyTransforming Healthcare at GTC Silicon Valley
Transforming Healthcare at GTC Silicon Valley
 
Empowering Radiology with AI
Empowering Radiology with AIEmpowering Radiology with AI
Empowering Radiology with AI
 
Top 5 Deep Learning and AI Stories - November 30, 2018
Top 5 Deep Learning and AI Stories - November 30, 2018Top 5 Deep Learning and AI Stories - November 30, 2018
Top 5 Deep Learning and AI Stories - November 30, 2018
 
Top 5 AI and Deep Learning Stories - November 9, 2018
Top 5 AI and Deep Learning Stories - November 9, 2018Top 5 AI and Deep Learning Stories - November 9, 2018
Top 5 AI and Deep Learning Stories - November 9, 2018
 
Key Healthcare Takeaways from GTC in October
Key Healthcare Takeaways from GTC in OctoberKey Healthcare Takeaways from GTC in October
Key Healthcare Takeaways from GTC in October
 

Recently uploaded

Recently uploaded (20)

"Impact of front-end architecture on development cost", Viktor Turskyi
"Impact of front-end architecture on development cost", Viktor Turskyi"Impact of front-end architecture on development cost", Viktor Turskyi
"Impact of front-end architecture on development cost", Viktor Turskyi
 
In-Depth Performance Testing Guide for IT Professionals
In-Depth Performance Testing Guide for IT ProfessionalsIn-Depth Performance Testing Guide for IT Professionals
In-Depth Performance Testing Guide for IT Professionals
 
Powerful Start- the Key to Project Success, Barbara Laskowska
Powerful Start- the Key to Project Success, Barbara LaskowskaPowerful Start- the Key to Project Success, Barbara Laskowska
Powerful Start- the Key to Project Success, Barbara Laskowska
 
Mission to Decommission: Importance of Decommissioning Products to Increase E...
Mission to Decommission: Importance of Decommissioning Products to Increase E...Mission to Decommission: Importance of Decommissioning Products to Increase E...
Mission to Decommission: Importance of Decommissioning Products to Increase E...
 
AI revolution and Salesforce, Jiří Karpíšek
AI revolution and Salesforce, Jiří KarpíšekAI revolution and Salesforce, Jiří Karpíšek
AI revolution and Salesforce, Jiří Karpíšek
 
ODC, Data Fabric and Architecture User Group
ODC, Data Fabric and Architecture User GroupODC, Data Fabric and Architecture User Group
ODC, Data Fabric and Architecture User Group
 
Key Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdfKey Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdf
 
UiPath Test Automation using UiPath Test Suite series, part 2
UiPath Test Automation using UiPath Test Suite series, part 2UiPath Test Automation using UiPath Test Suite series, part 2
UiPath Test Automation using UiPath Test Suite series, part 2
 
Demystifying gRPC in .Net by John Staveley
Demystifying gRPC in .Net by John StaveleyDemystifying gRPC in .Net by John Staveley
Demystifying gRPC in .Net by John Staveley
 
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
 
To Graph or Not to Graph Knowledge Graph Architectures and LLMs
To Graph or Not to Graph Knowledge Graph Architectures and LLMsTo Graph or Not to Graph Knowledge Graph Architectures and LLMs
To Graph or Not to Graph Knowledge Graph Architectures and LLMs
 
Exploring UiPath Orchestrator API: updates and limits in 2024 🚀
Exploring UiPath Orchestrator API: updates and limits in 2024 🚀Exploring UiPath Orchestrator API: updates and limits in 2024 🚀
Exploring UiPath Orchestrator API: updates and limits in 2024 🚀
 
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered QualitySoftware Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
 
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
 
Behind the Scenes From the Manager's Chair: Decoding the Secrets of Successfu...
Behind the Scenes From the Manager's Chair: Decoding the Secrets of Successfu...Behind the Scenes From the Manager's Chair: Decoding the Secrets of Successfu...
Behind the Scenes From the Manager's Chair: Decoding the Secrets of Successfu...
 
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
 
UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3
 
Optimizing NoSQL Performance Through Observability
Optimizing NoSQL Performance Through ObservabilityOptimizing NoSQL Performance Through Observability
Optimizing NoSQL Performance Through Observability
 
Designing Great Products: The Power of Design and Leadership by Chief Designe...
Designing Great Products: The Power of Design and Leadership by Chief Designe...Designing Great Products: The Power of Design and Leadership by Chief Designe...
Designing Great Products: The Power of Design and Leadership by Chief Designe...
 
When stars align: studies in data quality, knowledge graphs, and machine lear...
When stars align: studies in data quality, knowledge graphs, and machine lear...When stars align: studies in data quality, knowledge graphs, and machine lear...
When stars align: studies in data quality, knowledge graphs, and machine lear...
 

GPU Computing with Python and Anaconda: The Next Frontier

  • 1. © 2017 Anaconda, Inc. - Confidential & Proprietary GPU Computing with Python and Anaconda: The Next Frontier Accelerate. Connect. Empower. Stan Seibert Director of Community Innovation
  • 2. © 2017 Anaconda, Inc. - Confidential & Proprietary 2 GPUs & Python: A Great Combination • Python is becoming the glue that binds data science • Rapid integration empowers data scientists to combine new technologies • This is our goal for Anaconda: • Free distribution of Python and R for Win/Mac/Linux • Includes GPU-accelerated packages: Caffe, TensorFlow, PyTorch, Theano, Numba, Pyculib...
  • 3. © 2017 Anaconda, Inc. - Confidential & Proprietary 3 ReLU ReLU ReLU ReLU Deep Learning: An Early Success • Powerful machine learning technique • Many great open source options • Every major package has a Python interface • Very compute intensive ➡Perfect for GPU acceleration
  • 4. © 2017 Anaconda, Inc. - Confidential & Proprietary 4 • Compile numerical Python functions for CPU or GPU • Based on the LLVM compiler library • Great for rapid, custom algorithm development Numba: JIT Python Compilation
  • 5. © 2017 Anaconda, Inc. - Confidential & Proprietary Problem: An Ecosystem of Silos? GPU ETL/Data Prep Database Machine Learning Visualization Data Data Data Data
  • 6. © 2017 Anaconda, Inc. - Confidential & Proprietary Problem: An Ecosystem of Silos? GPU ETL/Data Prep Database Machine Learning Visualization Data Data Data Data CPU transfer CPU transferCPU transfer
  • 7. © 2017 Anaconda, Inc. - Confidential & Proprietary Problem: An Ecosystem of Silos? GPU ETL/Data Prep Database Machine Learning Visualization Data Data Data Data CPU transfer CPU transferCPU transfer Why do GPU applications share data through slow CPU memory?
  • 8. © 2017 Anaconda, Inc. - Confidential & Proprietary GPU Open Analytics Initiative Goal: Standardize data exchange between GPU analytics applications Current Members: MapD, Anaconda, H2O.ai, BlazingDB, Graphistry, Gunrock http://gpuopenanalytics.com/
  • 9. © 2017 Anaconda, Inc. - Confidential & Proprietary 9 Streamlining the Data Science Pipeline GPU Database Python Data Transformation Generalized Linear Model All data stays on the GPU GDF Packed Array Apache Arrow
  • 10. © 2017 Anaconda, Inc. - Confidential & Proprietary 10 • A format for tabular data in GPU memory • Exchange GDF between different libraries • Move between processes using CUDA IPC • Based on Apache Arrow • Code in separate library • Work in progress to move functionality into Arrow project GPU Dataframe (GDF)
  • 11. © 2017 Anaconda, Inc. - Confidential & Proprietary 11 • A Python library of manipulating GPU Dataframes: • Create from NumPy arrays and Pandas Dataframes • Exchange between processes • Math operations • Sort, Filter, Join, Group By • Ideal for data manipulation and feature engineering stages between data source and machine learning • Not intended to replace dedicated database applications • Interoperates with our Python compiler for GPU: Numba PyGDF: Python GPU Dataframes
  • 12. © 2017 Anaconda, Inc. - Confidential & Proprietary 12 PyGDF: Group By Performance GPU speedup become very large above 10 million elements Aggregation functions are extremely efficient on the GPU
  • 13. © 2017 Anaconda, Inc. - Confidential & Proprietary 13 • Scalable execution task graphs of task graphs from single computers to 1000+ node clusters • Scheduler is "resource aware" and can direct GPU tasks to nodes with appropriate hardware. Great for heterogeneous clusters! Dask: Distributed Computing
  • 14. © 2017 Anaconda, Inc. - Confidential & Proprietary 14 The Future • In flight: • Merger of common code into Apache Arrow GPU support • Node.js interface to GDF (Graphistry) • Dask GDF: Distributed GPU dataframe • Other potential future projects: • Tensor exchange between Python GPU libraries • GPU shared memory service (Plasma for GPU) • Can we improve the interaction of unified memory and IPC? • What do you want to see?
  • 15. © 2017 Anaconda, Inc. - Confidential & Proprietary Learn More GPU Open Analytics Website http://gpuopenanalytics.com GOAI Github Organization https://github.com/gpuopenanalytics/ GOAI Google Group https://groups.google.com/forum/#!forum/gpuopenanalytics