SlideShare a Scribd company logo
Accelerate Analytics
with a GPU Data Frame
Aaron Williams
October 18, 2017
MapD: Extreme Analytics
2
100x Faster Queries
MapD Core
The world’s fastest
columnar database, powered
by GPUs
+
Visualization at the Speed of Thought
MapD Immerse
A visualization front end that
leverages the speed &
rendering superiority of GPUs
MapD System Architecture
Accelerating the existing data infrastructure
3
4
MAPD DEMO
MapD Benchmarks
Blogger Mark Litwintschik benchmarked MapD on a billion-row taxi data set and
found it to be up to orders-of-magnitude faster than the fastest CPU databases
5
MapD Core: Comparative Query Acceleration*
System Q 1 Q 2 Q 3 Q 4
BrytlytDB & 2-node p2.16xlarge cluster 36x 47x 25x 12x
ClickHouse, Intel Core i5 4670K 49x 58x 32x 25x
Redshift, 6-node ds2.8xlarge cluster 74x 24x 14x 6x
BigQuery 95x 38x 6x 6x
Presto, 50-node n1-standard-4 cluster 190x 75x 61x 41x
Amazon Athena 305x 117x 37x 13x
Elasticsearch (heavily tuned) 386x 343x n/a n/a
Spark 2.1, 11 x m3.xlarge cluster w/ HDFS 485x 153x 119x 169x
Presto, 10-node n1-standard-4 cluster 524x 189x 127x 61x
Vertica, Intel Core i5 4670K 685x 607x 203x 132x
Elasticsearch (lightly tuned) 1,642x 1,194x n/a n/a
Presto, 5-node m3.xlarge cluster w/ HDFS 1,667x 735x 388x 159x
Presto, 50-node m3.xlarge cluster w/ S3 2,048x 849x 164x 86x
PostgreSQL 9.5 & cstore_fdw 7,238x 3,302x 1,424x 722x
Spark 1.6, 5-node m3.xlarge cluster w/ S3 12,571x 5,906x 3,758x 1,884x
*All speed comparisons are to the “MapD & 1 Nvidia Pascal DGX-1” benchmark
Source: http://tech.marksblogg.com/benchmarks.html
Query Compilation with LLVM
6
Traditional DBs can be highly inefficient
• each operator in SQL treated as a separate function
• incurs tremendous overhead and prevents vectorization
MapD compiles queries w/LLVM to create one custom function
• Queries run at speeds approaching hand-written functions
• LLVM enables generic targeting of different architectures (GPUs, X86, ARM, etc).
• Code can be generated to run query on CPU and GPU simultaneously
10111010101001010110101101010101
00110101101101010101010101011101
LLVM
Keeping Data Close to Compute
MapD maximizes performance by optimizing memory use
7
SSD or NVRAM STORAGE (L3)
250GB to 20TB
1-2 GB/sec
CPU RAM (L2)
32GB to 3TB
70-120 GB/sec
GPU RAM (L1)
24GB to 256GB
1000-6000 GB/sec
Hot Data
Speedup = 1500x to 5000x
Over Cold Data
Warm Data
Speedup = 35x to 120x
Over Cold Data
Cold Data
COMPUTE
LAYER
STORAGE
LAYER
Data Lake/Data Warehouse/System Of Record
SpeedIncreases
SpaceIncreases
The Status Quo: Memory Bottlenecks
8
PCIe
4-16GB/s
The GPU Open Analytics Initiative Model
Standard in-memory format; zero-copy interchange
9
GPU
The GPU Open Analytics Initiative Model
Standard in-memory format; zero-copy interchange
10
Interactive Machine Learning
Empowering the People in the Pipeline
11
Personas in
Analytics Lifecycle
(Illustrative)
Business Analyst
Data Scientist
Data Engineer
IT Systems Admin
Data Scientist / Business Analyst
Data
Preparation
Data
Discovery
& Feature
Engineering
Model &
Validate
Predict
Operationalize
Monitoring &
Refinement
Evaluate
& Decide
GPUsMapD H20.ai MapD
12
GOAI DEMO
Try MapD
It’s free and it’s easy (and @ortelius sez “it’s the new h0t sh1t”)
13
Play with the live demos:
https://www.mapd.com/demos/
Download the Community Edition:
https://www.mapd.com/platform/download-community/
Join our forums:
https://community.mapd.com/
Review these slides:
https://www.slideshare.net/aaronrogerwilliams
Aaron Williams
VP of Global Community
@_arw_
aaron@mapd.com
/in/aaronwilliams/
/williamsaaron
GTC Tel Aviv: Accelerate Analytics with a GPU Data Frame

More Related Content

What's hot

Ndb cluster 80_tpc_h
Ndb cluster 80_tpc_hNdb cluster 80_tpc_h
Ndb cluster 80_tpc_h
mikaelronstrom
 
DynamoDB at HasOffers
DynamoDB at HasOffers DynamoDB at HasOffers
DynamoDB at HasOffers
Amazon Web Services
 
SCasia 2018 MSFT hands on session for Azure Batch AI
SCasia 2018 MSFT hands on session for Azure Batch AISCasia 2018 MSFT hands on session for Azure Batch AI
SCasia 2018 MSFT hands on session for Azure Batch AI
Hiroshi Tanaka
 
Cassandra Day Atlanta 2015: Diagnosing Problems in Production
Cassandra Day Atlanta 2015: Diagnosing Problems in ProductionCassandra Day Atlanta 2015: Diagnosing Problems in Production
Cassandra Day Atlanta 2015: Diagnosing Problems in Production
DataStax Academy
 
GPU databases - How to use them and what the future holds
GPU databases - How to use them and what the future holdsGPU databases - How to use them and what the future holds
GPU databases - How to use them and what the future holds
Arnon Shimoni
 
ClickHouse in Real Life. Case Studies and Best Practices, by Alexander Zaitsev
ClickHouse in Real Life. Case Studies and Best Practices, by Alexander ZaitsevClickHouse in Real Life. Case Studies and Best Practices, by Alexander Zaitsev
ClickHouse in Real Life. Case Studies and Best Practices, by Alexander Zaitsev
Altinity Ltd
 
Scientific Computing With Amazon Web Services
Scientific Computing With Amazon Web ServicesScientific Computing With Amazon Web Services
Scientific Computing With Amazon Web Services
Jamie Kinney
 
OpenNebula TechDay Boston 2015 - HA HPC with OpenNebula
OpenNebula TechDay Boston 2015 - HA HPC with OpenNebulaOpenNebula TechDay Boston 2015 - HA HPC with OpenNebula
OpenNebula TechDay Boston 2015 - HA HPC with OpenNebula
OpenNebula Project
 
Measuring Database Performance on Bare Metal AWS Instances
Measuring Database Performance on Bare Metal AWS InstancesMeasuring Database Performance on Bare Metal AWS Instances
Measuring Database Performance on Bare Metal AWS Instances
ScyllaDB
 
ScyllaDB @ Apache BigData, may 2016
ScyllaDB @ Apache BigData, may 2016ScyllaDB @ Apache BigData, may 2016
ScyllaDB @ Apache BigData, may 2016
Tzach Livyatan
 
Using S3 Select to Deliver 100X Performance Improvements Versus the Public Cloud
Using S3 Select to Deliver 100X Performance Improvements Versus the Public CloudUsing S3 Select to Deliver 100X Performance Improvements Versus the Public Cloud
Using S3 Select to Deliver 100X Performance Improvements Versus the Public Cloud
Databricks
 
The Do’s and Don’ts of Benchmarking Databases
The Do’s and Don’ts of Benchmarking DatabasesThe Do’s and Don’ts of Benchmarking Databases
The Do’s and Don’ts of Benchmarking Databases
ScyllaDB
 
Introduction to SQream and the IoT environment
Introduction to SQream and the IoT environmentIntroduction to SQream and the IoT environment
Introduction to SQream and the IoT environment
Arnon Shimoni
 
Introducing Scylla Open Source 4.0
Introducing Scylla Open Source 4.0Introducing Scylla Open Source 4.0
Introducing Scylla Open Source 4.0
ScyllaDB
 
Webinar 2017. Supercharge your analytics with ClickHouse. Alexander Zaitsev
Webinar 2017. Supercharge your analytics with ClickHouse. Alexander ZaitsevWebinar 2017. Supercharge your analytics with ClickHouse. Alexander Zaitsev
Webinar 2017. Supercharge your analytics with ClickHouse. Alexander Zaitsev
Altinity Ltd
 
Ndb cluster 80_ycsb_disk
Ndb cluster 80_ycsb_diskNdb cluster 80_ycsb_disk
Ndb cluster 80_ycsb_disk
mikaelronstrom
 
HTTP Analytics for 6M requests per second using ClickHouse, by Alexander Boc...
HTTP Analytics for 6M requests per second using ClickHouse, by  Alexander Boc...HTTP Analytics for 6M requests per second using ClickHouse, by  Alexander Boc...
HTTP Analytics for 6M requests per second using ClickHouse, by Alexander Boc...
Altinity Ltd
 
Microsoft Azure in HPC scenarios
Microsoft Azure in HPC scenariosMicrosoft Azure in HPC scenarios
Microsoft Azure in HPC scenarios
mictc
 
MongoDB vs Scylla: Production Experience from Both Dev & Ops Standpoint at Nu...
MongoDB vs Scylla: Production Experience from Both Dev & Ops Standpoint at Nu...MongoDB vs Scylla: Production Experience from Both Dev & Ops Standpoint at Nu...
MongoDB vs Scylla: Production Experience from Both Dev & Ops Standpoint at Nu...
ScyllaDB
 
Get Your Head in the Cloud - Lessons in GPU Computing with Schlumberger
Get Your Head in the Cloud - Lessons in GPU Computing with SchlumbergerGet Your Head in the Cloud - Lessons in GPU Computing with Schlumberger
Get Your Head in the Cloud - Lessons in GPU Computing with Schlumberger
inside-BigData.com
 

What's hot (20)

Ndb cluster 80_tpc_h
Ndb cluster 80_tpc_hNdb cluster 80_tpc_h
Ndb cluster 80_tpc_h
 
DynamoDB at HasOffers
DynamoDB at HasOffers DynamoDB at HasOffers
DynamoDB at HasOffers
 
SCasia 2018 MSFT hands on session for Azure Batch AI
SCasia 2018 MSFT hands on session for Azure Batch AISCasia 2018 MSFT hands on session for Azure Batch AI
SCasia 2018 MSFT hands on session for Azure Batch AI
 
Cassandra Day Atlanta 2015: Diagnosing Problems in Production
Cassandra Day Atlanta 2015: Diagnosing Problems in ProductionCassandra Day Atlanta 2015: Diagnosing Problems in Production
Cassandra Day Atlanta 2015: Diagnosing Problems in Production
 
GPU databases - How to use them and what the future holds
GPU databases - How to use them and what the future holdsGPU databases - How to use them and what the future holds
GPU databases - How to use them and what the future holds
 
ClickHouse in Real Life. Case Studies and Best Practices, by Alexander Zaitsev
ClickHouse in Real Life. Case Studies and Best Practices, by Alexander ZaitsevClickHouse in Real Life. Case Studies and Best Practices, by Alexander Zaitsev
ClickHouse in Real Life. Case Studies and Best Practices, by Alexander Zaitsev
 
Scientific Computing With Amazon Web Services
Scientific Computing With Amazon Web ServicesScientific Computing With Amazon Web Services
Scientific Computing With Amazon Web Services
 
OpenNebula TechDay Boston 2015 - HA HPC with OpenNebula
OpenNebula TechDay Boston 2015 - HA HPC with OpenNebulaOpenNebula TechDay Boston 2015 - HA HPC with OpenNebula
OpenNebula TechDay Boston 2015 - HA HPC with OpenNebula
 
Measuring Database Performance on Bare Metal AWS Instances
Measuring Database Performance on Bare Metal AWS InstancesMeasuring Database Performance on Bare Metal AWS Instances
Measuring Database Performance on Bare Metal AWS Instances
 
ScyllaDB @ Apache BigData, may 2016
ScyllaDB @ Apache BigData, may 2016ScyllaDB @ Apache BigData, may 2016
ScyllaDB @ Apache BigData, may 2016
 
Using S3 Select to Deliver 100X Performance Improvements Versus the Public Cloud
Using S3 Select to Deliver 100X Performance Improvements Versus the Public CloudUsing S3 Select to Deliver 100X Performance Improvements Versus the Public Cloud
Using S3 Select to Deliver 100X Performance Improvements Versus the Public Cloud
 
The Do’s and Don’ts of Benchmarking Databases
The Do’s and Don’ts of Benchmarking DatabasesThe Do’s and Don’ts of Benchmarking Databases
The Do’s and Don’ts of Benchmarking Databases
 
Introduction to SQream and the IoT environment
Introduction to SQream and the IoT environmentIntroduction to SQream and the IoT environment
Introduction to SQream and the IoT environment
 
Introducing Scylla Open Source 4.0
Introducing Scylla Open Source 4.0Introducing Scylla Open Source 4.0
Introducing Scylla Open Source 4.0
 
Webinar 2017. Supercharge your analytics with ClickHouse. Alexander Zaitsev
Webinar 2017. Supercharge your analytics with ClickHouse. Alexander ZaitsevWebinar 2017. Supercharge your analytics with ClickHouse. Alexander Zaitsev
Webinar 2017. Supercharge your analytics with ClickHouse. Alexander Zaitsev
 
Ndb cluster 80_ycsb_disk
Ndb cluster 80_ycsb_diskNdb cluster 80_ycsb_disk
Ndb cluster 80_ycsb_disk
 
HTTP Analytics for 6M requests per second using ClickHouse, by Alexander Boc...
HTTP Analytics for 6M requests per second using ClickHouse, by  Alexander Boc...HTTP Analytics for 6M requests per second using ClickHouse, by  Alexander Boc...
HTTP Analytics for 6M requests per second using ClickHouse, by Alexander Boc...
 
Microsoft Azure in HPC scenarios
Microsoft Azure in HPC scenariosMicrosoft Azure in HPC scenarios
Microsoft Azure in HPC scenarios
 
MongoDB vs Scylla: Production Experience from Both Dev & Ops Standpoint at Nu...
MongoDB vs Scylla: Production Experience from Both Dev & Ops Standpoint at Nu...MongoDB vs Scylla: Production Experience from Both Dev & Ops Standpoint at Nu...
MongoDB vs Scylla: Production Experience from Both Dev & Ops Standpoint at Nu...
 
Get Your Head in the Cloud - Lessons in GPU Computing with Schlumberger
Get Your Head in the Cloud - Lessons in GPU Computing with SchlumbergerGet Your Head in the Cloud - Lessons in GPU Computing with Schlumberger
Get Your Head in the Cloud - Lessons in GPU Computing with Schlumberger
 

Similar to GTC Tel Aviv: Accelerate Analytics with a GPU Data Frame

Fast data in times of crisis with GPU accelerated database QikkDB | Business ...
Fast data in times of crisis with GPU accelerated database QikkDB | Business ...Fast data in times of crisis with GPU accelerated database QikkDB | Business ...
Fast data in times of crisis with GPU accelerated database QikkDB | Business ...
Matej Misik
 
GPU Accelerated Data Science with RAPIDS - ODSC West 2020
GPU Accelerated Data Science with RAPIDS - ODSC West 2020GPU Accelerated Data Science with RAPIDS - ODSC West 2020
GPU Accelerated Data Science with RAPIDS - ODSC West 2020
John Zedlewski
 
RAPIDS: GPU-Accelerated ETL and Feature Engineering
RAPIDS: GPU-Accelerated ETL and Feature EngineeringRAPIDS: GPU-Accelerated ETL and Feature Engineering
RAPIDS: GPU-Accelerated ETL and Feature Engineering
Keith Kraus
 
GPU-Accelerating UDFs in PySpark with Numba and PyGDF
GPU-Accelerating UDFs in PySpark with Numba and PyGDFGPU-Accelerating UDFs in PySpark with Numba and PyGDF
GPU-Accelerating UDFs in PySpark with Numba and PyGDF
Keith Kraus
 
Rapids: Data Science on GPUs
Rapids: Data Science on GPUsRapids: Data Science on GPUs
Rapids: Data Science on GPUs
inside-BigData.com
 
NVIDIA Rapids presentation
NVIDIA Rapids presentationNVIDIA Rapids presentation
NVIDIA Rapids presentation
testSri1
 
組み込みから HPC まで ARM コアで実現するエコシステム
組み込みから HPC まで ARM コアで実現するエコシステム組み込みから HPC まで ARM コアで実現するエコシステム
組み込みから HPC まで ARM コアで実現するエコシステム
Shinnosuke Furuya
 
Tesla Accelerated Computing Platform
Tesla Accelerated Computing PlatformTesla Accelerated Computing Platform
Tesla Accelerated Computing Platform
inside-BigData.com
 
MYSQL
MYSQLMYSQL
MYSQL
gilashikwa
 
Using BigBench to compare Hive and Spark (short version)
Using BigBench to compare Hive and Spark (short version)Using BigBench to compare Hive and Spark (short version)
Using BigBench to compare Hive and Spark (short version)
Nicolas Poggi
 
S51281 - Accelerate Data Science in Python with RAPIDS_1679330128290001YmT7.pdf
S51281 - Accelerate Data Science in Python with RAPIDS_1679330128290001YmT7.pdfS51281 - Accelerate Data Science in Python with RAPIDS_1679330128290001YmT7.pdf
S51281 - Accelerate Data Science in Python with RAPIDS_1679330128290001YmT7.pdf
DLow6
 
SF Big Analytics & SF Machine Learning Meetup: Machine Learning at the Limit ...
SF Big Analytics & SF Machine Learning Meetup: Machine Learning at the Limit ...SF Big Analytics & SF Machine Learning Meetup: Machine Learning at the Limit ...
SF Big Analytics & SF Machine Learning Meetup: Machine Learning at the Limit ...
Chester Chen
 
GIST AI-X Computing Cluster
GIST AI-X Computing ClusterGIST AI-X Computing Cluster
GIST AI-X Computing Cluster
Jax Jargalsaikhan
 
NVIDIA GPUs Power HPC & AI Workloads in Cloud with Univa
NVIDIA GPUs Power HPC & AI Workloads in Cloud with UnivaNVIDIA GPUs Power HPC & AI Workloads in Cloud with Univa
NVIDIA GPUs Power HPC & AI Workloads in Cloud with Univa
inside-BigData.com
 
Latest HPC News from NVIDIA
Latest HPC News from NVIDIALatest HPC News from NVIDIA
Latest HPC News from NVIDIA
inside-BigData.com
 
RAPIDS – Open GPU-accelerated Data Science
RAPIDS – Open GPU-accelerated Data ScienceRAPIDS – Open GPU-accelerated Data Science
RAPIDS – Open GPU-accelerated Data Science
Data Works MD
 
RISC V in Spacer
RISC V in SpacerRISC V in Spacer
RISC V in Spacer
klepsydratechnologie
 
Red Hat Storage Day Atlanta - Designing Ceph Clusters Using Intel-Based Hardw...
Red Hat Storage Day Atlanta - Designing Ceph Clusters Using Intel-Based Hardw...Red Hat Storage Day Atlanta - Designing Ceph Clusters Using Intel-Based Hardw...
Red Hat Storage Day Atlanta - Designing Ceph Clusters Using Intel-Based Hardw...
Red_Hat_Storage
 
20201006_PGconf_Online_Large_Data_Processing
20201006_PGconf_Online_Large_Data_Processing20201006_PGconf_Online_Large_Data_Processing
20201006_PGconf_Online_Large_Data_Processing
Kohei KaiGai
 
RAPIDS Overview
RAPIDS OverviewRAPIDS Overview
RAPIDS Overview
NVIDIA Japan
 

Similar to GTC Tel Aviv: Accelerate Analytics with a GPU Data Frame (20)

Fast data in times of crisis with GPU accelerated database QikkDB | Business ...
Fast data in times of crisis with GPU accelerated database QikkDB | Business ...Fast data in times of crisis with GPU accelerated database QikkDB | Business ...
Fast data in times of crisis with GPU accelerated database QikkDB | Business ...
 
GPU Accelerated Data Science with RAPIDS - ODSC West 2020
GPU Accelerated Data Science with RAPIDS - ODSC West 2020GPU Accelerated Data Science with RAPIDS - ODSC West 2020
GPU Accelerated Data Science with RAPIDS - ODSC West 2020
 
RAPIDS: GPU-Accelerated ETL and Feature Engineering
RAPIDS: GPU-Accelerated ETL and Feature EngineeringRAPIDS: GPU-Accelerated ETL and Feature Engineering
RAPIDS: GPU-Accelerated ETL and Feature Engineering
 
GPU-Accelerating UDFs in PySpark with Numba and PyGDF
GPU-Accelerating UDFs in PySpark with Numba and PyGDFGPU-Accelerating UDFs in PySpark with Numba and PyGDF
GPU-Accelerating UDFs in PySpark with Numba and PyGDF
 
Rapids: Data Science on GPUs
Rapids: Data Science on GPUsRapids: Data Science on GPUs
Rapids: Data Science on GPUs
 
NVIDIA Rapids presentation
NVIDIA Rapids presentationNVIDIA Rapids presentation
NVIDIA Rapids presentation
 
組み込みから HPC まで ARM コアで実現するエコシステム
組み込みから HPC まで ARM コアで実現するエコシステム組み込みから HPC まで ARM コアで実現するエコシステム
組み込みから HPC まで ARM コアで実現するエコシステム
 
Tesla Accelerated Computing Platform
Tesla Accelerated Computing PlatformTesla Accelerated Computing Platform
Tesla Accelerated Computing Platform
 
MYSQL
MYSQLMYSQL
MYSQL
 
Using BigBench to compare Hive and Spark (short version)
Using BigBench to compare Hive and Spark (short version)Using BigBench to compare Hive and Spark (short version)
Using BigBench to compare Hive and Spark (short version)
 
S51281 - Accelerate Data Science in Python with RAPIDS_1679330128290001YmT7.pdf
S51281 - Accelerate Data Science in Python with RAPIDS_1679330128290001YmT7.pdfS51281 - Accelerate Data Science in Python with RAPIDS_1679330128290001YmT7.pdf
S51281 - Accelerate Data Science in Python with RAPIDS_1679330128290001YmT7.pdf
 
SF Big Analytics & SF Machine Learning Meetup: Machine Learning at the Limit ...
SF Big Analytics & SF Machine Learning Meetup: Machine Learning at the Limit ...SF Big Analytics & SF Machine Learning Meetup: Machine Learning at the Limit ...
SF Big Analytics & SF Machine Learning Meetup: Machine Learning at the Limit ...
 
GIST AI-X Computing Cluster
GIST AI-X Computing ClusterGIST AI-X Computing Cluster
GIST AI-X Computing Cluster
 
NVIDIA GPUs Power HPC & AI Workloads in Cloud with Univa
NVIDIA GPUs Power HPC & AI Workloads in Cloud with UnivaNVIDIA GPUs Power HPC & AI Workloads in Cloud with Univa
NVIDIA GPUs Power HPC & AI Workloads in Cloud with Univa
 
Latest HPC News from NVIDIA
Latest HPC News from NVIDIALatest HPC News from NVIDIA
Latest HPC News from NVIDIA
 
RAPIDS – Open GPU-accelerated Data Science
RAPIDS – Open GPU-accelerated Data ScienceRAPIDS – Open GPU-accelerated Data Science
RAPIDS – Open GPU-accelerated Data Science
 
RISC V in Spacer
RISC V in SpacerRISC V in Spacer
RISC V in Spacer
 
Red Hat Storage Day Atlanta - Designing Ceph Clusters Using Intel-Based Hardw...
Red Hat Storage Day Atlanta - Designing Ceph Clusters Using Intel-Based Hardw...Red Hat Storage Day Atlanta - Designing Ceph Clusters Using Intel-Based Hardw...
Red Hat Storage Day Atlanta - Designing Ceph Clusters Using Intel-Based Hardw...
 
20201006_PGconf_Online_Large_Data_Processing
20201006_PGconf_Online_Large_Data_Processing20201006_PGconf_Online_Large_Data_Processing
20201006_PGconf_Online_Large_Data_Processing
 
RAPIDS Overview
RAPIDS OverviewRAPIDS Overview
RAPIDS Overview
 

Recently uploaded

How to Position Your Globus Data Portal for Success Ten Good Practices
How to Position Your Globus Data Portal for Success Ten Good PracticesHow to Position Your Globus Data Portal for Success Ten Good Practices
How to Position Your Globus Data Portal for Success Ten Good Practices
Globus
 
Large Language Models and the End of Programming
Large Language Models and the End of ProgrammingLarge Language Models and the End of Programming
Large Language Models and the End of Programming
Matt Welsh
 
Enterprise Software Development with No Code Solutions.pptx
Enterprise Software Development with No Code Solutions.pptxEnterprise Software Development with No Code Solutions.pptx
Enterprise Software Development with No Code Solutions.pptx
QuickwayInfoSystems3
 
AI Pilot Review: The World’s First Virtual Assistant Marketing Suite
AI Pilot Review: The World’s First Virtual Assistant Marketing SuiteAI Pilot Review: The World’s First Virtual Assistant Marketing Suite
AI Pilot Review: The World’s First Virtual Assistant Marketing Suite
Google
 
Beyond Event Sourcing - Embracing CRUD for Wix Platform - Java.IL
Beyond Event Sourcing - Embracing CRUD for Wix Platform - Java.ILBeyond Event Sourcing - Embracing CRUD for Wix Platform - Java.IL
Beyond Event Sourcing - Embracing CRUD for Wix Platform - Java.IL
Natan Silnitsky
 
Vitthal Shirke Microservices Resume Montevideo
Vitthal Shirke Microservices Resume MontevideoVitthal Shirke Microservices Resume Montevideo
Vitthal Shirke Microservices Resume Montevideo
Vitthal Shirke
 
GlobusWorld 2024 Opening Keynote session
GlobusWorld 2024 Opening Keynote sessionGlobusWorld 2024 Opening Keynote session
GlobusWorld 2024 Opening Keynote session
Globus
 
A Sighting of filterA in Typelevel Rite of Passage
A Sighting of filterA in Typelevel Rite of PassageA Sighting of filterA in Typelevel Rite of Passage
A Sighting of filterA in Typelevel Rite of Passage
Philip Schwarz
 
Webinar: Salesforce Document Management 2.0 - Smarter, Faster, Better
Webinar: Salesforce Document Management 2.0 - Smarter, Faster, BetterWebinar: Salesforce Document Management 2.0 - Smarter, Faster, Better
Webinar: Salesforce Document Management 2.0 - Smarter, Faster, Better
XfilesPro
 
2024 RoOUG Security model for the cloud.pptx
2024 RoOUG Security model for the cloud.pptx2024 RoOUG Security model for the cloud.pptx
2024 RoOUG Security model for the cloud.pptx
Georgi Kodinov
 
GraphSummit Paris - The art of the possible with Graph Technology
GraphSummit Paris - The art of the possible with Graph TechnologyGraphSummit Paris - The art of the possible with Graph Technology
GraphSummit Paris - The art of the possible with Graph Technology
Neo4j
 
Navigating the Metaverse: A Journey into Virtual Evolution"
Navigating the Metaverse: A Journey into Virtual Evolution"Navigating the Metaverse: A Journey into Virtual Evolution"
Navigating the Metaverse: A Journey into Virtual Evolution"
Donna Lenk
 
In 2015, I used to write extensions for Joomla, WordPress, phpBB3, etc and I ...
In 2015, I used to write extensions for Joomla, WordPress, phpBB3, etc and I ...In 2015, I used to write extensions for Joomla, WordPress, phpBB3, etc and I ...
In 2015, I used to write extensions for Joomla, WordPress, phpBB3, etc and I ...
Juraj Vysvader
 
BoxLang: Review our Visionary Licenses of 2024
BoxLang: Review our Visionary Licenses of 2024BoxLang: Review our Visionary Licenses of 2024
BoxLang: Review our Visionary Licenses of 2024
Ortus Solutions, Corp
 
Custom Healthcare Software for Managing Chronic Conditions and Remote Patient...
Custom Healthcare Software for Managing Chronic Conditions and Remote Patient...Custom Healthcare Software for Managing Chronic Conditions and Remote Patient...
Custom Healthcare Software for Managing Chronic Conditions and Remote Patient...
Mind IT Systems
 
Essentials of Automations: The Art of Triggers and Actions in FME
Essentials of Automations: The Art of Triggers and Actions in FMEEssentials of Automations: The Art of Triggers and Actions in FME
Essentials of Automations: The Art of Triggers and Actions in FME
Safe Software
 
May Marketo Masterclass, London MUG May 22 2024.pdf
May Marketo Masterclass, London MUG May 22 2024.pdfMay Marketo Masterclass, London MUG May 22 2024.pdf
May Marketo Masterclass, London MUG May 22 2024.pdf
Adele Miller
 
Pro Unity Game Development with C-sharp Book
Pro Unity Game Development with C-sharp BookPro Unity Game Development with C-sharp Book
Pro Unity Game Development with C-sharp Book
abdulrafaychaudhry
 
Prosigns: Transforming Business with Tailored Technology Solutions
Prosigns: Transforming Business with Tailored Technology SolutionsProsigns: Transforming Business with Tailored Technology Solutions
Prosigns: Transforming Business with Tailored Technology Solutions
Prosigns
 
Climate Science Flows: Enabling Petabyte-Scale Climate Analysis with the Eart...
Climate Science Flows: Enabling Petabyte-Scale Climate Analysis with the Eart...Climate Science Flows: Enabling Petabyte-Scale Climate Analysis with the Eart...
Climate Science Flows: Enabling Petabyte-Scale Climate Analysis with the Eart...
Globus
 

Recently uploaded (20)

How to Position Your Globus Data Portal for Success Ten Good Practices
How to Position Your Globus Data Portal for Success Ten Good PracticesHow to Position Your Globus Data Portal for Success Ten Good Practices
How to Position Your Globus Data Portal for Success Ten Good Practices
 
Large Language Models and the End of Programming
Large Language Models and the End of ProgrammingLarge Language Models and the End of Programming
Large Language Models and the End of Programming
 
Enterprise Software Development with No Code Solutions.pptx
Enterprise Software Development with No Code Solutions.pptxEnterprise Software Development with No Code Solutions.pptx
Enterprise Software Development with No Code Solutions.pptx
 
AI Pilot Review: The World’s First Virtual Assistant Marketing Suite
AI Pilot Review: The World’s First Virtual Assistant Marketing SuiteAI Pilot Review: The World’s First Virtual Assistant Marketing Suite
AI Pilot Review: The World’s First Virtual Assistant Marketing Suite
 
Beyond Event Sourcing - Embracing CRUD for Wix Platform - Java.IL
Beyond Event Sourcing - Embracing CRUD for Wix Platform - Java.ILBeyond Event Sourcing - Embracing CRUD for Wix Platform - Java.IL
Beyond Event Sourcing - Embracing CRUD for Wix Platform - Java.IL
 
Vitthal Shirke Microservices Resume Montevideo
Vitthal Shirke Microservices Resume MontevideoVitthal Shirke Microservices Resume Montevideo
Vitthal Shirke Microservices Resume Montevideo
 
GlobusWorld 2024 Opening Keynote session
GlobusWorld 2024 Opening Keynote sessionGlobusWorld 2024 Opening Keynote session
GlobusWorld 2024 Opening Keynote session
 
A Sighting of filterA in Typelevel Rite of Passage
A Sighting of filterA in Typelevel Rite of PassageA Sighting of filterA in Typelevel Rite of Passage
A Sighting of filterA in Typelevel Rite of Passage
 
Webinar: Salesforce Document Management 2.0 - Smarter, Faster, Better
Webinar: Salesforce Document Management 2.0 - Smarter, Faster, BetterWebinar: Salesforce Document Management 2.0 - Smarter, Faster, Better
Webinar: Salesforce Document Management 2.0 - Smarter, Faster, Better
 
2024 RoOUG Security model for the cloud.pptx
2024 RoOUG Security model for the cloud.pptx2024 RoOUG Security model for the cloud.pptx
2024 RoOUG Security model for the cloud.pptx
 
GraphSummit Paris - The art of the possible with Graph Technology
GraphSummit Paris - The art of the possible with Graph TechnologyGraphSummit Paris - The art of the possible with Graph Technology
GraphSummit Paris - The art of the possible with Graph Technology
 
Navigating the Metaverse: A Journey into Virtual Evolution"
Navigating the Metaverse: A Journey into Virtual Evolution"Navigating the Metaverse: A Journey into Virtual Evolution"
Navigating the Metaverse: A Journey into Virtual Evolution"
 
In 2015, I used to write extensions for Joomla, WordPress, phpBB3, etc and I ...
In 2015, I used to write extensions for Joomla, WordPress, phpBB3, etc and I ...In 2015, I used to write extensions for Joomla, WordPress, phpBB3, etc and I ...
In 2015, I used to write extensions for Joomla, WordPress, phpBB3, etc and I ...
 
BoxLang: Review our Visionary Licenses of 2024
BoxLang: Review our Visionary Licenses of 2024BoxLang: Review our Visionary Licenses of 2024
BoxLang: Review our Visionary Licenses of 2024
 
Custom Healthcare Software for Managing Chronic Conditions and Remote Patient...
Custom Healthcare Software for Managing Chronic Conditions and Remote Patient...Custom Healthcare Software for Managing Chronic Conditions and Remote Patient...
Custom Healthcare Software for Managing Chronic Conditions and Remote Patient...
 
Essentials of Automations: The Art of Triggers and Actions in FME
Essentials of Automations: The Art of Triggers and Actions in FMEEssentials of Automations: The Art of Triggers and Actions in FME
Essentials of Automations: The Art of Triggers and Actions in FME
 
May Marketo Masterclass, London MUG May 22 2024.pdf
May Marketo Masterclass, London MUG May 22 2024.pdfMay Marketo Masterclass, London MUG May 22 2024.pdf
May Marketo Masterclass, London MUG May 22 2024.pdf
 
Pro Unity Game Development with C-sharp Book
Pro Unity Game Development with C-sharp BookPro Unity Game Development with C-sharp Book
Pro Unity Game Development with C-sharp Book
 
Prosigns: Transforming Business with Tailored Technology Solutions
Prosigns: Transforming Business with Tailored Technology SolutionsProsigns: Transforming Business with Tailored Technology Solutions
Prosigns: Transforming Business with Tailored Technology Solutions
 
Climate Science Flows: Enabling Petabyte-Scale Climate Analysis with the Eart...
Climate Science Flows: Enabling Petabyte-Scale Climate Analysis with the Eart...Climate Science Flows: Enabling Petabyte-Scale Climate Analysis with the Eart...
Climate Science Flows: Enabling Petabyte-Scale Climate Analysis with the Eart...
 

GTC Tel Aviv: Accelerate Analytics with a GPU Data Frame

  • 1. Accelerate Analytics with a GPU Data Frame Aaron Williams October 18, 2017
  • 2. MapD: Extreme Analytics 2 100x Faster Queries MapD Core The world’s fastest columnar database, powered by GPUs + Visualization at the Speed of Thought MapD Immerse A visualization front end that leverages the speed & rendering superiority of GPUs
  • 3. MapD System Architecture Accelerating the existing data infrastructure 3
  • 5. MapD Benchmarks Blogger Mark Litwintschik benchmarked MapD on a billion-row taxi data set and found it to be up to orders-of-magnitude faster than the fastest CPU databases 5 MapD Core: Comparative Query Acceleration* System Q 1 Q 2 Q 3 Q 4 BrytlytDB & 2-node p2.16xlarge cluster 36x 47x 25x 12x ClickHouse, Intel Core i5 4670K 49x 58x 32x 25x Redshift, 6-node ds2.8xlarge cluster 74x 24x 14x 6x BigQuery 95x 38x 6x 6x Presto, 50-node n1-standard-4 cluster 190x 75x 61x 41x Amazon Athena 305x 117x 37x 13x Elasticsearch (heavily tuned) 386x 343x n/a n/a Spark 2.1, 11 x m3.xlarge cluster w/ HDFS 485x 153x 119x 169x Presto, 10-node n1-standard-4 cluster 524x 189x 127x 61x Vertica, Intel Core i5 4670K 685x 607x 203x 132x Elasticsearch (lightly tuned) 1,642x 1,194x n/a n/a Presto, 5-node m3.xlarge cluster w/ HDFS 1,667x 735x 388x 159x Presto, 50-node m3.xlarge cluster w/ S3 2,048x 849x 164x 86x PostgreSQL 9.5 & cstore_fdw 7,238x 3,302x 1,424x 722x Spark 1.6, 5-node m3.xlarge cluster w/ S3 12,571x 5,906x 3,758x 1,884x *All speed comparisons are to the “MapD & 1 Nvidia Pascal DGX-1” benchmark Source: http://tech.marksblogg.com/benchmarks.html
  • 6. Query Compilation with LLVM 6 Traditional DBs can be highly inefficient • each operator in SQL treated as a separate function • incurs tremendous overhead and prevents vectorization MapD compiles queries w/LLVM to create one custom function • Queries run at speeds approaching hand-written functions • LLVM enables generic targeting of different architectures (GPUs, X86, ARM, etc). • Code can be generated to run query on CPU and GPU simultaneously 10111010101001010110101101010101 00110101101101010101010101011101 LLVM
  • 7. Keeping Data Close to Compute MapD maximizes performance by optimizing memory use 7 SSD or NVRAM STORAGE (L3) 250GB to 20TB 1-2 GB/sec CPU RAM (L2) 32GB to 3TB 70-120 GB/sec GPU RAM (L1) 24GB to 256GB 1000-6000 GB/sec Hot Data Speedup = 1500x to 5000x Over Cold Data Warm Data Speedup = 35x to 120x Over Cold Data Cold Data COMPUTE LAYER STORAGE LAYER Data Lake/Data Warehouse/System Of Record SpeedIncreases SpaceIncreases
  • 8. The Status Quo: Memory Bottlenecks 8 PCIe 4-16GB/s
  • 9. The GPU Open Analytics Initiative Model Standard in-memory format; zero-copy interchange 9 GPU
  • 10. The GPU Open Analytics Initiative Model Standard in-memory format; zero-copy interchange 10
  • 11. Interactive Machine Learning Empowering the People in the Pipeline 11 Personas in Analytics Lifecycle (Illustrative) Business Analyst Data Scientist Data Engineer IT Systems Admin Data Scientist / Business Analyst Data Preparation Data Discovery & Feature Engineering Model & Validate Predict Operationalize Monitoring & Refinement Evaluate & Decide GPUsMapD H20.ai MapD
  • 13. Try MapD It’s free and it’s easy (and @ortelius sez “it’s the new h0t sh1t”) 13 Play with the live demos: https://www.mapd.com/demos/ Download the Community Edition: https://www.mapd.com/platform/download-community/ Join our forums: https://community.mapd.com/ Review these slides: https://www.slideshare.net/aaronrogerwilliams
  • 14. Aaron Williams VP of Global Community @_arw_ aaron@mapd.com /in/aaronwilliams/ /williamsaaron