SlideShare a Scribd company logo
1 of 15
Download to read offline
Accelerate Analytics
with a GPU Data Frame
Aaron Williams
October 18, 2017
MapD: Extreme Analytics
2
100x Faster Queries
MapD Core
The world’s fastest
columnar database, powered
by GPUs
+
Visualization at the Speed of Thought
MapD Immerse
A visualization front end that
leverages the speed &
rendering superiority of GPUs
MapD System Architecture
Accelerating the existing data infrastructure
3
4
MAPD DEMO
MapD Benchmarks
Blogger Mark Litwintschik benchmarked MapD on a billion-row taxi data set and
found it to be up to orders-of-magnitude faster than the fastest CPU databases
5
MapD Core: Comparative Query Acceleration*
System Q 1 Q 2 Q 3 Q 4
BrytlytDB & 2-node p2.16xlarge cluster 36x 47x 25x 12x
ClickHouse, Intel Core i5 4670K 49x 58x 32x 25x
Redshift, 6-node ds2.8xlarge cluster 74x 24x 14x 6x
BigQuery 95x 38x 6x 6x
Presto, 50-node n1-standard-4 cluster 190x 75x 61x 41x
Amazon Athena 305x 117x 37x 13x
Elasticsearch (heavily tuned) 386x 343x n/a n/a
Spark 2.1, 11 x m3.xlarge cluster w/ HDFS 485x 153x 119x 169x
Presto, 10-node n1-standard-4 cluster 524x 189x 127x 61x
Vertica, Intel Core i5 4670K 685x 607x 203x 132x
Elasticsearch (lightly tuned) 1,642x 1,194x n/a n/a
Presto, 5-node m3.xlarge cluster w/ HDFS 1,667x 735x 388x 159x
Presto, 50-node m3.xlarge cluster w/ S3 2,048x 849x 164x 86x
PostgreSQL 9.5 & cstore_fdw 7,238x 3,302x 1,424x 722x
Spark 1.6, 5-node m3.xlarge cluster w/ S3 12,571x 5,906x 3,758x 1,884x
*All speed comparisons are to the “MapD & 1 Nvidia Pascal DGX-1” benchmark
Source: http://tech.marksblogg.com/benchmarks.html
Query Compilation with LLVM
6
Traditional DBs can be highly inefficient
• each operator in SQL treated as a separate function
• incurs tremendous overhead and prevents vectorization
MapD compiles queries w/LLVM to create one custom function
• Queries run at speeds approaching hand-written functions
• LLVM enables generic targeting of different architectures (GPUs, X86, ARM, etc).
• Code can be generated to run query on CPU and GPU simultaneously
10111010101001010110101101010101
00110101101101010101010101011101
LLVM
Keeping Data Close to Compute
MapD maximizes performance by optimizing memory use
7
SSD or NVRAM STORAGE (L3)
250GB to 20TB
1-2 GB/sec
CPU RAM (L2)
32GB to 3TB
70-120 GB/sec
GPU RAM (L1)
24GB to 256GB
1000-6000 GB/sec
Hot Data
Speedup = 1500x to 5000x
Over Cold Data
Warm Data
Speedup = 35x to 120x
Over Cold Data
Cold Data
COMPUTE
LAYER
STORAGE
LAYER
Data Lake/Data Warehouse/System Of Record
SpeedIncreases
SpaceIncreases
The Status Quo: Memory Bottlenecks
8
PCIe
4-16GB/s
The GPU Open Analytics Initiative Model
Standard in-memory format; zero-copy interchange
9
GPU
The GPU Open Analytics Initiative Model
Standard in-memory format; zero-copy interchange
10
Interactive Machine Learning
Empowering the People in the Pipeline
11
Personas in
Analytics Lifecycle
(Illustrative)
Business Analyst
Data Scientist
Data Engineer
IT Systems Admin
Data Scientist / Business Analyst
Data
Preparation
Data
Discovery
& Feature
Engineering
Model &
Validate
Predict
Operationalize
Monitoring &
Refinement
Evaluate
& Decide
GPUsMapD H20.ai MapD
12
GOAI DEMO
Try MapD
It’s free and it’s easy (and @ortelius sez “it’s the new h0t sh1t”)
13
Play with the live demos:
https://www.mapd.com/demos/
Download the Community Edition:
https://www.mapd.com/platform/download-community/
Join our forums:
https://community.mapd.com/
Review these slides:
https://www.slideshare.net/aaronrogerwilliams
Aaron Williams
VP of Global Community
@_arw_
aaron@mapd.com
/in/aaronwilliams/
/williamsaaron
GTC Tel Aviv: Accelerate Analytics with a GPU Data Frame

More Related Content

What's hot

SCasia 2018 MSFT hands on session for Azure Batch AI
SCasia 2018 MSFT hands on session for Azure Batch AISCasia 2018 MSFT hands on session for Azure Batch AI
SCasia 2018 MSFT hands on session for Azure Batch AIHiroshi Tanaka
 
Cassandra Day Atlanta 2015: Diagnosing Problems in Production
Cassandra Day Atlanta 2015: Diagnosing Problems in ProductionCassandra Day Atlanta 2015: Diagnosing Problems in Production
Cassandra Day Atlanta 2015: Diagnosing Problems in ProductionDataStax Academy
 
GPU databases - How to use them and what the future holds
GPU databases - How to use them and what the future holdsGPU databases - How to use them and what the future holds
GPU databases - How to use them and what the future holdsArnon Shimoni
 
ClickHouse in Real Life. Case Studies and Best Practices, by Alexander Zaitsev
ClickHouse in Real Life. Case Studies and Best Practices, by Alexander ZaitsevClickHouse in Real Life. Case Studies and Best Practices, by Alexander Zaitsev
ClickHouse in Real Life. Case Studies and Best Practices, by Alexander ZaitsevAltinity Ltd
 
Scientific Computing With Amazon Web Services
Scientific Computing With Amazon Web ServicesScientific Computing With Amazon Web Services
Scientific Computing With Amazon Web ServicesJamie Kinney
 
OpenNebula TechDay Boston 2015 - HA HPC with OpenNebula
OpenNebula TechDay Boston 2015 - HA HPC with OpenNebulaOpenNebula TechDay Boston 2015 - HA HPC with OpenNebula
OpenNebula TechDay Boston 2015 - HA HPC with OpenNebulaOpenNebula Project
 
Measuring Database Performance on Bare Metal AWS Instances
Measuring Database Performance on Bare Metal AWS InstancesMeasuring Database Performance on Bare Metal AWS Instances
Measuring Database Performance on Bare Metal AWS InstancesScyllaDB
 
ScyllaDB @ Apache BigData, may 2016
ScyllaDB @ Apache BigData, may 2016ScyllaDB @ Apache BigData, may 2016
ScyllaDB @ Apache BigData, may 2016Tzach Livyatan
 
Using S3 Select to Deliver 100X Performance Improvements Versus the Public Cloud
Using S3 Select to Deliver 100X Performance Improvements Versus the Public CloudUsing S3 Select to Deliver 100X Performance Improvements Versus the Public Cloud
Using S3 Select to Deliver 100X Performance Improvements Versus the Public CloudDatabricks
 
The Do’s and Don’ts of Benchmarking Databases
The Do’s and Don’ts of Benchmarking DatabasesThe Do’s and Don’ts of Benchmarking Databases
The Do’s and Don’ts of Benchmarking DatabasesScyllaDB
 
Introduction to SQream and the IoT environment
Introduction to SQream and the IoT environmentIntroduction to SQream and the IoT environment
Introduction to SQream and the IoT environmentArnon Shimoni
 
Introducing Scylla Open Source 4.0
Introducing Scylla Open Source 4.0Introducing Scylla Open Source 4.0
Introducing Scylla Open Source 4.0ScyllaDB
 
Webinar 2017. Supercharge your analytics with ClickHouse. Alexander Zaitsev
Webinar 2017. Supercharge your analytics with ClickHouse. Alexander ZaitsevWebinar 2017. Supercharge your analytics with ClickHouse. Alexander Zaitsev
Webinar 2017. Supercharge your analytics with ClickHouse. Alexander ZaitsevAltinity Ltd
 
Ndb cluster 80_ycsb_disk
Ndb cluster 80_ycsb_diskNdb cluster 80_ycsb_disk
Ndb cluster 80_ycsb_diskmikaelronstrom
 
HTTP Analytics for 6M requests per second using ClickHouse, by Alexander Boc...
HTTP Analytics for 6M requests per second using ClickHouse, by  Alexander Boc...HTTP Analytics for 6M requests per second using ClickHouse, by  Alexander Boc...
HTTP Analytics for 6M requests per second using ClickHouse, by Alexander Boc...Altinity Ltd
 
Microsoft Azure in HPC scenarios
Microsoft Azure in HPC scenariosMicrosoft Azure in HPC scenarios
Microsoft Azure in HPC scenariosmictc
 
MongoDB vs Scylla: Production Experience from Both Dev & Ops Standpoint at Nu...
MongoDB vs Scylla: Production Experience from Both Dev & Ops Standpoint at Nu...MongoDB vs Scylla: Production Experience from Both Dev & Ops Standpoint at Nu...
MongoDB vs Scylla: Production Experience from Both Dev & Ops Standpoint at Nu...ScyllaDB
 
Get Your Head in the Cloud - Lessons in GPU Computing with Schlumberger
Get Your Head in the Cloud - Lessons in GPU Computing with SchlumbergerGet Your Head in the Cloud - Lessons in GPU Computing with Schlumberger
Get Your Head in the Cloud - Lessons in GPU Computing with Schlumbergerinside-BigData.com
 

What's hot (20)

Ndb cluster 80_tpc_h
Ndb cluster 80_tpc_hNdb cluster 80_tpc_h
Ndb cluster 80_tpc_h
 
DynamoDB at HasOffers
DynamoDB at HasOffers DynamoDB at HasOffers
DynamoDB at HasOffers
 
SCasia 2018 MSFT hands on session for Azure Batch AI
SCasia 2018 MSFT hands on session for Azure Batch AISCasia 2018 MSFT hands on session for Azure Batch AI
SCasia 2018 MSFT hands on session for Azure Batch AI
 
Cassandra Day Atlanta 2015: Diagnosing Problems in Production
Cassandra Day Atlanta 2015: Diagnosing Problems in ProductionCassandra Day Atlanta 2015: Diagnosing Problems in Production
Cassandra Day Atlanta 2015: Diagnosing Problems in Production
 
GPU databases - How to use them and what the future holds
GPU databases - How to use them and what the future holdsGPU databases - How to use them and what the future holds
GPU databases - How to use them and what the future holds
 
ClickHouse in Real Life. Case Studies and Best Practices, by Alexander Zaitsev
ClickHouse in Real Life. Case Studies and Best Practices, by Alexander ZaitsevClickHouse in Real Life. Case Studies and Best Practices, by Alexander Zaitsev
ClickHouse in Real Life. Case Studies and Best Practices, by Alexander Zaitsev
 
Scientific Computing With Amazon Web Services
Scientific Computing With Amazon Web ServicesScientific Computing With Amazon Web Services
Scientific Computing With Amazon Web Services
 
OpenNebula TechDay Boston 2015 - HA HPC with OpenNebula
OpenNebula TechDay Boston 2015 - HA HPC with OpenNebulaOpenNebula TechDay Boston 2015 - HA HPC with OpenNebula
OpenNebula TechDay Boston 2015 - HA HPC with OpenNebula
 
Measuring Database Performance on Bare Metal AWS Instances
Measuring Database Performance on Bare Metal AWS InstancesMeasuring Database Performance on Bare Metal AWS Instances
Measuring Database Performance on Bare Metal AWS Instances
 
ScyllaDB @ Apache BigData, may 2016
ScyllaDB @ Apache BigData, may 2016ScyllaDB @ Apache BigData, may 2016
ScyllaDB @ Apache BigData, may 2016
 
Using S3 Select to Deliver 100X Performance Improvements Versus the Public Cloud
Using S3 Select to Deliver 100X Performance Improvements Versus the Public CloudUsing S3 Select to Deliver 100X Performance Improvements Versus the Public Cloud
Using S3 Select to Deliver 100X Performance Improvements Versus the Public Cloud
 
The Do’s and Don’ts of Benchmarking Databases
The Do’s and Don’ts of Benchmarking DatabasesThe Do’s and Don’ts of Benchmarking Databases
The Do’s and Don’ts of Benchmarking Databases
 
Introduction to SQream and the IoT environment
Introduction to SQream and the IoT environmentIntroduction to SQream and the IoT environment
Introduction to SQream and the IoT environment
 
Introducing Scylla Open Source 4.0
Introducing Scylla Open Source 4.0Introducing Scylla Open Source 4.0
Introducing Scylla Open Source 4.0
 
Webinar 2017. Supercharge your analytics with ClickHouse. Alexander Zaitsev
Webinar 2017. Supercharge your analytics with ClickHouse. Alexander ZaitsevWebinar 2017. Supercharge your analytics with ClickHouse. Alexander Zaitsev
Webinar 2017. Supercharge your analytics with ClickHouse. Alexander Zaitsev
 
Ndb cluster 80_ycsb_disk
Ndb cluster 80_ycsb_diskNdb cluster 80_ycsb_disk
Ndb cluster 80_ycsb_disk
 
HTTP Analytics for 6M requests per second using ClickHouse, by Alexander Boc...
HTTP Analytics for 6M requests per second using ClickHouse, by  Alexander Boc...HTTP Analytics for 6M requests per second using ClickHouse, by  Alexander Boc...
HTTP Analytics for 6M requests per second using ClickHouse, by Alexander Boc...
 
Microsoft Azure in HPC scenarios
Microsoft Azure in HPC scenariosMicrosoft Azure in HPC scenarios
Microsoft Azure in HPC scenarios
 
MongoDB vs Scylla: Production Experience from Both Dev & Ops Standpoint at Nu...
MongoDB vs Scylla: Production Experience from Both Dev & Ops Standpoint at Nu...MongoDB vs Scylla: Production Experience from Both Dev & Ops Standpoint at Nu...
MongoDB vs Scylla: Production Experience from Both Dev & Ops Standpoint at Nu...
 
Get Your Head in the Cloud - Lessons in GPU Computing with Schlumberger
Get Your Head in the Cloud - Lessons in GPU Computing with SchlumbergerGet Your Head in the Cloud - Lessons in GPU Computing with Schlumberger
Get Your Head in the Cloud - Lessons in GPU Computing with Schlumberger
 

Similar to GTC Tel Aviv: Accelerate Analytics with a GPU Data Frame

Fast data in times of crisis with GPU accelerated database QikkDB | Business ...
Fast data in times of crisis with GPU accelerated database QikkDB | Business ...Fast data in times of crisis with GPU accelerated database QikkDB | Business ...
Fast data in times of crisis with GPU accelerated database QikkDB | Business ...Matej Misik
 
GPU Accelerated Data Science with RAPIDS - ODSC West 2020
GPU Accelerated Data Science with RAPIDS - ODSC West 2020GPU Accelerated Data Science with RAPIDS - ODSC West 2020
GPU Accelerated Data Science with RAPIDS - ODSC West 2020John Zedlewski
 
RAPIDS: GPU-Accelerated ETL and Feature Engineering
RAPIDS: GPU-Accelerated ETL and Feature EngineeringRAPIDS: GPU-Accelerated ETL and Feature Engineering
RAPIDS: GPU-Accelerated ETL and Feature EngineeringKeith Kraus
 
GPU-Accelerating UDFs in PySpark with Numba and PyGDF
GPU-Accelerating UDFs in PySpark with Numba and PyGDFGPU-Accelerating UDFs in PySpark with Numba and PyGDF
GPU-Accelerating UDFs in PySpark with Numba and PyGDFKeith Kraus
 
NVIDIA Rapids presentation
NVIDIA Rapids presentationNVIDIA Rapids presentation
NVIDIA Rapids presentationtestSri1
 
組み込みから HPC まで ARM コアで実現するエコシステム
組み込みから HPC まで ARM コアで実現するエコシステム組み込みから HPC まで ARM コアで実現するエコシステム
組み込みから HPC まで ARM コアで実現するエコシステムShinnosuke Furuya
 
Tesla Accelerated Computing Platform
Tesla Accelerated Computing PlatformTesla Accelerated Computing Platform
Tesla Accelerated Computing Platforminside-BigData.com
 
Using BigBench to compare Hive and Spark (short version)
Using BigBench to compare Hive and Spark (short version)Using BigBench to compare Hive and Spark (short version)
Using BigBench to compare Hive and Spark (short version)Nicolas Poggi
 
S51281 - Accelerate Data Science in Python with RAPIDS_1679330128290001YmT7.pdf
S51281 - Accelerate Data Science in Python with RAPIDS_1679330128290001YmT7.pdfS51281 - Accelerate Data Science in Python with RAPIDS_1679330128290001YmT7.pdf
S51281 - Accelerate Data Science in Python with RAPIDS_1679330128290001YmT7.pdfDLow6
 
SF Big Analytics & SF Machine Learning Meetup: Machine Learning at the Limit ...
SF Big Analytics & SF Machine Learning Meetup: Machine Learning at the Limit ...SF Big Analytics & SF Machine Learning Meetup: Machine Learning at the Limit ...
SF Big Analytics & SF Machine Learning Meetup: Machine Learning at the Limit ...Chester Chen
 
NVIDIA GPUs Power HPC & AI Workloads in Cloud with Univa
NVIDIA GPUs Power HPC & AI Workloads in Cloud with UnivaNVIDIA GPUs Power HPC & AI Workloads in Cloud with Univa
NVIDIA GPUs Power HPC & AI Workloads in Cloud with Univainside-BigData.com
 
RAPIDS – Open GPU-accelerated Data Science
RAPIDS – Open GPU-accelerated Data ScienceRAPIDS – Open GPU-accelerated Data Science
RAPIDS – Open GPU-accelerated Data ScienceData Works MD
 
Red Hat Storage Day Atlanta - Designing Ceph Clusters Using Intel-Based Hardw...
Red Hat Storage Day Atlanta - Designing Ceph Clusters Using Intel-Based Hardw...Red Hat Storage Day Atlanta - Designing Ceph Clusters Using Intel-Based Hardw...
Red Hat Storage Day Atlanta - Designing Ceph Clusters Using Intel-Based Hardw...Red_Hat_Storage
 
20201006_PGconf_Online_Large_Data_Processing
20201006_PGconf_Online_Large_Data_Processing20201006_PGconf_Online_Large_Data_Processing
20201006_PGconf_Online_Large_Data_ProcessingKohei KaiGai
 

Similar to GTC Tel Aviv: Accelerate Analytics with a GPU Data Frame (20)

Fast data in times of crisis with GPU accelerated database QikkDB | Business ...
Fast data in times of crisis with GPU accelerated database QikkDB | Business ...Fast data in times of crisis with GPU accelerated database QikkDB | Business ...
Fast data in times of crisis with GPU accelerated database QikkDB | Business ...
 
GPU Accelerated Data Science with RAPIDS - ODSC West 2020
GPU Accelerated Data Science with RAPIDS - ODSC West 2020GPU Accelerated Data Science with RAPIDS - ODSC West 2020
GPU Accelerated Data Science with RAPIDS - ODSC West 2020
 
RAPIDS: GPU-Accelerated ETL and Feature Engineering
RAPIDS: GPU-Accelerated ETL and Feature EngineeringRAPIDS: GPU-Accelerated ETL and Feature Engineering
RAPIDS: GPU-Accelerated ETL and Feature Engineering
 
GPU-Accelerating UDFs in PySpark with Numba and PyGDF
GPU-Accelerating UDFs in PySpark with Numba and PyGDFGPU-Accelerating UDFs in PySpark with Numba and PyGDF
GPU-Accelerating UDFs in PySpark with Numba and PyGDF
 
Rapids: Data Science on GPUs
Rapids: Data Science on GPUsRapids: Data Science on GPUs
Rapids: Data Science on GPUs
 
NVIDIA Rapids presentation
NVIDIA Rapids presentationNVIDIA Rapids presentation
NVIDIA Rapids presentation
 
組み込みから HPC まで ARM コアで実現するエコシステム
組み込みから HPC まで ARM コアで実現するエコシステム組み込みから HPC まで ARM コアで実現するエコシステム
組み込みから HPC まで ARM コアで実現するエコシステム
 
Tesla Accelerated Computing Platform
Tesla Accelerated Computing PlatformTesla Accelerated Computing Platform
Tesla Accelerated Computing Platform
 
MYSQL
MYSQLMYSQL
MYSQL
 
Using BigBench to compare Hive and Spark (short version)
Using BigBench to compare Hive and Spark (short version)Using BigBench to compare Hive and Spark (short version)
Using BigBench to compare Hive and Spark (short version)
 
S51281 - Accelerate Data Science in Python with RAPIDS_1679330128290001YmT7.pdf
S51281 - Accelerate Data Science in Python with RAPIDS_1679330128290001YmT7.pdfS51281 - Accelerate Data Science in Python with RAPIDS_1679330128290001YmT7.pdf
S51281 - Accelerate Data Science in Python with RAPIDS_1679330128290001YmT7.pdf
 
SF Big Analytics & SF Machine Learning Meetup: Machine Learning at the Limit ...
SF Big Analytics & SF Machine Learning Meetup: Machine Learning at the Limit ...SF Big Analytics & SF Machine Learning Meetup: Machine Learning at the Limit ...
SF Big Analytics & SF Machine Learning Meetup: Machine Learning at the Limit ...
 
GIST AI-X Computing Cluster
GIST AI-X Computing ClusterGIST AI-X Computing Cluster
GIST AI-X Computing Cluster
 
NVIDIA GPUs Power HPC & AI Workloads in Cloud with Univa
NVIDIA GPUs Power HPC & AI Workloads in Cloud with UnivaNVIDIA GPUs Power HPC & AI Workloads in Cloud with Univa
NVIDIA GPUs Power HPC & AI Workloads in Cloud with Univa
 
Latest HPC News from NVIDIA
Latest HPC News from NVIDIALatest HPC News from NVIDIA
Latest HPC News from NVIDIA
 
RAPIDS – Open GPU-accelerated Data Science
RAPIDS – Open GPU-accelerated Data ScienceRAPIDS – Open GPU-accelerated Data Science
RAPIDS – Open GPU-accelerated Data Science
 
RISC V in Spacer
RISC V in SpacerRISC V in Spacer
RISC V in Spacer
 
Red Hat Storage Day Atlanta - Designing Ceph Clusters Using Intel-Based Hardw...
Red Hat Storage Day Atlanta - Designing Ceph Clusters Using Intel-Based Hardw...Red Hat Storage Day Atlanta - Designing Ceph Clusters Using Intel-Based Hardw...
Red Hat Storage Day Atlanta - Designing Ceph Clusters Using Intel-Based Hardw...
 
20201006_PGconf_Online_Large_Data_Processing
20201006_PGconf_Online_Large_Data_Processing20201006_PGconf_Online_Large_Data_Processing
20201006_PGconf_Online_Large_Data_Processing
 
RAPIDS Overview
RAPIDS OverviewRAPIDS Overview
RAPIDS Overview
 

Recently uploaded

Cloud Management Software Platforms: OpenStack
Cloud Management Software Platforms: OpenStackCloud Management Software Platforms: OpenStack
Cloud Management Software Platforms: OpenStackVICTOR MAESTRE RAMIREZ
 
Der Spagat zwischen BIAS und FAIRNESS (2024)
Der Spagat zwischen BIAS und FAIRNESS (2024)Der Spagat zwischen BIAS und FAIRNESS (2024)
Der Spagat zwischen BIAS und FAIRNESS (2024)OPEN KNOWLEDGE GmbH
 
How to Track Employee Performance A Comprehensive Guide.pdf
How to Track Employee Performance A Comprehensive Guide.pdfHow to Track Employee Performance A Comprehensive Guide.pdf
How to Track Employee Performance A Comprehensive Guide.pdfLivetecs LLC
 
Software Project Health Check: Best Practices and Techniques for Your Product...
Software Project Health Check: Best Practices and Techniques for Your Product...Software Project Health Check: Best Practices and Techniques for Your Product...
Software Project Health Check: Best Practices and Techniques for Your Product...Velvetech LLC
 
Intelligent Home Wi-Fi Solutions | ThinkPalm
Intelligent Home Wi-Fi Solutions | ThinkPalmIntelligent Home Wi-Fi Solutions | ThinkPalm
Intelligent Home Wi-Fi Solutions | ThinkPalmSujith Sukumaran
 
Introduction Computer Science - Software Design.pdf
Introduction Computer Science - Software Design.pdfIntroduction Computer Science - Software Design.pdf
Introduction Computer Science - Software Design.pdfFerryKemperman
 
BATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASE
BATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASEBATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASE
BATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASEOrtus Solutions, Corp
 
办理学位证(UQ文凭证书)昆士兰大学毕业证成绩单原版一模一样
办理学位证(UQ文凭证书)昆士兰大学毕业证成绩单原版一模一样办理学位证(UQ文凭证书)昆士兰大学毕业证成绩单原版一模一样
办理学位证(UQ文凭证书)昆士兰大学毕业证成绩单原版一模一样umasea
 
Unveiling the Future: Sylius 2.0 New Features
Unveiling the Future: Sylius 2.0 New FeaturesUnveiling the Future: Sylius 2.0 New Features
Unveiling the Future: Sylius 2.0 New FeaturesŁukasz Chruściel
 
Unveiling Design Patterns: A Visual Guide with UML Diagrams
Unveiling Design Patterns: A Visual Guide with UML DiagramsUnveiling Design Patterns: A Visual Guide with UML Diagrams
Unveiling Design Patterns: A Visual Guide with UML DiagramsAhmed Mohamed
 
ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...
ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...
ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...Christina Lin
 
英国UN学位证,北安普顿大学毕业证书1:1制作
英国UN学位证,北安普顿大学毕业证书1:1制作英国UN学位证,北安普顿大学毕业证书1:1制作
英国UN学位证,北安普顿大学毕业证书1:1制作qr0udbr0
 
Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024
Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024
Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024StefanoLambiase
 
Balasore Best It Company|| Top 10 IT Company || Balasore Software company Odisha
Balasore Best It Company|| Top 10 IT Company || Balasore Software company OdishaBalasore Best It Company|| Top 10 IT Company || Balasore Software company Odisha
Balasore Best It Company|| Top 10 IT Company || Balasore Software company Odishasmiwainfosol
 
What are the key points to focus on before starting to learn ETL Development....
What are the key points to focus on before starting to learn ETL Development....What are the key points to focus on before starting to learn ETL Development....
What are the key points to focus on before starting to learn ETL Development....kzayra69
 
Building Real-Time Data Pipelines: Stream & Batch Processing workshop Slide
Building Real-Time Data Pipelines: Stream & Batch Processing workshop SlideBuilding Real-Time Data Pipelines: Stream & Batch Processing workshop Slide
Building Real-Time Data Pipelines: Stream & Batch Processing workshop SlideChristina Lin
 
Implementing Zero Trust strategy with Azure
Implementing Zero Trust strategy with AzureImplementing Zero Trust strategy with Azure
Implementing Zero Trust strategy with AzureDinusha Kumarasiri
 
What is Fashion PLM and Why Do You Need It
What is Fashion PLM and Why Do You Need ItWhat is Fashion PLM and Why Do You Need It
What is Fashion PLM and Why Do You Need ItWave PLM
 
Xen Safety Embedded OSS Summit April 2024 v4.pdf
Xen Safety Embedded OSS Summit April 2024 v4.pdfXen Safety Embedded OSS Summit April 2024 v4.pdf
Xen Safety Embedded OSS Summit April 2024 v4.pdfStefano Stabellini
 

Recently uploaded (20)

Cloud Management Software Platforms: OpenStack
Cloud Management Software Platforms: OpenStackCloud Management Software Platforms: OpenStack
Cloud Management Software Platforms: OpenStack
 
Der Spagat zwischen BIAS und FAIRNESS (2024)
Der Spagat zwischen BIAS und FAIRNESS (2024)Der Spagat zwischen BIAS und FAIRNESS (2024)
Der Spagat zwischen BIAS und FAIRNESS (2024)
 
How to Track Employee Performance A Comprehensive Guide.pdf
How to Track Employee Performance A Comprehensive Guide.pdfHow to Track Employee Performance A Comprehensive Guide.pdf
How to Track Employee Performance A Comprehensive Guide.pdf
 
Software Project Health Check: Best Practices and Techniques for Your Product...
Software Project Health Check: Best Practices and Techniques for Your Product...Software Project Health Check: Best Practices and Techniques for Your Product...
Software Project Health Check: Best Practices and Techniques for Your Product...
 
Intelligent Home Wi-Fi Solutions | ThinkPalm
Intelligent Home Wi-Fi Solutions | ThinkPalmIntelligent Home Wi-Fi Solutions | ThinkPalm
Intelligent Home Wi-Fi Solutions | ThinkPalm
 
Introduction Computer Science - Software Design.pdf
Introduction Computer Science - Software Design.pdfIntroduction Computer Science - Software Design.pdf
Introduction Computer Science - Software Design.pdf
 
BATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASE
BATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASEBATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASE
BATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASE
 
办理学位证(UQ文凭证书)昆士兰大学毕业证成绩单原版一模一样
办理学位证(UQ文凭证书)昆士兰大学毕业证成绩单原版一模一样办理学位证(UQ文凭证书)昆士兰大学毕业证成绩单原版一模一样
办理学位证(UQ文凭证书)昆士兰大学毕业证成绩单原版一模一样
 
Unveiling the Future: Sylius 2.0 New Features
Unveiling the Future: Sylius 2.0 New FeaturesUnveiling the Future: Sylius 2.0 New Features
Unveiling the Future: Sylius 2.0 New Features
 
Unveiling Design Patterns: A Visual Guide with UML Diagrams
Unveiling Design Patterns: A Visual Guide with UML DiagramsUnveiling Design Patterns: A Visual Guide with UML Diagrams
Unveiling Design Patterns: A Visual Guide with UML Diagrams
 
ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...
ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...
ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...
 
英国UN学位证,北安普顿大学毕业证书1:1制作
英国UN学位证,北安普顿大学毕业证书1:1制作英国UN学位证,北安普顿大学毕业证书1:1制作
英国UN学位证,北安普顿大学毕业证书1:1制作
 
Hot Sexy call girls in Patel Nagar🔝 9953056974 🔝 escort Service
Hot Sexy call girls in Patel Nagar🔝 9953056974 🔝 escort ServiceHot Sexy call girls in Patel Nagar🔝 9953056974 🔝 escort Service
Hot Sexy call girls in Patel Nagar🔝 9953056974 🔝 escort Service
 
Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024
Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024
Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024
 
Balasore Best It Company|| Top 10 IT Company || Balasore Software company Odisha
Balasore Best It Company|| Top 10 IT Company || Balasore Software company OdishaBalasore Best It Company|| Top 10 IT Company || Balasore Software company Odisha
Balasore Best It Company|| Top 10 IT Company || Balasore Software company Odisha
 
What are the key points to focus on before starting to learn ETL Development....
What are the key points to focus on before starting to learn ETL Development....What are the key points to focus on before starting to learn ETL Development....
What are the key points to focus on before starting to learn ETL Development....
 
Building Real-Time Data Pipelines: Stream & Batch Processing workshop Slide
Building Real-Time Data Pipelines: Stream & Batch Processing workshop SlideBuilding Real-Time Data Pipelines: Stream & Batch Processing workshop Slide
Building Real-Time Data Pipelines: Stream & Batch Processing workshop Slide
 
Implementing Zero Trust strategy with Azure
Implementing Zero Trust strategy with AzureImplementing Zero Trust strategy with Azure
Implementing Zero Trust strategy with Azure
 
What is Fashion PLM and Why Do You Need It
What is Fashion PLM and Why Do You Need ItWhat is Fashion PLM and Why Do You Need It
What is Fashion PLM and Why Do You Need It
 
Xen Safety Embedded OSS Summit April 2024 v4.pdf
Xen Safety Embedded OSS Summit April 2024 v4.pdfXen Safety Embedded OSS Summit April 2024 v4.pdf
Xen Safety Embedded OSS Summit April 2024 v4.pdf
 

GTC Tel Aviv: Accelerate Analytics with a GPU Data Frame

  • 1. Accelerate Analytics with a GPU Data Frame Aaron Williams October 18, 2017
  • 2. MapD: Extreme Analytics 2 100x Faster Queries MapD Core The world’s fastest columnar database, powered by GPUs + Visualization at the Speed of Thought MapD Immerse A visualization front end that leverages the speed & rendering superiority of GPUs
  • 3. MapD System Architecture Accelerating the existing data infrastructure 3
  • 5. MapD Benchmarks Blogger Mark Litwintschik benchmarked MapD on a billion-row taxi data set and found it to be up to orders-of-magnitude faster than the fastest CPU databases 5 MapD Core: Comparative Query Acceleration* System Q 1 Q 2 Q 3 Q 4 BrytlytDB & 2-node p2.16xlarge cluster 36x 47x 25x 12x ClickHouse, Intel Core i5 4670K 49x 58x 32x 25x Redshift, 6-node ds2.8xlarge cluster 74x 24x 14x 6x BigQuery 95x 38x 6x 6x Presto, 50-node n1-standard-4 cluster 190x 75x 61x 41x Amazon Athena 305x 117x 37x 13x Elasticsearch (heavily tuned) 386x 343x n/a n/a Spark 2.1, 11 x m3.xlarge cluster w/ HDFS 485x 153x 119x 169x Presto, 10-node n1-standard-4 cluster 524x 189x 127x 61x Vertica, Intel Core i5 4670K 685x 607x 203x 132x Elasticsearch (lightly tuned) 1,642x 1,194x n/a n/a Presto, 5-node m3.xlarge cluster w/ HDFS 1,667x 735x 388x 159x Presto, 50-node m3.xlarge cluster w/ S3 2,048x 849x 164x 86x PostgreSQL 9.5 & cstore_fdw 7,238x 3,302x 1,424x 722x Spark 1.6, 5-node m3.xlarge cluster w/ S3 12,571x 5,906x 3,758x 1,884x *All speed comparisons are to the “MapD & 1 Nvidia Pascal DGX-1” benchmark Source: http://tech.marksblogg.com/benchmarks.html
  • 6. Query Compilation with LLVM 6 Traditional DBs can be highly inefficient • each operator in SQL treated as a separate function • incurs tremendous overhead and prevents vectorization MapD compiles queries w/LLVM to create one custom function • Queries run at speeds approaching hand-written functions • LLVM enables generic targeting of different architectures (GPUs, X86, ARM, etc). • Code can be generated to run query on CPU and GPU simultaneously 10111010101001010110101101010101 00110101101101010101010101011101 LLVM
  • 7. Keeping Data Close to Compute MapD maximizes performance by optimizing memory use 7 SSD or NVRAM STORAGE (L3) 250GB to 20TB 1-2 GB/sec CPU RAM (L2) 32GB to 3TB 70-120 GB/sec GPU RAM (L1) 24GB to 256GB 1000-6000 GB/sec Hot Data Speedup = 1500x to 5000x Over Cold Data Warm Data Speedup = 35x to 120x Over Cold Data Cold Data COMPUTE LAYER STORAGE LAYER Data Lake/Data Warehouse/System Of Record SpeedIncreases SpaceIncreases
  • 8. The Status Quo: Memory Bottlenecks 8 PCIe 4-16GB/s
  • 9. The GPU Open Analytics Initiative Model Standard in-memory format; zero-copy interchange 9 GPU
  • 10. The GPU Open Analytics Initiative Model Standard in-memory format; zero-copy interchange 10
  • 11. Interactive Machine Learning Empowering the People in the Pipeline 11 Personas in Analytics Lifecycle (Illustrative) Business Analyst Data Scientist Data Engineer IT Systems Admin Data Scientist / Business Analyst Data Preparation Data Discovery & Feature Engineering Model & Validate Predict Operationalize Monitoring & Refinement Evaluate & Decide GPUsMapD H20.ai MapD
  • 13. Try MapD It’s free and it’s easy (and @ortelius sez “it’s the new h0t sh1t”) 13 Play with the live demos: https://www.mapd.com/demos/ Download the Community Edition: https://www.mapd.com/platform/download-community/ Join our forums: https://community.mapd.com/ Review these slides: https://www.slideshare.net/aaronrogerwilliams
  • 14. Aaron Williams VP of Global Community @_arw_ aaron@mapd.com /in/aaronwilliams/ /williamsaaron