SlideShare a Scribd company logo
Machine Learning & Data Science in
the Age of the GPU:
Smarter, Faster, Better
Aaron Williams @_arw_
October 22, 2017
MapD: Extreme Analytics
2
100x Faster Queries
MapD Core
The world’s fastest
columnar database, powered
by GPUs
+
Visualization at the Speed of Thought
MapD Immerse
A visualization front end that
leverages the speed &
rendering superiority of GPUs
MapD System Architecture
Accelerating the existing data infrastructure
3
4
MAPD DEMO
GPUs Recharge Moore’s Law
5
GPU Processing Power
50% per year
Data Growth
40% per year
CPU Processing Power
20% per year
Core Density Makes a Huge Difference
6
GPU ProcessingCPU Processing
40,000
Cores
20 Cores
*fictitious example
Latency Throughput
CPU
1 ns
per task
(1 task/ns) x (20 cores) =
20 tasks/ns
GPU
10 ns
per task
(0.1 task per ns) x (40,000 cores) =
4,000 task per ns
Latency: Time to do a task. | Throughput: Number of tasks per unit time.
MapD Benchmarks
Blogger Mark Litwintschik benchmarked MapD on a billion-row taxi data set and
found it to be up to orders-of-magnitude faster than the fastest CPU databases
7
MapD Core: Comparative Query Acceleration*
System Q 1 Q 2 Q 3 Q 4
BrytlytDB & 2-node p2.16xlarge cluster 36x 47x 25x 12x
ClickHouse, Intel Core i5 4670K 49x 58x 32x 25x
Redshift, 6-node ds2.8xlarge cluster 74x 24x 14x 6x
BigQuery 95x 38x 6x 6x
Presto, 50-node n1-standard-4 cluster 190x 75x 61x 41x
Amazon Athena 305x 117x 37x 13x
Elasticsearch (heavily tuned) 386x 343x n/a n/a
Spark 2.1, 11 x m3.xlarge cluster w/ HDFS 485x 153x 119x 169x
Presto, 10-node n1-standard-4 cluster 524x 189x 127x 61x
Vertica, Intel Core i5 4670K 685x 607x 203x 132x
Elasticsearch (lightly tuned) 1,642x 1,194x n/a n/a
Presto, 5-node m3.xlarge cluster w/ HDFS 1,667x 735x 388x 159x
Presto, 50-node m3.xlarge cluster w/ S3 2,048x 849x 164x 86x
PostgreSQL 9.5 & cstore_fdw 7,238x 3,302x 1,424x 722x
Spark 1.6, 5-node m3.xlarge cluster w/ S3 12,571x 5,906x 3,758x 1,884x
*All speed comparisons are to the “MapD & 1 Nvidia Pascal DGX-1” benchmark
Source: http://tech.marksblogg.com/benchmarks.html
Query Compilation with LLVM
8
Traditional DBs can be highly inefficient
• each operator in SQL treated as a separate function
• incurs tremendous overhead and prevents vectorization
MapD compiles queries w/LLVM to create one custom function
• Queries run at speeds approaching hand-written functions
• LLVM enables generic targeting of different architectures (GPUs, X86, ARM, etc).
• Code can be generated to run query on CPU and GPU simultaneously
10111010101001010110101101010101
00110101101101010101010101011101
LLVM
Keeping Data Close to Compute
MapD maximizes performance by optimizing memory use
9
SSD or NVRAM STORAGE (L3)
250GB to 20TB
1-2 GB/sec
CPU RAM (L2)
32GB to 3TB
70-120 GB/sec
GPU RAM (L1)
24GB to 256GB
1000-6000 GB/sec
Hot Data
Speedup = 1500x to 5000x
Over Cold Data
Warm Data
Speedup = 35x to 120x
Over Cold Data
Cold Data
COMPUTE
LAYER
STORAGE
LAYER
Data Lake/Data Warehouse/System Of Record
SpeedIncreases
SpaceIncreases
The GPU Open Analytics Initiative Model
Standard in-memory format; zero-copy interchange
10
GPU
11
GOAI DEMO
MapD Immerse
Using a hybrid approach to speed and scale visualization
12
Basic charts are frontend
rendered using D3 and other
related toolkits
Scatterplots, pointmaps +
polygons are backend rendered
using the Iris Rendering Engine on
GPUs
Geo-Viz is composited over a
frontend rendered basemap
Try MapD
It’s free and it’s easy (and @ortelius sez it’s “… the new h0t sh1t”)
13
Play with the live demos:
https://www.mapd.com/demos/
Download the Community Edition:
https://www.mapd.com/platform/download-community/
Join our forums:
https://community.mapd.com/
Review these slides:
https://speakerdeck.com/mapd
Aaron Williams
VP of Global Community
@_arw_
aaron@mapd.com
/in/aaronwilliams/
/williamsaaron
SoCal Data Science Conference: Machine Learning & Data Science in the Age of the GPU: Smarter, Faster, Better

More Related Content

What's hot

Ndb cluster 80_tpc_h
Ndb cluster 80_tpc_hNdb cluster 80_tpc_h
Ndb cluster 80_tpc_h
mikaelronstrom
 
SCasia 2018 MSFT hands on session for Azure Batch AI
SCasia 2018 MSFT hands on session for Azure Batch AISCasia 2018 MSFT hands on session for Azure Batch AI
SCasia 2018 MSFT hands on session for Azure Batch AI
Hiroshi Tanaka
 
Cassandra Day Atlanta 2015: Diagnosing Problems in Production
Cassandra Day Atlanta 2015: Diagnosing Problems in ProductionCassandra Day Atlanta 2015: Diagnosing Problems in Production
Cassandra Day Atlanta 2015: Diagnosing Problems in Production
DataStax Academy
 
GPU databases - How to use them and what the future holds
GPU databases - How to use them and what the future holdsGPU databases - How to use them and what the future holds
GPU databases - How to use them and what the future holds
Arnon Shimoni
 
OpenNebula TechDay Boston 2015 - HA HPC with OpenNebula
OpenNebula TechDay Boston 2015 - HA HPC with OpenNebulaOpenNebula TechDay Boston 2015 - HA HPC with OpenNebula
OpenNebula TechDay Boston 2015 - HA HPC with OpenNebula
OpenNebula Project
 
ClickHouse in Real Life. Case Studies and Best Practices, by Alexander Zaitsev
ClickHouse in Real Life. Case Studies and Best Practices, by Alexander ZaitsevClickHouse in Real Life. Case Studies and Best Practices, by Alexander Zaitsev
ClickHouse in Real Life. Case Studies and Best Practices, by Alexander Zaitsev
Altinity Ltd
 
ScyllaDB @ Apache BigData, may 2016
ScyllaDB @ Apache BigData, may 2016ScyllaDB @ Apache BigData, may 2016
ScyllaDB @ Apache BigData, may 2016
Tzach Livyatan
 
DSD-INT 2017 The use of big data for dredging - De Boer
DSD-INT 2017 The use of big data for dredging - De BoerDSD-INT 2017 The use of big data for dredging - De Boer
DSD-INT 2017 The use of big data for dredging - De Boer
Deltares
 
Microsoft Azure in HPC scenarios
Microsoft Azure in HPC scenariosMicrosoft Azure in HPC scenarios
Microsoft Azure in HPC scenarios
mictc
 
Using S3 Select to Deliver 100X Performance Improvements Versus the Public Cloud
Using S3 Select to Deliver 100X Performance Improvements Versus the Public CloudUsing S3 Select to Deliver 100X Performance Improvements Versus the Public Cloud
Using S3 Select to Deliver 100X Performance Improvements Versus the Public Cloud
Databricks
 
Ceph scale testing with 10 Billion Objects
Ceph scale testing with 10 Billion ObjectsCeph scale testing with 10 Billion Objects
Ceph scale testing with 10 Billion Objects
Karan Singh
 
Introducing Scylla Open Source 4.0
Introducing Scylla Open Source 4.0Introducing Scylla Open Source 4.0
Introducing Scylla Open Source 4.0
ScyllaDB
 
Ndb cluster 80_ycsb_disk
Ndb cluster 80_ycsb_diskNdb cluster 80_ycsb_disk
Ndb cluster 80_ycsb_disk
mikaelronstrom
 
HTTP Analytics for 6M requests per second using ClickHouse, by Alexander Boc...
HTTP Analytics for 6M requests per second using ClickHouse, by  Alexander Boc...HTTP Analytics for 6M requests per second using ClickHouse, by  Alexander Boc...
HTTP Analytics for 6M requests per second using ClickHouse, by Alexander Boc...
Altinity Ltd
 
MongoDB vs Scylla: Production Experience from Both Dev & Ops Standpoint at Nu...
MongoDB vs Scylla: Production Experience from Both Dev & Ops Standpoint at Nu...MongoDB vs Scylla: Production Experience from Both Dev & Ops Standpoint at Nu...
MongoDB vs Scylla: Production Experience from Both Dev & Ops Standpoint at Nu...
ScyllaDB
 
Webinar 2017. Supercharge your analytics with ClickHouse. Alexander Zaitsev
Webinar 2017. Supercharge your analytics with ClickHouse. Alexander ZaitsevWebinar 2017. Supercharge your analytics with ClickHouse. Alexander Zaitsev
Webinar 2017. Supercharge your analytics with ClickHouse. Alexander Zaitsev
Altinity Ltd
 
Get Your Head in the Cloud - Lessons in GPU Computing with Schlumberger
Get Your Head in the Cloud - Lessons in GPU Computing with SchlumbergerGet Your Head in the Cloud - Lessons in GPU Computing with Schlumberger
Get Your Head in the Cloud - Lessons in GPU Computing with Schlumberger
inside-BigData.com
 
Workshop actualización SVG CESGA 2012
Workshop actualización SVG CESGA 2012 Workshop actualización SVG CESGA 2012
Workshop actualización SVG CESGA 2012
CESGA Centro de Supercomputación de Galicia
 
What we Learned About Application Resiliency When the Data Center Burned Down
What we Learned About Application Resiliency When the Data Center Burned DownWhat we Learned About Application Resiliency When the Data Center Burned Down
What we Learned About Application Resiliency When the Data Center Burned Down
ScyllaDB
 
Effectively deploying hadoop to the cloud
Effectively  deploying hadoop to the cloudEffectively  deploying hadoop to the cloud
Effectively deploying hadoop to the cloud
Avinash Ramineni
 

What's hot (20)

Ndb cluster 80_tpc_h
Ndb cluster 80_tpc_hNdb cluster 80_tpc_h
Ndb cluster 80_tpc_h
 
SCasia 2018 MSFT hands on session for Azure Batch AI
SCasia 2018 MSFT hands on session for Azure Batch AISCasia 2018 MSFT hands on session for Azure Batch AI
SCasia 2018 MSFT hands on session for Azure Batch AI
 
Cassandra Day Atlanta 2015: Diagnosing Problems in Production
Cassandra Day Atlanta 2015: Diagnosing Problems in ProductionCassandra Day Atlanta 2015: Diagnosing Problems in Production
Cassandra Day Atlanta 2015: Diagnosing Problems in Production
 
GPU databases - How to use them and what the future holds
GPU databases - How to use them and what the future holdsGPU databases - How to use them and what the future holds
GPU databases - How to use them and what the future holds
 
OpenNebula TechDay Boston 2015 - HA HPC with OpenNebula
OpenNebula TechDay Boston 2015 - HA HPC with OpenNebulaOpenNebula TechDay Boston 2015 - HA HPC with OpenNebula
OpenNebula TechDay Boston 2015 - HA HPC with OpenNebula
 
ClickHouse in Real Life. Case Studies and Best Practices, by Alexander Zaitsev
ClickHouse in Real Life. Case Studies and Best Practices, by Alexander ZaitsevClickHouse in Real Life. Case Studies and Best Practices, by Alexander Zaitsev
ClickHouse in Real Life. Case Studies and Best Practices, by Alexander Zaitsev
 
ScyllaDB @ Apache BigData, may 2016
ScyllaDB @ Apache BigData, may 2016ScyllaDB @ Apache BigData, may 2016
ScyllaDB @ Apache BigData, may 2016
 
DSD-INT 2017 The use of big data for dredging - De Boer
DSD-INT 2017 The use of big data for dredging - De BoerDSD-INT 2017 The use of big data for dredging - De Boer
DSD-INT 2017 The use of big data for dredging - De Boer
 
Microsoft Azure in HPC scenarios
Microsoft Azure in HPC scenariosMicrosoft Azure in HPC scenarios
Microsoft Azure in HPC scenarios
 
Using S3 Select to Deliver 100X Performance Improvements Versus the Public Cloud
Using S3 Select to Deliver 100X Performance Improvements Versus the Public CloudUsing S3 Select to Deliver 100X Performance Improvements Versus the Public Cloud
Using S3 Select to Deliver 100X Performance Improvements Versus the Public Cloud
 
Ceph scale testing with 10 Billion Objects
Ceph scale testing with 10 Billion ObjectsCeph scale testing with 10 Billion Objects
Ceph scale testing with 10 Billion Objects
 
Introducing Scylla Open Source 4.0
Introducing Scylla Open Source 4.0Introducing Scylla Open Source 4.0
Introducing Scylla Open Source 4.0
 
Ndb cluster 80_ycsb_disk
Ndb cluster 80_ycsb_diskNdb cluster 80_ycsb_disk
Ndb cluster 80_ycsb_disk
 
HTTP Analytics for 6M requests per second using ClickHouse, by Alexander Boc...
HTTP Analytics for 6M requests per second using ClickHouse, by  Alexander Boc...HTTP Analytics for 6M requests per second using ClickHouse, by  Alexander Boc...
HTTP Analytics for 6M requests per second using ClickHouse, by Alexander Boc...
 
MongoDB vs Scylla: Production Experience from Both Dev & Ops Standpoint at Nu...
MongoDB vs Scylla: Production Experience from Both Dev & Ops Standpoint at Nu...MongoDB vs Scylla: Production Experience from Both Dev & Ops Standpoint at Nu...
MongoDB vs Scylla: Production Experience from Both Dev & Ops Standpoint at Nu...
 
Webinar 2017. Supercharge your analytics with ClickHouse. Alexander Zaitsev
Webinar 2017. Supercharge your analytics with ClickHouse. Alexander ZaitsevWebinar 2017. Supercharge your analytics with ClickHouse. Alexander Zaitsev
Webinar 2017. Supercharge your analytics with ClickHouse. Alexander Zaitsev
 
Get Your Head in the Cloud - Lessons in GPU Computing with Schlumberger
Get Your Head in the Cloud - Lessons in GPU Computing with SchlumbergerGet Your Head in the Cloud - Lessons in GPU Computing with Schlumberger
Get Your Head in the Cloud - Lessons in GPU Computing with Schlumberger
 
Workshop actualización SVG CESGA 2012
Workshop actualización SVG CESGA 2012 Workshop actualización SVG CESGA 2012
Workshop actualización SVG CESGA 2012
 
What we Learned About Application Resiliency When the Data Center Burned Down
What we Learned About Application Resiliency When the Data Center Burned DownWhat we Learned About Application Resiliency When the Data Center Burned Down
What we Learned About Application Resiliency When the Data Center Burned Down
 
Effectively deploying hadoop to the cloud
Effectively  deploying hadoop to the cloudEffectively  deploying hadoop to the cloud
Effectively deploying hadoop to the cloud
 

Similar to SoCal Data Science Conference: Machine Learning & Data Science in the Age of the GPU: Smarter, Faster, Better

Fast data in times of crisis with GPU accelerated database QikkDB | Business ...
Fast data in times of crisis with GPU accelerated database QikkDB | Business ...Fast data in times of crisis with GPU accelerated database QikkDB | Business ...
Fast data in times of crisis with GPU accelerated database QikkDB | Business ...
Matej Misik
 
GPU Accelerated Data Science with RAPIDS - ODSC West 2020
GPU Accelerated Data Science with RAPIDS - ODSC West 2020GPU Accelerated Data Science with RAPIDS - ODSC West 2020
GPU Accelerated Data Science with RAPIDS - ODSC West 2020
John Zedlewski
 
Programmable Exascale Supercomputer
Programmable Exascale SupercomputerProgrammable Exascale Supercomputer
Programmable Exascale Supercomputer
Sagar Dolas
 
RAPIDS: GPU-Accelerated ETL and Feature Engineering
RAPIDS: GPU-Accelerated ETL and Feature EngineeringRAPIDS: GPU-Accelerated ETL and Feature Engineering
RAPIDS: GPU-Accelerated ETL and Feature Engineering
Keith Kraus
 
Rapids: Data Science on GPUs
Rapids: Data Science on GPUsRapids: Data Science on GPUs
Rapids: Data Science on GPUs
inside-BigData.com
 
NVIDIA Rapids presentation
NVIDIA Rapids presentationNVIDIA Rapids presentation
NVIDIA Rapids presentation
testSri1
 
Exascale Capabl
Exascale CapablExascale Capabl
Exascale Capabl
Sagar Dolas
 
Shark SQL and Rich Analytics at Scale
Shark SQL and Rich Analytics at ScaleShark SQL and Rich Analytics at Scale
Shark SQL and Rich Analytics at ScaleDataWorks Summit
 
RAPIDS – Open GPU-accelerated Data Science
RAPIDS – Open GPU-accelerated Data ScienceRAPIDS – Open GPU-accelerated Data Science
RAPIDS – Open GPU-accelerated Data Science
Data Works MD
 
RAPIDS Overview
RAPIDS OverviewRAPIDS Overview
RAPIDS Overview
NVIDIA Japan
 
Databases Have Forgotten About Single Node Performance, A Wrongheaded Trade Off
Databases Have Forgotten About Single Node Performance, A Wrongheaded Trade OffDatabases Have Forgotten About Single Node Performance, A Wrongheaded Trade Off
Databases Have Forgotten About Single Node Performance, A Wrongheaded Trade Off
Timescale
 
GPU-Accelerating UDFs in PySpark with Numba and PyGDF
GPU-Accelerating UDFs in PySpark with Numba and PyGDFGPU-Accelerating UDFs in PySpark with Numba and PyGDF
GPU-Accelerating UDFs in PySpark with Numba and PyGDF
Keith Kraus
 
RedisConf17 - Doing More With Redis - Ofer Bengal and Yiftach Shoolman
RedisConf17 - Doing More With Redis - Ofer Bengal and Yiftach ShoolmanRedisConf17 - Doing More With Redis - Ofer Bengal and Yiftach Shoolman
RedisConf17 - Doing More With Redis - Ofer Bengal and Yiftach Shoolman
Redis Labs
 
Paris Data Geek - Spark Streaming
Paris Data Geek - Spark Streaming Paris Data Geek - Spark Streaming
Paris Data Geek - Spark Streaming
Djamel Zouaoui
 
Stream Processing
Stream ProcessingStream Processing
Stream Processing
arnamoy10
 
Postgres-XC: Symmetric PostgreSQL Cluster
Postgres-XC: Symmetric PostgreSQL ClusterPostgres-XC: Symmetric PostgreSQL Cluster
Postgres-XC: Symmetric PostgreSQL Cluster
Pavan Deolasee
 
Tackling Network Bottlenecks with Hardware Accelerations: Cloud vs. On-Premise
Tackling Network Bottlenecks with Hardware Accelerations: Cloud vs. On-PremiseTackling Network Bottlenecks with Hardware Accelerations: Cloud vs. On-Premise
Tackling Network Bottlenecks with Hardware Accelerations: Cloud vs. On-Premise
Databricks
 
SF Big Analytics & SF Machine Learning Meetup: Machine Learning at the Limit ...
SF Big Analytics & SF Machine Learning Meetup: Machine Learning at the Limit ...SF Big Analytics & SF Machine Learning Meetup: Machine Learning at the Limit ...
SF Big Analytics & SF Machine Learning Meetup: Machine Learning at the Limit ...
Chester Chen
 
uCluster
uClusteruCluster
Accelerated Machine Learning with RAPIDS and MLflow, Nvidia/RAPIDS
Accelerated Machine Learning with RAPIDS and MLflow, Nvidia/RAPIDSAccelerated Machine Learning with RAPIDS and MLflow, Nvidia/RAPIDS
Accelerated Machine Learning with RAPIDS and MLflow, Nvidia/RAPIDS
Databricks
 

Similar to SoCal Data Science Conference: Machine Learning & Data Science in the Age of the GPU: Smarter, Faster, Better (20)

Fast data in times of crisis with GPU accelerated database QikkDB | Business ...
Fast data in times of crisis with GPU accelerated database QikkDB | Business ...Fast data in times of crisis with GPU accelerated database QikkDB | Business ...
Fast data in times of crisis with GPU accelerated database QikkDB | Business ...
 
GPU Accelerated Data Science with RAPIDS - ODSC West 2020
GPU Accelerated Data Science with RAPIDS - ODSC West 2020GPU Accelerated Data Science with RAPIDS - ODSC West 2020
GPU Accelerated Data Science with RAPIDS - ODSC West 2020
 
Programmable Exascale Supercomputer
Programmable Exascale SupercomputerProgrammable Exascale Supercomputer
Programmable Exascale Supercomputer
 
RAPIDS: GPU-Accelerated ETL and Feature Engineering
RAPIDS: GPU-Accelerated ETL and Feature EngineeringRAPIDS: GPU-Accelerated ETL and Feature Engineering
RAPIDS: GPU-Accelerated ETL and Feature Engineering
 
Rapids: Data Science on GPUs
Rapids: Data Science on GPUsRapids: Data Science on GPUs
Rapids: Data Science on GPUs
 
NVIDIA Rapids presentation
NVIDIA Rapids presentationNVIDIA Rapids presentation
NVIDIA Rapids presentation
 
Exascale Capabl
Exascale CapablExascale Capabl
Exascale Capabl
 
Shark SQL and Rich Analytics at Scale
Shark SQL and Rich Analytics at ScaleShark SQL and Rich Analytics at Scale
Shark SQL and Rich Analytics at Scale
 
RAPIDS – Open GPU-accelerated Data Science
RAPIDS – Open GPU-accelerated Data ScienceRAPIDS – Open GPU-accelerated Data Science
RAPIDS – Open GPU-accelerated Data Science
 
RAPIDS Overview
RAPIDS OverviewRAPIDS Overview
RAPIDS Overview
 
Databases Have Forgotten About Single Node Performance, A Wrongheaded Trade Off
Databases Have Forgotten About Single Node Performance, A Wrongheaded Trade OffDatabases Have Forgotten About Single Node Performance, A Wrongheaded Trade Off
Databases Have Forgotten About Single Node Performance, A Wrongheaded Trade Off
 
GPU-Accelerating UDFs in PySpark with Numba and PyGDF
GPU-Accelerating UDFs in PySpark with Numba and PyGDFGPU-Accelerating UDFs in PySpark with Numba and PyGDF
GPU-Accelerating UDFs in PySpark with Numba and PyGDF
 
RedisConf17 - Doing More With Redis - Ofer Bengal and Yiftach Shoolman
RedisConf17 - Doing More With Redis - Ofer Bengal and Yiftach ShoolmanRedisConf17 - Doing More With Redis - Ofer Bengal and Yiftach Shoolman
RedisConf17 - Doing More With Redis - Ofer Bengal and Yiftach Shoolman
 
Paris Data Geek - Spark Streaming
Paris Data Geek - Spark Streaming Paris Data Geek - Spark Streaming
Paris Data Geek - Spark Streaming
 
Stream Processing
Stream ProcessingStream Processing
Stream Processing
 
Postgres-XC: Symmetric PostgreSQL Cluster
Postgres-XC: Symmetric PostgreSQL ClusterPostgres-XC: Symmetric PostgreSQL Cluster
Postgres-XC: Symmetric PostgreSQL Cluster
 
Tackling Network Bottlenecks with Hardware Accelerations: Cloud vs. On-Premise
Tackling Network Bottlenecks with Hardware Accelerations: Cloud vs. On-PremiseTackling Network Bottlenecks with Hardware Accelerations: Cloud vs. On-Premise
Tackling Network Bottlenecks with Hardware Accelerations: Cloud vs. On-Premise
 
SF Big Analytics & SF Machine Learning Meetup: Machine Learning at the Limit ...
SF Big Analytics & SF Machine Learning Meetup: Machine Learning at the Limit ...SF Big Analytics & SF Machine Learning Meetup: Machine Learning at the Limit ...
SF Big Analytics & SF Machine Learning Meetup: Machine Learning at the Limit ...
 
uCluster
uClusteruCluster
uCluster
 
Accelerated Machine Learning with RAPIDS and MLflow, Nvidia/RAPIDS
Accelerated Machine Learning with RAPIDS and MLflow, Nvidia/RAPIDSAccelerated Machine Learning with RAPIDS and MLflow, Nvidia/RAPIDS
Accelerated Machine Learning with RAPIDS and MLflow, Nvidia/RAPIDS
 

Recently uploaded

Navigating the Metaverse: A Journey into Virtual Evolution"
Navigating the Metaverse: A Journey into Virtual Evolution"Navigating the Metaverse: A Journey into Virtual Evolution"
Navigating the Metaverse: A Journey into Virtual Evolution"
Donna Lenk
 
Top Features to Include in Your Winzo Clone App for Business Growth (4).pptx
Top Features to Include in Your Winzo Clone App for Business Growth (4).pptxTop Features to Include in Your Winzo Clone App for Business Growth (4).pptx
Top Features to Include in Your Winzo Clone App for Business Growth (4).pptx
rickgrimesss22
 
Developing Distributed High-performance Computing Capabilities of an Open Sci...
Developing Distributed High-performance Computing Capabilities of an Open Sci...Developing Distributed High-performance Computing Capabilities of an Open Sci...
Developing Distributed High-performance Computing Capabilities of an Open Sci...
Globus
 
Cyaniclab : Software Development Agency Portfolio.pdf
Cyaniclab : Software Development Agency Portfolio.pdfCyaniclab : Software Development Agency Portfolio.pdf
Cyaniclab : Software Development Agency Portfolio.pdf
Cyanic lab
 
Quarkus Hidden and Forbidden Extensions
Quarkus Hidden and Forbidden ExtensionsQuarkus Hidden and Forbidden Extensions
Quarkus Hidden and Forbidden Extensions
Max Andersen
 
Webinar: Salesforce Document Management 2.0 - Smarter, Faster, Better
Webinar: Salesforce Document Management 2.0 - Smarter, Faster, BetterWebinar: Salesforce Document Management 2.0 - Smarter, Faster, Better
Webinar: Salesforce Document Management 2.0 - Smarter, Faster, Better
XfilesPro
 
In 2015, I used to write extensions for Joomla, WordPress, phpBB3, etc and I ...
In 2015, I used to write extensions for Joomla, WordPress, phpBB3, etc and I ...In 2015, I used to write extensions for Joomla, WordPress, phpBB3, etc and I ...
In 2015, I used to write extensions for Joomla, WordPress, phpBB3, etc and I ...
Juraj Vysvader
 
First Steps with Globus Compute Multi-User Endpoints
First Steps with Globus Compute Multi-User EndpointsFirst Steps with Globus Compute Multi-User Endpoints
First Steps with Globus Compute Multi-User Endpoints
Globus
 
Gamify Your Mind; The Secret Sauce to Delivering Success, Continuously Improv...
Gamify Your Mind; The Secret Sauce to Delivering Success, Continuously Improv...Gamify Your Mind; The Secret Sauce to Delivering Success, Continuously Improv...
Gamify Your Mind; The Secret Sauce to Delivering Success, Continuously Improv...
Shahin Sheidaei
 
top nidhi software solution freedownload
top nidhi software solution freedownloadtop nidhi software solution freedownload
top nidhi software solution freedownload
vrstrong314
 
Accelerate Enterprise Software Engineering with Platformless
Accelerate Enterprise Software Engineering with PlatformlessAccelerate Enterprise Software Engineering with Platformless
Accelerate Enterprise Software Engineering with Platformless
WSO2
 
A Comprehensive Look at Generative AI in Retail App Testing.pdf
A Comprehensive Look at Generative AI in Retail App Testing.pdfA Comprehensive Look at Generative AI in Retail App Testing.pdf
A Comprehensive Look at Generative AI in Retail App Testing.pdf
kalichargn70th171
 
May Marketo Masterclass, London MUG May 22 2024.pdf
May Marketo Masterclass, London MUG May 22 2024.pdfMay Marketo Masterclass, London MUG May 22 2024.pdf
May Marketo Masterclass, London MUG May 22 2024.pdf
Adele Miller
 
Beyond Event Sourcing - Embracing CRUD for Wix Platform - Java.IL
Beyond Event Sourcing - Embracing CRUD for Wix Platform - Java.ILBeyond Event Sourcing - Embracing CRUD for Wix Platform - Java.IL
Beyond Event Sourcing - Embracing CRUD for Wix Platform - Java.IL
Natan Silnitsky
 
Prosigns: Transforming Business with Tailored Technology Solutions
Prosigns: Transforming Business with Tailored Technology SolutionsProsigns: Transforming Business with Tailored Technology Solutions
Prosigns: Transforming Business with Tailored Technology Solutions
Prosigns
 
How Recreation Management Software Can Streamline Your Operations.pptx
How Recreation Management Software Can Streamline Your Operations.pptxHow Recreation Management Software Can Streamline Your Operations.pptx
How Recreation Management Software Can Streamline Your Operations.pptx
wottaspaceseo
 
Providing Globus Services to Users of JASMIN for Environmental Data Analysis
Providing Globus Services to Users of JASMIN for Environmental Data AnalysisProviding Globus Services to Users of JASMIN for Environmental Data Analysis
Providing Globus Services to Users of JASMIN for Environmental Data Analysis
Globus
 
OpenFOAM solver for Helmholtz equation, helmholtzFoam / helmholtzBubbleFoam
OpenFOAM solver for Helmholtz equation, helmholtzFoam / helmholtzBubbleFoamOpenFOAM solver for Helmholtz equation, helmholtzFoam / helmholtzBubbleFoam
OpenFOAM solver for Helmholtz equation, helmholtzFoam / helmholtzBubbleFoam
takuyayamamoto1800
 
Enterprise Resource Planning System in Telangana
Enterprise Resource Planning System in TelanganaEnterprise Resource Planning System in Telangana
Enterprise Resource Planning System in Telangana
NYGGS Automation Suite
 
Large Language Models and the End of Programming
Large Language Models and the End of ProgrammingLarge Language Models and the End of Programming
Large Language Models and the End of Programming
Matt Welsh
 

Recently uploaded (20)

Navigating the Metaverse: A Journey into Virtual Evolution"
Navigating the Metaverse: A Journey into Virtual Evolution"Navigating the Metaverse: A Journey into Virtual Evolution"
Navigating the Metaverse: A Journey into Virtual Evolution"
 
Top Features to Include in Your Winzo Clone App for Business Growth (4).pptx
Top Features to Include in Your Winzo Clone App for Business Growth (4).pptxTop Features to Include in Your Winzo Clone App for Business Growth (4).pptx
Top Features to Include in Your Winzo Clone App for Business Growth (4).pptx
 
Developing Distributed High-performance Computing Capabilities of an Open Sci...
Developing Distributed High-performance Computing Capabilities of an Open Sci...Developing Distributed High-performance Computing Capabilities of an Open Sci...
Developing Distributed High-performance Computing Capabilities of an Open Sci...
 
Cyaniclab : Software Development Agency Portfolio.pdf
Cyaniclab : Software Development Agency Portfolio.pdfCyaniclab : Software Development Agency Portfolio.pdf
Cyaniclab : Software Development Agency Portfolio.pdf
 
Quarkus Hidden and Forbidden Extensions
Quarkus Hidden and Forbidden ExtensionsQuarkus Hidden and Forbidden Extensions
Quarkus Hidden and Forbidden Extensions
 
Webinar: Salesforce Document Management 2.0 - Smarter, Faster, Better
Webinar: Salesforce Document Management 2.0 - Smarter, Faster, BetterWebinar: Salesforce Document Management 2.0 - Smarter, Faster, Better
Webinar: Salesforce Document Management 2.0 - Smarter, Faster, Better
 
In 2015, I used to write extensions for Joomla, WordPress, phpBB3, etc and I ...
In 2015, I used to write extensions for Joomla, WordPress, phpBB3, etc and I ...In 2015, I used to write extensions for Joomla, WordPress, phpBB3, etc and I ...
In 2015, I used to write extensions for Joomla, WordPress, phpBB3, etc and I ...
 
First Steps with Globus Compute Multi-User Endpoints
First Steps with Globus Compute Multi-User EndpointsFirst Steps with Globus Compute Multi-User Endpoints
First Steps with Globus Compute Multi-User Endpoints
 
Gamify Your Mind; The Secret Sauce to Delivering Success, Continuously Improv...
Gamify Your Mind; The Secret Sauce to Delivering Success, Continuously Improv...Gamify Your Mind; The Secret Sauce to Delivering Success, Continuously Improv...
Gamify Your Mind; The Secret Sauce to Delivering Success, Continuously Improv...
 
top nidhi software solution freedownload
top nidhi software solution freedownloadtop nidhi software solution freedownload
top nidhi software solution freedownload
 
Accelerate Enterprise Software Engineering with Platformless
Accelerate Enterprise Software Engineering with PlatformlessAccelerate Enterprise Software Engineering with Platformless
Accelerate Enterprise Software Engineering with Platformless
 
A Comprehensive Look at Generative AI in Retail App Testing.pdf
A Comprehensive Look at Generative AI in Retail App Testing.pdfA Comprehensive Look at Generative AI in Retail App Testing.pdf
A Comprehensive Look at Generative AI in Retail App Testing.pdf
 
May Marketo Masterclass, London MUG May 22 2024.pdf
May Marketo Masterclass, London MUG May 22 2024.pdfMay Marketo Masterclass, London MUG May 22 2024.pdf
May Marketo Masterclass, London MUG May 22 2024.pdf
 
Beyond Event Sourcing - Embracing CRUD for Wix Platform - Java.IL
Beyond Event Sourcing - Embracing CRUD for Wix Platform - Java.ILBeyond Event Sourcing - Embracing CRUD for Wix Platform - Java.IL
Beyond Event Sourcing - Embracing CRUD for Wix Platform - Java.IL
 
Prosigns: Transforming Business with Tailored Technology Solutions
Prosigns: Transforming Business with Tailored Technology SolutionsProsigns: Transforming Business with Tailored Technology Solutions
Prosigns: Transforming Business with Tailored Technology Solutions
 
How Recreation Management Software Can Streamline Your Operations.pptx
How Recreation Management Software Can Streamline Your Operations.pptxHow Recreation Management Software Can Streamline Your Operations.pptx
How Recreation Management Software Can Streamline Your Operations.pptx
 
Providing Globus Services to Users of JASMIN for Environmental Data Analysis
Providing Globus Services to Users of JASMIN for Environmental Data AnalysisProviding Globus Services to Users of JASMIN for Environmental Data Analysis
Providing Globus Services to Users of JASMIN for Environmental Data Analysis
 
OpenFOAM solver for Helmholtz equation, helmholtzFoam / helmholtzBubbleFoam
OpenFOAM solver for Helmholtz equation, helmholtzFoam / helmholtzBubbleFoamOpenFOAM solver for Helmholtz equation, helmholtzFoam / helmholtzBubbleFoam
OpenFOAM solver for Helmholtz equation, helmholtzFoam / helmholtzBubbleFoam
 
Enterprise Resource Planning System in Telangana
Enterprise Resource Planning System in TelanganaEnterprise Resource Planning System in Telangana
Enterprise Resource Planning System in Telangana
 
Large Language Models and the End of Programming
Large Language Models and the End of ProgrammingLarge Language Models and the End of Programming
Large Language Models and the End of Programming
 

SoCal Data Science Conference: Machine Learning & Data Science in the Age of the GPU: Smarter, Faster, Better

  • 1. Machine Learning & Data Science in the Age of the GPU: Smarter, Faster, Better Aaron Williams @_arw_ October 22, 2017
  • 2. MapD: Extreme Analytics 2 100x Faster Queries MapD Core The world’s fastest columnar database, powered by GPUs + Visualization at the Speed of Thought MapD Immerse A visualization front end that leverages the speed & rendering superiority of GPUs
  • 3. MapD System Architecture Accelerating the existing data infrastructure 3
  • 5. GPUs Recharge Moore’s Law 5 GPU Processing Power 50% per year Data Growth 40% per year CPU Processing Power 20% per year
  • 6. Core Density Makes a Huge Difference 6 GPU ProcessingCPU Processing 40,000 Cores 20 Cores *fictitious example Latency Throughput CPU 1 ns per task (1 task/ns) x (20 cores) = 20 tasks/ns GPU 10 ns per task (0.1 task per ns) x (40,000 cores) = 4,000 task per ns Latency: Time to do a task. | Throughput: Number of tasks per unit time.
  • 7. MapD Benchmarks Blogger Mark Litwintschik benchmarked MapD on a billion-row taxi data set and found it to be up to orders-of-magnitude faster than the fastest CPU databases 7 MapD Core: Comparative Query Acceleration* System Q 1 Q 2 Q 3 Q 4 BrytlytDB & 2-node p2.16xlarge cluster 36x 47x 25x 12x ClickHouse, Intel Core i5 4670K 49x 58x 32x 25x Redshift, 6-node ds2.8xlarge cluster 74x 24x 14x 6x BigQuery 95x 38x 6x 6x Presto, 50-node n1-standard-4 cluster 190x 75x 61x 41x Amazon Athena 305x 117x 37x 13x Elasticsearch (heavily tuned) 386x 343x n/a n/a Spark 2.1, 11 x m3.xlarge cluster w/ HDFS 485x 153x 119x 169x Presto, 10-node n1-standard-4 cluster 524x 189x 127x 61x Vertica, Intel Core i5 4670K 685x 607x 203x 132x Elasticsearch (lightly tuned) 1,642x 1,194x n/a n/a Presto, 5-node m3.xlarge cluster w/ HDFS 1,667x 735x 388x 159x Presto, 50-node m3.xlarge cluster w/ S3 2,048x 849x 164x 86x PostgreSQL 9.5 & cstore_fdw 7,238x 3,302x 1,424x 722x Spark 1.6, 5-node m3.xlarge cluster w/ S3 12,571x 5,906x 3,758x 1,884x *All speed comparisons are to the “MapD & 1 Nvidia Pascal DGX-1” benchmark Source: http://tech.marksblogg.com/benchmarks.html
  • 8. Query Compilation with LLVM 8 Traditional DBs can be highly inefficient • each operator in SQL treated as a separate function • incurs tremendous overhead and prevents vectorization MapD compiles queries w/LLVM to create one custom function • Queries run at speeds approaching hand-written functions • LLVM enables generic targeting of different architectures (GPUs, X86, ARM, etc). • Code can be generated to run query on CPU and GPU simultaneously 10111010101001010110101101010101 00110101101101010101010101011101 LLVM
  • 9. Keeping Data Close to Compute MapD maximizes performance by optimizing memory use 9 SSD or NVRAM STORAGE (L3) 250GB to 20TB 1-2 GB/sec CPU RAM (L2) 32GB to 3TB 70-120 GB/sec GPU RAM (L1) 24GB to 256GB 1000-6000 GB/sec Hot Data Speedup = 1500x to 5000x Over Cold Data Warm Data Speedup = 35x to 120x Over Cold Data Cold Data COMPUTE LAYER STORAGE LAYER Data Lake/Data Warehouse/System Of Record SpeedIncreases SpaceIncreases
  • 10. The GPU Open Analytics Initiative Model Standard in-memory format; zero-copy interchange 10 GPU
  • 12. MapD Immerse Using a hybrid approach to speed and scale visualization 12 Basic charts are frontend rendered using D3 and other related toolkits Scatterplots, pointmaps + polygons are backend rendered using the Iris Rendering Engine on GPUs Geo-Viz is composited over a frontend rendered basemap
  • 13. Try MapD It’s free and it’s easy (and @ortelius sez it’s “… the new h0t sh1t”) 13 Play with the live demos: https://www.mapd.com/demos/ Download the Community Edition: https://www.mapd.com/platform/download-community/ Join our forums: https://community.mapd.com/ Review these slides: https://speakerdeck.com/mapd
  • 14. Aaron Williams VP of Global Community @_arw_ aaron@mapd.com /in/aaronwilliams/ /williamsaaron