GTC Tel Aviv: Accelerate Analytics with a GPU Data Frame

•

1 like•89 views

Presented October 18, 2017 While rapid innovation is occurring across the GPU software ecosystem, the platforms themselves still remain isolated from each other - until now. Aaron Williams, the VP of Global Community at MapD, will demo the GPU Open Analytics Initiative's (GOAI) first project on stage - the GPU Data Frame (GDF) - and explain how this approach will enable efficient intra-GPU communication between different processes running on the GPUs.

Software

Accelerate Analytics
with a GPU Data Frame
Aaron Williams
October 18, 2017

MapD: Extreme Analytics
2
100x Faster Queries
MapD Core
The world’s fastest
columnar database, powered
by GPUs
+
Visualization at the Speed of Thought
MapD Immerse
A visualization front end that
leverages the speed &
rendering superiority of GPUs

MapD System Architecture
Accelerating the existing data infrastructure
3

MapD Benchmarks
Blogger Mark Litwintschik benchmarked MapD on a billion-row taxi data set and
found it to be up to orders-of-magnitude faster than the fastest CPU databases
5
MapD Core: Comparative Query Acceleration*
System Q 1 Q 2 Q 3 Q 4
BrytlytDB & 2-node p2.16xlarge cluster 36x 47x 25x 12x
ClickHouse, Intel Core i5 4670K 49x 58x 32x 25x
Redshift, 6-node ds2.8xlarge cluster 74x 24x 14x 6x
BigQuery 95x 38x 6x 6x
Presto, 50-node n1-standard-4 cluster 190x 75x 61x 41x
Amazon Athena 305x 117x 37x 13x
Elasticsearch (heavily tuned) 386x 343x n/a n/a
Spark 2.1, 11 x m3.xlarge cluster w/ HDFS 485x 153x 119x 169x
Presto, 10-node n1-standard-4 cluster 524x 189x 127x 61x
Vertica, Intel Core i5 4670K 685x 607x 203x 132x
Elasticsearch (lightly tuned) 1,642x 1,194x n/a n/a
Presto, 5-node m3.xlarge cluster w/ HDFS 1,667x 735x 388x 159x
Presto, 50-node m3.xlarge cluster w/ S3 2,048x 849x 164x 86x
PostgreSQL 9.5 & cstore_fdw 7,238x 3,302x 1,424x 722x
Spark 1.6, 5-node m3.xlarge cluster w/ S3 12,571x 5,906x 3,758x 1,884x
*All speed comparisons are to the “MapD & 1 Nvidia Pascal DGX-1” benchmark
Source: http://tech.marksblogg.com/benchmarks.html

Query Compilation with LLVM
6
Traditional DBs can be highly inefficient
• each operator in SQL treated as a separate function
• incurs tremendous overhead and prevents vectorization
MapD compiles queries w/LLVM to create one custom function
• Queries run at speeds approaching hand-written functions
• LLVM enables generic targeting of different architectures (GPUs, X86, ARM, etc).
• Code can be generated to run query on CPU and GPU simultaneously
10111010101001010110101101010101
00110101101101010101010101011101
LLVM

Keeping Data Close to Compute
MapD maximizes performance by optimizing memory use
7
SSD or NVRAM STORAGE (L3)
250GB to 20TB
1-2 GB/sec
CPU RAM (L2)
32GB to 3TB
70-120 GB/sec
GPU RAM (L1)
24GB to 256GB
1000-6000 GB/sec
Hot Data
Speedup = 1500x to 5000x
Over Cold Data
Warm Data
Speedup = 35x to 120x
Over Cold Data
Cold Data
COMPUTE
LAYER
STORAGE
LAYER
Data Lake/Data Warehouse/System Of Record
SpeedIncreases
SpaceIncreases

The Status Quo: Memory Bottlenecks
8
PCIe
4-16GB/s

The GPU Open Analytics Initiative Model
Standard in-memory format; zero-copy interchange
9
GPU

The GPU Open Analytics Initiative Model
Standard in-memory format; zero-copy interchange
10

Interactive Machine Learning
Empowering the People in the Pipeline
11
Personas in
Analytics Lifecycle
(Illustrative)
Business Analyst
Data Scientist
Data Engineer
IT Systems Admin
Data Scientist / Business Analyst
Data
Preparation
Data
Discovery
& Feature
Engineering
Model &
Validate
Predict
Operationalize
Monitoring &
Refinement
Evaluate
& Decide
GPUsMapD H20.ai MapD

Try MapD
It’s free and it’s easy (and @ortelius sez “it’s the new h0t sh1t”)
13
Play with the live demos:
https://www.mapd.com/demos/
Download the Community Edition:
https://www.mapd.com/platform/download-community/
Join our forums:
https://community.mapd.com/
Review these slides:
https://www.slideshare.net/aaronrogerwilliams

Aaron Williams
VP of Global Community
@_arw_
aaron@mapd.com
/in/aaronwilliams/
/williamsaaron

GTC Tel Aviv: Accelerate Analytics with a GPU Data Frame

What's hot

Ndb cluster 80_tpc_hmikaelronstrom

DynamoDB at HasOffers Amazon Web Services

SCasia 2018 MSFT hands on session for Azure Batch AIHiroshi Tanaka

Cassandra Day Atlanta 2015: Diagnosing Problems in ProductionDataStax Academy

GPU databases - How to use them and what the future holdsArnon Shimoni

ClickHouse in Real Life. Case Studies and Best Practices, by Alexander ZaitsevAltinity Ltd

Scientific Computing With Amazon Web ServicesJamie Kinney

OpenNebula TechDay Boston 2015 - HA HPC with OpenNebulaOpenNebula Project

Measuring Database Performance on Bare Metal AWS InstancesScyllaDB

ScyllaDB @ Apache BigData, may 2016Tzach Livyatan

Using S3 Select to Deliver 100X Performance Improvements Versus the Public CloudDatabricks

The Do’s and Don’ts of Benchmarking DatabasesScyllaDB

Introduction to SQream and the IoT environmentArnon Shimoni

Introducing Scylla Open Source 4.0ScyllaDB

Webinar 2017. Supercharge your analytics with ClickHouse. Alexander ZaitsevAltinity Ltd

Ndb cluster 80_ycsb_diskmikaelronstrom

HTTP Analytics for 6M requests per second using ClickHouse, by Alexander Boc...Altinity Ltd

Microsoft Azure in HPC scenariosmictc

MongoDB vs Scylla: Production Experience from Both Dev & Ops Standpoint at Nu...ScyllaDB

Get Your Head in the Cloud - Lessons in GPU Computing with Schlumbergerinside-BigData.com

What's hot (20)

Ndb cluster 80_tpc_h

DynamoDB at HasOffers

SCasia 2018 MSFT hands on session for Azure Batch AI

Cassandra Day Atlanta 2015: Diagnosing Problems in Production

GPU databases - How to use them and what the future holds

ClickHouse in Real Life. Case Studies and Best Practices, by Alexander Zaitsev

Scientific Computing With Amazon Web Services

OpenNebula TechDay Boston 2015 - HA HPC with OpenNebula

Measuring Database Performance on Bare Metal AWS Instances

ScyllaDB @ Apache BigData, may 2016

Using S3 Select to Deliver 100X Performance Improvements Versus the Public Cloud

The Do’s and Don’ts of Benchmarking Databases

Introduction to SQream and the IoT environment

Introducing Scylla Open Source 4.0

Webinar 2017. Supercharge your analytics with ClickHouse. Alexander Zaitsev

Ndb cluster 80_ycsb_disk

HTTP Analytics for 6M requests per second using ClickHouse, by Alexander Boc...

Microsoft Azure in HPC scenarios

MongoDB vs Scylla: Production Experience from Both Dev & Ops Standpoint at Nu...

Get Your Head in the Cloud - Lessons in GPU Computing with Schlumberger

Similar to GTC Tel Aviv: Accelerate Analytics with a GPU Data Frame

Fast data in times of crisis with GPU accelerated database QikkDB | Business ...Matej Misik

GPU Accelerated Data Science with RAPIDS - ODSC West 2020John Zedlewski

RAPIDS: GPU-Accelerated ETL and Feature EngineeringKeith Kraus

GPU-Accelerating UDFs in PySpark with Numba and PyGDFKeith Kraus

Rapids: Data Science on GPUsinside-BigData.com

NVIDIA Rapids presentationtestSri1

組み込みから HPC まで ARM コアで実現するエコシステムShinnosuke Furuya

Tesla Accelerated Computing Platforminside-BigData.com

MYSQLgilashikwa

Using BigBench to compare Hive and Spark (short version)Nicolas Poggi

S51281 - Accelerate Data Science in Python with RAPIDS_1679330128290001YmT7.pdfDLow6

SF Big Analytics & SF Machine Learning Meetup: Machine Learning at the Limit ...Chester Chen

GIST AI-X Computing ClusterJax Jargalsaikhan

NVIDIA GPUs Power HPC & AI Workloads in Cloud with Univainside-BigData.com

Latest HPC News from NVIDIAinside-BigData.com

RAPIDS – Open GPU-accelerated Data ScienceData Works MD

RISC V in Spacerklepsydratechnologie

Red Hat Storage Day Atlanta - Designing Ceph Clusters Using Intel-Based Hardw...Red_Hat_Storage

20201006_PGconf_Online_Large_Data_ProcessingKohei KaiGai

RAPIDS OverviewNVIDIA Japan

Similar to GTC Tel Aviv: Accelerate Analytics with a GPU Data Frame (20)

Fast data in times of crisis with GPU accelerated database QikkDB | Business ...

GPU Accelerated Data Science with RAPIDS - ODSC West 2020

RAPIDS: GPU-Accelerated ETL and Feature Engineering

GPU-Accelerating UDFs in PySpark with Numba and PyGDF

Rapids: Data Science on GPUs

NVIDIA Rapids presentation

組み込みから HPC まで ARM コアで実現するエコシステム

Tesla Accelerated Computing Platform

MYSQL

Using BigBench to compare Hive and Spark (short version)

S51281 - Accelerate Data Science in Python with RAPIDS_1679330128290001YmT7.pdf

SF Big Analytics & SF Machine Learning Meetup: Machine Learning at the Limit ...

GIST AI-X Computing Cluster

NVIDIA GPUs Power HPC & AI Workloads in Cloud with Univa

Latest HPC News from NVIDIA

RAPIDS – Open GPU-accelerated Data Science

RISC V in Spacer

Red Hat Storage Day Atlanta - Designing Ceph Clusters Using Intel-Based Hardw...

20201006_PGconf_Online_Large_Data_Processing

RAPIDS Overview

Recently uploaded

Cloud Management Software Platforms: OpenStackVICTOR MAESTRE RAMIREZ

Der Spagat zwischen BIAS und FAIRNESS (2024)OPEN KNOWLEDGE GmbH

How to Track Employee Performance A Comprehensive Guide.pdfLivetecs LLC

Software Project Health Check: Best Practices and Techniques for Your Product...Velvetech LLC

Intelligent Home Wi-Fi Solutions | ThinkPalmSujith Sukumaran

Introduction Computer Science - Software Design.pdfFerryKemperman

BATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASEOrtus Solutions, Corp

办理学位证(UQ文凭证书)昆士兰大学毕业证成绩单原版一模一样umasea

Unveiling the Future: Sylius 2.0 New FeaturesŁukasz Chruściel

Unveiling Design Patterns: A Visual Guide with UML DiagramsAhmed Mohamed

ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...Christina Lin

英国UN学位证,北安普顿大学毕业证书1:1制作qr0udbr0

Hot Sexy call girls in Patel Nagar🔝 9953056974 🔝 escort Service9953056974 Low Rate Call Girls In Saket, Delhi NCR

Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024StefanoLambiase

Balasore Best It Company|| Top 10 IT Company || Balasore Software company Odishasmiwainfosol

What are the key points to focus on before starting to learn ETL Development....kzayra69

Building Real-Time Data Pipelines: Stream & Batch Processing workshop SlideChristina Lin

Implementing Zero Trust strategy with AzureDinusha Kumarasiri

What is Fashion PLM and Why Do You Need ItWave PLM

Xen Safety Embedded OSS Summit April 2024 v4.pdfStefano Stabellini

Recently uploaded (20)

Cloud Management Software Platforms: OpenStack

Der Spagat zwischen BIAS und FAIRNESS (2024)

How to Track Employee Performance A Comprehensive Guide.pdf

Software Project Health Check: Best Practices and Techniques for Your Product...

Intelligent Home Wi-Fi Solutions | ThinkPalm

Introduction Computer Science - Software Design.pdf

BATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASE

办理学位证(UQ文凭证书)昆士兰大学毕业证成绩单原版一模一样

Unveiling the Future: Sylius 2.0 New Features

Unveiling Design Patterns: A Visual Guide with UML Diagrams

ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...

英国UN学位证,北安普顿大学毕业证书1:1制作

Hot Sexy call girls in Patel Nagar🔝 9953056974 🔝 escort Service

Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024

Balasore Best It Company|| Top 10 IT Company || Balasore Software company Odisha

What are the key points to focus on before starting to learn ETL Development....

Building Real-Time Data Pipelines: Stream & Batch Processing workshop Slide

Implementing Zero Trust strategy with Azure

What is Fashion PLM and Why Do You Need It

Xen Safety Embedded OSS Summit April 2024 v4.pdf

GTC Tel Aviv: Accelerate Analytics with a GPU Data Frame

1. Accelerate Analytics with a GPU Data Frame Aaron Williams October 18, 2017

2. MapD: Extreme Analytics 2 100x Faster Queries MapD Core The world’s fastest columnar database, powered by GPUs + Visualization at the Speed of Thought MapD Immerse A visualization front end that leverages the speed & rendering superiority of GPUs

3. MapD System Architecture Accelerating the existing data infrastructure 3

4. 4 MAPD DEMO

5. MapD Benchmarks Blogger Mark Litwintschik benchmarked MapD on a billion-row taxi data set and found it to be up to orders-of-magnitude faster than the fastest CPU databases 5 MapD Core: Comparative Query Acceleration* System Q 1 Q 2 Q 3 Q 4 BrytlytDB & 2-node p2.16xlarge cluster 36x 47x 25x 12x ClickHouse, Intel Core i5 4670K 49x 58x 32x 25x Redshift, 6-node ds2.8xlarge cluster 74x 24x 14x 6x BigQuery 95x 38x 6x 6x Presto, 50-node n1-standard-4 cluster 190x 75x 61x 41x Amazon Athena 305x 117x 37x 13x Elasticsearch (heavily tuned) 386x 343x n/a n/a Spark 2.1, 11 x m3.xlarge cluster w/ HDFS 485x 153x 119x 169x Presto, 10-node n1-standard-4 cluster 524x 189x 127x 61x Vertica, Intel Core i5 4670K 685x 607x 203x 132x Elasticsearch (lightly tuned) 1,642x 1,194x n/a n/a Presto, 5-node m3.xlarge cluster w/ HDFS 1,667x 735x 388x 159x Presto, 50-node m3.xlarge cluster w/ S3 2,048x 849x 164x 86x PostgreSQL 9.5 & cstore_fdw 7,238x 3,302x 1,424x 722x Spark 1.6, 5-node m3.xlarge cluster w/ S3 12,571x 5,906x 3,758x 1,884x *All speed comparisons are to the “MapD & 1 Nvidia Pascal DGX-1” benchmark Source: http://tech.marksblogg.com/benchmarks.html

6. Query Compilation with LLVM 6 Traditional DBs can be highly inefficient • each operator in SQL treated as a separate function • incurs tremendous overhead and prevents vectorization MapD compiles queries w/LLVM to create one custom function • Queries run at speeds approaching hand-written functions • LLVM enables generic targeting of different architectures (GPUs, X86, ARM, etc). • Code can be generated to run query on CPU and GPU simultaneously 10111010101001010110101101010101 00110101101101010101010101011101 LLVM

7. Keeping Data Close to Compute MapD maximizes performance by optimizing memory use 7 SSD or NVRAM STORAGE (L3) 250GB to 20TB 1-2 GB/sec CPU RAM (L2) 32GB to 3TB 70-120 GB/sec GPU RAM (L1) 24GB to 256GB 1000-6000 GB/sec Hot Data Speedup = 1500x to 5000x Over Cold Data Warm Data Speedup = 35x to 120x Over Cold Data Cold Data COMPUTE LAYER STORAGE LAYER Data Lake/Data Warehouse/System Of Record SpeedIncreases SpaceIncreases

8. The Status Quo: Memory Bottlenecks 8 PCIe 4-16GB/s

9. The GPU Open Analytics Initiative Model Standard in-memory format; zero-copy interchange 9 GPU

10. The GPU Open Analytics Initiative Model Standard in-memory format; zero-copy interchange 10

11. Interactive Machine Learning Empowering the People in the Pipeline 11 Personas in Analytics Lifecycle (Illustrative) Business Analyst Data Scientist Data Engineer IT Systems Admin Data Scientist / Business Analyst Data Preparation Data Discovery & Feature Engineering Model & Validate Predict Operationalize Monitoring & Refinement Evaluate & Decide GPUsMapD H20.ai MapD

12. 12 GOAI DEMO

13. Try MapD It’s free and it’s easy (and @ortelius sez “it’s the new h0t sh1t”) 13 Play with the live demos: https://www.mapd.com/demos/ Download the Community Edition: https://www.mapd.com/platform/download-community/ Join our forums: https://community.mapd.com/ Review these slides: https://www.slideshare.net/aaronrogerwilliams

14. Aaron Williams VP of Global Community @_arw_ aaron@mapd.com /in/aaronwilliams/ /williamsaaron

GTC Tel Aviv: Accelerate Analytics with a GPU Data Frame

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to GTC Tel Aviv: Accelerate Analytics with a GPU Data Frame

Similar to GTC Tel Aviv: Accelerate Analytics with a GPU Data Frame (20)

Recently uploaded

Recently uploaded (20)

GTC Tel Aviv: Accelerate Analytics with a GPU Data Frame