Machine Learning Developers - Know your GPUs

•

0 likes•126 views

Amazon Web Services

Dev Day Australia 2018 Dev Bytes Presentation by Aparna Elangovan, AWS Solutions Architect

• 10s-100s of processing
cores
• Pre-defined instruction
set & datapath widths
• Optimized for general-
purpose computing
CPU
CPUs vs GPUs
• 1,000s of processing
cores
• Pre-defined instruction
set and datapath widths
• Highly effective at
parallel execution
GPU
DRAM
Control
ALU
ALU
Cache
DRAM
ALU
ALU
Control
ALU
ALU
Cache
DRAM
ALU
ALU
Control
ALU
ALU
Cache
DRAM
ALU
ALU
Control
ALU
ALU
Cache
DRAM
ALU
ALU

How GPU acceleration works
3. Copy result data from GPU
memory and CPU memory
CPU
CPU memory
GPU
GPU memory
PCI Bus 2. Process
on GPU
1. Copy data from CPU
memory and GPU memory

GPU Manufacturers & Programming platforms
• Manufacturers
• NVIDIA
• Intel
• ..
• GPU Programming Platforms
• CUDA (Compute Unified Device Architecture) parallel computing platform..
Specific to NVIDIA GPUs
• OpenCL - Open platform programming model

$NVIDIA GPU Hello World Vector Add in C & CUDA // Global key word indicates this code runs on GPU __global__ void add(int n, float a, float *x, float *y) { int i = blockIdx.x*blockDim.x + threadIdx.x; if (i < n) y[i] = a*x[i] + y[i]; } int main(void) { // Initialize your array ( CPU & GPU memory) // Some code here …… // Step 1 - Copy data from CPU (Host) memory to GPU (Device) memory cudaMemcpy(d_x, x, N*sizeof(float), cudaMemcpyHostToDevice); cudaMemcpy(d_y, y, N*sizeof(float), cudaMemcpyHostToDevice); // Step 2 Perform add on 1M elements on GPU add<<<(N+255)/256, 256>>>(N, 2.0f, d_x, d_y); // Step3 Copy result data back from GPU (Device) memory to CPU (Host) memory cudaMemcpy(y, d_y, N*sizeof(float), cudaMemcpyDeviceToHost); // Free memory on GPU (Device) // …. }$

Demo - GPU Acceleration Vector
Add & Multiply in Python

What's hot

SQream on Ibm power9 (english)Yutaka Kawai

AWS Customer Presentation - ZyngaAmazon Web Services

CUDA performance study on Hadoop MapReduce Clusterairbots

Cheap HPCAlex Moore

«Oracle on AWS»Provectus

Evolution and roadmap ibm power_system_onepageNengkuan Tu

OakTableWorld 2013: Ultimate Exadata IO monitoring – Flash, HardDisk , & Writ...Kristofferson A

[Paper Reading]KVSSD: Close integration of LSM trees and flash translation la...PingCAP

Semicon2018 dileepbDileep Bhandarkar

Icg hpc-usergdburton

Announcing Amazon EC2 F1 Instances with Custom FPGAsAmazon Web Services

Ceph Day KL - Ceph on ARM Ceph Community

HC-4019, "Exploiting Coarse-grained Parallelism in B+ Tree Searches on an APU...AMD Developer Central

Using GPUs to handle Big Data with Java by Adam Roberts.J On The Beach

Open compute technologyAMD

CONDOR @ NGCLE@e-Novia 15.11.2017NECST Lab @ Politecnico di Milano

COSCUP 2020 RISC-V 32 bit linux highmem portingEric Lin

What's hot (17)

SQream on Ibm power9 (english)

AWS Customer Presentation - Zynga

CUDA performance study on Hadoop MapReduce Cluster

Cheap HPC

«Oracle on AWS»

Evolution and roadmap ibm power_system_onepage

OakTableWorld 2013: Ultimate Exadata IO monitoring – Flash, HardDisk , & Writ...

[Paper Reading]KVSSD: Close integration of LSM trees and flash translation la...

Semicon2018 dileepb

Icg hpc-user

Announcing Amazon EC2 F1 Instances with Custom FPGAs

Ceph Day KL - Ceph on ARM

HC-4019, "Exploiting Coarse-grained Parallelism in B+ Tree Searches on an APU...

Using GPUs to handle Big Data with Java by Adam Roberts.

Open compute technology

CONDOR @ NGCLE@e-Novia 15.11.2017

COSCUP 2020 RISC-V 32 bit linux highmem porting

Similar to Machine Learning Developers - Know your GPUs

Deep Learning on ARM Platforms - SFO17-509Linaro

LCU13: GPGPU on ARM Experience ReportLinaro

Computação acelerada – a era das ap us roberto brandão, ciênciaCampus Party Brasil

Amazon EC2 deepdive and a sprinkel of AWS Compute | AWS Floor28Amazon Web Services

PgopenclTim Child

PostgreSQL with OpenCLMuhaza Liebenlito

Backend.AI Technical Introduction (19.09 / 2019 Autumn)Lablup Inc.

AMD It's Time to ROCinside-BigData.com

Programming Models for Heterogeneous ChipsFacultad de Informática UCM

GPU Programming with JavaKelum Senanayake

CudaMannu Malhotra

Part 4 Maximizing the utilization of GPU resources on-premise and in the cloudUniva, an Altair Company

Heterogeneous computingRashid Ansari

AWS re:Invent 2016: Deep Learning, 3D Content Rendering, and Massively Parall...Amazon Web Services

Stream Processingarnamoy10

Deep Dive on Amazon EC2 Accelerated Computing - AWS Online Tech TalksAmazon Web Services

Amd accelerated computing -ufrjRoberto Brandao

Using a Field Programmable Gate Array to Accelerate Application PerformanceOdinot Stanislas

The Rise of Parallel Computingbakers84

Balancing Power & Performance WebinarQualcomm Developer Network

Similar to Machine Learning Developers - Know your GPUs (20)

Deep Learning on ARM Platforms - SFO17-509

LCU13: GPGPU on ARM Experience Report

Computação acelerada – a era das ap us roberto brandão, ciência

Amazon EC2 deepdive and a sprinkel of AWS Compute | AWS Floor28

Pgopencl

PostgreSQL with OpenCL

Backend.AI Technical Introduction (19.09 / 2019 Autumn)

AMD It's Time to ROC

Programming Models for Heterogeneous Chips

GPU Programming with Java

Cuda

Part 4 Maximizing the utilization of GPU resources on-premise and in the cloud

Heterogeneous computing

AWS re:Invent 2016: Deep Learning, 3D Content Rendering, and Massively Parall...

Stream Processing

Deep Dive on Amazon EC2 Accelerated Computing - AWS Online Tech Talks

Amd accelerated computing -ufrj

Using a Field Programmable Gate Array to Accelerate Application Performance

The Rise of Parallel Computing

Balancing Power & Performance Webinar

More from Amazon Web Services

Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...Amazon Web Services

Big Data per le Startup: come creare applicazioni Big Data in modalità Server...Amazon Web Services

Esegui pod serverless con Amazon EKS e AWS FargateAmazon Web Services

Costruire Applicazioni Moderne con AWSAmazon Web Services

Come spendere fino al 90% in meno con i container e le istanze spot Amazon Web Services

Open banking as a serviceAmazon Web Services

Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...Amazon Web Services

OpsWorks Configuration Management: automatizza la gestione e i deployment del...Amazon Web Services

Microsoft Active Directory su AWS per supportare i tuoi Windows WorkloadsAmazon Web Services

Computer Vision con AWSAmazon Web Services

Database Oracle e VMware Cloud on AWS i miti da sfatareAmazon Web Services

Crea la tua prima serverless ledger-based app con QLDB e NodeJSAmazon Web Services

API moderne real-time per applicazioni mobili e webAmazon Web Services

Database Oracle e VMware Cloud™ on AWS: i miti da sfatareAmazon Web Services

Tools for building your MVP on AWSAmazon Web Services

How to Build a Winning Pitch DeckAmazon Web Services

Building a web application without serversAmazon Web Services

Fundraising EssentialsAmazon Web Services

AWS_HK_StartupDay_Building Interactive websites while automating for efficien...Amazon Web Services

Introduzione a Amazon Elastic Container ServiceAmazon Web Services

More from Amazon Web Services (20)

Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...

Big Data per le Startup: come creare applicazioni Big Data in modalità Server...

Esegui pod serverless con Amazon EKS e AWS Fargate

Costruire Applicazioni Moderne con AWS

Come spendere fino al 90% in meno con i container e le istanze spot

Open banking as a service

Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...

OpsWorks Configuration Management: automatizza la gestione e i deployment del...

Microsoft Active Directory su AWS per supportare i tuoi Windows Workloads

Computer Vision con AWS

Database Oracle e VMware Cloud on AWS i miti da sfatare

Crea la tua prima serverless ledger-based app con QLDB e NodeJS

API moderne real-time per applicazioni mobili e web

Database Oracle e VMware Cloud™ on AWS: i miti da sfatare

Tools for building your MVP on AWS

How to Build a Winning Pitch Deck

Building a web application without servers

Fundraising Essentials

AWS_HK_StartupDay_Building Interactive websites while automating for efficien...

Introduzione a Amazon Elastic Container Service

Machine Learning Developers - Know your GPUs

2. • 10s-100s of processing cores • Pre-defined instruction set & datapath widths • Optimized for general- purpose computing CPU CPUs vs GPUs • 1,000s of processing cores • Pre-defined instruction set and datapath widths • Highly effective at parallel execution GPU DRAM Control ALU ALU Cache DRAM ALU ALU Control ALU ALU Cache DRAM ALU ALU Control ALU ALU Cache DRAM ALU ALU Control ALU ALU Cache DRAM ALU ALU

3. How GPU acceleration works 3. Copy result data from GPU memory and CPU memory CPU CPU memory GPU GPU memory PCI Bus 2. Process on GPU 1. Copy data from CPU memory and GPU memory

4. GPU Manufacturers & Programming platforms • Manufacturers • NVIDIA • Intel • .. • GPU Programming Platforms • CUDA (Compute Unified Device Architecture) parallel computing platform.. Specific to NVIDIA GPUs • OpenCL - Open platform programming model

5. NVIDIA GPU Hello World Vector Add in C & CUDA // Global key word indicates this code runs on GPU __global__ void add(int n, float a, float *x, float *y) { int i = blockIdx.x*blockDim.x + threadIdx.x; if (i < n) y[i] = a*x[i] + y[i]; } int main(void) { // Initialize your array ( CPU & GPU memory) // Some code here …… // Step 1 - Copy data from CPU (Host) memory to GPU (Device) memory cudaMemcpy(d_x, x, N*sizeof(float), cudaMemcpyHostToDevice); cudaMemcpy(d_y, y, N*sizeof(float), cudaMemcpyHostToDevice); // Step 2 Perform add on 1M elements on GPU add<<<(N+255)/256, 256>>>(N, 2.0f, d_x, d_y); // Step3 Copy result data back from GPU (Device) memory to CPU (Host) memory cudaMemcpy(y, d_y, N*sizeof(float), cudaMemcpyDeviceToHost); // Free memory on GPU (Device) // …. }

6. Demo - GPU Acceleration Vector Add & Multiply in Python

7. Summary 1. GPU instance ≠ Faster performance • Program & compile your code to target CPU • Operations must be parallelizable & compute intensive • Need lots of data • Profile your code to measure CPU vs GPU Utlization

8. Scan QR Code for More Information

9. Thank you

Machine Learning Developers - Know your GPUs

Recommended

Recommended

More Related Content

What's hot

What's hot (17)

Similar to Machine Learning Developers - Know your GPUs

Similar to Machine Learning Developers - Know your GPUs (20)

More from Amazon Web Services

More from Amazon Web Services (20)

Machine Learning Developers - Know your GPUs