SlideShare a Scribd company logo
1 of 9
Download to read offline
Machine Learning Developers –
Know Your GPUs
Aparna Elangovan, AWS Solutions Architect, 16th July 2018
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
• 10s-100s of processing
cores
• Pre-defined instruction
set & datapath widths
• Optimized for general-
purpose computing
CPU
CPUs vs GPUs
• 1,000s of processing
cores
• Pre-defined instruction
set and datapath widths
• Highly effective at
parallel execution
GPU
DRAM
Control
ALU
ALU
Cache
DRAM
ALU
ALU
Control
ALU
ALU
Cache
DRAM
ALU
ALU
Control
ALU
ALU
Cache
DRAM
ALU
ALU
Control
ALU
ALU
Cache
DRAM
ALU
ALU
How GPU acceleration works
3. Copy result data from GPU
memory and CPU memory
CPU
CPU memory
GPU
GPU memory
PCI Bus 2. Process
on GPU
1. Copy data from CPU
memory and GPU memory
GPU Manufacturers & Programming platforms
• Manufacturers
• NVIDIA
• Intel
• ..
• GPU Programming Platforms
• CUDA (Compute Unified Device Architecture) parallel computing platform..
Specific to NVIDIA GPUs
• OpenCL - Open platform programming model
NVIDIA GPU Hello World Vector Add in C & CUDA
// Global key word indicates this code runs on GPU
__global__
void add(int n, float a, float *x, float *y)
{
int i = blockIdx.x*blockDim.x + threadIdx.x;
if (i < n) y[i] = a*x[i] + y[i];
}
int main(void)
{
// Initialize your array ( CPU & GPU memory)
// Some code here ……
// Step 1 - Copy data from CPU (Host) memory to GPU (Device) memory
cudaMemcpy(d_x, x, N*sizeof(float), cudaMemcpyHostToDevice);
cudaMemcpy(d_y, y, N*sizeof(float), cudaMemcpyHostToDevice);
// Step 2 Perform add on 1M elements on GPU
add<<<(N+255)/256, 256>>>(N, 2.0f, d_x, d_y);
// Step3 Copy result data back from GPU (Device) memory to CPU (Host) memory
cudaMemcpy(y, d_y, N*sizeof(float), cudaMemcpyDeviceToHost);
// Free memory on GPU (Device)
// ….
}
Demo - GPU Acceleration Vector
Add & Multiply in Python
Summary
1. GPU instance ≠ Faster performance
• Program & compile your code to target CPU
• Operations must be parallelizable & compute intensive
• Need lots of data
• Profile your code to measure CPU vs GPU Utlization
Scan QR Code for More Information
Thank you

More Related Content

What's hot

SQream on Ibm power9 (english)
SQream on Ibm power9 (english)SQream on Ibm power9 (english)
SQream on Ibm power9 (english)Yutaka Kawai
 
AWS Customer Presentation - Zynga
AWS Customer Presentation - ZyngaAWS Customer Presentation - Zynga
AWS Customer Presentation - ZyngaAmazon Web Services
 
CUDA performance study on Hadoop MapReduce Cluster
CUDA performance study on Hadoop MapReduce ClusterCUDA performance study on Hadoop MapReduce Cluster
CUDA performance study on Hadoop MapReduce Clusterairbots
 
«Oracle on AWS»
«Oracle on AWS»«Oracle on AWS»
«Oracle on AWS»Provectus
 
Evolution and roadmap ibm power_system_onepage
Evolution and roadmap ibm power_system_onepageEvolution and roadmap ibm power_system_onepage
Evolution and roadmap ibm power_system_onepageNengkuan Tu
 
OakTableWorld 2013: Ultimate Exadata IO monitoring – Flash, HardDisk , & Writ...
OakTableWorld 2013: Ultimate Exadata IO monitoring – Flash, HardDisk , & Writ...OakTableWorld 2013: Ultimate Exadata IO monitoring – Flash, HardDisk , & Writ...
OakTableWorld 2013: Ultimate Exadata IO monitoring – Flash, HardDisk , & Writ...Kristofferson A
 
[Paper Reading]KVSSD: Close integration of LSM trees and flash translation la...
[Paper Reading]KVSSD: Close integration of LSM trees and flash translation la...[Paper Reading]KVSSD: Close integration of LSM trees and flash translation la...
[Paper Reading]KVSSD: Close integration of LSM trees and flash translation la...PingCAP
 
Icg hpc-user
Icg hpc-userIcg hpc-user
Icg hpc-usergdburton
 
Announcing Amazon EC2 F1 Instances with Custom FPGAs
Announcing Amazon EC2 F1 Instances with Custom FPGAsAnnouncing Amazon EC2 F1 Instances with Custom FPGAs
Announcing Amazon EC2 F1 Instances with Custom FPGAsAmazon Web Services
 
Ceph Day KL - Ceph on ARM
Ceph Day KL - Ceph on ARM Ceph Day KL - Ceph on ARM
Ceph Day KL - Ceph on ARM Ceph Community
 
HC-4019, "Exploiting Coarse-grained Parallelism in B+ Tree Searches on an APU...
HC-4019, "Exploiting Coarse-grained Parallelism in B+ Tree Searches on an APU...HC-4019, "Exploiting Coarse-grained Parallelism in B+ Tree Searches on an APU...
HC-4019, "Exploiting Coarse-grained Parallelism in B+ Tree Searches on an APU...AMD Developer Central
 
Using GPUs to handle Big Data with Java by Adam Roberts.
Using GPUs to handle Big Data with Java by Adam Roberts.Using GPUs to handle Big Data with Java by Adam Roberts.
Using GPUs to handle Big Data with Java by Adam Roberts.J On The Beach
 
Open compute technology
Open compute technologyOpen compute technology
Open compute technologyAMD
 
COSCUP 2020 RISC-V 32 bit linux highmem porting
COSCUP 2020 RISC-V 32 bit linux highmem portingCOSCUP 2020 RISC-V 32 bit linux highmem porting
COSCUP 2020 RISC-V 32 bit linux highmem portingEric Lin
 

What's hot (17)

SQream on Ibm power9 (english)
SQream on Ibm power9 (english)SQream on Ibm power9 (english)
SQream on Ibm power9 (english)
 
AWS Customer Presentation - Zynga
AWS Customer Presentation - ZyngaAWS Customer Presentation - Zynga
AWS Customer Presentation - Zynga
 
CUDA performance study on Hadoop MapReduce Cluster
CUDA performance study on Hadoop MapReduce ClusterCUDA performance study on Hadoop MapReduce Cluster
CUDA performance study on Hadoop MapReduce Cluster
 
Cheap HPC
Cheap HPCCheap HPC
Cheap HPC
 
«Oracle on AWS»
«Oracle on AWS»«Oracle on AWS»
«Oracle on AWS»
 
Evolution and roadmap ibm power_system_onepage
Evolution and roadmap ibm power_system_onepageEvolution and roadmap ibm power_system_onepage
Evolution and roadmap ibm power_system_onepage
 
OakTableWorld 2013: Ultimate Exadata IO monitoring – Flash, HardDisk , & Writ...
OakTableWorld 2013: Ultimate Exadata IO monitoring – Flash, HardDisk , & Writ...OakTableWorld 2013: Ultimate Exadata IO monitoring – Flash, HardDisk , & Writ...
OakTableWorld 2013: Ultimate Exadata IO monitoring – Flash, HardDisk , & Writ...
 
[Paper Reading]KVSSD: Close integration of LSM trees and flash translation la...
[Paper Reading]KVSSD: Close integration of LSM trees and flash translation la...[Paper Reading]KVSSD: Close integration of LSM trees and flash translation la...
[Paper Reading]KVSSD: Close integration of LSM trees and flash translation la...
 
Semicon2018 dileepb
Semicon2018 dileepbSemicon2018 dileepb
Semicon2018 dileepb
 
Icg hpc-user
Icg hpc-userIcg hpc-user
Icg hpc-user
 
Announcing Amazon EC2 F1 Instances with Custom FPGAs
Announcing Amazon EC2 F1 Instances with Custom FPGAsAnnouncing Amazon EC2 F1 Instances with Custom FPGAs
Announcing Amazon EC2 F1 Instances with Custom FPGAs
 
Ceph Day KL - Ceph on ARM
Ceph Day KL - Ceph on ARM Ceph Day KL - Ceph on ARM
Ceph Day KL - Ceph on ARM
 
HC-4019, "Exploiting Coarse-grained Parallelism in B+ Tree Searches on an APU...
HC-4019, "Exploiting Coarse-grained Parallelism in B+ Tree Searches on an APU...HC-4019, "Exploiting Coarse-grained Parallelism in B+ Tree Searches on an APU...
HC-4019, "Exploiting Coarse-grained Parallelism in B+ Tree Searches on an APU...
 
Using GPUs to handle Big Data with Java by Adam Roberts.
Using GPUs to handle Big Data with Java by Adam Roberts.Using GPUs to handle Big Data with Java by Adam Roberts.
Using GPUs to handle Big Data with Java by Adam Roberts.
 
Open compute technology
Open compute technologyOpen compute technology
Open compute technology
 
CONDOR @ NGCLE@e-Novia 15.11.2017
CONDOR @ NGCLE@e-Novia 15.11.2017CONDOR @ NGCLE@e-Novia 15.11.2017
CONDOR @ NGCLE@e-Novia 15.11.2017
 
COSCUP 2020 RISC-V 32 bit linux highmem porting
COSCUP 2020 RISC-V 32 bit linux highmem portingCOSCUP 2020 RISC-V 32 bit linux highmem porting
COSCUP 2020 RISC-V 32 bit linux highmem porting
 

Similar to Machine Learning Developers - Know your GPUs

Deep Learning on ARM Platforms - SFO17-509
Deep Learning on ARM Platforms - SFO17-509Deep Learning on ARM Platforms - SFO17-509
Deep Learning on ARM Platforms - SFO17-509Linaro
 
LCU13: GPGPU on ARM Experience Report
LCU13: GPGPU on ARM Experience ReportLCU13: GPGPU on ARM Experience Report
LCU13: GPGPU on ARM Experience ReportLinaro
 
Computação acelerada – a era das ap us roberto brandão, ciência
Computação acelerada – a era das ap us   roberto brandão,  ciênciaComputação acelerada – a era das ap us   roberto brandão,  ciência
Computação acelerada – a era das ap us roberto brandão, ciênciaCampus Party Brasil
 
Amazon EC2 deepdive and a sprinkel of AWS Compute | AWS Floor28
Amazon EC2 deepdive and a sprinkel of AWS Compute | AWS Floor28Amazon EC2 deepdive and a sprinkel of AWS Compute | AWS Floor28
Amazon EC2 deepdive and a sprinkel of AWS Compute | AWS Floor28Amazon Web Services
 
Backend.AI Technical Introduction (19.09 / 2019 Autumn)
Backend.AI Technical Introduction (19.09 / 2019 Autumn)Backend.AI Technical Introduction (19.09 / 2019 Autumn)
Backend.AI Technical Introduction (19.09 / 2019 Autumn)Lablup Inc.
 
Part 4 Maximizing the utilization of GPU resources on-premise and in the cloud
Part 4  Maximizing the utilization of GPU resources on-premise and in the cloudPart 4  Maximizing the utilization of GPU resources on-premise and in the cloud
Part 4 Maximizing the utilization of GPU resources on-premise and in the cloudUniva, an Altair Company
 
Heterogeneous computing
Heterogeneous computingHeterogeneous computing
Heterogeneous computingRashid Ansari
 
AWS re:Invent 2016: Deep Learning, 3D Content Rendering, and Massively Parall...
AWS re:Invent 2016: Deep Learning, 3D Content Rendering, and Massively Parall...AWS re:Invent 2016: Deep Learning, 3D Content Rendering, and Massively Parall...
AWS re:Invent 2016: Deep Learning, 3D Content Rendering, and Massively Parall...Amazon Web Services
 
Stream Processing
Stream ProcessingStream Processing
Stream Processingarnamoy10
 
Deep Dive on Amazon EC2 Accelerated Computing - AWS Online Tech Talks
Deep Dive on Amazon EC2 Accelerated Computing - AWS Online Tech TalksDeep Dive on Amazon EC2 Accelerated Computing - AWS Online Tech Talks
Deep Dive on Amazon EC2 Accelerated Computing - AWS Online Tech TalksAmazon Web Services
 
Amd accelerated computing -ufrj
Amd   accelerated computing -ufrjAmd   accelerated computing -ufrj
Amd accelerated computing -ufrjRoberto Brandao
 
Using a Field Programmable Gate Array to Accelerate Application Performance
Using a Field Programmable Gate Array to Accelerate Application PerformanceUsing a Field Programmable Gate Array to Accelerate Application Performance
Using a Field Programmable Gate Array to Accelerate Application PerformanceOdinot Stanislas
 
The Rise of Parallel Computing
The Rise of Parallel ComputingThe Rise of Parallel Computing
The Rise of Parallel Computingbakers84
 

Similar to Machine Learning Developers - Know your GPUs (20)

Deep Learning on ARM Platforms - SFO17-509
Deep Learning on ARM Platforms - SFO17-509Deep Learning on ARM Platforms - SFO17-509
Deep Learning on ARM Platforms - SFO17-509
 
LCU13: GPGPU on ARM Experience Report
LCU13: GPGPU on ARM Experience ReportLCU13: GPGPU on ARM Experience Report
LCU13: GPGPU on ARM Experience Report
 
Computação acelerada – a era das ap us roberto brandão, ciência
Computação acelerada – a era das ap us   roberto brandão,  ciênciaComputação acelerada – a era das ap us   roberto brandão,  ciência
Computação acelerada – a era das ap us roberto brandão, ciência
 
Amazon EC2 deepdive and a sprinkel of AWS Compute | AWS Floor28
Amazon EC2 deepdive and a sprinkel of AWS Compute | AWS Floor28Amazon EC2 deepdive and a sprinkel of AWS Compute | AWS Floor28
Amazon EC2 deepdive and a sprinkel of AWS Compute | AWS Floor28
 
Pgopencl
PgopenclPgopencl
Pgopencl
 
PostgreSQL with OpenCL
PostgreSQL with OpenCLPostgreSQL with OpenCL
PostgreSQL with OpenCL
 
Backend.AI Technical Introduction (19.09 / 2019 Autumn)
Backend.AI Technical Introduction (19.09 / 2019 Autumn)Backend.AI Technical Introduction (19.09 / 2019 Autumn)
Backend.AI Technical Introduction (19.09 / 2019 Autumn)
 
AMD It's Time to ROC
AMD It's Time to ROCAMD It's Time to ROC
AMD It's Time to ROC
 
Programming Models for Heterogeneous Chips
Programming Models for  Heterogeneous ChipsProgramming Models for  Heterogeneous Chips
Programming Models for Heterogeneous Chips
 
GPU Programming with Java
GPU Programming with JavaGPU Programming with Java
GPU Programming with Java
 
Cuda
CudaCuda
Cuda
 
Part 4 Maximizing the utilization of GPU resources on-premise and in the cloud
Part 4  Maximizing the utilization of GPU resources on-premise and in the cloudPart 4  Maximizing the utilization of GPU resources on-premise and in the cloud
Part 4 Maximizing the utilization of GPU resources on-premise and in the cloud
 
Heterogeneous computing
Heterogeneous computingHeterogeneous computing
Heterogeneous computing
 
AWS re:Invent 2016: Deep Learning, 3D Content Rendering, and Massively Parall...
AWS re:Invent 2016: Deep Learning, 3D Content Rendering, and Massively Parall...AWS re:Invent 2016: Deep Learning, 3D Content Rendering, and Massively Parall...
AWS re:Invent 2016: Deep Learning, 3D Content Rendering, and Massively Parall...
 
Stream Processing
Stream ProcessingStream Processing
Stream Processing
 
Deep Dive on Amazon EC2 Accelerated Computing - AWS Online Tech Talks
Deep Dive on Amazon EC2 Accelerated Computing - AWS Online Tech TalksDeep Dive on Amazon EC2 Accelerated Computing - AWS Online Tech Talks
Deep Dive on Amazon EC2 Accelerated Computing - AWS Online Tech Talks
 
Amd accelerated computing -ufrj
Amd   accelerated computing -ufrjAmd   accelerated computing -ufrj
Amd accelerated computing -ufrj
 
Using a Field Programmable Gate Array to Accelerate Application Performance
Using a Field Programmable Gate Array to Accelerate Application PerformanceUsing a Field Programmable Gate Array to Accelerate Application Performance
Using a Field Programmable Gate Array to Accelerate Application Performance
 
The Rise of Parallel Computing
The Rise of Parallel ComputingThe Rise of Parallel Computing
The Rise of Parallel Computing
 
Balancing Power & Performance Webinar
Balancing Power & Performance WebinarBalancing Power & Performance Webinar
Balancing Power & Performance Webinar
 

More from Amazon Web Services

Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...Amazon Web Services
 
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...Amazon Web Services
 
Esegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS FargateEsegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS FargateAmazon Web Services
 
Costruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWSCostruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWSAmazon Web Services
 
Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot Amazon Web Services
 
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...Amazon Web Services
 
OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...Amazon Web Services
 
Microsoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows WorkloadsMicrosoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows WorkloadsAmazon Web Services
 
Database Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatareDatabase Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatareAmazon Web Services
 
Crea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJSCrea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJSAmazon Web Services
 
API moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e webAPI moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e webAmazon Web Services
 
Database Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatareDatabase Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatareAmazon Web Services
 
Tools for building your MVP on AWS
Tools for building your MVP on AWSTools for building your MVP on AWS
Tools for building your MVP on AWSAmazon Web Services
 
How to Build a Winning Pitch Deck
How to Build a Winning Pitch DeckHow to Build a Winning Pitch Deck
How to Build a Winning Pitch DeckAmazon Web Services
 
Building a web application without servers
Building a web application without serversBuilding a web application without servers
Building a web application without serversAmazon Web Services
 
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...Amazon Web Services
 
Introduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container ServiceIntroduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container ServiceAmazon Web Services
 

More from Amazon Web Services (20)

Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
 
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
 
Esegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS FargateEsegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS Fargate
 
Costruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWSCostruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWS
 
Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot
 
Open banking as a service
Open banking as a serviceOpen banking as a service
Open banking as a service
 
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
 
OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...
 
Microsoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows WorkloadsMicrosoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows Workloads
 
Computer Vision con AWS
Computer Vision con AWSComputer Vision con AWS
Computer Vision con AWS
 
Database Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatareDatabase Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatare
 
Crea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJSCrea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJS
 
API moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e webAPI moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e web
 
Database Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatareDatabase Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatare
 
Tools for building your MVP on AWS
Tools for building your MVP on AWSTools for building your MVP on AWS
Tools for building your MVP on AWS
 
How to Build a Winning Pitch Deck
How to Build a Winning Pitch DeckHow to Build a Winning Pitch Deck
How to Build a Winning Pitch Deck
 
Building a web application without servers
Building a web application without serversBuilding a web application without servers
Building a web application without servers
 
Fundraising Essentials
Fundraising EssentialsFundraising Essentials
Fundraising Essentials
 
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
 
Introduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container ServiceIntroduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container Service
 

Machine Learning Developers - Know your GPUs

  • 1. Machine Learning Developers – Know Your GPUs Aparna Elangovan, AWS Solutions Architect, 16th July 2018 © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
  • 2. • 10s-100s of processing cores • Pre-defined instruction set & datapath widths • Optimized for general- purpose computing CPU CPUs vs GPUs • 1,000s of processing cores • Pre-defined instruction set and datapath widths • Highly effective at parallel execution GPU DRAM Control ALU ALU Cache DRAM ALU ALU Control ALU ALU Cache DRAM ALU ALU Control ALU ALU Cache DRAM ALU ALU Control ALU ALU Cache DRAM ALU ALU
  • 3. How GPU acceleration works 3. Copy result data from GPU memory and CPU memory CPU CPU memory GPU GPU memory PCI Bus 2. Process on GPU 1. Copy data from CPU memory and GPU memory
  • 4. GPU Manufacturers & Programming platforms • Manufacturers • NVIDIA • Intel • .. • GPU Programming Platforms • CUDA (Compute Unified Device Architecture) parallel computing platform.. Specific to NVIDIA GPUs • OpenCL - Open platform programming model
  • 5. NVIDIA GPU Hello World Vector Add in C & CUDA // Global key word indicates this code runs on GPU __global__ void add(int n, float a, float *x, float *y) { int i = blockIdx.x*blockDim.x + threadIdx.x; if (i < n) y[i] = a*x[i] + y[i]; } int main(void) { // Initialize your array ( CPU & GPU memory) // Some code here …… // Step 1 - Copy data from CPU (Host) memory to GPU (Device) memory cudaMemcpy(d_x, x, N*sizeof(float), cudaMemcpyHostToDevice); cudaMemcpy(d_y, y, N*sizeof(float), cudaMemcpyHostToDevice); // Step 2 Perform add on 1M elements on GPU add<<<(N+255)/256, 256>>>(N, 2.0f, d_x, d_y); // Step3 Copy result data back from GPU (Device) memory to CPU (Host) memory cudaMemcpy(y, d_y, N*sizeof(float), cudaMemcpyDeviceToHost); // Free memory on GPU (Device) // …. }
  • 6. Demo - GPU Acceleration Vector Add & Multiply in Python
  • 7. Summary 1. GPU instance ≠ Faster performance • Program & compile your code to target CPU • Operations must be parallelizable & compute intensive • Need lots of data • Profile your code to measure CPU vs GPU Utlization
  • 8. Scan QR Code for More Information