SlideShare a Scribd company logo
PPoPP 2010, Bangalore, India.
Basic Architecture Concepts
CPU Architecture
4 stages of instruction execution
Too many cycles per instruction (CPI)
Fetch Decode Execute Write
t=1 2 3 4 5
PPoPP 2010, Bangalore, India.
Basic Architecture Concepts
CPU Architecture
4 stages of instruction execution
Too many cycles per instruction (CPI)
To reduce the CPI, introduce  pipelined execution
Fetch Decode Execute Write
Fetch Decode Execute Write
Fetch Decode Execute Write
Fetch Decode Execute Write
t=1 2 3 4 5
PPoPP 2010, Bangalore, India.
Basic Architecture Concepts
CPU Architecture
4 stages of instruction execution
Too many cycles per instruction (CPI)
To reduce the CPI, introduce  pipelined execution
Needs buffers to store results across stages.
A cache to handle slow memory access times
Fetch Decode Execute Write
Fetch Decode Execute Write
Fetch Decode Execute Write
Fetch Decode Execute Write
t=1 2 3 4 5
Cache
PPoPP 2010, Bangalore, India.
Basic Architecture Concepts
Fetch Decode Execute Write
Fetch Decode Execute Write
Fetch Decode Execute Write
Fetch Decode Execute Write
t=1 2 3 4 5
Cache
CPU Architecture
4 stages of instruction execution
Too many cycles per instruction (CPI)
To reduce the CPI, introduce  pipelined execution
Needs buffers to store results across stages.
A cache to handle slow memory access times
Multilevel caches, out­of­order execution, branch prediction, ...
PPoPP 2010, Bangalore, India.
Basic Architecture Concepts
CPU architecture getting too complex.
Not translating to equivalent performance 
benefits
Need a rethink on traditional CPU architectures.
PPoPP 2010, Bangalore, India.
Basic Architecture Concepts
Couple with this the new wisdom in computer 
architectures.
 Memory Wall – memory latencies far higher
 ILP Wall – Reducing benefits from instruction 
level parallelism
 Power Wall – Increase in power consumption 
with increase in clock rates.
Multi­core is the way forward
Ex: GPUs, Cell, Intel Quad core, ...
Predicted that 100+ core computers would be a 
reality soon.
PPoPP 2010, Bangalore, India.
Multicore and Manycore Processors
IBM Cell
NVidia GeForce 8800 includes 128 scalar processors 
and Tesla
Sun T1 and T2
Tilera Tile64
Picochip combines 430 simple RISC cores
Cisco 188
TRIPS 
PPoPP 2010, Bangalore, India.
The Case for the GPUs
GPUs are now common. They also have high computing 
power per dollar, compared to the CPU
Today’s computer system has a CPU and a GPU, with the 
GPU being used primarily for graphics.
GPUs are good at some tasks and not so good at others.
They are especially good at processing large data such as 
images.
Let us use the right processor for the right task.
Goal: Increase the overall throughput of the computer system 
on the given task. Use CPU and GPU synergistically.
PPoPP 2010, Bangalore, India.
Evolution of GPUs
Graphics: a few hundred triangles/vertices map to afewhundred 
thousand pixels
Process pixels in parallel. Do the same thing on a large number of 
different items.
Data parallel model : parallelism provided by the data
Thousands to millions of data elements
Same program/instruction on all of them 
Hardware: 8­16  cores to process vertices and 64­128 to process 
pixels by 2005
Less versatile than CPU cores
SIMD mode of computations. Less hardware for instruction issue
No caching, branch prediction, out of order execution, etc.
‐ ‐
Can pack more cores in same silicon die area
PPoPP 2010, Bangalore, India.
GPUs as a Case Study
GPGPU – General Purpose Programming on 
GPUs 
OpenGL extensions 
Very difficult to program
Recently manufactures started supporting C­like 
 
programming abstraction to program GPUs
CUDA from NVidia
Other benefits of GPGPU
Affordable cost, easy availability, computational 
power
PPoPP 2010, Bangalore, India.
GPUs as a Case Study
GPUs suited for routines with high arithmetic 
intensity.
One feature is high memory latency, depending 
on the nature of access.
Should overlap memory with arithmetic.
PPoPP 2010, Bangalore, India.
CPU Vs GPU
Few powerful cores Vs. lots of small cores

GPUs: For good performance, applications 
need high arithmetic intensity
GPUs : No system managed cache.
PPoPP 2010, Bangalore, India.
GPGPU as a Case Study
Regular algorithms
Map well to data parallel model of GPUs
Each work item operates by itself or with a few 
neighbors
Example settings : image processing.
Threads can share data, e.g., apron pixels in an 
image processing kernel.
PPoPP 2010, Bangalore, India.
GPU as a Case Study
Irregular algorithms
Applications with data accesses that are not regular 
in nature.
Occurs in settings such as graph algorithms, data 
structures building, etc.
Difficult to get high efficiency due to high memory 
latency of accesses. 
PPoPP 2010, Bangalore, India.
GPGPU Tools and APIs
OpenGL
CUDA
OpenCL
Brook

More Related Content

More from Subhajit Sahu

Adjusting OpenMP PageRank : SHORT REPORT / NOTES
Adjusting OpenMP PageRank : SHORT REPORT / NOTESAdjusting OpenMP PageRank : SHORT REPORT / NOTES
Adjusting OpenMP PageRank : SHORT REPORT / NOTES
Subhajit Sahu
 
word2vec, node2vec, graph2vec, X2vec: Towards a Theory of Vector Embeddings o...
word2vec, node2vec, graph2vec, X2vec: Towards a Theory of Vector Embeddings o...word2vec, node2vec, graph2vec, X2vec: Towards a Theory of Vector Embeddings o...
word2vec, node2vec, graph2vec, X2vec: Towards a Theory of Vector Embeddings o...
Subhajit Sahu
 
DyGraph: A Dynamic Graph Generator and Benchmark Suite : NOTES
DyGraph: A Dynamic Graph Generator and Benchmark Suite : NOTESDyGraph: A Dynamic Graph Generator and Benchmark Suite : NOTES
DyGraph: A Dynamic Graph Generator and Benchmark Suite : NOTES
Subhajit Sahu
 
Shared memory Parallelism (NOTES)
Shared memory Parallelism (NOTES)Shared memory Parallelism (NOTES)
Shared memory Parallelism (NOTES)
Subhajit Sahu
 
A Dynamic Algorithm for Local Community Detection in Graphs : NOTES
A Dynamic Algorithm for Local Community Detection in Graphs : NOTESA Dynamic Algorithm for Local Community Detection in Graphs : NOTES
A Dynamic Algorithm for Local Community Detection in Graphs : NOTES
Subhajit Sahu
 
Scalable Static and Dynamic Community Detection Using Grappolo : NOTES
Scalable Static and Dynamic Community Detection Using Grappolo : NOTESScalable Static and Dynamic Community Detection Using Grappolo : NOTES
Scalable Static and Dynamic Community Detection Using Grappolo : NOTES
Subhajit Sahu
 
Application Areas of Community Detection: A Review : NOTES
Application Areas of Community Detection: A Review : NOTESApplication Areas of Community Detection: A Review : NOTES
Application Areas of Community Detection: A Review : NOTES
Subhajit Sahu
 
Community Detection on the GPU : NOTES
Community Detection on the GPU : NOTESCommunity Detection on the GPU : NOTES
Community Detection on the GPU : NOTES
Subhajit Sahu
 
Survey for extra-child-process package : NOTES
Survey for extra-child-process package : NOTESSurvey for extra-child-process package : NOTES
Survey for extra-child-process package : NOTES
Subhajit Sahu
 
Dynamic Batch Parallel Algorithms for Updating PageRank : POSTER
Dynamic Batch Parallel Algorithms for Updating PageRank : POSTERDynamic Batch Parallel Algorithms for Updating PageRank : POSTER
Dynamic Batch Parallel Algorithms for Updating PageRank : POSTER
Subhajit Sahu
 
Abstract for IPDPS 2022 PhD Forum on Dynamic Batch Parallel Algorithms for Up...
Abstract for IPDPS 2022 PhD Forum on Dynamic Batch Parallel Algorithms for Up...Abstract for IPDPS 2022 PhD Forum on Dynamic Batch Parallel Algorithms for Up...
Abstract for IPDPS 2022 PhD Forum on Dynamic Batch Parallel Algorithms for Up...
Subhajit Sahu
 
Fast Incremental Community Detection on Dynamic Graphs : NOTES
Fast Incremental Community Detection on Dynamic Graphs : NOTESFast Incremental Community Detection on Dynamic Graphs : NOTES
Fast Incremental Community Detection on Dynamic Graphs : NOTES
Subhajit Sahu
 
Can you fix farming by going back 8000 years : NOTES
Can you fix farming by going back 8000 years : NOTESCan you fix farming by going back 8000 years : NOTES
Can you fix farming by going back 8000 years : NOTES
Subhajit Sahu
 
HITS algorithm : NOTES
HITS algorithm : NOTESHITS algorithm : NOTES
HITS algorithm : NOTES
Subhajit Sahu
 
Dynamic Batch Parallel Algorithms for Updating Pagerank : SLIDES
Dynamic Batch Parallel Algorithms for Updating Pagerank : SLIDESDynamic Batch Parallel Algorithms for Updating Pagerank : SLIDES
Dynamic Batch Parallel Algorithms for Updating Pagerank : SLIDES
Subhajit Sahu
 
Are Satellites Covered in Gold Foil : NOTES
Are Satellites Covered in Gold Foil : NOTESAre Satellites Covered in Gold Foil : NOTES
Are Satellites Covered in Gold Foil : NOTES
Subhajit Sahu
 
Taxation for Traders < Markets and Taxation : NOTES
Taxation for Traders < Markets and Taxation : NOTESTaxation for Traders < Markets and Taxation : NOTES
Taxation for Traders < Markets and Taxation : NOTES
Subhajit Sahu
 
A Generalization of the PageRank Algorithm : NOTES
A Generalization of the PageRank Algorithm : NOTESA Generalization of the PageRank Algorithm : NOTES
A Generalization of the PageRank Algorithm : NOTES
Subhajit Sahu
 
ApproxBioWear: Approximating Additions for Efficient Biomedical Wearable Comp...
ApproxBioWear: Approximating Additions for Efficient Biomedical Wearable Comp...ApproxBioWear: Approximating Additions for Efficient Biomedical Wearable Comp...
ApproxBioWear: Approximating Additions for Efficient Biomedical Wearable Comp...
Subhajit Sahu
 
Income Tax Calender 2021 (ITD) : NOTES
Income Tax Calender 2021 (ITD) : NOTESIncome Tax Calender 2021 (ITD) : NOTES
Income Tax Calender 2021 (ITD) : NOTES
Subhajit Sahu
 

More from Subhajit Sahu (20)

Adjusting OpenMP PageRank : SHORT REPORT / NOTES
Adjusting OpenMP PageRank : SHORT REPORT / NOTESAdjusting OpenMP PageRank : SHORT REPORT / NOTES
Adjusting OpenMP PageRank : SHORT REPORT / NOTES
 
word2vec, node2vec, graph2vec, X2vec: Towards a Theory of Vector Embeddings o...
word2vec, node2vec, graph2vec, X2vec: Towards a Theory of Vector Embeddings o...word2vec, node2vec, graph2vec, X2vec: Towards a Theory of Vector Embeddings o...
word2vec, node2vec, graph2vec, X2vec: Towards a Theory of Vector Embeddings o...
 
DyGraph: A Dynamic Graph Generator and Benchmark Suite : NOTES
DyGraph: A Dynamic Graph Generator and Benchmark Suite : NOTESDyGraph: A Dynamic Graph Generator and Benchmark Suite : NOTES
DyGraph: A Dynamic Graph Generator and Benchmark Suite : NOTES
 
Shared memory Parallelism (NOTES)
Shared memory Parallelism (NOTES)Shared memory Parallelism (NOTES)
Shared memory Parallelism (NOTES)
 
A Dynamic Algorithm for Local Community Detection in Graphs : NOTES
A Dynamic Algorithm for Local Community Detection in Graphs : NOTESA Dynamic Algorithm for Local Community Detection in Graphs : NOTES
A Dynamic Algorithm for Local Community Detection in Graphs : NOTES
 
Scalable Static and Dynamic Community Detection Using Grappolo : NOTES
Scalable Static and Dynamic Community Detection Using Grappolo : NOTESScalable Static and Dynamic Community Detection Using Grappolo : NOTES
Scalable Static and Dynamic Community Detection Using Grappolo : NOTES
 
Application Areas of Community Detection: A Review : NOTES
Application Areas of Community Detection: A Review : NOTESApplication Areas of Community Detection: A Review : NOTES
Application Areas of Community Detection: A Review : NOTES
 
Community Detection on the GPU : NOTES
Community Detection on the GPU : NOTESCommunity Detection on the GPU : NOTES
Community Detection on the GPU : NOTES
 
Survey for extra-child-process package : NOTES
Survey for extra-child-process package : NOTESSurvey for extra-child-process package : NOTES
Survey for extra-child-process package : NOTES
 
Dynamic Batch Parallel Algorithms for Updating PageRank : POSTER
Dynamic Batch Parallel Algorithms for Updating PageRank : POSTERDynamic Batch Parallel Algorithms for Updating PageRank : POSTER
Dynamic Batch Parallel Algorithms for Updating PageRank : POSTER
 
Abstract for IPDPS 2022 PhD Forum on Dynamic Batch Parallel Algorithms for Up...
Abstract for IPDPS 2022 PhD Forum on Dynamic Batch Parallel Algorithms for Up...Abstract for IPDPS 2022 PhD Forum on Dynamic Batch Parallel Algorithms for Up...
Abstract for IPDPS 2022 PhD Forum on Dynamic Batch Parallel Algorithms for Up...
 
Fast Incremental Community Detection on Dynamic Graphs : NOTES
Fast Incremental Community Detection on Dynamic Graphs : NOTESFast Incremental Community Detection on Dynamic Graphs : NOTES
Fast Incremental Community Detection on Dynamic Graphs : NOTES
 
Can you fix farming by going back 8000 years : NOTES
Can you fix farming by going back 8000 years : NOTESCan you fix farming by going back 8000 years : NOTES
Can you fix farming by going back 8000 years : NOTES
 
HITS algorithm : NOTES
HITS algorithm : NOTESHITS algorithm : NOTES
HITS algorithm : NOTES
 
Dynamic Batch Parallel Algorithms for Updating Pagerank : SLIDES
Dynamic Batch Parallel Algorithms for Updating Pagerank : SLIDESDynamic Batch Parallel Algorithms for Updating Pagerank : SLIDES
Dynamic Batch Parallel Algorithms for Updating Pagerank : SLIDES
 
Are Satellites Covered in Gold Foil : NOTES
Are Satellites Covered in Gold Foil : NOTESAre Satellites Covered in Gold Foil : NOTES
Are Satellites Covered in Gold Foil : NOTES
 
Taxation for Traders < Markets and Taxation : NOTES
Taxation for Traders < Markets and Taxation : NOTESTaxation for Traders < Markets and Taxation : NOTES
Taxation for Traders < Markets and Taxation : NOTES
 
A Generalization of the PageRank Algorithm : NOTES
A Generalization of the PageRank Algorithm : NOTESA Generalization of the PageRank Algorithm : NOTES
A Generalization of the PageRank Algorithm : NOTES
 
ApproxBioWear: Approximating Additions for Efficient Biomedical Wearable Comp...
ApproxBioWear: Approximating Additions for Efficient Biomedical Wearable Comp...ApproxBioWear: Approximating Additions for Efficient Biomedical Wearable Comp...
ApproxBioWear: Approximating Additions for Efficient Biomedical Wearable Comp...
 
Income Tax Calender 2021 (ITD) : NOTES
Income Tax Calender 2021 (ITD) : NOTESIncome Tax Calender 2021 (ITD) : NOTES
Income Tax Calender 2021 (ITD) : NOTES
 

Basic Computer Architecture and the Case for GPUs : NOTES