SlideShare a Scribd company logo
1 of 6
Download to read offline
Autor: Roland Bruggmann, roland.bruggmann@students.bfh.ch
Date: 30. July 2015
Berner Fachhochschule | Haute ´ecole sp´ecialis´ee bernoise | Bern University of Applied Sciences
Multicore and GPU Programming
Module BTI7407 Parallel Computing
Exercises
Contents
1 Introduction 1
1.1 Taxonomy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.2 Performance Metrics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.2.1 Speedup . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.2.2 Efficiency . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.2.3 Scaling Efficiency . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.2.4 Amdahl’s Law . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
Acronyms 4
Bibliography 4
Exercises, Version 0.1 i
1 Introduction
1.1 Taxonomy
By Michael Flynn, in 1966 (see [Bar15, p. 3]):
Single Instruction, Single Data (SISD): One instruction at a time, operating on a single data item. E.g., each
core of a contemporary multicore-CPU can be considered a SISD machine.
Single Instruction, Multiple Data (SIMD): Each instruction is applied on a collection of items. E.g., vector
processors and GPUs on the level of the Streaming Multiprocessor.
Multiple Instructions, Single Data (MISD): Multiple instructions applied to the same data item. Used when
fault tolerance is required, e.g., in a military or aerospace applications.
Multiple Instructions, Multiple Data (MIMD): Multicore machines, including GPUs, follow this paradigm. GPUs
are made from a collection of SIMD units, whereby each can execute its own program—collectively they be-
have as a MIMD one.
1.2 Performance Metrics
1.2.1 Speedup
The improvement in execution time by the use of a parallel solution is defined as (see [Bar15, p. 14]):
speedup =
tseq
tpar
(1.1)
where tseq is the execution time of the sequential program, and tpar is the execution time of the parallel program
for solving the same instance of a problem. Both are wall-clock times, and as such they are not objective. Speedup
can still vary based on the system as well as on the input data. For this reason, it is customary to report average
figures, or even average, maximum, and minimum observed. It can tell us if it is feasible to accelerate the solution
of a problem, e.g., if speedup > 1.
1.2.2 Efficiency
Generic efficiency can tell us if this can be done efficiently, i.e., with a modest amount of resources (ressource
utilization, see [Bar15, p. 15]):
ef f iciency =
speedup
N
=
tseq
N · tpar
(1.2)
where N is the number of CPUs/cores employed for the execution of the parallel program. Normally, speedup is
expected as < N. When speedup = N, the corresponding parallel program exhibits what is called a linear speedup.
There are even situations where speedup > N and ef f iciency > 1 in what is known as a superlinear speedup
scenario.
Exercises, Version 0.1 1
1.2.3 Scaling Efficiency
In general, scalability is the ability to handle a growing amount of work efficiently. In the context of a parallel
algorithm and/or platform, scalability translates to being able to
ˆ (a) solve bigger problems (weak scaling efficiency) and/or
ˆ (b) to incorporate more computing resources (strong scaling efficiency).
Strong Scaling Efficiency is defined by the same equation as the generic efficiency in Equation 1.2, see [Bar15,
p. 17]):
strongScalingEf f iciency(N) =
tseq
N · tpar
(1.3)
Weak Scaling Efficiency is defined as (see [Bar15, p. 18]):
weakScalingEf f iciency(N) =
tseq
tpar
(1.4)
where tpar is the time to solve a problem that is N times bigger than the one the single machine is solving in time
tseq. There are number of issues with calculating scaling efficiency when GPU computing ressources are involved:
e.g., tseq for single CPU versus tpar for CPU/GPU-hybrid including I/O (cp. [Bar15, p. 18]).
1.2.4 Amdahl’s Law
Gene Amdahl, in 1967, assumed (see [Bar15, p. 21]):
ˆ We have a sequential application that requires time T to execute on a single CPU.
ˆ The application consists of a 0 α 1 part that can be parallelized.
The remaining 1 − α has to be done sequentially.
ˆ Parallel execution incurs no communication overhead, and the paralellizable part can be divided evenly among
any chosen number of CPUs. This assumption suits particularly well multicore architectures, where cores
have access to the same shared memory.
Then, speedup obained by N nodes should be upperbound by:
speedup =
tseq
tpar
=
T
(1 − α)T + α·T
N
=
1
1 − α + α
N
(1.5)
and by obtaining the limit for N → ∞:
lim
N→∞
(speedup) =
1
1 − α
(1.6)
It solves a difficult question: How much faster can a problem be solved by a paralell program? And it does so in a
completely abstract manner. It relies only on the characteristics of the problem, i.e., α.
Exercises, Version 0.1 2
Figure 1.1: Speedup curves for different values of α, as predicted by Amdahl’s law
Figure 1.2: Efficiency curves for different values of α, as predicted by Amdahl’s law
Exercises, Version 0.1 3
Acronyms
CPU Central Processing Unit
CUDA Compute Unified Device Architecture
GPGPU General Processing on Graphic Processing Unit
GPU Graphic Processing Unit
MIMD Multiple Instructions, Multiple Data
MISD Multiple Instructions, Single Data
MPI Message Passing Interface
OpenCL Open Computing Library
OpenMPI Open Message Passing Interface
PC Program Counter
PCAM Partitioning, Communication, Agglomeration, and Mapping
SIMD Single Instruction, Multiple Data
SIMT Single Instruction Multiple Threads
SISD Single Instruction, Single Data
Bibliography
[Bar15] Gerassimos Barlas. Multicore and GPU Programming – An Integrated Approach. 1st ed. Waltham: Morgan
Kaufmann, 2015. ISBN: 978-0-12-417137-4. URL: http://booksite.elsevier.com/9780124171374/
(visited on 30/07/2015).
Exercises, Version 0.1 4

More Related Content

What's hot

Introduction to Parallel Computing
Introduction to Parallel ComputingIntroduction to Parallel Computing
Introduction to Parallel ComputingAkhila Prabhakaran
 
Chapter 4: Parallel Programming Languages
Chapter 4: Parallel Programming LanguagesChapter 4: Parallel Programming Languages
Chapter 4: Parallel Programming LanguagesHeman Pathak
 
Parallel Processing
Parallel ProcessingParallel Processing
Parallel ProcessingRTigger
 
Introduction To Parallel Computing
Introduction To Parallel ComputingIntroduction To Parallel Computing
Introduction To Parallel ComputingJörn Dinkla
 
Research Scope in Parallel Computing And Parallel Programming
Research Scope in Parallel Computing And Parallel ProgrammingResearch Scope in Parallel Computing And Parallel Programming
Research Scope in Parallel Computing And Parallel ProgrammingShitalkumar Sukhdeve
 
Sara Afshar: Scheduling and Resource Sharing in Multiprocessor Real-Time Systems
Sara Afshar: Scheduling and Resource Sharing in Multiprocessor Real-Time SystemsSara Afshar: Scheduling and Resource Sharing in Multiprocessor Real-Time Systems
Sara Afshar: Scheduling and Resource Sharing in Multiprocessor Real-Time Systemsknowdiff
 
Parallelization of Graceful Labeling Using Open MP
Parallelization of Graceful Labeling Using Open MPParallelization of Graceful Labeling Using Open MP
Parallelization of Graceful Labeling Using Open MPIJSRED
 
Lecture 1
Lecture 1Lecture 1
Lecture 1Mr SMAK
 
Real-Time Scheduling Algorithms
Real-Time Scheduling AlgorithmsReal-Time Scheduling Algorithms
Real-Time Scheduling AlgorithmsAJAL A J
 
network ram parallel computing
network ram parallel computingnetwork ram parallel computing
network ram parallel computingNiranjana Ambadi
 
INTRODUCTION TO PARALLEL PROCESSING
INTRODUCTION TO PARALLEL PROCESSINGINTRODUCTION TO PARALLEL PROCESSING
INTRODUCTION TO PARALLEL PROCESSINGGS Kosta
 
Parallel Processors (SIMD)
Parallel Processors (SIMD) Parallel Processors (SIMD)
Parallel Processors (SIMD) Ali Raza
 
Memory allocation for real time operating system
Memory allocation for real time operating systemMemory allocation for real time operating system
Memory allocation for real time operating systemAsma'a Lafi
 

What's hot (20)

Parallel processing
Parallel processingParallel processing
Parallel processing
 
Introduction to Parallel Computing
Introduction to Parallel ComputingIntroduction to Parallel Computing
Introduction to Parallel Computing
 
Chapter 4: Parallel Programming Languages
Chapter 4: Parallel Programming LanguagesChapter 4: Parallel Programming Languages
Chapter 4: Parallel Programming Languages
 
Parallel Processing
Parallel ProcessingParallel Processing
Parallel Processing
 
Introduction To Parallel Computing
Introduction To Parallel ComputingIntroduction To Parallel Computing
Introduction To Parallel Computing
 
Chap1 slides
Chap1 slidesChap1 slides
Chap1 slides
 
Research Scope in Parallel Computing And Parallel Programming
Research Scope in Parallel Computing And Parallel ProgrammingResearch Scope in Parallel Computing And Parallel Programming
Research Scope in Parallel Computing And Parallel Programming
 
Sara Afshar: Scheduling and Resource Sharing in Multiprocessor Real-Time Systems
Sara Afshar: Scheduling and Resource Sharing in Multiprocessor Real-Time SystemsSara Afshar: Scheduling and Resource Sharing in Multiprocessor Real-Time Systems
Sara Afshar: Scheduling and Resource Sharing in Multiprocessor Real-Time Systems
 
Parallelization of Graceful Labeling Using Open MP
Parallelization of Graceful Labeling Using Open MPParallelization of Graceful Labeling Using Open MP
Parallelization of Graceful Labeling Using Open MP
 
Lecture 1
Lecture 1Lecture 1
Lecture 1
 
Solution(1)
Solution(1)Solution(1)
Solution(1)
 
Parallel Computing
Parallel ComputingParallel Computing
Parallel Computing
 
Real-Time Scheduling Algorithms
Real-Time Scheduling AlgorithmsReal-Time Scheduling Algorithms
Real-Time Scheduling Algorithms
 
network ram parallel computing
network ram parallel computingnetwork ram parallel computing
network ram parallel computing
 
INTRODUCTION TO PARALLEL PROCESSING
INTRODUCTION TO PARALLEL PROCESSINGINTRODUCTION TO PARALLEL PROCESSING
INTRODUCTION TO PARALLEL PROCESSING
 
Real time-embedded-system-lec-04
Real time-embedded-system-lec-04Real time-embedded-system-lec-04
Real time-embedded-system-lec-04
 
Parallel Processors (SIMD)
Parallel Processors (SIMD) Parallel Processors (SIMD)
Parallel Processors (SIMD)
 
Real time-embedded-system-lec-05
Real time-embedded-system-lec-05Real time-embedded-system-lec-05
Real time-embedded-system-lec-05
 
Memory allocation for real time operating system
Memory allocation for real time operating systemMemory allocation for real time operating system
Memory allocation for real time operating system
 
Parallel Processing Concepts
Parallel Processing Concepts Parallel Processing Concepts
Parallel Processing Concepts
 

Similar to Multicore and GPU Programming

Lecture 3
Lecture 3Lecture 3
Lecture 3Mr SMAK
 
High Performance & High Throughput Computing - EUDAT Summer School (Giuseppe ...
High Performance & High Throughput Computing - EUDAT Summer School (Giuseppe ...High Performance & High Throughput Computing - EUDAT Summer School (Giuseppe ...
High Performance & High Throughput Computing - EUDAT Summer School (Giuseppe ...EUDAT
 
Parallel Programming on the ANDC cluster
Parallel Programming on the ANDC clusterParallel Programming on the ANDC cluster
Parallel Programming on the ANDC clusterSudhang Shankar
 
parallel Questions &amp; answers
parallel Questions &amp; answersparallel Questions &amp; answers
parallel Questions &amp; answersMd. Mashiur Rahman
 
Pipelining in Computer System Achitecture
Pipelining in Computer System AchitecturePipelining in Computer System Achitecture
Pipelining in Computer System AchitectureYashiUpadhyay3
 
DYNAMIC VOLTAGE SCALING FOR POWER CONSUMPTION REDUCTION IN REAL-TIME MIXED TA...
DYNAMIC VOLTAGE SCALING FOR POWER CONSUMPTION REDUCTION IN REAL-TIME MIXED TA...DYNAMIC VOLTAGE SCALING FOR POWER CONSUMPTION REDUCTION IN REAL-TIME MIXED TA...
DYNAMIC VOLTAGE SCALING FOR POWER CONSUMPTION REDUCTION IN REAL-TIME MIXED TA...cscpconf
 
Parallel programming
Parallel programmingParallel programming
Parallel programmingAnshul Sharma
 
Pipelining and vector processing
Pipelining and vector processingPipelining and vector processing
Pipelining and vector processingKamal Acharya
 
IRJET- Latin Square Computation of Order-3 using Open CL
IRJET- Latin Square Computation of Order-3 using Open CLIRJET- Latin Square Computation of Order-3 using Open CL
IRJET- Latin Square Computation of Order-3 using Open CLIRJET Journal
 
Design & Analysis of Algorithm course .pptx
Design & Analysis of Algorithm course .pptxDesign & Analysis of Algorithm course .pptx
Design & Analysis of Algorithm course .pptxJeevaMCSEKIOT
 
Performance analysis of sobel edge filter on heterogeneous system using opencl
Performance analysis of sobel edge filter on heterogeneous system using openclPerformance analysis of sobel edge filter on heterogeneous system using opencl
Performance analysis of sobel edge filter on heterogeneous system using opencleSAT Publishing House
 
ICS 2410.Parallel.Sytsems.Lecture.Week 3.week5.pptx
ICS 2410.Parallel.Sytsems.Lecture.Week 3.week5.pptxICS 2410.Parallel.Sytsems.Lecture.Week 3.week5.pptx
ICS 2410.Parallel.Sytsems.Lecture.Week 3.week5.pptxjohnsmith96441
 

Similar to Multicore and GPU Programming (20)

Aca11 bk2 ch9
Aca11 bk2 ch9Aca11 bk2 ch9
Aca11 bk2 ch9
 
Lecture 3
Lecture 3Lecture 3
Lecture 3
 
High Performance & High Throughput Computing - EUDAT Summer School (Giuseppe ...
High Performance & High Throughput Computing - EUDAT Summer School (Giuseppe ...High Performance & High Throughput Computing - EUDAT Summer School (Giuseppe ...
High Performance & High Throughput Computing - EUDAT Summer School (Giuseppe ...
 
Parallel Programming on the ANDC cluster
Parallel Programming on the ANDC clusterParallel Programming on the ANDC cluster
Parallel Programming on the ANDC cluster
 
Chapter 2 ds
Chapter 2 dsChapter 2 ds
Chapter 2 ds
 
Parallel computation
Parallel computationParallel computation
Parallel computation
 
parallel Questions &amp; answers
parallel Questions &amp; answersparallel Questions &amp; answers
parallel Questions &amp; answers
 
Pipelining in Computer System Achitecture
Pipelining in Computer System AchitecturePipelining in Computer System Achitecture
Pipelining in Computer System Achitecture
 
DYNAMIC VOLTAGE SCALING FOR POWER CONSUMPTION REDUCTION IN REAL-TIME MIXED TA...
DYNAMIC VOLTAGE SCALING FOR POWER CONSUMPTION REDUCTION IN REAL-TIME MIXED TA...DYNAMIC VOLTAGE SCALING FOR POWER CONSUMPTION REDUCTION IN REAL-TIME MIXED TA...
DYNAMIC VOLTAGE SCALING FOR POWER CONSUMPTION REDUCTION IN REAL-TIME MIXED TA...
 
Matrix multiplication
Matrix multiplicationMatrix multiplication
Matrix multiplication
 
Parallel programming
Parallel programmingParallel programming
Parallel programming
 
Pipelining and vector processing
Pipelining and vector processingPipelining and vector processing
Pipelining and vector processing
 
IRJET- Latin Square Computation of Order-3 using Open CL
IRJET- Latin Square Computation of Order-3 using Open CLIRJET- Latin Square Computation of Order-3 using Open CL
IRJET- Latin Square Computation of Order-3 using Open CL
 
Design & Analysis of Algorithm course .pptx
Design & Analysis of Algorithm course .pptxDesign & Analysis of Algorithm course .pptx
Design & Analysis of Algorithm course .pptx
 
Lecture1
Lecture1Lecture1
Lecture1
 
Performance analysis of sobel edge filter on heterogeneous system using opencl
Performance analysis of sobel edge filter on heterogeneous system using openclPerformance analysis of sobel edge filter on heterogeneous system using opencl
Performance analysis of sobel edge filter on heterogeneous system using opencl
 
parallel-computation.pdf
parallel-computation.pdfparallel-computation.pdf
parallel-computation.pdf
 
1.prallelism
1.prallelism1.prallelism
1.prallelism
 
1.prallelism
1.prallelism1.prallelism
1.prallelism
 
ICS 2410.Parallel.Sytsems.Lecture.Week 3.week5.pptx
ICS 2410.Parallel.Sytsems.Lecture.Week 3.week5.pptxICS 2410.Parallel.Sytsems.Lecture.Week 3.week5.pptx
ICS 2410.Parallel.Sytsems.Lecture.Week 3.week5.pptx
 

More from Roland Bruggmann

Fingerprint Analysis – Preprocessing and Feature Extraction
Fingerprint Analysis – Preprocessing and Feature ExtractionFingerprint Analysis – Preprocessing and Feature Extraction
Fingerprint Analysis – Preprocessing and Feature ExtractionRoland Bruggmann
 
Unreal Engine IoT Project: Heartbeat
Unreal Engine IoT Project: HeartbeatUnreal Engine IoT Project: Heartbeat
Unreal Engine IoT Project: HeartbeatRoland Bruggmann
 
3D Content for Dream-like VR
3D Content for Dream-like VR3D Content for Dream-like VR
3D Content for Dream-like VRRoland Bruggmann
 
OSG Volume Rendering - Presentation
OSG Volume Rendering - PresentationOSG Volume Rendering - Presentation
OSG Volume Rendering - PresentationRoland Bruggmann
 
Swiss National Supercomputing Centre CSCS
Swiss National Supercomputing Centre CSCSSwiss National Supercomputing Centre CSCS
Swiss National Supercomputing Centre CSCSRoland Bruggmann
 
Numerische Methoden: Approximation und Integration
Numerische Methoden: Approximation und IntegrationNumerische Methoden: Approximation und Integration
Numerische Methoden: Approximation und IntegrationRoland Bruggmann
 
Unity® Volume Rendering - Abstract
Unity® Volume Rendering - AbstractUnity® Volume Rendering - Abstract
Unity® Volume Rendering - AbstractRoland Bruggmann
 
Unity® Volume Rendering - Benutzerhandbuch
Unity® Volume Rendering - BenutzerhandbuchUnity® Volume Rendering - Benutzerhandbuch
Unity® Volume Rendering - BenutzerhandbuchRoland Bruggmann
 
Serious Game "Virtual Surgery" - Game Design Document
Serious Game "Virtual Surgery" - Game Design DocumentSerious Game "Virtual Surgery" - Game Design Document
Serious Game "Virtual Surgery" - Game Design DocumentRoland Bruggmann
 
Digitale Kamera und Modulationstransferfunktion
Digitale Kamera und ModulationstransferfunktionDigitale Kamera und Modulationstransferfunktion
Digitale Kamera und ModulationstransferfunktionRoland Bruggmann
 
Visualisierung von Algorithmen und Datenstrukturen
Visualisierung von Algorithmen und DatenstrukturenVisualisierung von Algorithmen und Datenstrukturen
Visualisierung von Algorithmen und DatenstrukturenRoland Bruggmann
 
User-centered Design für Telemedizin-App
User-centered Design für Telemedizin-AppUser-centered Design für Telemedizin-App
User-centered Design für Telemedizin-AppRoland Bruggmann
 
TOGAF Architecture Content Framework
TOGAF Architecture Content FrameworkTOGAF Architecture Content Framework
TOGAF Architecture Content FrameworkRoland Bruggmann
 

More from Roland Bruggmann (20)

Fingerprint Analysis – Preprocessing and Feature Extraction
Fingerprint Analysis – Preprocessing and Feature ExtractionFingerprint Analysis – Preprocessing and Feature Extraction
Fingerprint Analysis – Preprocessing and Feature Extraction
 
Unreal Engine IoT Project: Heartbeat
Unreal Engine IoT Project: HeartbeatUnreal Engine IoT Project: Heartbeat
Unreal Engine IoT Project: Heartbeat
 
3D Content for Dream-like VR
3D Content for Dream-like VR3D Content for Dream-like VR
3D Content for Dream-like VR
 
OSG Volume Rendering - Presentation
OSG Volume Rendering - PresentationOSG Volume Rendering - Presentation
OSG Volume Rendering - Presentation
 
Swiss National Supercomputing Centre CSCS
Swiss National Supercomputing Centre CSCSSwiss National Supercomputing Centre CSCS
Swiss National Supercomputing Centre CSCS
 
Sprechen als Handeln
Sprechen als HandelnSprechen als Handeln
Sprechen als Handeln
 
Numerische Methoden: Approximation und Integration
Numerische Methoden: Approximation und IntegrationNumerische Methoden: Approximation und Integration
Numerische Methoden: Approximation und Integration
 
Unity® Volume Rendering - Abstract
Unity® Volume Rendering - AbstractUnity® Volume Rendering - Abstract
Unity® Volume Rendering - Abstract
 
Unity® Volume Rendering - Benutzerhandbuch
Unity® Volume Rendering - BenutzerhandbuchUnity® Volume Rendering - Benutzerhandbuch
Unity® Volume Rendering - Benutzerhandbuch
 
Serious Game "Virtual Surgery" - Game Design Document
Serious Game "Virtual Surgery" - Game Design DocumentSerious Game "Virtual Surgery" - Game Design Document
Serious Game "Virtual Surgery" - Game Design Document
 
OSG Volume Rendering
OSG Volume RenderingOSG Volume Rendering
OSG Volume Rendering
 
Digitale Kamera und Modulationstransferfunktion
Digitale Kamera und ModulationstransferfunktionDigitale Kamera und Modulationstransferfunktion
Digitale Kamera und Modulationstransferfunktion
 
Quadriken im Raum
Quadriken im RaumQuadriken im Raum
Quadriken im Raum
 
Visualisierung von Algorithmen und Datenstrukturen
Visualisierung von Algorithmen und DatenstrukturenVisualisierung von Algorithmen und Datenstrukturen
Visualisierung von Algorithmen und Datenstrukturen
 
User-centered Design für Telemedizin-App
User-centered Design für Telemedizin-AppUser-centered Design für Telemedizin-App
User-centered Design für Telemedizin-App
 
Ondes stationnaires
Ondes stationnairesOndes stationnaires
Ondes stationnaires
 
Passwords Safe
Passwords SafePasswords Safe
Passwords Safe
 
Stehende Wellen
Stehende WellenStehende Wellen
Stehende Wellen
 
TOGAF Architecture Content Framework
TOGAF Architecture Content FrameworkTOGAF Architecture Content Framework
TOGAF Architecture Content Framework
 
Cultural Dimensions
Cultural DimensionsCultural Dimensions
Cultural Dimensions
 

Recently uploaded

WSO2CON 2024 - Freedom First—Unleashing Developer Potential with Open Source
WSO2CON 2024 - Freedom First—Unleashing Developer Potential with Open SourceWSO2CON 2024 - Freedom First—Unleashing Developer Potential with Open Source
WSO2CON 2024 - Freedom First—Unleashing Developer Potential with Open SourceWSO2
 
WSO2Con2024 - WSO2's IAM Vision: Identity-Led Digital Transformation
WSO2Con2024 - WSO2's IAM Vision: Identity-Led Digital TransformationWSO2Con2024 - WSO2's IAM Vision: Identity-Led Digital Transformation
WSO2Con2024 - WSO2's IAM Vision: Identity-Led Digital TransformationWSO2
 
Crypto Cloud Review - How To Earn Up To $500 Per DAY Of Bitcoin 100% On AutoP...
Crypto Cloud Review - How To Earn Up To $500 Per DAY Of Bitcoin 100% On AutoP...Crypto Cloud Review - How To Earn Up To $500 Per DAY Of Bitcoin 100% On AutoP...
Crypto Cloud Review - How To Earn Up To $500 Per DAY Of Bitcoin 100% On AutoP...SelfMade bd
 
%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...
%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...
%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...masabamasaba
 
%in tembisa+277-882-255-28 abortion pills for sale in tembisa
%in tembisa+277-882-255-28 abortion pills for sale in tembisa%in tembisa+277-882-255-28 abortion pills for sale in tembisa
%in tembisa+277-882-255-28 abortion pills for sale in tembisamasabamasaba
 
%+27788225528 love spells in Knoxville Psychic Readings, Attraction spells,Br...
%+27788225528 love spells in Knoxville Psychic Readings, Attraction spells,Br...%+27788225528 love spells in Knoxville Psychic Readings, Attraction spells,Br...
%+27788225528 love spells in Knoxville Psychic Readings, Attraction spells,Br...masabamasaba
 
WSO2Con2024 - From Code To Cloud: Fast Track Your Cloud Native Journey with C...
WSO2Con2024 - From Code To Cloud: Fast Track Your Cloud Native Journey with C...WSO2Con2024 - From Code To Cloud: Fast Track Your Cloud Native Journey with C...
WSO2Con2024 - From Code To Cloud: Fast Track Your Cloud Native Journey with C...WSO2
 
WSO2CON 2024 Slides - Open Source to SaaS
WSO2CON 2024 Slides - Open Source to SaaSWSO2CON 2024 Slides - Open Source to SaaS
WSO2CON 2024 Slides - Open Source to SaaSWSO2
 
%+27788225528 love spells in Boston Psychic Readings, Attraction spells,Bring...
%+27788225528 love spells in Boston Psychic Readings, Attraction spells,Bring...%+27788225528 love spells in Boston Psychic Readings, Attraction spells,Bring...
%+27788225528 love spells in Boston Psychic Readings, Attraction spells,Bring...masabamasaba
 
Announcing Codolex 2.0 from GDK Software
Announcing Codolex 2.0 from GDK SoftwareAnnouncing Codolex 2.0 from GDK Software
Announcing Codolex 2.0 from GDK SoftwareJim McKeeth
 
Direct Style Effect Systems - The Print[A] Example - A Comprehension Aid
Direct Style Effect Systems -The Print[A] Example- A Comprehension AidDirect Style Effect Systems -The Print[A] Example- A Comprehension Aid
Direct Style Effect Systems - The Print[A] Example - A Comprehension AidPhilip Schwarz
 
%in Benoni+277-882-255-28 abortion pills for sale in Benoni
%in Benoni+277-882-255-28 abortion pills for sale in Benoni%in Benoni+277-882-255-28 abortion pills for sale in Benoni
%in Benoni+277-882-255-28 abortion pills for sale in Benonimasabamasaba
 
AI & Machine Learning Presentation Template
AI & Machine Learning Presentation TemplateAI & Machine Learning Presentation Template
AI & Machine Learning Presentation TemplatePresentation.STUDIO
 
tonesoftg
tonesoftgtonesoftg
tonesoftglanshi9
 
Love witchcraft +27768521739 Binding love spell in Sandy Springs, GA |psychic...
Love witchcraft +27768521739 Binding love spell in Sandy Springs, GA |psychic...Love witchcraft +27768521739 Binding love spell in Sandy Springs, GA |psychic...
Love witchcraft +27768521739 Binding love spell in Sandy Springs, GA |psychic...chiefasafspells
 
WSO2CON 2024 - Navigating API Complexity: REST, GraphQL, gRPC, Websocket, Web...
WSO2CON 2024 - Navigating API Complexity: REST, GraphQL, gRPC, Websocket, Web...WSO2CON 2024 - Navigating API Complexity: REST, GraphQL, gRPC, Websocket, Web...
WSO2CON 2024 - Navigating API Complexity: REST, GraphQL, gRPC, Websocket, Web...WSO2
 
Architecture decision records - How not to get lost in the past
Architecture decision records - How not to get lost in the pastArchitecture decision records - How not to get lost in the past
Architecture decision records - How not to get lost in the pastPapp Krisztián
 
WSO2CON 2024 - API Management Usage at La Poste and Its Impact on Business an...
WSO2CON 2024 - API Management Usage at La Poste and Its Impact on Business an...WSO2CON 2024 - API Management Usage at La Poste and Its Impact on Business an...
WSO2CON 2024 - API Management Usage at La Poste and Its Impact on Business an...WSO2
 

Recently uploaded (20)

WSO2CON 2024 - Freedom First—Unleashing Developer Potential with Open Source
WSO2CON 2024 - Freedom First—Unleashing Developer Potential with Open SourceWSO2CON 2024 - Freedom First—Unleashing Developer Potential with Open Source
WSO2CON 2024 - Freedom First—Unleashing Developer Potential with Open Source
 
WSO2Con2024 - WSO2's IAM Vision: Identity-Led Digital Transformation
WSO2Con2024 - WSO2's IAM Vision: Identity-Led Digital TransformationWSO2Con2024 - WSO2's IAM Vision: Identity-Led Digital Transformation
WSO2Con2024 - WSO2's IAM Vision: Identity-Led Digital Transformation
 
Crypto Cloud Review - How To Earn Up To $500 Per DAY Of Bitcoin 100% On AutoP...
Crypto Cloud Review - How To Earn Up To $500 Per DAY Of Bitcoin 100% On AutoP...Crypto Cloud Review - How To Earn Up To $500 Per DAY Of Bitcoin 100% On AutoP...
Crypto Cloud Review - How To Earn Up To $500 Per DAY Of Bitcoin 100% On AutoP...
 
%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...
%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...
%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...
 
Abortion Pills In Pretoria ](+27832195400*)[ 🏥 Women's Abortion Clinic In Pre...
Abortion Pills In Pretoria ](+27832195400*)[ 🏥 Women's Abortion Clinic In Pre...Abortion Pills In Pretoria ](+27832195400*)[ 🏥 Women's Abortion Clinic In Pre...
Abortion Pills In Pretoria ](+27832195400*)[ 🏥 Women's Abortion Clinic In Pre...
 
%in tembisa+277-882-255-28 abortion pills for sale in tembisa
%in tembisa+277-882-255-28 abortion pills for sale in tembisa%in tembisa+277-882-255-28 abortion pills for sale in tembisa
%in tembisa+277-882-255-28 abortion pills for sale in tembisa
 
%+27788225528 love spells in Knoxville Psychic Readings, Attraction spells,Br...
%+27788225528 love spells in Knoxville Psychic Readings, Attraction spells,Br...%+27788225528 love spells in Knoxville Psychic Readings, Attraction spells,Br...
%+27788225528 love spells in Knoxville Psychic Readings, Attraction spells,Br...
 
WSO2Con2024 - From Code To Cloud: Fast Track Your Cloud Native Journey with C...
WSO2Con2024 - From Code To Cloud: Fast Track Your Cloud Native Journey with C...WSO2Con2024 - From Code To Cloud: Fast Track Your Cloud Native Journey with C...
WSO2Con2024 - From Code To Cloud: Fast Track Your Cloud Native Journey with C...
 
WSO2CON 2024 Slides - Open Source to SaaS
WSO2CON 2024 Slides - Open Source to SaaSWSO2CON 2024 Slides - Open Source to SaaS
WSO2CON 2024 Slides - Open Source to SaaS
 
%+27788225528 love spells in Boston Psychic Readings, Attraction spells,Bring...
%+27788225528 love spells in Boston Psychic Readings, Attraction spells,Bring...%+27788225528 love spells in Boston Psychic Readings, Attraction spells,Bring...
%+27788225528 love spells in Boston Psychic Readings, Attraction spells,Bring...
 
Announcing Codolex 2.0 from GDK Software
Announcing Codolex 2.0 from GDK SoftwareAnnouncing Codolex 2.0 from GDK Software
Announcing Codolex 2.0 from GDK Software
 
Direct Style Effect Systems - The Print[A] Example - A Comprehension Aid
Direct Style Effect Systems -The Print[A] Example- A Comprehension AidDirect Style Effect Systems -The Print[A] Example- A Comprehension Aid
Direct Style Effect Systems - The Print[A] Example - A Comprehension Aid
 
%in Benoni+277-882-255-28 abortion pills for sale in Benoni
%in Benoni+277-882-255-28 abortion pills for sale in Benoni%in Benoni+277-882-255-28 abortion pills for sale in Benoni
%in Benoni+277-882-255-28 abortion pills for sale in Benoni
 
AI & Machine Learning Presentation Template
AI & Machine Learning Presentation TemplateAI & Machine Learning Presentation Template
AI & Machine Learning Presentation Template
 
Abortion Pill Prices Boksburg [(+27832195400*)] 🏥 Women's Abortion Clinic in ...
Abortion Pill Prices Boksburg [(+27832195400*)] 🏥 Women's Abortion Clinic in ...Abortion Pill Prices Boksburg [(+27832195400*)] 🏥 Women's Abortion Clinic in ...
Abortion Pill Prices Boksburg [(+27832195400*)] 🏥 Women's Abortion Clinic in ...
 
tonesoftg
tonesoftgtonesoftg
tonesoftg
 
Love witchcraft +27768521739 Binding love spell in Sandy Springs, GA |psychic...
Love witchcraft +27768521739 Binding love spell in Sandy Springs, GA |psychic...Love witchcraft +27768521739 Binding love spell in Sandy Springs, GA |psychic...
Love witchcraft +27768521739 Binding love spell in Sandy Springs, GA |psychic...
 
WSO2CON 2024 - Navigating API Complexity: REST, GraphQL, gRPC, Websocket, Web...
WSO2CON 2024 - Navigating API Complexity: REST, GraphQL, gRPC, Websocket, Web...WSO2CON 2024 - Navigating API Complexity: REST, GraphQL, gRPC, Websocket, Web...
WSO2CON 2024 - Navigating API Complexity: REST, GraphQL, gRPC, Websocket, Web...
 
Architecture decision records - How not to get lost in the past
Architecture decision records - How not to get lost in the pastArchitecture decision records - How not to get lost in the past
Architecture decision records - How not to get lost in the past
 
WSO2CON 2024 - API Management Usage at La Poste and Its Impact on Business an...
WSO2CON 2024 - API Management Usage at La Poste and Its Impact on Business an...WSO2CON 2024 - API Management Usage at La Poste and Its Impact on Business an...
WSO2CON 2024 - API Management Usage at La Poste and Its Impact on Business an...
 

Multicore and GPU Programming

  • 1. Autor: Roland Bruggmann, roland.bruggmann@students.bfh.ch Date: 30. July 2015 Berner Fachhochschule | Haute ´ecole sp´ecialis´ee bernoise | Bern University of Applied Sciences Multicore and GPU Programming Module BTI7407 Parallel Computing Exercises
  • 2. Contents 1 Introduction 1 1.1 Taxonomy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 1.2 Performance Metrics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 1.2.1 Speedup . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 1.2.2 Efficiency . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 1.2.3 Scaling Efficiency . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2 1.2.4 Amdahl’s Law . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2 Acronyms 4 Bibliography 4 Exercises, Version 0.1 i
  • 3. 1 Introduction 1.1 Taxonomy By Michael Flynn, in 1966 (see [Bar15, p. 3]): Single Instruction, Single Data (SISD): One instruction at a time, operating on a single data item. E.g., each core of a contemporary multicore-CPU can be considered a SISD machine. Single Instruction, Multiple Data (SIMD): Each instruction is applied on a collection of items. E.g., vector processors and GPUs on the level of the Streaming Multiprocessor. Multiple Instructions, Single Data (MISD): Multiple instructions applied to the same data item. Used when fault tolerance is required, e.g., in a military or aerospace applications. Multiple Instructions, Multiple Data (MIMD): Multicore machines, including GPUs, follow this paradigm. GPUs are made from a collection of SIMD units, whereby each can execute its own program—collectively they be- have as a MIMD one. 1.2 Performance Metrics 1.2.1 Speedup The improvement in execution time by the use of a parallel solution is defined as (see [Bar15, p. 14]): speedup = tseq tpar (1.1) where tseq is the execution time of the sequential program, and tpar is the execution time of the parallel program for solving the same instance of a problem. Both are wall-clock times, and as such they are not objective. Speedup can still vary based on the system as well as on the input data. For this reason, it is customary to report average figures, or even average, maximum, and minimum observed. It can tell us if it is feasible to accelerate the solution of a problem, e.g., if speedup > 1. 1.2.2 Efficiency Generic efficiency can tell us if this can be done efficiently, i.e., with a modest amount of resources (ressource utilization, see [Bar15, p. 15]): ef f iciency = speedup N = tseq N · tpar (1.2) where N is the number of CPUs/cores employed for the execution of the parallel program. Normally, speedup is expected as < N. When speedup = N, the corresponding parallel program exhibits what is called a linear speedup. There are even situations where speedup > N and ef f iciency > 1 in what is known as a superlinear speedup scenario. Exercises, Version 0.1 1
  • 4. 1.2.3 Scaling Efficiency In general, scalability is the ability to handle a growing amount of work efficiently. In the context of a parallel algorithm and/or platform, scalability translates to being able to ˆ (a) solve bigger problems (weak scaling efficiency) and/or ˆ (b) to incorporate more computing resources (strong scaling efficiency). Strong Scaling Efficiency is defined by the same equation as the generic efficiency in Equation 1.2, see [Bar15, p. 17]): strongScalingEf f iciency(N) = tseq N · tpar (1.3) Weak Scaling Efficiency is defined as (see [Bar15, p. 18]): weakScalingEf f iciency(N) = tseq tpar (1.4) where tpar is the time to solve a problem that is N times bigger than the one the single machine is solving in time tseq. There are number of issues with calculating scaling efficiency when GPU computing ressources are involved: e.g., tseq for single CPU versus tpar for CPU/GPU-hybrid including I/O (cp. [Bar15, p. 18]). 1.2.4 Amdahl’s Law Gene Amdahl, in 1967, assumed (see [Bar15, p. 21]): ˆ We have a sequential application that requires time T to execute on a single CPU. ˆ The application consists of a 0 α 1 part that can be parallelized. The remaining 1 − α has to be done sequentially. ˆ Parallel execution incurs no communication overhead, and the paralellizable part can be divided evenly among any chosen number of CPUs. This assumption suits particularly well multicore architectures, where cores have access to the same shared memory. Then, speedup obained by N nodes should be upperbound by: speedup = tseq tpar = T (1 − α)T + α·T N = 1 1 − α + α N (1.5) and by obtaining the limit for N → ∞: lim N→∞ (speedup) = 1 1 − α (1.6) It solves a difficult question: How much faster can a problem be solved by a paralell program? And it does so in a completely abstract manner. It relies only on the characteristics of the problem, i.e., α. Exercises, Version 0.1 2
  • 5. Figure 1.1: Speedup curves for different values of α, as predicted by Amdahl’s law Figure 1.2: Efficiency curves for different values of α, as predicted by Amdahl’s law Exercises, Version 0.1 3
  • 6. Acronyms CPU Central Processing Unit CUDA Compute Unified Device Architecture GPGPU General Processing on Graphic Processing Unit GPU Graphic Processing Unit MIMD Multiple Instructions, Multiple Data MISD Multiple Instructions, Single Data MPI Message Passing Interface OpenCL Open Computing Library OpenMPI Open Message Passing Interface PC Program Counter PCAM Partitioning, Communication, Agglomeration, and Mapping SIMD Single Instruction, Multiple Data SIMT Single Instruction Multiple Threads SISD Single Instruction, Single Data Bibliography [Bar15] Gerassimos Barlas. Multicore and GPU Programming – An Integrated Approach. 1st ed. Waltham: Morgan Kaufmann, 2015. ISBN: 978-0-12-417137-4. URL: http://booksite.elsevier.com/9780124171374/ (visited on 30/07/2015). Exercises, Version 0.1 4