Thinking in parallel ab tuladev

HPC & GPU Supercomputing Groups

 Non-profit, free to join groups hosted on www.meetup.com

 A group for the application of cutting-edge HPC & GPU
supercomputing technology to cutting-edge business
problems.

 Started January 2011 with New York group and reached
today 1000 members with all groups from Boston, Silicon
Valley, Chicago, New
Mexico, Denver, Seattle, Austin, Washington D.C., South
Florida, Tokyo

 Please visit
www.SupercomputingGroup.com
for South Florida group.

Красноармейский проспект 25, 19 ноября 2011
November 19, 2011

Many thanks to Andrew Sheppard for providing
supporting content for this presentation.

Andrew is the organizer of New York meetup group and he is a
financial consultant with extensive experience in quantitative
financial analysis, trading-desk software development, and
technical management. Andrew is also the author of the
forthcoming book "Programming GPUs‖, to be published by
O’Reilly (www.oreilly.com).

―Thinking in Parallel‖
is the term for making the conceptual leap that takes
a developer from writing programs that run on
hardware with little real parallelism, to writing
programs that execute efficiently on massively
parallel hardware, with 100’s and 1000’s of
cores, leading to very substantial speedups
(x10, x100 and beyond).

―[A]nd in this precious phial is the power to think twice
as fast, move twice as quickly, do twice as much
work in a given time as you could otherwise do.‖

—H. G. Wells, ―The New Accelerator‖ (1901)

 Serial programs are traditional (most of the programs are
serial), sequential (just sequence of tasks) and relatively easy
to think the flow.

 For example : Prefix Sum ( Scan )
5 1 8 11

where binary associative operator is summation.
5 5+1 5+1+8 5 + 1 + 8 + 11

5 6 14 25

 data[] = {5, 1, 8, 11, 4}
forall i from 1 to n do
data[i] = data[i – 1] + data[i]

 But sequential thinking is about to change, because the
serial performance improvement has slowed down from
50% to %20 since 2002 and we can not expect huge
improvement in serial performance anymore.
Therefore, programming is going parallel.

Multi- and many-core computing
hitting the mainstream
 Today we have 4-12 cores, in few years 32 cores and Intel is
predicting in 2015 we will have 100 cores.

 AMD Opteron (12) IBM Power 7 (8) Intel Xeon (12)
 Sun UltraSparc T3 (16) Cell (9)
 NVIDIA GeForce (1024)
 Adapteva (4096) Tilera Tile-Gx (100)

 There is lots of effort towards developing good runtime
compilers, debuggers and OS support:

 MS TPL, Intel TBB, MPI, PVM, pthreads, PLINQ, OpenMP, MS Concurrency
Runtime, MS Dryad, MS C++ AMP, NVIDIA CUDA C, ATI
APP, OpenCL, Microsoft DirectCompute, Brooks, Shaders, PGI CUDA
Fortran, GPU.Net, HMPP, Thrust etc.

 More than one hundred parallel programming languages in 2008
( http://perilsofparallel.blogspot.com/2008/09/101-parallel-languages-part-
1.html or http://tinyurl.com/3p4a8to )

What are some problems
moving into a multi-core world?
 A lot of companies have a huge code base developed
with little or no parallelism. Converting those great product
to multi-core will take time.

 We haven’t been teaching much about parallelism for
many years. Most students we educated in the last 10
years know very little about parallelism.

 Engineers need to understand parallelism, understand all
the issues of parallelism, to utilize all these cores.

 Parallel thinking is not the latest API, library or hardware.
Parallel thinking is a set of core ideas we have to identify
and teach our students and workforce.

 Writing good serial software was hard, writing good
parallel software is harder: require new tools, new
techniques, and a new ―Thinking in Parallel‖ mindset.
Performance

Competitive
Advantage

Serial Applications

Time
2004 – Multi-core is on desktop

 Parallel Prefix Sum

1 1
Parallel Prefix Sum (Scan) with CUDA (NVIDIA) 6
(@ http://tinyurl.com/3s9as2j)

Concurrency Parallelism

Programming issue  Property of the machine
Single processor  Multi-processor
Goal : running multiple  Goal : speedup
interleaved threads  Threads are executed
Only one thread executes at simultaneously
any given time

Time >>>
Time >>>

Task A Task B Task A Task B
Thread 1 Thread 1 Thread 2
Thread 2
Thread 1
Thread 2

Flynn’s Taxonomy of
Architectures

Single Instruction/
Single Instruction/ Multiple Data
Single Data

Multiple Instruction/ Multiple Instruction/
Single Data Multiple Data

Parallel Programming
Methodology

Proceed Measure Test

Analyze Design Code

Analyzing Parallelism
 Amdahl’s Law helps to predict the theoretical
maximum speedup on fixed problem size:

 Gustafson's Law proposes that larger problems
can be solved by scaling the parallel computing
power :
S = rs + n . rp

Design Patterns

Finding Concurrency Design Space

Algorithm Structure Design Space

Supporting Structures Design Space

Implementation Mechanism Design
Space

Finding Concurrency Design Space
Decomposition Dependency Analysis
Task Decomposition Group Task
Design Evaluation
Data Decomposition Order Task

Data-Flow Decomposition Data Sharing

Algorithm Structure Design Space
Organize by Tasks Organize by Data Decomp. Organize by Flow of Data
Task Parallelism Geometric Decomposition Pipeline

Divide and Conquer Recursive Data Event-Based Coordination

Supporting Structures Design Space
Program Structures Data Structures

SPMD Loop Parallelism Shared Data Distributed Array

Master / Worker Fork / Join Shared Queue

Implementation Mechanism Design Space

8 Rules for ―Thinking in
Parallel‖
1. Identify truly independent computations.
2. Implement parallelism at the highest level possible.
3. Plan early for scalability to take advantage of increasing numbers
of cores.
4. Hide parallelization in libraries.
5. Use the right parallel programming model.
6. Never assume a particular order of execution.
7. Use non-shared storage whenever possible.
8. Dare to change the algorithm for a better chance of parallelism.
9. Be creative and pragmatic.

Pragmatic Parallelization
 Programming, in practice, is pragmatic.

 Most people prefer a practical ―good enough‖
solution over an ―ideal‖ solution.

Chaotic Pragmatic Bureaucratic

Importance of Rules

Parallel Programming Support
 CPU  GPU
 MS TPL  NVIDIA CUDA C
 Intel TBB  ATI APP
 MPI  OpenCL
 PVM  Microsoft DirectCompute
 pthreads  Brooks
 PLINQ  Shaders
 OpenMP  PGI CUDA Fortran
 MS Concurrency Runtime  GPU.Net
 MS Dryad  HMPP
 MS C++ AMP  Thrust
etc. etc.

Links and References
 Patterns for Parallel Programming. Mattson, Timothy G.; Sanders,
Beverly A.; Massingill, Berna L. (2004-09-15). Addison-Wesley
Professional.

 An Introduction to Parallel Programming. Pacheco, Peter (2011-01-
07). Morgan Kaufmann.

 The Art of Concurrency . Breshears, Clay (2009-05-07). O'Reilly
Media.

 Wikipedia

 http://newsroom.intel.com/community/intel_newsroom/blog/2011/0
9/15/the-future-accelerated-multi-core-goes-mainstream-
computing-pushed-to-extremes

 http://perilsofparallel.blogspot.com/2011/09/conversation-with-
intels-james-reinders.html

Thinking in parallel ab tuladev

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Viewers also liked

Viewers also liked (20)

Similar to Thinking in parallel ab tuladev

Similar to Thinking in parallel ab tuladev (20)

More from Pavel Tsukanov

More from Pavel Tsukanov (11)

Recently uploaded

Recently uploaded (20)

Thinking in parallel ab tuladev

Editor's Notes