Successfully reported this slideshow.

Thinking in parallel ab tuladev


Published on

CPU, GPU, Thinking in Parallel

Published in: Technology, Education
  • Be the first to comment

  • Be the first to like this

Thinking in parallel ab tuladev

  2. 2. HPC & GPU Supercomputing Groups Non-profit, free to join groups hosted on A group for the application of cutting-edge HPC & GPU supercomputing technology to cutting-edge business problems. Started January 2011 with New York group and reached today 1000 members with all groups from Boston, Silicon Valley, Chicago, New Mexico, Denver, Seattle, Austin, Washington D.C., South Florida, Tokyo Please visit for South Florida group.
  3. 3. Красноармейский проспект 25, 19 ноября 2011 November 19, 2011
  4. 4. Many thanks to Andrew Sheppard for providing supporting content for this presentation.Andrew is the organizer of New York meetup group and he is afinancial consultant with extensive experience in quantitativefinancial analysis, trading-desk software development, andtechnical management. Andrew is also the author of theforthcoming book "Programming GPUs‖, to be published byO’Reilly (
  5. 5. ―Thinking in Parallel‖is the term for making the conceptual leap that takesa developer from writing programs that run onhardware with little real parallelism, to writingprograms that execute efficiently on massivelyparallel hardware, with 100’s and 1000’s ofcores, leading to very substantial speedups(x10, x100 and beyond).
  6. 6. ―[A]nd in this precious phial is the power to think twice as fast, move twice as quickly, do twice as much work in a given time as you could otherwise do.‖—H. G. Wells, ―The New Accelerator‖ (1901)
  7. 7.  Serial programs are traditional (most of the programs are serial), sequential (just sequence of tasks) and relatively easy to think the flow. For example : Prefix Sum ( Scan ) 5 1 8 11 where binary associative operator is summation. 5 5+1 5+1+8 5 + 1 + 8 + 11 5 6 14 25 data[] = {5, 1, 8, 11, 4} forall i from 1 to n do data[i] = data[i – 1] + data[i]
  8. 8.  But sequential thinking is about to change, because the serial performance improvement has slowed down from 50% to %20 since 2002 and we can not expect huge improvement in serial performance anymore. Therefore, programming is going parallel.
  9. 9. Multi- and many-core computinghitting the mainstream Today we have 4-12 cores, in few years 32 cores and Intel is predicting in 2015 we will have 100 cores.  AMD Opteron (12) IBM Power 7 (8) Intel Xeon (12)  Sun UltraSparc T3 (16) Cell (9)  NVIDIA GeForce (1024)  Adapteva (4096) Tilera Tile-Gx (100) There is lots of effort towards developing good runtime compilers, debuggers and OS support:  MS TPL, Intel TBB, MPI, PVM, pthreads, PLINQ, OpenMP, MS Concurrency Runtime, MS Dryad, MS C++ AMP, NVIDIA CUDA C, ATI APP, OpenCL, Microsoft DirectCompute, Brooks, Shaders, PGI CUDA Fortran, GPU.Net, HMPP, Thrust etc.  More than one hundred parallel programming languages in 2008 ( 1.html or )
  10. 10. What are some problemsmoving into a multi-core world? A lot of companies have a huge code base developed with little or no parallelism. Converting those great product to multi-core will take time. We haven’t been teaching much about parallelism for many years. Most students we educated in the last 10 years know very little about parallelism. Engineers need to understand parallelism, understand all the issues of parallelism, to utilize all these cores. Parallel thinking is not the latest API, library or hardware. Parallel thinking is a set of core ideas we have to identify and teach our students and workforce.
  11. 11.  Writing good serial software was hard, writing good parallel software is harder: require new tools, new techniques, and a new ―Thinking in Parallel‖ mindset.Performance Competitive Advantage Serial Applications Time 2004 – Multi-core is on desktop
  12. 12.  Parallel Prefix Sum 1 1Parallel Prefix Sum (Scan) with CUDA (NVIDIA) 6(@
  13. 13. Where to start?
  14. 14. Concurrency ParallelismProgramming issue  Property of the machineSingle processor  Multi-processorGoal : running multiple  Goal : speedupinterleaved threads  Threads are executedOnly one thread executes at simultaneouslyany given time Time >>> Time >>> Task A Task B Task A Task B Thread 1 Thread 1 Thread 2 Thread 2 Thread 1 Thread 2 Thread 1 Thread 1 Thread 2 Thread 2 Thread 1 Thread 2
  15. 15. Flynn’s Taxonomy ofArchitectures Single Instruction/ Single Instruction/ Multiple Data Single Data Multiple Instruction/ Multiple Instruction/ Single Data Multiple Data
  16. 16. SISD vs. SIMD
  17. 17. Parallel ProgrammingMethodology Proceed Measure Test Analyze Design Code
  18. 18. Analyzing Parallelism Amdahl’s Law helps to predict the theoretical maximum speedup on fixed problem size: Gustafsons Law proposes that larger problems can be solved by scaling the parallel computing power : S = rs + n . rp
  19. 19. Amdahl’s Law
  20. 20. Gustafson’s Law
  21. 21. Design Patterns Finding Concurrency Design Space Algorithm Structure Design Space Supporting Structures Design Space Implementation Mechanism Design Space
  22. 22. Finding Concurrency Design Space Decomposition Dependency Analysis Task Decomposition Group Task Design Evaluation Data Decomposition Order TaskData-Flow Decomposition Data Sharing Algorithm Structure Design Space Organize by Tasks Organize by Data Decomp. Organize by Flow of Data Task Parallelism Geometric Decomposition Pipeline Divide and Conquer Recursive Data Event-Based Coordination Supporting Structures Design Space Program Structures Data Structures SPMD Loop Parallelism Shared Data Distributed Array Master / Worker Fork / Join Shared Queue Implementation Mechanism Design Space
  23. 23. 8 Rules for ―Thinking inParallel‖1. Identify truly independent computations.2. Implement parallelism at the highest level possible.3. Plan early for scalability to take advantage of increasing numbers of cores.4. Hide parallelization in libraries.5. Use the right parallel programming model.6. Never assume a particular order of execution.7. Use non-shared storage whenever possible.8. Dare to change the algorithm for a better chance of parallelism.9. Be creative and pragmatic.
  24. 24. Pragmatic Parallelization Programming, in practice, is pragmatic. Most people prefer a practical ―good enough‖ solution over an ―ideal‖ solution. Chaotic Pragmatic Bureaucratic Importance of Rules
  25. 25. Parallel Programming Support CPU  GPU  MS TPL  NVIDIA CUDA C  Intel TBB  ATI APP  MPI  OpenCL  PVM  Microsoft DirectCompute  pthreads  Brooks  PLINQ  Shaders  OpenMP  PGI CUDA Fortran  MS Concurrency Runtime  GPU.Net  MS Dryad  HMPP  MS C++ AMP  Thrust etc. etc.
  26. 26. Links and References Patterns for Parallel Programming. Mattson, Timothy G.; Sanders, Beverly A.; Massingill, Berna L. (2004-09-15). Addison-Wesley Professional. An Introduction to Parallel Programming. Pacheco, Peter (2011-01- 07). Morgan Kaufmann. The Art of Concurrency . Breshears, Clay (2009-05-07). OReilly Media. Wikipedia 9/15/the-future-accelerated-multi-core-goes-mainstream- computing-pushed-to-extremes intels-james-reinders.html
  27. 27. Q&A