Thinking in parallel ab tuladev
Upcoming SlideShare
Loading in...5
×
 

Like this? Share it with your network

Share

Thinking in parallel ab tuladev

on

  • 815 views

CPU, GPU, Thinking in Parallel

CPU, GPU, Thinking in Parallel

Statistics

Views

Total Views
815
Views on SlideShare
784
Embed Views
31

Actions

Likes
0
Downloads
6
Comments
0

2 Embeds 31

http://tuladev.net 19
http://www.tuladev.net 12

Accessibility

Categories

Upload Details

Uploaded via as Microsoft PowerPoint

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment
  • Traditionally, much of computer programming has been serial in nature. A program begins at a well defined entry point and works through a sequence of tasks in succession. Designing serial programs are relatively easy because you can think sequentially for the most part. This is the simplified programming model that most programmers today have learned and use. But it’s about to change, because programming is going parallel. And given how hard it seems to be to write good serial software (judging by how many software projects struggle even in the serial world) then the new challenges of parallel programming will require new tools, new techniques, and a new “Thinking in Parallel” mindset. This short talk focuses on some of the issues relating to “Thinking in Parallel”. It’s written from the perspective of a practitioner in-the-trenches and not from the perspective of a computer scientist. Nor is it a comprehensive overview of topic. It’s merely a starting point. My hope is that even though the meetup group ranges from newbies to experts, everyone will come away with at least one idea to think about. The journey begins …
  • Traditionally, much of computer programming has been serial in nature. A program begins at a well defined entry point and works through a sequence of tasks in succession. Designing serial programs are relatively easy because you can think sequentially for the most part. This is the simplified programming model that most programmers today have learned and use. But it’s about to change, because programming is going parallel. And given how hard it seems to be to write good serial software (judging by how many software projects struggle even in the serial world) then the new challenges of parallel programming will require new tools, new techniques, and a new “Thinking in Parallel” mindset. This short talk focuses on some of the issues relating to “Thinking in Parallel”. It’s written from the perspective of a practitioner in-the-trenches and not from the perspective of a computer scientist. Nor is it a comprehensive overview of topic. It’s merely a starting point. My hope is that even though the meetup group ranges from newbies to experts, everyone will come away with at least one idea to think about. The journey begins …
  • Traditionally, much of computer programming has been serial in nature. A program begins at a well defined entry point and works through a sequence of tasks in succession. Designing serial programs are relatively easy because you can think sequentially for the most part. This is the simplified programming model that most programmers today have learned and use. But it’s about to change, because programming is going parallel. And given how hard it seems to be to write good serial software (judging by how many software projects struggle even in the serial world) then the new challenges of parallel programming will require new tools, new techniques, and a new “Thinking in Parallel” mindset. This short talk focuses on some of the issues relating to “Thinking in Parallel”. It’s written from the perspective of a practitioner in-the-trenches and not from the perspective of a computer scientist. Nor is it a comprehensive overview of topic. It’s merely a starting point. My hope is that even though the meetup group ranges from newbies to experts, everyone will come away with at least one idea to think about. The journey begins …A large percentage of people who provide applications are going to have to care about parallelism in order to match the capabilities of their competitors.
  • Before we begin, let’s clear up a common misunderstanding. What’s the difference between concurrent versus parallel? They both share some common concepts and difficulties, but they are different. Concurrent execution of two or more programs (or multiple tasks within a single program) means that only a single thread executes at any given time, but switching between threads is so rapid that it appears as though all tasks proceed concurrently. Parallel execution means that two or more threads are actually running simultaneously in hardware. It is not as though the tasks appear to proceed in parallel, they really are running in parallel. The difference between can be illustrated simply as follows. [diagram]Of course, multicore CPUs, GPUs and clusters of the same are all about running in parallel. In the case of individual GPUs and large clusters of CPUs, they run in a massively parallel way.
  • SynchronizationWhenever two or more tasks need the same resources, there is the possibility for contention. For multi-threaded applications this is often solved with mutexes, semaphores, critical sections and their like. On GPUs there are likewise synchronization objects.Race ConditionA major obstacle to efficient parallel execution is resource contention, whether that be for memory or I/O (though one could argue all data access is I/O of one form or another.) Resource contention is particularly prevalent for MISD and MIMD execution models. But the potential is always present when two executing tasks need to share the same resource which itself is not parallelizable.
  • There are dozens of different parallel architectures, among them networks of workstations, clusters of off-the-shelf PCs, massively parallel supercomputers, tightly coupled symmetric multiprocessors, and multiprocessor workstations.Flynn’s taxonomy categorizes all computers according to the number of instruction streams and data streams they have, where a stream is a sequence of instructions or data on which a computer operatesSISD Single instruction, single data (SISD) means a single thread or core operating on a single piece of data. SIMD Single instruction, multiple data (SIMD) means the same code running on multiple threads or cores, but operating on different parts of the data set. This is the execution model for GPUs. MISD Multiple instructions, single data (MISD) means different programs running in multiple threads or cores operating on the same data. MIMD Multiple instructions, multiple data (MIMD) means different programs running in multiple threads or cores operating on different parts of the data set.
  • Since the GOF (Gang of Four: Gamma, Helm, Johnson and Vissides) wrote “Design Patterns: Elements of Reusable Object-Oriented Software”, programming patterns have gained in popularity. They are now used in one way or another by most mainstream programmers. Not surprisingly, design patterns for parallel programming have emerged and attempt, to one degree or another, to map a problem onto an underlying execution model.
  • Task Decomposition Task decomposition, as its name implies, breaks the problem into parts so that each can be independently assigned to a different computational resource to run in parallel. Data Decomposition Data decomposition is breaking your data set into smaller parts so that each can be processed separately by different compute resources. For data decomposition what is needed is not just a partitioning of the data, but rather a data plan that includes such things as how the data will be encoded and moved around. Both can affect performance considerably, because some encodings are more efficient and compact than others, and leaving data in-situ and doing multiple operations on it is clearly better than the converseGroup Tasks If a group shares a temporal constraint (for example, waiting on one group to finish filling a file before another group can begin reading it), we can satisfy that constraint once for the whole group. If a group of tasks must work together on a shared data structure, the required synchronization can be worked out once for the whole group. If a set of tasks are independent, combining them into a single group and scheduling them for execution as a single large group can simplify the design and increase the available concurrency (thereby letting the solution scale to more PEs).Order Tasksa way of decomposing a problem into tasks and a way of collecting these tasks into logically related groups, how must these groups of tasks be ordered to satisfy constraints among tasks?
  • These rules are from “The Art of Concurrency” by Clay Breshears. I’ve modified them slightly so they are equally applicable to “Thinking in Parallel”:
  • Traditional languages gain support for parallel programming through libraries that seek to hide as much of the underlying hardware and parallelism as possibleFunctional languages have, in recent times, gained in popularity because of their strong support for parallelism; even if parallelism isn’t built into the language it is often well supported through the “functional” style. Some functional languages, notably Erlang, are intrinsically parallel given that it was intended for parallel execution by design.   OtherScala, for example, is a language that supports a mix of programming models and styles, from OOP to functional. It also supports parallel programming quite well. Given the large number of languages out there, you will no doubt find a parallel programming language to your taste!

Thinking in parallel ab tuladev Presentation Transcript

  • 1. SPONSORED BY
  • 2. HPC & GPU Supercomputing Groups Non-profit, free to join groups hosted on www.meetup.com A group for the application of cutting-edge HPC & GPU supercomputing technology to cutting-edge business problems. Started January 2011 with New York group and reached today 1000 members with all groups from Boston, Silicon Valley, Chicago, New Mexico, Denver, Seattle, Austin, Washington D.C., South Florida, Tokyo Please visit www.SupercomputingGroup.com for South Florida group.
  • 3. Красноармейский проспект 25, 19 ноября 2011 November 19, 2011
  • 4. Many thanks to Andrew Sheppard for providing supporting content for this presentation.Andrew is the organizer of New York meetup group and he is afinancial consultant with extensive experience in quantitativefinancial analysis, trading-desk software development, andtechnical management. Andrew is also the author of theforthcoming book "Programming GPUs‖, to be published byO’Reilly (www.oreilly.com).
  • 5. ―Thinking in Parallel‖is the term for making the conceptual leap that takesa developer from writing programs that run onhardware with little real parallelism, to writingprograms that execute efficiently on massivelyparallel hardware, with 100’s and 1000’s ofcores, leading to very substantial speedups(x10, x100 and beyond).
  • 6. ―[A]nd in this precious phial is the power to think twice as fast, move twice as quickly, do twice as much work in a given time as you could otherwise do.‖—H. G. Wells, ―The New Accelerator‖ (1901)
  • 7.  Serial programs are traditional (most of the programs are serial), sequential (just sequence of tasks) and relatively easy to think the flow. For example : Prefix Sum ( Scan ) 5 1 8 11 where binary associative operator is summation. 5 5+1 5+1+8 5 + 1 + 8 + 11 5 6 14 25 data[] = {5, 1, 8, 11, 4} forall i from 1 to n do data[i] = data[i – 1] + data[i]
  • 8.  But sequential thinking is about to change, because the serial performance improvement has slowed down from 50% to %20 since 2002 and we can not expect huge improvement in serial performance anymore. Therefore, programming is going parallel.
  • 9. Multi- and many-core computinghitting the mainstream Today we have 4-12 cores, in few years 32 cores and Intel is predicting in 2015 we will have 100 cores.  AMD Opteron (12) IBM Power 7 (8) Intel Xeon (12)  Sun UltraSparc T3 (16) Cell (9)  NVIDIA GeForce (1024)  Adapteva (4096) Tilera Tile-Gx (100) There is lots of effort towards developing good runtime compilers, debuggers and OS support:  MS TPL, Intel TBB, MPI, PVM, pthreads, PLINQ, OpenMP, MS Concurrency Runtime, MS Dryad, MS C++ AMP, NVIDIA CUDA C, ATI APP, OpenCL, Microsoft DirectCompute, Brooks, Shaders, PGI CUDA Fortran, GPU.Net, HMPP, Thrust etc.  More than one hundred parallel programming languages in 2008 ( http://perilsofparallel.blogspot.com/2008/09/101-parallel-languages-part- 1.html or http://tinyurl.com/3p4a8to )
  • 10. What are some problemsmoving into a multi-core world? A lot of companies have a huge code base developed with little or no parallelism. Converting those great product to multi-core will take time. We haven’t been teaching much about parallelism for many years. Most students we educated in the last 10 years know very little about parallelism. Engineers need to understand parallelism, understand all the issues of parallelism, to utilize all these cores. Parallel thinking is not the latest API, library or hardware. Parallel thinking is a set of core ideas we have to identify and teach our students and workforce.
  • 11.  Writing good serial software was hard, writing good parallel software is harder: require new tools, new techniques, and a new ―Thinking in Parallel‖ mindset.Performance Competitive Advantage Serial Applications Time 2004 – Multi-core is on desktop
  • 12.  Parallel Prefix Sum 1 1Parallel Prefix Sum (Scan) with CUDA (NVIDIA) 6(@ http://tinyurl.com/3s9as2j)
  • 13. Where to start?
  • 14. Concurrency ParallelismProgramming issue  Property of the machineSingle processor  Multi-processorGoal : running multiple  Goal : speedupinterleaved threads  Threads are executedOnly one thread executes at simultaneouslyany given time Time >>> Time >>> Task A Task B Task A Task B Thread 1 Thread 1 Thread 2 Thread 2 Thread 1 Thread 2 Thread 1 Thread 1 Thread 2 Thread 2 Thread 1 Thread 2
  • 15. Flynn’s Taxonomy ofArchitectures Single Instruction/ Single Instruction/ Multiple Data Single Data Multiple Instruction/ Multiple Instruction/ Single Data Multiple Data
  • 16. SISD vs. SIMD
  • 17. Parallel ProgrammingMethodology Proceed Measure Test Analyze Design Code
  • 18. Analyzing Parallelism Amdahl’s Law helps to predict the theoretical maximum speedup on fixed problem size: Gustafsons Law proposes that larger problems can be solved by scaling the parallel computing power : S = rs + n . rp
  • 19. Amdahl’s Law
  • 20. Gustafson’s Law
  • 21. Design Patterns Finding Concurrency Design Space Algorithm Structure Design Space Supporting Structures Design Space Implementation Mechanism Design Space
  • 22. Finding Concurrency Design Space Decomposition Dependency Analysis Task Decomposition Group Task Design Evaluation Data Decomposition Order TaskData-Flow Decomposition Data Sharing Algorithm Structure Design Space Organize by Tasks Organize by Data Decomp. Organize by Flow of Data Task Parallelism Geometric Decomposition Pipeline Divide and Conquer Recursive Data Event-Based Coordination Supporting Structures Design Space Program Structures Data Structures SPMD Loop Parallelism Shared Data Distributed Array Master / Worker Fork / Join Shared Queue Implementation Mechanism Design Space
  • 23. 8 Rules for ―Thinking inParallel‖1. Identify truly independent computations.2. Implement parallelism at the highest level possible.3. Plan early for scalability to take advantage of increasing numbers of cores.4. Hide parallelization in libraries.5. Use the right parallel programming model.6. Never assume a particular order of execution.7. Use non-shared storage whenever possible.8. Dare to change the algorithm for a better chance of parallelism.9. Be creative and pragmatic.
  • 24. Pragmatic Parallelization Programming, in practice, is pragmatic. Most people prefer a practical ―good enough‖ solution over an ―ideal‖ solution. Chaotic Pragmatic Bureaucratic Importance of Rules
  • 25. Parallel Programming Support CPU  GPU  MS TPL  NVIDIA CUDA C  Intel TBB  ATI APP  MPI  OpenCL  PVM  Microsoft DirectCompute  pthreads  Brooks  PLINQ  Shaders  OpenMP  PGI CUDA Fortran  MS Concurrency Runtime  GPU.Net  MS Dryad  HMPP  MS C++ AMP  Thrust etc. etc.
  • 26. Links and References Patterns for Parallel Programming. Mattson, Timothy G.; Sanders, Beverly A.; Massingill, Berna L. (2004-09-15). Addison-Wesley Professional. An Introduction to Parallel Programming. Pacheco, Peter (2011-01- 07). Morgan Kaufmann. The Art of Concurrency . Breshears, Clay (2009-05-07). OReilly Media. Wikipedia http://newsroom.intel.com/community/intel_newsroom/blog/2011/0 9/15/the-future-accelerated-multi-core-goes-mainstream- computing-pushed-to-extremes http://perilsofparallel.blogspot.com/2011/09/conversation-with- intels-james-reinders.html
  • 27. Q&A