Your SlideShare is downloading. ×
0
Parallel Programming Primer 1
Parallel Programming Primer 1
Parallel Programming Primer 1
Parallel Programming Primer 1
Parallel Programming Primer 1
Parallel Programming Primer 1
Parallel Programming Primer 1
Parallel Programming Primer 1
Parallel Programming Primer 1
Parallel Programming Primer 1
Parallel Programming Primer 1
Parallel Programming Primer 1
Parallel Programming Primer 1
Parallel Programming Primer 1
Parallel Programming Primer 1
Parallel Programming Primer 1
Parallel Programming Primer 1
Parallel Programming Primer 1
Parallel Programming Primer 1
Parallel Programming Primer 1
Parallel Programming Primer 1
Parallel Programming Primer 1
Parallel Programming Primer 1
Parallel Programming Primer 1
Parallel Programming Primer 1
Parallel Programming Primer 1
Parallel Programming Primer 1
Parallel Programming Primer 1
Parallel Programming Primer 1
Parallel Programming Primer 1
Parallel Programming Primer 1
Parallel Programming Primer 1
Parallel Programming Primer 1
Parallel Programming Primer 1
Parallel Programming Primer 1
Parallel Programming Primer 1
Parallel Programming Primer 1
Parallel Programming Primer 1
Parallel Programming Primer 1
Parallel Programming Primer 1
Parallel Programming Primer 1
Parallel Programming Primer 1
Parallel Programming Primer 1
Parallel Programming Primer 1
Parallel Programming Primer 1
Parallel Programming Primer 1
Parallel Programming Primer 1
Parallel Programming Primer 1
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×
Saving this for later? Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime – even offline.
Text the download link to your phone
Standard text messaging rates apply

Parallel Programming Primer 1

1,253

Published on

0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
1,253
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
35
Comments
0
Likes
0
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide

Transcript

  • 1. Lecture 1 – Parallel Programming Primer CPE 458 – Parallel Programming, Spring 2009 Except as otherwise noted, the content of this presentation is licensed under the Creative Commons Attribution 2.5 License.http://creativecommons.org/licenses/by/2.5
  • 2. Outline <ul><li>Introduction to Parallel Computing </li></ul><ul><li>Parallel Architectures </li></ul><ul><li>Parallel Algorithms </li></ul><ul><li>Common Parallel Programming Models </li></ul><ul><ul><li>Message-Passing </li></ul></ul><ul><ul><li>Shared Address Space </li></ul></ul>
  • 3. Introduction to Parallel Computing <ul><li>Moore’s Law </li></ul><ul><ul><li>The number of transistors that can be placed inexpensively on an integrated circuit will double approximately every 18 months. </li></ul></ul><ul><ul><li>Self-fulfilling prophecy </li></ul></ul><ul><ul><ul><li>Computer architect goal </li></ul></ul></ul><ul><ul><ul><li>Software developer assumption </li></ul></ul></ul>
  • 4. Introduction to Parallel Computing <ul><li>Impediments to Moore’s Law </li></ul><ul><ul><li>Theoretical Limit </li></ul></ul><ul><ul><li>What to do with all that die space? </li></ul></ul><ul><ul><li>Design complexity </li></ul></ul><ul><ul><li>How do you meet the expected performance increase? </li></ul></ul>
  • 5. Introduction to Parallel Computing <ul><li>Parallelism </li></ul><ul><ul><li>Continue to increase performance via parallelism. </li></ul></ul>
  • 6. Introduction to Parallel Computing <ul><li>From a software point-of-view, need to solve demanding problems </li></ul><ul><ul><li>Engineering Simulations </li></ul></ul><ul><ul><li>Scientific Applications </li></ul></ul><ul><ul><li>Commercial Applications </li></ul></ul><ul><li>Need the performance, resource gains afforded by parallelism </li></ul>
  • 7. Introduction to Parallel Computing <ul><li>Engineering Simulations </li></ul><ul><ul><li>Aerodynamics </li></ul></ul><ul><ul><li>Engine efficiency </li></ul></ul>
  • 8. Introduction to Parallel Computing <ul><li>Scientific Applications </li></ul><ul><ul><li>Bioinformatics </li></ul></ul><ul><ul><li>Thermonuclear processes </li></ul></ul><ul><ul><li>Weather modeling </li></ul></ul>
  • 9. Introduction to Parallel Computing <ul><li>Commercial Applications </li></ul><ul><ul><li>Financial transaction processing </li></ul></ul><ul><ul><li>Data mining </li></ul></ul><ul><ul><li>Web Indexing </li></ul></ul>
  • 10. Introduction to Parallel Computing <ul><li>Unfortunately, greatly increases coding complexity </li></ul><ul><ul><li>Coordinating concurrent tasks </li></ul></ul><ul><ul><li>Parallelizing algorithms </li></ul></ul><ul><ul><li>Lack of standard environments and support </li></ul></ul>
  • 11. Introduction to Parallel Computing <ul><li>The challenge </li></ul><ul><ul><li>Provide the abstractions , programming paradigms, and algorithms needed to effectively design, implement, and maintain applications that exploit the parallelism provided by the underlying hardware in order to solve modern problems. </li></ul></ul>
  • 12. Parallel Architectures <ul><li>Standard sequential architecture </li></ul>CPU RAM BUS Bottlenecks
  • 13. Parallel Architectures <ul><li>Use multiple </li></ul><ul><ul><li>Datapaths </li></ul></ul><ul><ul><li>Memory units </li></ul></ul><ul><ul><li>Processing units </li></ul></ul>
  • 14. Parallel Architectures <ul><li>SIMD </li></ul><ul><ul><li>Single instruction stream, multiple data stream </li></ul></ul>Processing Unit Control Unit Interconnect Processing Unit Processing Unit Processing Unit Processing Unit
  • 15. Parallel Architectures <ul><li>SIMD </li></ul><ul><ul><li>Advantages </li></ul></ul><ul><ul><ul><li>Performs vector/matrix operations well </li></ul></ul></ul><ul><ul><ul><ul><li>EX: Intel’s MMX chip </li></ul></ul></ul></ul><ul><ul><li>Disadvantages </li></ul></ul><ul><ul><ul><li>Too dependent on type of computation </li></ul></ul></ul><ul><ul><ul><ul><li>EX: Graphics </li></ul></ul></ul></ul><ul><ul><ul><li>Performance/resource utilization suffers if computations aren’t “embarrasingly parallel”. </li></ul></ul></ul>
  • 16. Parallel Architectures <ul><li>MIMD </li></ul><ul><ul><li>Multiple instruction stream, multiple data stream </li></ul></ul>Processing/Control Unit Processing/Control Unit Processing/Control Unit Processing/Control Unit Interconnect
  • 17. Parallel Architectures <ul><li>MIMD </li></ul><ul><ul><li>Advantages </li></ul></ul><ul><ul><ul><li>Can be built with off-the-shelf components </li></ul></ul></ul><ul><ul><ul><li>Better suited to irregular data access patterns </li></ul></ul></ul><ul><ul><li>Disadvantages </li></ul></ul><ul><ul><ul><li>Requires more hardware (!sharing control unit) </li></ul></ul></ul><ul><ul><ul><li>Store program/OS at each processor </li></ul></ul></ul><ul><li>Ex: Typical commodity SMP machines we see today. </li></ul>
  • 18. Parallel Architectures <ul><li>Task Communication </li></ul><ul><ul><li>Shared address space </li></ul></ul><ul><ul><ul><li>Use common memory to exchange data </li></ul></ul></ul><ul><ul><ul><li>Communication and replication are implicit </li></ul></ul></ul><ul><ul><li>Message passing </li></ul></ul><ul><ul><ul><li>Use send()/receive() primitives to exchange data </li></ul></ul></ul><ul><ul><ul><li>Communication and replication are explicit </li></ul></ul></ul>
  • 19. Parallel Architectures <ul><li>Shared address space </li></ul><ul><ul><li>Uniform memory access (UMA) </li></ul></ul><ul><ul><ul><li>Access to a memory location is independent of which processing unit makes the request. </li></ul></ul></ul><ul><ul><li>Non-uniform memory access (NUMA) </li></ul></ul><ul><ul><ul><li>Access to a memory location depends on the location of the processing unit relative to the memory accessed. </li></ul></ul></ul>
  • 20. Parallel Architectures <ul><li>Message passing </li></ul><ul><ul><li>Each processing unit has its own private memory </li></ul></ul><ul><ul><li>Exchange of messages used to pass data </li></ul></ul><ul><ul><li>APIs </li></ul></ul><ul><ul><ul><li>Message Passing Interface (MPI) </li></ul></ul></ul><ul><ul><ul><li>Parallel Virtual Machine (PVM) </li></ul></ul></ul>
  • 21. Parallel Algorithms <ul><li>Algorithm </li></ul><ul><ul><li>a sequence of finite instructions, often used for calculation and data processing. </li></ul></ul><ul><li>Parallel Algorithm </li></ul><ul><ul><li>An algorithm that which can be executed a piece at a time on many different processing devices, and then put back together again at the end to get the correct result </li></ul></ul>
  • 22. Parallel Algorithms <ul><li>Challenges </li></ul><ul><ul><li>Identifying work that can be done concurrently. </li></ul></ul><ul><ul><li>Mapping work to processing units. </li></ul></ul><ul><ul><li>Distributing the work </li></ul></ul><ul><ul><li>Managing access to shared data </li></ul></ul><ul><ul><li>Synchronizing various stages of execution. </li></ul></ul>
  • 23. Parallel Algorithms <ul><li>Models </li></ul><ul><ul><li>A way to structure a parallel algorithm by selecting decomposition and mapping techniques in a manner to minimize interactions. </li></ul></ul>
  • 24. Parallel Algorithms <ul><li>Models </li></ul><ul><ul><li>Data-parallel </li></ul></ul><ul><ul><li>Task graph </li></ul></ul><ul><ul><li>Work pool </li></ul></ul><ul><ul><li>Master-slave </li></ul></ul><ul><ul><li>Pipeline </li></ul></ul><ul><ul><li>Hybrid </li></ul></ul>
  • 25. Parallel Algorithms <ul><li>Data-parallel </li></ul><ul><ul><li>Mapping of Work </li></ul></ul><ul><ul><ul><li>Static </li></ul></ul></ul><ul><ul><ul><li>Tasks -> Processes </li></ul></ul></ul><ul><ul><li>Mapping of Data </li></ul></ul><ul><ul><ul><li>Independent data items assigned to processes (Data Parallelism) </li></ul></ul></ul>
  • 26. Parallel Algorithms <ul><li>Data-parallel </li></ul><ul><ul><li>Computation </li></ul></ul><ul><ul><ul><li>Tasks process data, synchronize to get new data or exchange results, continue until all data processed </li></ul></ul></ul><ul><ul><li>Load Balancing </li></ul></ul><ul><ul><ul><li>Uniform partitioning of data </li></ul></ul></ul><ul><ul><li>Synchronization </li></ul></ul><ul><ul><ul><li>Minimal or barrier needed at end of a phase </li></ul></ul></ul><ul><ul><li>Ex: Ray Tracing </li></ul></ul>
  • 27. Parallel Algorithms <ul><li>Data-parallel </li></ul>P P P P P D D D D D O O O O O
  • 28. Parallel Algorithms <ul><li>Task graph </li></ul><ul><ul><li>Mapping of Work </li></ul></ul><ul><ul><ul><li>Static </li></ul></ul></ul><ul><ul><ul><li>Tasks are mapped to nodes in a data dependency task dependency graph (Task parallelism) </li></ul></ul></ul><ul><ul><li>Mapping of Data </li></ul></ul><ul><ul><ul><li>Data moves through graph (Source to Sink) </li></ul></ul></ul>
  • 29. Parallel Algorithms <ul><li>Task graph </li></ul><ul><ul><li>Computation </li></ul></ul><ul><ul><ul><li>Each node processes input from previous node(s) and send output to next node(s) in the graph </li></ul></ul></ul><ul><ul><li>Load Balancing </li></ul></ul><ul><ul><ul><li>Assign more processes to a given task </li></ul></ul></ul><ul><ul><ul><li>Eliminate graph bottlenecks </li></ul></ul></ul><ul><ul><li>Synchronization </li></ul></ul><ul><ul><ul><li>Node data exchange </li></ul></ul></ul><ul><ul><li>Ex: Parallel Quicksort, Divide and Conquer approaches </li></ul></ul>
  • 30. Parallel Algorithms <ul><li>Task graph </li></ul>P P P P P P D D D O O
  • 31. Parallel Algorithms <ul><li>Work pool </li></ul><ul><ul><li>Mapping of Work/Data </li></ul></ul><ul><ul><ul><li>No desired pre-mapping </li></ul></ul></ul><ul><ul><ul><li>Any task performed by any process </li></ul></ul></ul><ul><ul><li>Computation </li></ul></ul><ul><ul><ul><li>Processes work as data becomes available (or requests arrive) </li></ul></ul></ul>
  • 32. Parallel Algorithms <ul><li>Work pool </li></ul><ul><ul><li>Load Balancing </li></ul></ul><ul><ul><ul><li>Dynamic mapping of tasks to processes </li></ul></ul></ul><ul><ul><li>Synchronization </li></ul></ul><ul><ul><ul><li>Adding/removing work from input queue </li></ul></ul></ul><ul><ul><li>Ex: Web Server </li></ul></ul>
  • 33. Parallel Algorithms <ul><li>Work pool </li></ul>P Work Pool P P P P Input queue Output queue
  • 34. Parallel Algorithms <ul><li>Master-slave </li></ul><ul><ul><li>Modification to Worker Pool Model </li></ul></ul><ul><ul><ul><li>One or more Master processes generate and assign work to worker processes </li></ul></ul></ul><ul><ul><li>Load Balancing </li></ul></ul><ul><ul><ul><li>A Master process can better distribute load to worker processes </li></ul></ul></ul>
  • 35. Parallel Algorithms <ul><li>Pipeline </li></ul><ul><ul><li>Mapping of work </li></ul></ul><ul><ul><ul><li>Processes are assigned tasks that correspond to stages in the pipeline </li></ul></ul></ul><ul><ul><ul><li>Static </li></ul></ul></ul><ul><ul><li>Mapping of Data </li></ul></ul><ul><ul><ul><li>Data processed in FIFO order </li></ul></ul></ul><ul><ul><ul><ul><li>Stream parallelism </li></ul></ul></ul></ul>
  • 36. Parallel Algorithms <ul><li>Pipeline </li></ul><ul><ul><li>Computation </li></ul></ul><ul><ul><ul><li>Data is passed through a succession of processes, each of which will perform some task on it </li></ul></ul></ul><ul><ul><li>Load Balancing </li></ul></ul><ul><ul><ul><li>Insure all stages of the pipeline are balanced (contain the same amount of work) </li></ul></ul></ul><ul><ul><li>Synchronization </li></ul></ul><ul><ul><ul><li>Producer/Consumer buffers between stages </li></ul></ul></ul><ul><ul><li>Ex: Processor pipeline, graphics pipeline </li></ul></ul>
  • 37. Parallel Algorithms <ul><li>Pipeline </li></ul>P Input queue Output queue P P buffer buffer
  • 38. Common Parallel Programming Models <ul><li>Message-Passing </li></ul><ul><li>Shared Address Space </li></ul>
  • 39. Common Parallel Programming Models <ul><li>Message-Passing </li></ul><ul><ul><li>Most widely used for programming parallel computers (clusters of workstations) </li></ul></ul><ul><ul><li>Key attributes: </li></ul></ul><ul><ul><ul><li>Partitioned address space </li></ul></ul></ul><ul><ul><ul><li>Explicit parallelization </li></ul></ul></ul><ul><ul><li>Process interactions </li></ul></ul><ul><ul><ul><li>Send and receive data </li></ul></ul></ul>
  • 40. Common Parallel Programming Models <ul><li>Message-Passing </li></ul><ul><ul><li>Communications </li></ul></ul><ul><ul><ul><li>Sending and receiving messages </li></ul></ul></ul><ul><ul><ul><li>Primitives </li></ul></ul></ul><ul><ul><ul><ul><li>send(buff, size, destination) </li></ul></ul></ul></ul><ul><ul><ul><ul><li>receive(buff, size, source) </li></ul></ul></ul></ul><ul><ul><ul><ul><li>Blocking vs non-blocking </li></ul></ul></ul></ul><ul><ul><ul><ul><li>Buffered vs non-buffered </li></ul></ul></ul></ul><ul><ul><ul><li>Message Passing Interface (MPI) </li></ul></ul></ul><ul><ul><ul><ul><li>Popular message passing library </li></ul></ul></ul></ul><ul><ul><ul><ul><li>~125 functions </li></ul></ul></ul></ul>
  • 41. Common Parallel Programming Models <ul><li>Message-Passing </li></ul>Workstation Workstation Workstation Workstation Data P4 P3 P2 P1 send(buff1, 1024, p3) receive(buff3, 1024, p1)
  • 42. Common Parallel Programming Models <ul><li>Shared Address Space </li></ul><ul><ul><li>Mostly used for programming SMP machines (multicore chips) </li></ul></ul><ul><ul><li>Key attributes </li></ul></ul><ul><ul><ul><li>Shared address space </li></ul></ul></ul><ul><ul><ul><ul><li>Threads </li></ul></ul></ul></ul><ul><ul><ul><ul><li>Shmget/shmat UNIX operations </li></ul></ul></ul></ul><ul><ul><ul><li>Implicit parallelization </li></ul></ul></ul><ul><ul><li>Process/Thread communication </li></ul></ul><ul><ul><ul><li>Memory reads/stores </li></ul></ul></ul>
  • 43. Common Parallel Programming Models <ul><li>Shared Address Space </li></ul><ul><ul><li>Communication </li></ul></ul><ul><ul><ul><li>Read/write memory </li></ul></ul></ul><ul><ul><ul><ul><li>EX: x++; </li></ul></ul></ul></ul><ul><ul><li>Posix Thread API </li></ul></ul><ul><ul><ul><li>Popular thread API </li></ul></ul></ul><ul><ul><ul><li>Operations </li></ul></ul></ul><ul><ul><ul><ul><li>Creation/deletion of threads </li></ul></ul></ul></ul><ul><ul><ul><ul><li>Synchronization (mutexes, semaphores) </li></ul></ul></ul></ul><ul><ul><ul><ul><li>Thread management </li></ul></ul></ul></ul>
  • 44. Common Parallel Programming Models <ul><li>Shared Address Space </li></ul>Workstation T3 T4 T2 T1 Data SMP RAM
  • 45. Parallel Programming Pitfalls <ul><li>Synchronization </li></ul><ul><ul><li>Deadlock </li></ul></ul><ul><ul><li>Livelock </li></ul></ul><ul><ul><li>Fairness </li></ul></ul><ul><li>Efficiency </li></ul><ul><ul><li>Maximize parallelism </li></ul></ul><ul><li>Reliability </li></ul><ul><ul><li>Correctness </li></ul></ul><ul><ul><li>Debugging </li></ul></ul>
  • 46. Prelude to MapReduce <ul><li>MapReduce is a paradigm designed by Google for making a subset (albeit a large one) of distributed problems easier to code </li></ul><ul><li>Automates data distribution & result aggregation </li></ul><ul><li>Restricts the ways data can interact to eliminate locks (no shared state = no locks!) </li></ul>
  • 47. Prelude to MapReduce <ul><li>Next time… </li></ul><ul><ul><li>MapReduce parallel programming paradigm </li></ul></ul>
  • 48. References <ul><li>Introduction to Parallel Computing, Grama et al., Pearson Education, 2003 </li></ul><ul><li>Distributed Systems </li></ul><ul><ul><li>http://code. google . com/edu/parallel/index .html </li></ul></ul><ul><li>Distributed Computing Principles and Applications, M.L. Liu, Pearson Education 2004 </li></ul>

×