Os Madsen Block

584
-1

Published on

Published in: Technology, Education
0 Comments
1 Like
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
584
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
0
Comments
0
Likes
1
Embeds 0
No embeds

No notes for slide

Os Madsen Block

  1. 1. © 2007 Rachael L Madsen and Beverly T Block
  2. 2. Rachael Madsen Multiple Solutions Computing Portland, Oregon rachael@multi-sol.com www.multi-sol.com 2
  3. 3. Threads vs. Processes  Multi-processor hardware  Software choices for process management  3
  4. 4. The use of multiple computers or processors to solve a problem or perform a function 4
  5. 5. Threads and Processes Are Not The Same 5
  6. 6. A single process achieves parallelism by  creating separate threads for subtasks A thread shares context with its parent  process On a single processor, parallelism is an  illusion created by interweaving 6
  7. 7. Each process has its own context  Overheads for creation, communication and  context switching are higher Processes allow true concurrent computing  even on separate systems 7
  8. 8. Threads are generally faster  Very dependent on hardware and  operating system Difficult to generate metrics  8
  9. 9. Intel/AMD vs. Cell 9
  10. 10. Currently available Intel: 2 – 4 cores AMD: 2 cores (4 soon) Cell processors: 9 cores Graphical Processing Unit (GPU) 10
  11. 11. Architectural State Architectural State Architectural State Architectural State Execution Engine Execution Engine Execution Engine Execution Engine Local APIC Local APIC Local APIC Local APIC Second Level Cache Second Level Cache Bus Interface Bus Interface System Bus 11
  12. 12. SPE- Synergistic Processing Element SPU – Synergistic Processor Unit SXU – Synergistic Execution Unit MFC – Memory Flow Control PPE – PowerPC Processor Element LS – Local Storage PPU - PowerPC Processing Unit PXU - PowerPC Execution Unit MIC – Memory Interface Controller L1, L2 – Local Storage BIC – Broadband Interface Controller SPE’s SPU SPU SPU SPU SXU SXU SXU SXU LS LS LS LS PPE MIC MFC MFC MFC MFC PPU L2 Element Interconnect Bus (EIB) (up to 96B/cycle) L1 PXU MFC MFC MFC MFC BIC LS LS LS LS SPE’s SXU SXU SXU SXU SPU SPU SPU SPU 12
  13. 13. PowerPC Processing Unit Local PowerPC Local Storage Extension Storage 2 1 Unit 13
  14. 14. Synergistic Processor Unit Synergistic Execution Unit Local Storage Memory Flow Control 14
  15. 15. User-Level Threads Kernel-Level Threads Hardware Threads 15
  16. 16. Defining and Operating Executing Preparing Threads Threads Threads Performed by Performed Performed by Programming by OS using Processors Environment Processes and Compiler 16
  17. 17. User-Level Threads Kernel-Level Threads Hardware Threads Intel/AMD Cell Sophisticated firmware Minimal firmware on chip to handle on chip to handle process execution execution 17
  18. 18. User-Level Threads Kernel-Level Threads Hardware Threads Intel/AMD Cell Multiple process Process management of management threads by Operating written by user: System total control! 18
  19. 19. User-Level Threads Kernel-Level Threads Hardware Threads Intel/AMD Cell Use threading package User manages to manage threads threads directly (OpenMP, Pthreads, or by adapting a TBB, etc) threading package 19
  20. 20. Intel/AMD Cell Completely controlled Controlled by user by OS and chip For execution to be fast, execution block (code and data) must be kept in cache as much as possible. 20
  21. 21. Global Interpreter Lock (GIL)  Cache Management  Data Management  Program Flow  Thread Design  21
  22. 22. Python allows only one instance of the  interpreter to run at any given time True multi-processing only available by  calling lower-level (C/C++/Fortran/etc) routines This is as it should be! The python  interpreter should not be parallelized 22
  23. 23. Significant Factors Available memory  Number of other processes running  How the OS handling of threads and the hardware  handling of threads interact with each other 23
  24. 24. Strategies Design data structures so that data can be sliced  into small chunks Start with small program and data structures, then  increase them slowly looking for performance degradation Optimize code in called processes  Not enough control to do much else!  24
  25. 25. Significant Factors Available memory on PowerPC  Whether there are other users on the cell  Progressive computation on one set of data  vs. separate computation on separate data 25
  26. 26. Strategies Process plus data for SPE’s must fit within  256 K Optimize code running on SPE’s – try  different options for your specific application Divide tasks sent to PPE into chunks that  will fit into SPE’s. 26
  27. 27. Data Stream Data 1 Data 2 Data 3 Data 4 Process Process Process Process Result 1 Result 2 Result 3 Result 4 Different data is put through the same process 27
  28. 28. Data Stream Data Data Data Data Process 1 Process 2 Process 3 Process 4 Result 1 Result 2 Result 3 Result 4 The same data is put through different processes 28
  29. 29. Avoiding Deadlocks  Avoiding Race Conditions  Scaling  29
  30. 30. NetWorkSpaces  RapidMind  QT Threads  30
  31. 31. Written in Python  Python-like interface  Written up in August 2007 Dr Dobb’s Journal  (currently available on literature table) Can work with other languages  Works on multiple processors as well as multi-core  Handles appropriate breakdown of data  31
  32. 32. Uses C++ like syntax to specify work to  be done in parallel Otherwise similar in functionality to  NetWorkSpaces Claims to be highly efficient  Currently in commercial use  Free for development; requires license  for released product 32
  33. 33. Originally intended to support GUI interfaces  across the internet (multiple systems) Covers mechanics of interface with processors  Does not handle data  QtPy is a python implementation  33
  34. 34. http://www-  128.ibm.com/developerworks/power/cell/docs_documentation.html Introduction to the Cell Multiprocessor  Cell Broadband Engine Programming Tutorial  Cell Broadband Engine Programming Handbook  Programming high-performance applications on  the Cell BE processor Maximizing the power of the CBE Processor  34
  35. 35. Dr. Dobb’s Journal article about depth-first search: http://www.ddj.com/dept/64bit/197801624 Software Development Kit http://www-128.ibm.com/developerworks/power/cell Programming the Cell Broadband Engine http://www.embedded.com/showArticle.jhtml?articleID=188101999 35

×