2. Introduction
• What is parallel processing ?
It is ability of processing more than one job simultaneously.
• Why going parallel ?
• Great deal of data to be processed
• Time needed to calculate an engineering equation
• Need jobs to be done faster
Isfahan University of Technology, Dep. Electronic and Computer
Engineering
2
3. Technologies
• What technologies used for parallel processing ?
• Network based parallel processing
• Utilizing CPU free time and power
• Fact is most of CPU time and power is wasting
• Tearing down jobs and run them on resources
• Local parallelism on multicore/multiprocessor systems
• Utilize the concept of multithreading
• Utilize the concept of share memory
• Can be run on either GPU or CPU
Isfahan University of Technology, Dep. Electronic and Computer
Engineering
3
4. Tools and Technics
• What tools used for parallel processing ?
• Network based parallel processing
• Gird based parallel computing
• Cloud based parallelism and Cloud computing
• Local parallelism on multicore/multiprocessor systems
• NVidia® CODA™
• MPI
• Posix Threads
• OpenMP
Isfahan University of Technology, Dep. Electronic and Computer
Engineering
4
5. What is OpenMP
• OpenMP
• In simple word runs a user program in parallel.
• It utilize to main concepts for parallelism
• Multithreading
• Shared Memory
• It takes user application, tear it down into group of threads and
runs them on a shared memory foundation
Isfahan University of Technology, Dep. Electronic and Computer
Engineering
5
6. Why using OpenMP
• It is simple to use it
• Most of the times there is no need to change program code
• It utilize compiler directives to demonstrate parallel region
• It is cross platform
• It supports by Fortran and C / C++
Isfahan University of Technology, Dep. Electronic and Computer
Engineering
6
7. Programming Model
• Shared Memory
• Parallelism by threading
• Fork-Join model
• Explicit Parallelism
• Nested Parallelism
• Dynamic Threads
• Input / Output
• Memory model
Isfahan University of Technology, Dep. Electronic and Computer
Engineering
7
8. Shared Memory
• What is shared memory ?
• Why using shared memory?
• Shared Memory in OpenMP
Isfahan University of Technology, Dep. Electronic and Computer
Engineering
8
9. Shared Memory (Cont.)
• Following system can be used for shared memory access
• a single core chip (older PC’s, sequential execution)
• a multicore chip (such as your laptop?)
• multiple single core chips in a NUMA system
• multiple multicore chips in a NUMA system (VT SGI system)
Isfahan University of Technology, Dep. Electronic and Computer
Engineering
9
10. UMA Vs. NUMA
• Unified Memory Access ( UMA )
Isfahan University of Technology, Dep. Electronic and Computer
Engineering
10
11. UMA Vs. NUMA (Cont.)
• Non Unified Memory Access ( NUMA )
Isfahan University of Technology, Dep. Electronic and Computer
Engineering
11
12. Multi Threading
• What is Multi Threading
• What is Intel Hyper-Threading
• Why using Multi Threading
• Multi Threading in OpenMP
Isfahan University of Technology, Dep. Electronic and Computer
Engineering
12
13. Fork – Join Model
• What is Fork
• What is Join
• How Multi Threading works in OpenMP
Isfahan University of Technology, Dep. Electronic and Computer
Engineering
13
14. Fork – Join Model (Cont.)
Isfahan University of Technology, Dep. Electronic and Computer
Engineering
14
F
J
Master
Thread
Thread
15. OpenMP Elements
• Compiler Directives
• Runtime Libraries
• Environmental Variables
Isfahan University of Technology, Dep. Electronic and Computer
Engineering
15
16. How to use OpenMP
• OpenMP implemented for C/C++ and Fortran
• In C/C++ we use compiler directives
• We only need to specify the parallel region
Isfahan University of Technology, Dep. Electronic and Computer
Engineering
16
17. How to use OpenMP
• In non Microsoft compiler:
Isfahan University of Technology, Dep. Electronic and Computer
Engineering
17
18. How to use OpenMP (Cont.)
• In Visual Studio :
Isfahan University of Technology, Dep. Electronic and Computer
Engineering
18
19. Real Expriment
void main()
{
omp_set_num_threads(6);
LARGE_INTEGER frequency; // ticks per secon
LARGE_INTEGER t1, t2; // ticks
double elapsedTime;
// get ticks per second
QueryPerformanceFrequency(&frequency);
// start timer
QueryPerformanceCounter(&t1);
#pragma omp parallel for
for(int i =0 ; i < 999999 ; i++)
for(int i =0 ; i < 1000 ; i++);
// stop timer
QueryPerformanceCounter(&t2);
elapsedTime= (t2.QuadPart - t1.QuadPart) * 1000.0 / frequency.QuadPart;
cout << elapsedTime << " ms.n";
}
Isfahan University of Technology, Dep. Electronic and Computer
Engineering
19
20. Expriment Result - Sequential
• It took 3347.68 milliseconds to run
Isfahan University of Technology, Dep. Electronic and Computer
Engineering
20
21. Expriment Result - Parallel
• It took 983.576 milliseconds to run
Isfahan University of Technology, Dep. Electronic and Computer
Engineering
21