Instrumenting abenchmarkapplicationTools and Measurements TechniquesProject by Mário Almeida (EMDC)Barcelona, 25 April 2012
Index (1/2)Tools and configuration● Parsec  ○ Overview  ○ Benchmark programs● Extrae● Paraver● Configuration              ...
Index (2/2)Measurements● Raytrace  ○ Overview  ○ Code  ○ Inputs  ○ Traces  ○ Load Balancing  ○ Cache misses and instructio...
Tools and configuration
ParsecOverview● Benchmark with the following characteristics:  ○   Multithreaded  ○   Emerging workloads  ○   Diverse  ○  ...
ParsecBenchmark programs●   blackscholes●   bodytrack●   canneal●   dedup●   facesim●   ferret●   fluidanimate●   freqmine...
Extrae● Instrumentation package to trace programs  and run with shared memory model and  message passing programming.     ...
Paraver● Detailed quantitative analysis of a program  performance.● Concurrent comparative analysis of several  traces.● S...
Configuration (1/4)Boada server:●   Dual CPU Six Core with Hyperthreading.●   Kills applications after a few minutes.●   2...
Configuration (2/4)Installed and/or configured:●   Parsec 2.1 with raytrace package only.●   Extrae 2.2.1.●   Paraver 4.3....
Configuration (3/4)                      9
Configuration (4/4)                      10
Measurements
RaytraceOverview● Physical simulation for visualization● Computer animation● Input is a complex object of many triangles. ...
RaytraceCodeFor every pixel in the image   calculate trajectory of ray striking pixel   find closest intersection point of...
RaytraceInputs●   simsmall - 1 million polygons (480x270)●   simmedium - 1 million poly (960x540)●   simlarge - 1 million ...
RaytraceTrace (1/2)Only 10% of the execution time is parallel!    Not created   Running                                   ...
RaytraceTrace (2/2)Render time is proportional to the # of frames!     Init and adding object   Build Context   Render    ...
RaytraceLoad balancing (1/2)Not created             Create Threads    Task              Barrier                    Wait fo...
RaytraceLoad balancing (2/2)Good load balancing between the slavethreads.                                        17
RaytraceCache and instructions   High number of cache misses   Very low number of cache misses                            ...
RaytraceExecution time (1/3)                  These are average times from                  multiple executions of the par...
RaytraceExecution time (2/3)                  There was a smaller average                  deviation of 0.03 seconds.     ...
RaytraceExecution time (3/3)                  There was a even smaller average                  deviation of 0.02 seconds....
RaytraceConfiguration comparison                   In the case of the limited                   configuration, although   ...
RaytraceExtrae overhead                  23
Conclusions
Conclusions (1/3)● The system seemed to perform worse for a  number of threads multiple of the total  number of physical c...
Conclusions (2/3)● Although it wasnt possible to verify,  increasing the input should cause higher  cache misses, because ...
Conclusions (3/3)● Paraver simplifies the process of analyzing  an application performance.● Better knowledge of the syste...
Questions
Upcoming SlideShare
Loading in …5
×

Instrumenting parsecs raytrace

1,222 views
1,117 views

Published on

(Check my blog @ http://www.marioalmeida.eu/ )

In this presentation I present the performance metrics and results of running the parsec benchmark with the raytrace application on Upc's boada server

Published in: Technology
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
1,222
On SlideShare
0
From Embeds
0
Number of Embeds
282
Actions
Shares
0
Downloads
4
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Instrumenting parsecs raytrace

  1. 1. Instrumenting abenchmarkapplicationTools and Measurements TechniquesProject by Mário Almeida (EMDC)Barcelona, 25 April 2012
  2. 2. Index (1/2)Tools and configuration● Parsec ○ Overview ○ Benchmark programs● Extrae● Paraver● Configuration 1
  3. 3. Index (2/2)Measurements● Raytrace ○ Overview ○ Code ○ Inputs ○ Traces ○ Load Balancing ○ Cache misses and instructions ○ Execution time ○ Configuration comparisons ○ Extrae overheadConclusions 2
  4. 4. Tools and configuration
  5. 5. ParsecOverview● Benchmark with the following characteristics: ○ Multithreaded ○ Emerging workloads ○ Diverse ○ Not HPC-focused ○ Research 3
  6. 6. ParsecBenchmark programs● blackscholes● bodytrack● canneal● dedup● facesim● ferret● fluidanimate● freqmine● raytrace● ... 4
  7. 7. Extrae● Instrumentation package to trace programs and run with shared memory model and message passing programming. 5
  8. 8. Paraver● Detailed quantitative analysis of a program performance.● Concurrent comparative analysis of several traces.● Support for mixed message passing and shared memory.● Building of derived metrics. 6
  9. 9. Configuration (1/4)Boada server:● Dual CPU Six Core with Hyperthreading.● Kills applications after a few minutes.● 24 GB of RAM.Boada server:● Used cpulimit to limit the cpu usage up to four cores. 7
  10. 10. Configuration (2/4)Installed and/or configured:● Parsec 2.1 with raytrace package only.● Extrae 2.2.1.● Paraver 4.3.0 (in my laptop).● CpuLimit● Minor configurations on .bashrc.● Multiple scripts to clean, build and run. 8
  11. 11. Configuration (3/4) 9
  12. 12. Configuration (4/4) 10
  13. 13. Measurements
  14. 14. RaytraceOverview● Physical simulation for visualization● Computer animation● Input is a complex object of many triangles. 11
  15. 15. RaytraceCodeFor every pixel in the image calculate trajectory of ray striking pixel find closest intersection point of ray with scenegeometry calculate contribution of all lights at intersection point recursively trace specularly reflected rayend for 12
  16. 16. RaytraceInputs● simsmall - 1 million polygons (480x270)● simmedium - 1 million poly (960x540)● simlarge - 1 million poly (1920x1080)● native - 10 million poly (1920x1080) 13
  17. 17. RaytraceTrace (1/2)Only 10% of the execution time is parallel! Not created Running 14
  18. 18. RaytraceTrace (2/2)Render time is proportional to the # of frames! Init and adding object Build Context Render 15
  19. 19. RaytraceLoad balancing (1/2)Not created Create Threads Task Barrier Wait for all threads 16
  20. 20. RaytraceLoad balancing (2/2)Good load balancing between the slavethreads. 17
  21. 21. RaytraceCache and instructions High number of cache misses Very low number of cache misses There were no significative diferences of IPC between threads. 18
  22. 22. RaytraceExecution time (1/3) These are average times from multiple executions of the parallel code only and without extrae overhead. There was a high average deviation of 0.3 seconds in the experiments. Bigger inputs were more accurate. 19
  23. 23. RaytraceExecution time (2/3) There was a smaller average deviation of 0.03 seconds. With 64 threads it runs almost three times faster! 20
  24. 24. RaytraceExecution time (3/3) There was a even smaller average deviation of 0.02 seconds. With 64 threads it runs almost three times faster! 21
  25. 25. RaytraceConfiguration comparison In the case of the limited configuration, although perfomance doesnt seem to degrade, the execution time seems to stabilize for more than 8 threads. 22
  26. 26. RaytraceExtrae overhead 23
  27. 27. Conclusions
  28. 28. Conclusions (1/3)● The system seemed to perform worse for a number of threads multiple of the total number of physical cores.● The program has a good load balancing.● Fine-granular parallelism. 24
  29. 29. Conclusions (2/3)● Although it wasnt possible to verify, increasing the input should cause higher cache misses, because of the big working sets that wont fit on the memory.● Memory bandwidth should be the main issue for good speedups.● Boada killed almost all the native input executions. 25
  30. 30. Conclusions (3/3)● Paraver simplifies the process of analyzing an application performance.● Better knowledge of the systems architecture would be needed in order further analyse the performance of the application. 26
  31. 31. Questions

×