Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

M&t presentation

278 views

Published on

Benchmark Instrumentation, Paraver

Published in: Technology, Art & Photos
  • Be the first to comment

  • Be the first to like this

M&t presentation

  1. 1. BENCHMARKINSTRUMENTATIONUmit Cavus BUYUKSAHINMeasurements Tools & Techinics, Spring ‘12 4/17/2012
  2. 2. Benchmark Instrumentation 2OUTLINE• NAS Benchmark Suite• Experiments• Paraver Visualization • Code View • Communication • Disk I/O • Load Balancing • LD1 Cache Miss • Cycles per Instruction (CPI)• Execution Time• Benchmarking Time• Conclusion
  3. 3. Benchmark Instrumentation 3NAS Benchmark Suite• NAS ... is a set of benchmarks. ... evaluates performance of highly parallel supercomputers. ... developed and maintained by NASA Advanced Supercomputing(NAS).
  4. 4. Benchmark Instrumentation 4NAS Benchmark Suite• NAS Kernel Applications • IS - Integer Sort • EP - Embarrassingly Parallel • CG - Conjugate Gradient • MG - Multi-Grid • FT - discrete 3D fast Fourier Transform• Problem Sizes • S : small size • W : workstation size • A, B, C : standart test size; ~4X size in increasing order • D, E, F : large test size; ~16X size in increasing order
  5. 5. Benchmark Instrumentation 5OUTLINE• NAS Benchmark Suite• Experiments• Paraver Visualization • Code View • Communication • Disk I/O • Load Balancing • LD1 Cache Miss • Cycles per Instruction (CPI)• Execution Time• Benchmarking Time• Conclusion
  6. 6. Benchmark Instrumentation 6Experiments• NAS Parallel Benchmark version 3.2.1• IS Kernel Application: ... sorts N keys in parallel. ... tests • integer computation speed • communication perfomance• S Problem Size: ... small for quick test purposes ... has 216 keys
  7. 7. Benchmark Instrumentation 7Experiments• IS Benchmarking Procedure (generally) 1. Generating sequence of N keys 2. Loading N keys into the memory systems 3. Time begins 4. Loop Sorting & partial verification 5. Time ends 6. Full verification.
  8. 8. Benchmark Instrumentation 8ExperimentsMachines: • My Computer  i686 GNU/Linux  3Gb Ram  2 CPUSs with 800Mhz • Boada  x86_64 x86_64 x86_64 GNU/Linux  24Gb Ram  24 CPUS with 1596Mhz
  9. 9. Benchmark Instrumentation 9ExperimentsProcedure: • Not manually instrumented. • Paraver traces are automatically generated • LD_PRELOAD is exported. • Benchmarks are executed with 2,4,8,16,32, and 64 processors. • Benchmark results are analyzed • Generated traces are examined in paraver tools.
  10. 10. Benchmark Instrumentation 10OUTLINE• NAS Benchmark Suite• Experiments• Paraver Visualization • Code View • Communication • Disk I/O • Load Balancing • LD1 Cache Miss • Cycles per Instruction (CPI)• Execution Time• Benchmarking Time• Conclusion
  11. 11. Benchmark Instrumentation 11Paraver Visualization – Code View• My Computer• Boada
  12. 12. Benchmark Instrumentation 12Paraver Visualization – Communication• My Computer• Boada
  13. 13. Benchmark Instrumentation 13Paraver Visualization – Disk I/O• My Computer• Boada
  14. 14. Benchmark Instrumentation 14Paraver Visualization – Load Balance• My Computer....
  15. 15. Benchmark Instrumentation 15Paraver Visualization – Load Balance• Boada....
  16. 16. Benchmark Instrumentation 16Paraver Visualization – LD1 Cache Miss• My Computer
  17. 17. Benchmark Instrumentation 17Paraver Visualization – LD1 Cache Miss• Boada
  18. 18. Benchmark Instrumentation 18Paraver Visualization – CPI• My Computer
  19. 19. Benchmark Instrumentation 19Paraver Visualization – CPI• Boada
  20. 20. Benchmark Instrumentation 20OUTLINE• NAS Benchmark Suite• Experiments• Paraver Visualization • Code View • Communication • Disk I/O • Load Balancing • LD1 Cache Miss • Cycles per Instruction (CPI)• Execution Time• Benchmarking Time• Conclusion
  21. 21. Benchmark Instrumentation 21Execution Time 16000 14000 12000 10000 MyComputer Time (ms) 8000 Boada 6000 4000 2000 0 2 4 8 16 32 64 # of processors
  22. 22. Benchmark Instrumentation 22Execution Time 𝐸𝑥𝑒𝑐𝑢𝑡𝑖𝑜𝑛𝑇𝑖𝑚𝑒 𝑜𝑓 𝑀𝑦𝐶𝑜𝑚𝑝𝑢𝑡𝑒𝑟• Relative Speedup = 𝐸𝑥𝑒𝑐𝑢𝑡𝑖𝑜𝑛𝑇𝑖𝑚𝑒 𝑜𝑓 𝐵𝑜𝑎𝑑𝑎 60 50 40 30 SpeedUp 20 10 0 1 2 4 8 16 32 64 # of processors
  23. 23. Benchmark Instrumentation 23OUTLINE• NAS Benchmark Suite• Experiments• Paraver Visualization • Code View • Communication • Disk I/O • Load Balancing • LD1 Cache Miss • Cycles per Instruction (CPI)• Execution Time• Benchmarking Time• Conclusion
  24. 24. Benchmark Instrumentation 24Benchmarking Time - reminder• IS Benchmarking Procedure (generally) 1. Generating sequence of N keys 2. Loading N keys into the memory systems 3. Time begins 4. Loop Sorting & partial verification 5. Time ends 6. Full verification.• Benchmarking Time = execution time of the parallel algorithm
  25. 25. Benchmark Instrumentation 25Benchmarking Time 2,000 1,800 1,600 1,400 1,200 Time (sec) 1,000 MyComputer Boada 0,800 0,600 0,400 0,200 0,000 1 2 4 8 16 32 64 # of processors
  26. 26. Benchmark Instrumentation 26Benchmarking Time 𝐵𝑒𝑛𝑐ℎ𝑚𝑎𝑟𝑘𝑖𝑛𝑔𝑇𝑖𝑚𝑒 𝑜𝑓 𝑀𝑦𝐶𝑜𝑚𝑝𝑢𝑡𝑒𝑟• Relative Speedup = 𝐵𝑒𝑛𝑐ℎ𝑚𝑎𝑟𝑘𝑖𝑛𝑔𝑇𝑖𝑚𝑒 𝑜𝑓 𝐵𝑜𝑎𝑑𝑎 70,00 60,00 50,00 40,00 SpeedUp 30,00 20,00 10,00 0,00 # of processors 1 2 4 8 16 32 64
  27. 27. Benchmark Instrumentation 27Benchmarking Time• SpeedUp of My Computer 1,2 1 0,8 SpeedUp 0,6 0,4 0,2 0 # of processors 1 2 4 8 16 32 64
  28. 28. Benchmark Instrumentation 28Benchmarking Time• SpeedUp of Boada 7 6 5 4 SpeedUp 3 2 1 0 1 2 4 8 16 32 64 # of processors
  29. 29. Benchmark Instrumentation 29OUTLINE• NAS Benchmark Suite• Experiments• Paraver Visualization • Code View • Communication • Disk I/O • Load Balancing • LD1 Cache Miss • Cycles per Instruction (CPI)• Execution Time• Benchmarking Time• Conclusion
  30. 30. Benchmark Instrumentation 30Conclusion• IS application • ... does not have so much communication. • ... is based on computation and memory loading. • ... has low cache miss and high CPI values in computation phase.• NAS is designed for highly parallel supercomputers. • MyComputer is inadequate to meet requierments of NAS. • MyComputer can not speed up in this application. • Boada can speed up untill number of processors that it has. • Mycomputer saves less time for disk I/O operations. • CPI values in Boada’ s computation phase less.
  31. 31. BENCHMARKINSTRUMENTATIONUmit Cavus BUYUKSAHINMeasurements & Tools, Spring ‘12 4/17/2012

×