MitgliedderHelmholtz-GemeinschaftScalable ParallelPerformance Measurementwith the Scalasca ToolsetBernd MohrJune 2013
June 2013 JSC 2Parallel Architectures: State of the ArtNetwork or Switch...N0 N1 NkInter-connectP0 Pn...MemoryA0Am... Inte...
June 2013 JSC 3Parallel Performance Challenges• Current and future systems (will) consist of Complex configurations With...
June 2013 JSC 4“A picture is worth 1000 words…”• “Real world” example• MPI ring program
June 2013 JSC 5“What about 1000’s of pictures?”(with 100’s of menu options)
June 2013 JSC 6Example Automatic Analysis: Late Sender
June 2013 JSC 7Scalasca: Example MPI PatternstimeprocessENTER EXIT SEND RECV COLLEXIT(a) Late Sendertimeprocess(b) Late Re...
June 2013 JSC 8The Scalasca Project• Scalable Analysis ofLarge Scale Applications• Approach Instrument C, C++, and Fortra...
June 2013 JSC 9Scalasca Example: CESM Sea Ice ModuleLate SenderAnalysis• Finds waiting atMPI_Waitall()insideice boundaryha...
June 2013 JSC 10Scalasca Example: CESM Sea Ice ModuleLate SenderAnalysis +ApplicationTopology• Shows distributionof imbala...
June 2013 JSC 11timeScalasca Root Cause Analysis• Root-cause analysis Wait states typically caused by loador communicatio...
June 2013 JSC 12Scalasca Example: CESM Sea Ice ModuleDirect WaitTime Analysis• Direct waitcaused by ranksprocessing areasn...
June 2013 JSC 13Scalasca Example: CESM Sea Ice ModuleIndirect WaitTime Analysis• Indirect waitsoccurs forranks processingw...
June 2013 JSC 14Scalasca Example: CESM Sea Ice ModuleDelay CostsAnalysis• Delays NOTcaused on ranksprocessingice!
June 2013 JSC 15NEW: Scalasca on Intel MICExample:• TACC Stampede• NAS BT-MZ code• MPI/OpenMP• 8x16 CPU threads (2 MPI/nod...
June 2013 JSC 16Acknowledgements• Scalasca team (JSC) (GRS)• SponsorsMichaelKnoblochBerndMohrPeterPhilippenMarkusGeimerDan...
June 2013 JSC 17Questions?• Check outhttp://www.scalasca.org• Or contact us atscalasca@fz-juelich.de
Upcoming SlideShare
Loading in...5
×

Scalable Parallel Performance Measurement with the Scalasca Toolset

295

Published on

"Tools which provide insight not just numbers or charts" from Scalasca

0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
295
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
0
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Scalable Parallel Performance Measurement with the Scalasca Toolset

  1. 1. MitgliedderHelmholtz-GemeinschaftScalable ParallelPerformance Measurementwith the Scalasca ToolsetBernd MohrJune 2013
  2. 2. June 2013 JSC 2Parallel Architectures: State of the ArtNetwork or Switch...N0 N1 NkInter-connectP0 Pn...MemoryA0Am... Inter-connectP0 Pn...MemoryA0Am...Inter-connectP0 Pn...A0Am...MemoryPiCore0 Core1 CorerL10 L11 L1L20 L2r/2L30...... AjRouter RouterRouterRouter RouterRouterRouterRouter Router RouterRouter Router RouterRouter Router RouterRouter Router RouterRouter Router RouterRouter Router RouterRouter Router RouterRouter Router RouterRouter Router RouterorSMPNUMA
  3. 3. June 2013 JSC 3Parallel Performance Challenges• Current and future systems (will) consist of Complex configurations With a huge number of components Very likely heterogeneous• Deep software hierarchies of large, complex software components willbe required to make use of such systems Sophisticated integrated performancemeasurement, analysis, and optimization capabilitieswill be required to efficiently operate such systems Tools which provide insight not just numbers or charts needed!
  4. 4. June 2013 JSC 4“A picture is worth 1000 words…”• “Real world” example• MPI ring program
  5. 5. June 2013 JSC 5“What about 1000’s of pictures?”(with 100’s of menu options)
  6. 6. June 2013 JSC 6Example Automatic Analysis: Late Sender
  7. 7. June 2013 JSC 7Scalasca: Example MPI PatternstimeprocessENTER EXIT SEND RECV COLLEXIT(a) Late Sendertimeprocess(b) Late Receivertimeprocess(d) Wait at N x Ntimeprocess(c) Late Sender / Wrong Order
  8. 8. June 2013 JSC 8The Scalasca Project• Scalable Analysis ofLarge Scale Applications• Approach Instrument C, C++, and Fortran parallel applications Based on MPI, OpenMP, SHMEM, or hybrid Option 1: scalable call-path profiling Option 2: scalable event trace analysis Collect event traces Search trace for event patterns representing inefficiencies Categorize and rank inefficiencies found• Supports MPI 2.2 (P2P, collectives, RMA, IO) and OpenMP 3.0 (excl. nesting)http://www.scalasca.org/
  9. 9. June 2013 JSC 9Scalasca Example: CESM Sea Ice ModuleLate SenderAnalysis• Finds waiting atMPI_Waitall()insideice boundaryhalo update• Shows distributionof imbalanceacross systemand ranks
  10. 10. June 2013 JSC 10Scalasca Example: CESM Sea Ice ModuleLate SenderAnalysis +ApplicationTopology• Shows distributionof imbalanceover topology• MPI topologiesare automaticallycaptured
  11. 11. June 2013 JSC 11timeScalasca Root Cause Analysis• Root-cause analysis Wait states typically caused by loador communication imbalancesearlier in the program Waiting time can also propagate(e.g., indirect waiting time) Enhanced performance analysis tofind the root cause of wait states• Approach Distinguish between directand indirect waiting time Identify call path/processcombinations delaying otherprocesses and causing firstorder waiting time Identify original delayRecvSendSendfoofoofoobarbar RecvABCcauseRecvRecvDirect waitIndirect waitRecvbarDELAY
  12. 12. June 2013 JSC 12Scalasca Example: CESM Sea Ice ModuleDirect WaitTime Analysis• Direct waitcaused by ranksprocessing areasnear the northand southice borders
  13. 13. June 2013 JSC 13Scalasca Example: CESM Sea Ice ModuleIndirect WaitTime Analysis• Indirect waitsoccurs forranks processingwarmer areas
  14. 14. June 2013 JSC 14Scalasca Example: CESM Sea Ice ModuleDelay CostsAnalysis• Delays NOTcaused on ranksprocessingice!
  15. 15. June 2013 JSC 15NEW: Scalasca on Intel MICExample:• TACC Stampede• NAS BT-MZ code• MPI/OpenMP• 8x16 CPU threads (2 MPI/node)• 60x16 MIC threads (15 MPI/MIC)Supported modes• Host-only or MIC-only• SymmetricNot yet supported modes• Offload
  16. 16. June 2013 JSC 16Acknowledgements• Scalasca team (JSC) (GRS)• SponsorsMichaelKnoblochBerndMohrPeterPhilippenMarkusGeimerDanielLorenzChristianRösselDavidBöhmeMarc-AndréHermannsPavelSaviankouMarcSchlütterIljaZhukovAlexandreStrubeBrianWylieFelixWolfAnkeVisserMonikaLückeAamerShahAlexandruCalotoiuJieJiangSergeiShudlerGuoyongMaoPhilippGschwandtner
  17. 17. June 2013 JSC 17Questions?• Check outhttp://www.scalasca.org• Or contact us atscalasca@fz-juelich.de

×