More Related Content
Similar to A Survey on Performance Analytical Tools for Partitioned Global Address Space
Similar to A Survey on Performance Analytical Tools for Partitioned Global Address Space (20)
More from TakuyaFukuoka2 (7)
A Survey on Performance Analytical Tools for Partitioned Global Address Space
- 5. PerformanceAnalyticalTools(PATs)
Theyarebasedonmeasure‑modifycycle
Datacollection
Dataanalysis
Datavisualizaion
Optimization
ManytoolshavebeendevelopedsuchasHPCToolkit[1],Tau[2]
Tomyknowledge,noPATsspecializedforsoftwareDSMexists
[1]Adhianto,L.,Banerjee,S.,Fagan,M.,Krentel,M.,Marin,G.,Mellor‑Crummey,J.,&Tallent,N.R.(2009).
HPCTOOLKIT:toolsforperformanceanalysisofoptimizedparallelprograms.ConcurrencyandComputation:
PracticeandExperience,22(6)https://doi.org/10.1002/cpe.1553
[2]Shende,S.S.,&Malony,A.D.(2006).TheTauParallelPerformanceSystem.InTheInternationalJournalof
HighPerformanceComputingApplications(Vol.20,pp.287–311).
5
- 9. PGASfamily
ManytypesofPGASsystemswithvariousinterfaceshavebeen
developed
UPC(UnifiedParallelC)[1]
Clanguageextension
SHMEM
Designedtoexposelow‑levelhardwarecapabilitieswith
minimaloverhead
Chapel[2]
Aparallelprogramminglanguagetosupporttaskparallelism
GlobalAarray,Co‑Array‑Fortranandetc......
[1]El‑Ghazawi,T.,&Cantonnet,F.(2002).UPCPerformanceandPotential:ANPBExperimentalStudy.InSC’02
Proceedingsofthe2002ACM/IEEEconferenceonSupercomputing(pp.1–26).
https://doi.org/10.1109/sc.2002.10034
[2]Chamberlain,B.L.,Callahan,D.,&Zima,H.P.(2007).ParallelProgrammabilityandtheChapelLanguage.The
InternationalJournalofHighPerformanceComputingApplications,21(3),291–312.
https://doi.org/10.1177/1094342007078442
9
- 11. UPCMatrix‑VectorMultiplication(1)
#include <upc_relaxed.h>
shared int a[THREADS][THREADS] ;
shared int b[THREADS], c[THREADS] ;
void main (void)
{
int i, j;
upc_forall( i = 0 ; i < THREADS ; i++; i){
c[i] = 0;
for ( j= 0 ; j < THREADS ; j++)
c[i] += a[i][j]*b[j];
}
}
fromhttp://www.training.prace‑ri.eu/training_material/uploads/tx_pracetmo/UPC_Edinburgh30March2011.pdf 11
- 12. UPCMatrix‑VectorMultiplication(2)
#include <upc_relaxed.h>
shared [THREAD] int a[THREADS][THREADS] ;
shared int b[THREADS], c[THREADS] ;
void main (void)
{
int i, j;
upc_forall( i = 0 ; i < THREADS ; i++; i){
c[i] = 0;
for ( j= 0 ; j < THREADS ; j++)
c[i] += a[i][j]*b[j];
}
}
fromhttp://www.training.prace‑ri.eu/training_material/uploads/tx_pracetmo/UPC_Edinburgh30March2011.pdf 12
- 41. OtherPerformanceAnalyticalToolsfor
PGAS
forX10[1]
Itvisualizeimplicitdatatransferamongplacesand
synchronizationamongactivities
forGASNet[2]
ItextendsPPWforGASNet,alow‑levelcommunicationlibrary
underPGASmodelssuchasUPCandSHMEM
[1]Itahashi,S.(2014).TowardaprofilingtoolforvisualizingimplicitbehaviorinX10.In2014X10Workshop
(X10’14)(pp.1–5).X10Workshop2014.
[2]Prakash,P.,III,M.B.,George,A.,&Aggarwal,V.(2011).PerformanceAnalysisFrameworkforGASNet
Middleware,Tools,andApplications.InPGAS’11.
41