SlideShare a Scribd company logo
Nucleon TMD Contractions in Lattice QCD
using QUDA1
Christos Kallidonis, Sergey Syritsyn
X
Y
key=vl3_xxx_bl2_YY
Nucleon EDMs on a Lattice 

at the Physical Point
Sergey N. Syritsyn,
Stony Brook University & RIKEN / BNL Research Center
together with LHP and RBC collaborations
LATTICE 2018
East Lansing, MI, July 22-28, 2018
Courtesy of BMW Collaboration
GPU Hackathon
Brookhaven National Laboratory
Sep. 17-21, 2018
Progress Report
Mentors:
Kate Clark, Mathias Wagner
1 https://github.com/lattice/quda
with GPU Lattice team:
C. Jung, M. Lin, D. Howarth, J. Tu, B. Wang, D. Guo
Problem at hand
Degrees of freedom:
• (local) volume sites: x = 1,…,512K
• Ns spin: α,β = 1,…,4
• Nc color: a, b = 1,…,3
• Vector index: k = 1,…,12
• Γ-matrix index: i = 1,…,16
• Complex numbers! x2
# cplx multiply-add / site: N2
c N2
s ⇥ (1 + NcNs) + N3
s
15104 Flops
(2NcNs)2
+ N2
c ) ⇤ cplx = 4752 Bytes
N2
s ⇤ cplx = 256 Bytes
Inp. mem/site:
Out. mem/site:
⇥
=
Uba
(x) wk (x)a
↵ Wk (x)b
↵ ⇥ v?
k (x)b =
+
+
⇥ C(i)
(x)
Fk (x)↵
G(x)↵ =
X
k
Fk (x)↵
G(x)↵
(i)
↵
=
C(i)
(x) =
X
k
X
↵, ,a,b
(i)
↵U(x)ba
wk (x)a
↵v?
k (x)b
Kernel optimization
Iteration-0:
• assign 1 thr/site
• loop over, a, b, α, β
• sum over k
• perform trace
Iteration-1:
• QUDA: block/grid auto-tuning functionality
⇥
=
Uba
(x) wk (x)a
↵ Wk (x)b
↵ ⇥ v?
k (x)b =
+
+
⇥ C(i)
(x)
Fk (x)↵
G(x)↵ =
X
k
Fk (x)↵
G(x)↵
(i)
↵
Can do better than that!
Performance per GPU (1/2 K80): ~ 6 GFlop/s
Memory Bandwidth: ~ 1.9 GB/s
Kernel exec. cost: 6 GPU*sec
—> Dominant part of workflow
Nvidia Visual
profiler:
Thanks, Mathias!
C(i)
(x) =
X
k
X
↵, ,a,b
(i)
↵U(x)ba
wk (x)a
↵v?
k (x)b
Kernel optimization
Iteration-2:
• move required buffers to shared memory
• extend the block dim. to 3d - assign color/spin
indices to individual threads
• #pragma unroll the (remaining) loops
• inline relevant functions involving Γ-matrices
Kernel exec. cost: 5.2 GPU*sec, x1.15 impr.
Profiler still complains about very
high local memory overhead…
⇥
=
Uba
(x) wk (x)a
↵ Wk (x)b
↵ ⇥ v?
k (x)b =
+
+
⇥ C(i)
(x)
Fk (x)↵
G(x)↵ =
X
k
Fk (x)↵
G(x)↵
(i)
↵
C(i)
(x) =
X
k
X
↵, ,a,b
(i)
↵U(x)ba
wk (x)a
↵v?
k (x)b
Kernel optimization
Iteration-3:
• Move Γ-matrices to constant memory, did the trick. Thanks, Kate!
—> compiler could not resolve array indexing,
buffers spilled to local memory
QUDA auto-tuner report:
Performance: 205 Gflop/s
Memory BW: 65 GB/s
Kernel exec. cost: 0.16 GPU*sec (to compare with 5.2 GPU*sec)
—> Now only 4% of workflow
x32 improvement!!
On-going work:
• can we squeeze more Flop/s ?
• optimize communication-intensive code segments
• experiment with env. variables
• update/optimize the rest of contraction kernels
⇥
=
Uba
(x) wk (x)a
↵ Wk (x)b
↵ ⇥ v?
k (x)b =
+
+
⇥ C(i)
(x)
Fk (x)↵
G(x)↵ =
X
k
Fk (x)↵
G(x)↵
(i)
↵
C(i)
(x) =
X
k
X
↵, ,a,b
(i)
↵U(x)ba
wk (x)a
↵v?
k (x)b

More Related Content

What's hot

Fine grained asynchronism for pseudo-spectral codes - with application to tur...
Fine grained asynchronism for pseudo-spectral codes - with application to tur...Fine grained asynchronism for pseudo-spectral codes - with application to tur...
Fine grained asynchronism for pseudo-spectral codes - with application to tur...
Ganesan Narayanasamy
 
20181116 Massive Log Processing using I/O optimized PostgreSQL
20181116 Massive Log Processing using I/O optimized PostgreSQL20181116 Massive Log Processing using I/O optimized PostgreSQL
20181116 Massive Log Processing using I/O optimized PostgreSQL
Kohei KaiGai
 
The next generation of the Montage image mosaic engine
The next generation of the Montage image mosaic engineThe next generation of the Montage image mosaic engine
The next generation of the Montage image mosaic engine
G. Bruce Berriman
 
Faster Practical Block Compression for Rank/Select Dictionaries
Faster Practical Block Compression for Rank/Select DictionariesFaster Practical Block Compression for Rank/Select Dictionaries
Faster Practical Block Compression for Rank/Select Dictionaries
Rakuten Group, Inc.
 
Fast Identification of Heavy Hitters by Cached and Packed Group Testing
Fast Identification of Heavy Hitters by Cached and Packed Group TestingFast Identification of Heavy Hitters by Cached and Packed Group Testing
Fast Identification of Heavy Hitters by Cached and Packed Group Testing
Rakuten Group, Inc.
 
A Generate-Test-Aggregate Parallel Programming Library on Spark
A Generate-Test-Aggregate Parallel Programming Library on SparkA Generate-Test-Aggregate Parallel Programming Library on Spark
A Generate-Test-Aggregate Parallel Programming Library on Spark
Yu Liu
 
Fast Wavelet Tree Construction in Practice
Fast Wavelet Tree Construction in PracticeFast Wavelet Tree Construction in Practice
Fast Wavelet Tree Construction in Practice
Rakuten Group, Inc.
 
Look Mommy, No GC! (TechDays NL 2017)
Look Mommy, No GC! (TechDays NL 2017)Look Mommy, No GC! (TechDays NL 2017)
Look Mommy, No GC! (TechDays NL 2017)
Dina Goldshtein
 
SX Aurora TSUBASA (Vector Engine) a Brand-new Vector Supercomputing power in...
SX Aurora TSUBASA  (Vector Engine) a Brand-new Vector Supercomputing power in...SX Aurora TSUBASA  (Vector Engine) a Brand-new Vector Supercomputing power in...
SX Aurora TSUBASA (Vector Engine) a Brand-new Vector Supercomputing power in...
inside-BigData.com
 
April 2015 APS presentation
April 2015 APS presentationApril 2015 APS presentation
April 2015 APS presentation
Adam Getchell
 
Regularised Cross-Modal Hashing (SIGIR'15 Poster)
Regularised Cross-Modal Hashing (SIGIR'15 Poster)Regularised Cross-Modal Hashing (SIGIR'15 Poster)
Regularised Cross-Modal Hashing (SIGIR'15 Poster)
Sean Moran
 
Seismic data analysis with u net
Seismic data analysis with u netSeismic data analysis with u net
Seismic data analysis with u net
Ding Li
 
Graph 500 DISLIB powered optimized version
Graph 500 DISLIB powered optimized versionGraph 500 DISLIB powered optimized version
Graph 500 DISLIB powered optimized version
Anton Korzh
 
Buckner_GPU-R_BOSC2009
Buckner_GPU-R_BOSC2009Buckner_GPU-R_BOSC2009
Buckner_GPU-R_BOSC2009bosc
 
Japan Lustre User Group 2014
Japan Lustre User Group 2014Japan Lustre User Group 2014
Japan Lustre User Group 2014
Hitoshi Sato
 
pMatlab on BlueGene
pMatlab on BlueGenepMatlab on BlueGene
pMatlab on BlueGenevsachde
 
Gpu Cuda
Gpu CudaGpu Cuda

What's hot (20)

Fine grained asynchronism for pseudo-spectral codes - with application to tur...
Fine grained asynchronism for pseudo-spectral codes - with application to tur...Fine grained asynchronism for pseudo-spectral codes - with application to tur...
Fine grained asynchronism for pseudo-spectral codes - with application to tur...
 
20181116 Massive Log Processing using I/O optimized PostgreSQL
20181116 Massive Log Processing using I/O optimized PostgreSQL20181116 Massive Log Processing using I/O optimized PostgreSQL
20181116 Massive Log Processing using I/O optimized PostgreSQL
 
The next generation of the Montage image mosaic engine
The next generation of the Montage image mosaic engineThe next generation of the Montage image mosaic engine
The next generation of the Montage image mosaic engine
 
cnsm2011_slide
cnsm2011_slidecnsm2011_slide
cnsm2011_slide
 
Faster Practical Block Compression for Rank/Select Dictionaries
Faster Practical Block Compression for Rank/Select DictionariesFaster Practical Block Compression for Rank/Select Dictionaries
Faster Practical Block Compression for Rank/Select Dictionaries
 
Fast Identification of Heavy Hitters by Cached and Packed Group Testing
Fast Identification of Heavy Hitters by Cached and Packed Group TestingFast Identification of Heavy Hitters by Cached and Packed Group Testing
Fast Identification of Heavy Hitters by Cached and Packed Group Testing
 
A Generate-Test-Aggregate Parallel Programming Library on Spark
A Generate-Test-Aggregate Parallel Programming Library on SparkA Generate-Test-Aggregate Parallel Programming Library on Spark
A Generate-Test-Aggregate Parallel Programming Library on Spark
 
Fast Wavelet Tree Construction in Practice
Fast Wavelet Tree Construction in PracticeFast Wavelet Tree Construction in Practice
Fast Wavelet Tree Construction in Practice
 
Look Mommy, No GC! (TechDays NL 2017)
Look Mommy, No GC! (TechDays NL 2017)Look Mommy, No GC! (TechDays NL 2017)
Look Mommy, No GC! (TechDays NL 2017)
 
SX Aurora TSUBASA (Vector Engine) a Brand-new Vector Supercomputing power in...
SX Aurora TSUBASA  (Vector Engine) a Brand-new Vector Supercomputing power in...SX Aurora TSUBASA  (Vector Engine) a Brand-new Vector Supercomputing power in...
SX Aurora TSUBASA (Vector Engine) a Brand-new Vector Supercomputing power in...
 
April 2015 APS presentation
April 2015 APS presentationApril 2015 APS presentation
April 2015 APS presentation
 
Regularised Cross-Modal Hashing (SIGIR'15 Poster)
Regularised Cross-Modal Hashing (SIGIR'15 Poster)Regularised Cross-Modal Hashing (SIGIR'15 Poster)
Regularised Cross-Modal Hashing (SIGIR'15 Poster)
 
Seismic data analysis with u net
Seismic data analysis with u netSeismic data analysis with u net
Seismic data analysis with u net
 
Graph 500 DISLIB powered optimized version
Graph 500 DISLIB powered optimized versionGraph 500 DISLIB powered optimized version
Graph 500 DISLIB powered optimized version
 
GPU Programming
GPU ProgrammingGPU Programming
GPU Programming
 
Slide tesi
Slide tesiSlide tesi
Slide tesi
 
Buckner_GPU-R_BOSC2009
Buckner_GPU-R_BOSC2009Buckner_GPU-R_BOSC2009
Buckner_GPU-R_BOSC2009
 
Japan Lustre User Group 2014
Japan Lustre User Group 2014Japan Lustre User Group 2014
Japan Lustre User Group 2014
 
pMatlab on BlueGene
pMatlab on BlueGenepMatlab on BlueGene
pMatlab on BlueGene
 
Gpu Cuda
Gpu CudaGpu Cuda
Gpu Cuda
 

Similar to Nucleon TMD Contractions in Lattice QCD using QUDA

Intro to Machine Learning for GPUs
Intro to Machine Learning for GPUsIntro to Machine Learning for GPUs
Intro to Machine Learning for GPUs
Sri Ambati
 
Introduction to GPUs for Machine Learning
Introduction to GPUs for Machine LearningIntroduction to GPUs for Machine Learning
Introduction to GPUs for Machine Learning
Sri Ambati
 
Accelerating HPC Applications on NVIDIA GPUs with OpenACC
Accelerating HPC Applications on NVIDIA GPUs with OpenACCAccelerating HPC Applications on NVIDIA GPUs with OpenACC
Accelerating HPC Applications on NVIDIA GPUs with OpenACC
inside-BigData.com
 
Computing using GPUs
Computing using GPUsComputing using GPUs
Computing using GPUs
Shree Kumar
 
FAST MAP PROJECTION ON CUDA.ppt
FAST MAP PROJECTION ON CUDA.pptFAST MAP PROJECTION ON CUDA.ppt
FAST MAP PROJECTION ON CUDA.pptgrssieee
 
Gpu perf-presentation
Gpu perf-presentationGpu perf-presentation
Gpu perf-presentation
GiannisTsagatakis
 
GPGPU Computation
GPGPU ComputationGPGPU Computation
GPGPU Computation
jtsagata
 
PL/CUDA - Fusion of HPC Grade Power with In-Database Analytics
PL/CUDA - Fusion of HPC Grade Power with In-Database AnalyticsPL/CUDA - Fusion of HPC Grade Power with In-Database Analytics
PL/CUDA - Fusion of HPC Grade Power with In-Database Analytics
Kohei KaiGai
 
pgconfasia2016 plcuda en
pgconfasia2016 plcuda enpgconfasia2016 plcuda en
pgconfasia2016 plcuda en
Kohei KaiGai
 
dCUDA: Distributed GPU Computing with Hardware Overlap
 dCUDA: Distributed GPU Computing with Hardware Overlap dCUDA: Distributed GPU Computing with Hardware Overlap
dCUDA: Distributed GPU Computing with Hardware Overlap
inside-BigData.com
 
GPU Accelerated Domain Decomposition
GPU Accelerated Domain DecompositionGPU Accelerated Domain Decomposition
GPU Accelerated Domain Decomposition
Richard Southern
 
Gpu workshop cluster universe: scripting cuda
Gpu workshop cluster universe: scripting cudaGpu workshop cluster universe: scripting cuda
Gpu workshop cluster universe: scripting cuda
Ferdinand Jamitzky
 
An35225228
An35225228An35225228
An35225228
IJERA Editor
 
Recent Progress in SCCS on GPU Simulation of Biomedical and Hydrodynamic Prob...
Recent Progress in SCCS on GPU Simulation of Biomedical and Hydrodynamic Prob...Recent Progress in SCCS on GPU Simulation of Biomedical and Hydrodynamic Prob...
Recent Progress in SCCS on GPU Simulation of Biomedical and Hydrodynamic Prob...
NVIDIA Taiwan
 
Intro2 Cuda Moayad
Intro2 Cuda MoayadIntro2 Cuda Moayad
Intro2 Cuda Moayad
Moayadhn
 
Technology Updates of PG-Strom at Aug-2014 (PGUnconf@Tokyo)
Technology Updates of PG-Strom at Aug-2014 (PGUnconf@Tokyo)Technology Updates of PG-Strom at Aug-2014 (PGUnconf@Tokyo)
Technology Updates of PG-Strom at Aug-2014 (PGUnconf@Tokyo)
Kohei KaiGai
 
DPF 2017: GPUs in LHCb for Analysis
DPF 2017: GPUs in LHCb for AnalysisDPF 2017: GPUs in LHCb for Analysis
DPF 2017: GPUs in LHCb for Analysis
Henry Schreiner
 
Adaptive Linear Solvers and Eigensolvers
Adaptive Linear Solvers and EigensolversAdaptive Linear Solvers and Eigensolvers
Adaptive Linear Solvers and Eigensolvers
inside-BigData.com
 
The reversible residual network
The reversible residual networkThe reversible residual network
The reversible residual network
ThyrixYang1
 
2022-01-17-Rethinking_Bisenet.pptx
2022-01-17-Rethinking_Bisenet.pptx2022-01-17-Rethinking_Bisenet.pptx
2022-01-17-Rethinking_Bisenet.pptx
JAEMINJEONG5
 

Similar to Nucleon TMD Contractions in Lattice QCD using QUDA (20)

Intro to Machine Learning for GPUs
Intro to Machine Learning for GPUsIntro to Machine Learning for GPUs
Intro to Machine Learning for GPUs
 
Introduction to GPUs for Machine Learning
Introduction to GPUs for Machine LearningIntroduction to GPUs for Machine Learning
Introduction to GPUs for Machine Learning
 
Accelerating HPC Applications on NVIDIA GPUs with OpenACC
Accelerating HPC Applications on NVIDIA GPUs with OpenACCAccelerating HPC Applications on NVIDIA GPUs with OpenACC
Accelerating HPC Applications on NVIDIA GPUs with OpenACC
 
Computing using GPUs
Computing using GPUsComputing using GPUs
Computing using GPUs
 
FAST MAP PROJECTION ON CUDA.ppt
FAST MAP PROJECTION ON CUDA.pptFAST MAP PROJECTION ON CUDA.ppt
FAST MAP PROJECTION ON CUDA.ppt
 
Gpu perf-presentation
Gpu perf-presentationGpu perf-presentation
Gpu perf-presentation
 
GPGPU Computation
GPGPU ComputationGPGPU Computation
GPGPU Computation
 
PL/CUDA - Fusion of HPC Grade Power with In-Database Analytics
PL/CUDA - Fusion of HPC Grade Power with In-Database AnalyticsPL/CUDA - Fusion of HPC Grade Power with In-Database Analytics
PL/CUDA - Fusion of HPC Grade Power with In-Database Analytics
 
pgconfasia2016 plcuda en
pgconfasia2016 plcuda enpgconfasia2016 plcuda en
pgconfasia2016 plcuda en
 
dCUDA: Distributed GPU Computing with Hardware Overlap
 dCUDA: Distributed GPU Computing with Hardware Overlap dCUDA: Distributed GPU Computing with Hardware Overlap
dCUDA: Distributed GPU Computing with Hardware Overlap
 
GPU Accelerated Domain Decomposition
GPU Accelerated Domain DecompositionGPU Accelerated Domain Decomposition
GPU Accelerated Domain Decomposition
 
Gpu workshop cluster universe: scripting cuda
Gpu workshop cluster universe: scripting cudaGpu workshop cluster universe: scripting cuda
Gpu workshop cluster universe: scripting cuda
 
An35225228
An35225228An35225228
An35225228
 
Recent Progress in SCCS on GPU Simulation of Biomedical and Hydrodynamic Prob...
Recent Progress in SCCS on GPU Simulation of Biomedical and Hydrodynamic Prob...Recent Progress in SCCS on GPU Simulation of Biomedical and Hydrodynamic Prob...
Recent Progress in SCCS on GPU Simulation of Biomedical and Hydrodynamic Prob...
 
Intro2 Cuda Moayad
Intro2 Cuda MoayadIntro2 Cuda Moayad
Intro2 Cuda Moayad
 
Technology Updates of PG-Strom at Aug-2014 (PGUnconf@Tokyo)
Technology Updates of PG-Strom at Aug-2014 (PGUnconf@Tokyo)Technology Updates of PG-Strom at Aug-2014 (PGUnconf@Tokyo)
Technology Updates of PG-Strom at Aug-2014 (PGUnconf@Tokyo)
 
DPF 2017: GPUs in LHCb for Analysis
DPF 2017: GPUs in LHCb for AnalysisDPF 2017: GPUs in LHCb for Analysis
DPF 2017: GPUs in LHCb for Analysis
 
Adaptive Linear Solvers and Eigensolvers
Adaptive Linear Solvers and EigensolversAdaptive Linear Solvers and Eigensolvers
Adaptive Linear Solvers and Eigensolvers
 
The reversible residual network
The reversible residual networkThe reversible residual network
The reversible residual network
 
2022-01-17-Rethinking_Bisenet.pptx
2022-01-17-Rethinking_Bisenet.pptx2022-01-17-Rethinking_Bisenet.pptx
2022-01-17-Rethinking_Bisenet.pptx
 

More from Christos Kallidonis

Nucleon valence quark distribution functions from Lattice QCD
Nucleon valence quark distribution functions from Lattice QCDNucleon valence quark distribution functions from Lattice QCD
Nucleon valence quark distribution functions from Lattice QCD
Christos Kallidonis
 
The Nucleon Parton Distribution Functions from Lattice QCD
The Nucleon Parton Distribution Functions from Lattice QCDThe Nucleon Parton Distribution Functions from Lattice QCD
The Nucleon Parton Distribution Functions from Lattice QCD
Christos Kallidonis
 
The nucleon electromagnetic form factors at high momentum transfer from Latti...
The nucleon electromagnetic form factors at high momentum transfer from Latti...The nucleon electromagnetic form factors at high momentum transfer from Latti...
The nucleon electromagnetic form factors at high momentum transfer from Latti...
Christos Kallidonis
 
Nucleon electromagnetic form factors at high-momentum transfer from Lattice QCD
Nucleon electromagnetic form factors at high-momentum transfer from Lattice QCDNucleon electromagnetic form factors at high-momentum transfer from Lattice QCD
Nucleon electromagnetic form factors at high-momentum transfer from Lattice QCD
Christos Kallidonis
 
Computing the Nucleon Spin from Lattice QCD
Computing the Nucleon Spin from Lattice QCDComputing the Nucleon Spin from Lattice QCD
Computing the Nucleon Spin from Lattice QCD
Christos Kallidonis
 
Introduction to Hadron Structure from Lattice QCD
Introduction to Hadron Structure from Lattice QCDIntroduction to Hadron Structure from Lattice QCD
Introduction to Hadron Structure from Lattice QCD
Christos Kallidonis
 
Probing nucleon structure from Lattice QCD simulations
Probing nucleon structure from Lattice QCD simulationsProbing nucleon structure from Lattice QCD simulations
Probing nucleon structure from Lattice QCD simulations
Christos Kallidonis
 
Hyperon and charmed baryon masses and axial charges from Lattice QCD
Hyperon and charmed baryon masses and axial charges from Lattice QCDHyperon and charmed baryon masses and axial charges from Lattice QCD
Hyperon and charmed baryon masses and axial charges from Lattice QCD
Christos Kallidonis
 
Hyperon and charm baryon axial charges from Lattice QCD
Hyperon and charm baryon axial charges from Lattice QCDHyperon and charm baryon axial charges from Lattice QCD
Hyperon and charm baryon axial charges from Lattice QCD
Christos Kallidonis
 
Computing the masses of hyperons and charmed baryons from Lattice QCD
Computing the masses of hyperons and charmed baryons from Lattice QCDComputing the masses of hyperons and charmed baryons from Lattice QCD
Computing the masses of hyperons and charmed baryons from Lattice QCD
Christos Kallidonis
 
Hyperon and charm baryons masses from twisted mass Lattice QCD
Hyperon and charm baryons masses from twisted mass Lattice QCDHyperon and charm baryons masses from twisted mass Lattice QCD
Hyperon and charm baryons masses from twisted mass Lattice QCD
Christos Kallidonis
 

More from Christos Kallidonis (11)

Nucleon valence quark distribution functions from Lattice QCD
Nucleon valence quark distribution functions from Lattice QCDNucleon valence quark distribution functions from Lattice QCD
Nucleon valence quark distribution functions from Lattice QCD
 
The Nucleon Parton Distribution Functions from Lattice QCD
The Nucleon Parton Distribution Functions from Lattice QCDThe Nucleon Parton Distribution Functions from Lattice QCD
The Nucleon Parton Distribution Functions from Lattice QCD
 
The nucleon electromagnetic form factors at high momentum transfer from Latti...
The nucleon electromagnetic form factors at high momentum transfer from Latti...The nucleon electromagnetic form factors at high momentum transfer from Latti...
The nucleon electromagnetic form factors at high momentum transfer from Latti...
 
Nucleon electromagnetic form factors at high-momentum transfer from Lattice QCD
Nucleon electromagnetic form factors at high-momentum transfer from Lattice QCDNucleon electromagnetic form factors at high-momentum transfer from Lattice QCD
Nucleon electromagnetic form factors at high-momentum transfer from Lattice QCD
 
Computing the Nucleon Spin from Lattice QCD
Computing the Nucleon Spin from Lattice QCDComputing the Nucleon Spin from Lattice QCD
Computing the Nucleon Spin from Lattice QCD
 
Introduction to Hadron Structure from Lattice QCD
Introduction to Hadron Structure from Lattice QCDIntroduction to Hadron Structure from Lattice QCD
Introduction to Hadron Structure from Lattice QCD
 
Probing nucleon structure from Lattice QCD simulations
Probing nucleon structure from Lattice QCD simulationsProbing nucleon structure from Lattice QCD simulations
Probing nucleon structure from Lattice QCD simulations
 
Hyperon and charmed baryon masses and axial charges from Lattice QCD
Hyperon and charmed baryon masses and axial charges from Lattice QCDHyperon and charmed baryon masses and axial charges from Lattice QCD
Hyperon and charmed baryon masses and axial charges from Lattice QCD
 
Hyperon and charm baryon axial charges from Lattice QCD
Hyperon and charm baryon axial charges from Lattice QCDHyperon and charm baryon axial charges from Lattice QCD
Hyperon and charm baryon axial charges from Lattice QCD
 
Computing the masses of hyperons and charmed baryons from Lattice QCD
Computing the masses of hyperons and charmed baryons from Lattice QCDComputing the masses of hyperons and charmed baryons from Lattice QCD
Computing the masses of hyperons and charmed baryons from Lattice QCD
 
Hyperon and charm baryons masses from twisted mass Lattice QCD
Hyperon and charm baryons masses from twisted mass Lattice QCDHyperon and charm baryons masses from twisted mass Lattice QCD
Hyperon and charm baryons masses from twisted mass Lattice QCD
 

Recently uploaded

DERIVATION OF MODIFIED BERNOULLI EQUATION WITH VISCOUS EFFECTS AND TERMINAL V...
DERIVATION OF MODIFIED BERNOULLI EQUATION WITH VISCOUS EFFECTS AND TERMINAL V...DERIVATION OF MODIFIED BERNOULLI EQUATION WITH VISCOUS EFFECTS AND TERMINAL V...
DERIVATION OF MODIFIED BERNOULLI EQUATION WITH VISCOUS EFFECTS AND TERMINAL V...
Wasswaderrick3
 
GBSN - Biochemistry (Unit 5) Chemistry of Lipids
GBSN - Biochemistry (Unit 5) Chemistry of LipidsGBSN - Biochemistry (Unit 5) Chemistry of Lipids
GBSN - Biochemistry (Unit 5) Chemistry of Lipids
Areesha Ahmad
 
Toxic effects of heavy metals : Lead and Arsenic
Toxic effects of heavy metals : Lead and ArsenicToxic effects of heavy metals : Lead and Arsenic
Toxic effects of heavy metals : Lead and Arsenic
sanjana502982
 
Salas, V. (2024) "John of St. Thomas (Poinsot) on the Science of Sacred Theol...
Salas, V. (2024) "John of St. Thomas (Poinsot) on the Science of Sacred Theol...Salas, V. (2024) "John of St. Thomas (Poinsot) on the Science of Sacred Theol...
Salas, V. (2024) "John of St. Thomas (Poinsot) on the Science of Sacred Theol...
Studia Poinsotiana
 
Unveiling the Energy Potential of Marshmallow Deposits.pdf
Unveiling the Energy Potential of Marshmallow Deposits.pdfUnveiling the Energy Potential of Marshmallow Deposits.pdf
Unveiling the Energy Potential of Marshmallow Deposits.pdf
Erdal Coalmaker
 
Lateral Ventricles.pdf very easy good diagrams comprehensive
Lateral Ventricles.pdf very easy good diagrams comprehensiveLateral Ventricles.pdf very easy good diagrams comprehensive
Lateral Ventricles.pdf very easy good diagrams comprehensive
silvermistyshot
 
GBSN- Microbiology (Lab 3) Gram Staining
GBSN- Microbiology (Lab 3) Gram StainingGBSN- Microbiology (Lab 3) Gram Staining
GBSN- Microbiology (Lab 3) Gram Staining
Areesha Ahmad
 
Deep Behavioral Phenotyping in Systems Neuroscience for Functional Atlasing a...
Deep Behavioral Phenotyping in Systems Neuroscience for Functional Atlasing a...Deep Behavioral Phenotyping in Systems Neuroscience for Functional Atlasing a...
Deep Behavioral Phenotyping in Systems Neuroscience for Functional Atlasing a...
Ana Luísa Pinho
 
extra-chromosomal-inheritance[1].pptx.pdfpdf
extra-chromosomal-inheritance[1].pptx.pdfpdfextra-chromosomal-inheritance[1].pptx.pdfpdf
extra-chromosomal-inheritance[1].pptx.pdfpdf
DiyaBiswas10
 
bordetella pertussis.................................ppt
bordetella pertussis.................................pptbordetella pertussis.................................ppt
bordetella pertussis.................................ppt
kejapriya1
 
Body fluids_tonicity_dehydration_hypovolemia_hypervolemia.pptx
Body fluids_tonicity_dehydration_hypovolemia_hypervolemia.pptxBody fluids_tonicity_dehydration_hypovolemia_hypervolemia.pptx
Body fluids_tonicity_dehydration_hypovolemia_hypervolemia.pptx
muralinath2
 
Deep Software Variability and Frictionless Reproducibility
Deep Software Variability and Frictionless ReproducibilityDeep Software Variability and Frictionless Reproducibility
Deep Software Variability and Frictionless Reproducibility
University of Rennes, INSA Rennes, Inria/IRISA, CNRS
 
nodule formation by alisha dewangan.pptx
nodule formation by alisha dewangan.pptxnodule formation by alisha dewangan.pptx
nodule formation by alisha dewangan.pptx
alishadewangan1
 
Observation of Io’s Resurfacing via Plume Deposition Using Ground-based Adapt...
Observation of Io’s Resurfacing via Plume Deposition Using Ground-based Adapt...Observation of Io’s Resurfacing via Plume Deposition Using Ground-based Adapt...
Observation of Io’s Resurfacing via Plume Deposition Using Ground-based Adapt...
Sérgio Sacani
 
Seminar of U.V. Spectroscopy by SAMIR PANDA
 Seminar of U.V. Spectroscopy by SAMIR PANDA Seminar of U.V. Spectroscopy by SAMIR PANDA
Seminar of U.V. Spectroscopy by SAMIR PANDA
SAMIR PANDA
 
Chapter 12 - climate change and the energy crisis
Chapter 12 - climate change and the energy crisisChapter 12 - climate change and the energy crisis
Chapter 12 - climate change and the energy crisis
tonzsalvador2222
 
In silico drugs analogue design: novobiocin analogues.pptx
In silico drugs analogue design: novobiocin analogues.pptxIn silico drugs analogue design: novobiocin analogues.pptx
In silico drugs analogue design: novobiocin analogues.pptx
AlaminAfendy1
 
general properties of oerganologametal.ppt
general properties of oerganologametal.pptgeneral properties of oerganologametal.ppt
general properties of oerganologametal.ppt
IqrimaNabilatulhusni
 
erythropoiesis-I_mechanism& clinical significance.pptx
erythropoiesis-I_mechanism& clinical significance.pptxerythropoiesis-I_mechanism& clinical significance.pptx
erythropoiesis-I_mechanism& clinical significance.pptx
muralinath2
 
Nucleic Acid-its structural and functional complexity.
Nucleic Acid-its structural and functional complexity.Nucleic Acid-its structural and functional complexity.
Nucleic Acid-its structural and functional complexity.
Nistarini College, Purulia (W.B) India
 

Recently uploaded (20)

DERIVATION OF MODIFIED BERNOULLI EQUATION WITH VISCOUS EFFECTS AND TERMINAL V...
DERIVATION OF MODIFIED BERNOULLI EQUATION WITH VISCOUS EFFECTS AND TERMINAL V...DERIVATION OF MODIFIED BERNOULLI EQUATION WITH VISCOUS EFFECTS AND TERMINAL V...
DERIVATION OF MODIFIED BERNOULLI EQUATION WITH VISCOUS EFFECTS AND TERMINAL V...
 
GBSN - Biochemistry (Unit 5) Chemistry of Lipids
GBSN - Biochemistry (Unit 5) Chemistry of LipidsGBSN - Biochemistry (Unit 5) Chemistry of Lipids
GBSN - Biochemistry (Unit 5) Chemistry of Lipids
 
Toxic effects of heavy metals : Lead and Arsenic
Toxic effects of heavy metals : Lead and ArsenicToxic effects of heavy metals : Lead and Arsenic
Toxic effects of heavy metals : Lead and Arsenic
 
Salas, V. (2024) "John of St. Thomas (Poinsot) on the Science of Sacred Theol...
Salas, V. (2024) "John of St. Thomas (Poinsot) on the Science of Sacred Theol...Salas, V. (2024) "John of St. Thomas (Poinsot) on the Science of Sacred Theol...
Salas, V. (2024) "John of St. Thomas (Poinsot) on the Science of Sacred Theol...
 
Unveiling the Energy Potential of Marshmallow Deposits.pdf
Unveiling the Energy Potential of Marshmallow Deposits.pdfUnveiling the Energy Potential of Marshmallow Deposits.pdf
Unveiling the Energy Potential of Marshmallow Deposits.pdf
 
Lateral Ventricles.pdf very easy good diagrams comprehensive
Lateral Ventricles.pdf very easy good diagrams comprehensiveLateral Ventricles.pdf very easy good diagrams comprehensive
Lateral Ventricles.pdf very easy good diagrams comprehensive
 
GBSN- Microbiology (Lab 3) Gram Staining
GBSN- Microbiology (Lab 3) Gram StainingGBSN- Microbiology (Lab 3) Gram Staining
GBSN- Microbiology (Lab 3) Gram Staining
 
Deep Behavioral Phenotyping in Systems Neuroscience for Functional Atlasing a...
Deep Behavioral Phenotyping in Systems Neuroscience for Functional Atlasing a...Deep Behavioral Phenotyping in Systems Neuroscience for Functional Atlasing a...
Deep Behavioral Phenotyping in Systems Neuroscience for Functional Atlasing a...
 
extra-chromosomal-inheritance[1].pptx.pdfpdf
extra-chromosomal-inheritance[1].pptx.pdfpdfextra-chromosomal-inheritance[1].pptx.pdfpdf
extra-chromosomal-inheritance[1].pptx.pdfpdf
 
bordetella pertussis.................................ppt
bordetella pertussis.................................pptbordetella pertussis.................................ppt
bordetella pertussis.................................ppt
 
Body fluids_tonicity_dehydration_hypovolemia_hypervolemia.pptx
Body fluids_tonicity_dehydration_hypovolemia_hypervolemia.pptxBody fluids_tonicity_dehydration_hypovolemia_hypervolemia.pptx
Body fluids_tonicity_dehydration_hypovolemia_hypervolemia.pptx
 
Deep Software Variability and Frictionless Reproducibility
Deep Software Variability and Frictionless ReproducibilityDeep Software Variability and Frictionless Reproducibility
Deep Software Variability and Frictionless Reproducibility
 
nodule formation by alisha dewangan.pptx
nodule formation by alisha dewangan.pptxnodule formation by alisha dewangan.pptx
nodule formation by alisha dewangan.pptx
 
Observation of Io’s Resurfacing via Plume Deposition Using Ground-based Adapt...
Observation of Io’s Resurfacing via Plume Deposition Using Ground-based Adapt...Observation of Io’s Resurfacing via Plume Deposition Using Ground-based Adapt...
Observation of Io’s Resurfacing via Plume Deposition Using Ground-based Adapt...
 
Seminar of U.V. Spectroscopy by SAMIR PANDA
 Seminar of U.V. Spectroscopy by SAMIR PANDA Seminar of U.V. Spectroscopy by SAMIR PANDA
Seminar of U.V. Spectroscopy by SAMIR PANDA
 
Chapter 12 - climate change and the energy crisis
Chapter 12 - climate change and the energy crisisChapter 12 - climate change and the energy crisis
Chapter 12 - climate change and the energy crisis
 
In silico drugs analogue design: novobiocin analogues.pptx
In silico drugs analogue design: novobiocin analogues.pptxIn silico drugs analogue design: novobiocin analogues.pptx
In silico drugs analogue design: novobiocin analogues.pptx
 
general properties of oerganologametal.ppt
general properties of oerganologametal.pptgeneral properties of oerganologametal.ppt
general properties of oerganologametal.ppt
 
erythropoiesis-I_mechanism& clinical significance.pptx
erythropoiesis-I_mechanism& clinical significance.pptxerythropoiesis-I_mechanism& clinical significance.pptx
erythropoiesis-I_mechanism& clinical significance.pptx
 
Nucleic Acid-its structural and functional complexity.
Nucleic Acid-its structural and functional complexity.Nucleic Acid-its structural and functional complexity.
Nucleic Acid-its structural and functional complexity.
 

Nucleon TMD Contractions in Lattice QCD using QUDA

  • 1. Nucleon TMD Contractions in Lattice QCD using QUDA1 Christos Kallidonis, Sergey Syritsyn X Y key=vl3_xxx_bl2_YY Nucleon EDMs on a Lattice 
 at the Physical Point Sergey N. Syritsyn, Stony Brook University & RIKEN / BNL Research Center together with LHP and RBC collaborations LATTICE 2018 East Lansing, MI, July 22-28, 2018 Courtesy of BMW Collaboration GPU Hackathon Brookhaven National Laboratory Sep. 17-21, 2018 Progress Report Mentors: Kate Clark, Mathias Wagner 1 https://github.com/lattice/quda with GPU Lattice team: C. Jung, M. Lin, D. Howarth, J. Tu, B. Wang, D. Guo
  • 2. Problem at hand Degrees of freedom: • (local) volume sites: x = 1,…,512K • Ns spin: α,β = 1,…,4 • Nc color: a, b = 1,…,3 • Vector index: k = 1,…,12 • Γ-matrix index: i = 1,…,16 • Complex numbers! x2 # cplx multiply-add / site: N2 c N2 s ⇥ (1 + NcNs) + N3 s 15104 Flops (2NcNs)2 + N2 c ) ⇤ cplx = 4752 Bytes N2 s ⇤ cplx = 256 Bytes Inp. mem/site: Out. mem/site: ⇥ = Uba (x) wk (x)a ↵ Wk (x)b ↵ ⇥ v? k (x)b = + + ⇥ C(i) (x) Fk (x)↵ G(x)↵ = X k Fk (x)↵ G(x)↵ (i) ↵ = C(i) (x) = X k X ↵, ,a,b (i) ↵U(x)ba wk (x)a ↵v? k (x)b
  • 3. Kernel optimization Iteration-0: • assign 1 thr/site • loop over, a, b, α, β • sum over k • perform trace Iteration-1: • QUDA: block/grid auto-tuning functionality ⇥ = Uba (x) wk (x)a ↵ Wk (x)b ↵ ⇥ v? k (x)b = + + ⇥ C(i) (x) Fk (x)↵ G(x)↵ = X k Fk (x)↵ G(x)↵ (i) ↵ Can do better than that! Performance per GPU (1/2 K80): ~ 6 GFlop/s Memory Bandwidth: ~ 1.9 GB/s Kernel exec. cost: 6 GPU*sec —> Dominant part of workflow Nvidia Visual profiler: Thanks, Mathias! C(i) (x) = X k X ↵, ,a,b (i) ↵U(x)ba wk (x)a ↵v? k (x)b
  • 4. Kernel optimization Iteration-2: • move required buffers to shared memory • extend the block dim. to 3d - assign color/spin indices to individual threads • #pragma unroll the (remaining) loops • inline relevant functions involving Γ-matrices Kernel exec. cost: 5.2 GPU*sec, x1.15 impr. Profiler still complains about very high local memory overhead… ⇥ = Uba (x) wk (x)a ↵ Wk (x)b ↵ ⇥ v? k (x)b = + + ⇥ C(i) (x) Fk (x)↵ G(x)↵ = X k Fk (x)↵ G(x)↵ (i) ↵ C(i) (x) = X k X ↵, ,a,b (i) ↵U(x)ba wk (x)a ↵v? k (x)b
  • 5. Kernel optimization Iteration-3: • Move Γ-matrices to constant memory, did the trick. Thanks, Kate! —> compiler could not resolve array indexing, buffers spilled to local memory QUDA auto-tuner report: Performance: 205 Gflop/s Memory BW: 65 GB/s Kernel exec. cost: 0.16 GPU*sec (to compare with 5.2 GPU*sec) —> Now only 4% of workflow x32 improvement!! On-going work: • can we squeeze more Flop/s ? • optimize communication-intensive code segments • experiment with env. variables • update/optimize the rest of contraction kernels ⇥ = Uba (x) wk (x)a ↵ Wk (x)b ↵ ⇥ v? k (x)b = + + ⇥ C(i) (x) Fk (x)↵ G(x)↵ = X k Fk (x)↵ G(x)↵ (i) ↵ C(i) (x) = X k X ↵, ,a,b (i) ↵U(x)ba wk (x)a ↵v? k (x)b