1
Deep Learning for Fast Simulation
HNSciCloud M-PIL-3.2 meeting
June 2018
S. Vallecorsa F.Carminati G. Khattak
2
Our objective
• Activities on-going to speedup Monte Carlo techniques
• Not enough to cope with HL-LHC expected needs
• Current fast simulation solutions are detector dependent
• A general fast simulation tool based on Machine
Learning/Deep Learning
• Optimizing training time becomes crucial
Improved, efficient and accurate fast simulation
2
3
Requirements
Precise simulation results
Detailed validation process
A fast inference step
Generic customizable tool
Easy-to-use and easily extensible framework
Large hyper-parameters scans and meta-optimisation:
Training time under control
Scalability
Possibility to work across platforms
3
4
Generator G generates data from random noise
Discriminator D learns how to distinguish real data
from generated data
4
Simultaneously train two networks that compete and cooperate with each other
Generative adversarial networks
arXiv:1406.2661v1	
Image source:
The (blind) counterfeiter/detective case
Counterfeiter shows the Monalisa
Detective says it is fake and gives feedback
Counterfeiter makes new Monalisa based on feedback
Iterate until detective is fooled
https://arxiv.org/pdf/1701.00160v1.pdf
5
Generated images
Interpret detector output as a 3D image
5
GAN	generated	electron	
shower
Y	moment	(width)
Average	shower	
section
3D convolutional GAN generate realistic detector output
Customized architecture (includes auxiliary regression tasks)
Agreement to standard Monte Carlo in terms of physics is remarkable!
Energy	fraction	measured	by	the	calorimeter	
on Caltech ibanks GPU cluster thanks to Prof M. Spiropulu
6
Distributed training is needed
Inference:
Monte Carlo: 17 s/particle vs 3DGAN: 7 ms/particle
è speedup factor > 2500 on CPU!!
Training:
45 min/epoch on a NVIDIA P100
Introduce data parallel training using mpi-learn
(Elastic Averaging Stochastic Gradient Descent)
Computing performance
Calorimeter energy
response:
GAN prediction stays
stable through 20
nodes!
Strong scaling measured
at CSCS Swiss National
Super Computing Center
(J-R. Vlimant)
Time	to	create	an	electron	shower
Method Machine
Time/Shower
(msec)
Full	Simulation	
(geant4)
Intel	Xeon	Platinum	
8180
17000
3d	GAN
(batch	size	128)
Intel	Xeon	Platinum	
8180
7
3d	GAN
(batchsize 128)
P100 0.04
7
DL with the HNSciCloud
First tests during prototype (2017)
Single GPU training benchmark ( RHEA, T-Systems,
IBM)
P100 (RHEA - Exoscale) vs K80 (IBM)
Current tests
MPI based distributed training (ssh/TCP)
Local input storage
Single GPU per node
Comparison to HPC environment
Trials with HTCondor on Exoscale cloud (5 VMs)
(still under investigation) 2
2 P100 T-Systems
(CSCS)
8
Next steps
Continue with tests/optimisation:
• Schedulers (SLURM)
• Input storage options
• GPU/node configuration
• Possibility to combine GPUs from different resources
Additional GPUs are needed
First results are very promising
8
9
Thanks!
Questions?

Deep Learning for Fast Simulation

  • 1.
    1 Deep Learning forFast Simulation HNSciCloud M-PIL-3.2 meeting June 2018 S. Vallecorsa F.Carminati G. Khattak
  • 2.
    2 Our objective • Activitieson-going to speedup Monte Carlo techniques • Not enough to cope with HL-LHC expected needs • Current fast simulation solutions are detector dependent • A general fast simulation tool based on Machine Learning/Deep Learning • Optimizing training time becomes crucial Improved, efficient and accurate fast simulation 2
  • 3.
    3 Requirements Precise simulation results Detailedvalidation process A fast inference step Generic customizable tool Easy-to-use and easily extensible framework Large hyper-parameters scans and meta-optimisation: Training time under control Scalability Possibility to work across platforms 3
  • 4.
    4 Generator G generatesdata from random noise Discriminator D learns how to distinguish real data from generated data 4 Simultaneously train two networks that compete and cooperate with each other Generative adversarial networks arXiv:1406.2661v1 Image source: The (blind) counterfeiter/detective case Counterfeiter shows the Monalisa Detective says it is fake and gives feedback Counterfeiter makes new Monalisa based on feedback Iterate until detective is fooled https://arxiv.org/pdf/1701.00160v1.pdf
  • 5.
    5 Generated images Interpret detectoroutput as a 3D image 5 GAN generated electron shower Y moment (width) Average shower section 3D convolutional GAN generate realistic detector output Customized architecture (includes auxiliary regression tasks) Agreement to standard Monte Carlo in terms of physics is remarkable! Energy fraction measured by the calorimeter on Caltech ibanks GPU cluster thanks to Prof M. Spiropulu
  • 6.
    6 Distributed training isneeded Inference: Monte Carlo: 17 s/particle vs 3DGAN: 7 ms/particle è speedup factor > 2500 on CPU!! Training: 45 min/epoch on a NVIDIA P100 Introduce data parallel training using mpi-learn (Elastic Averaging Stochastic Gradient Descent) Computing performance Calorimeter energy response: GAN prediction stays stable through 20 nodes! Strong scaling measured at CSCS Swiss National Super Computing Center (J-R. Vlimant) Time to create an electron shower Method Machine Time/Shower (msec) Full Simulation (geant4) Intel Xeon Platinum 8180 17000 3d GAN (batch size 128) Intel Xeon Platinum 8180 7 3d GAN (batchsize 128) P100 0.04
  • 7.
    7 DL with theHNSciCloud First tests during prototype (2017) Single GPU training benchmark ( RHEA, T-Systems, IBM) P100 (RHEA - Exoscale) vs K80 (IBM) Current tests MPI based distributed training (ssh/TCP) Local input storage Single GPU per node Comparison to HPC environment Trials with HTCondor on Exoscale cloud (5 VMs) (still under investigation) 2 2 P100 T-Systems (CSCS)
  • 8.
    8 Next steps Continue withtests/optimisation: • Schedulers (SLURM) • Input storage options • GPU/node configuration • Possibility to combine GPUs from different resources Additional GPUs are needed First results are very promising 8
  • 9.