Enhanced MPSM3 for applications to quantum biological simulations

© 2016 IBM Corporation
Enhanced MPSM3 for
applications to quantum
biological simulations
Cristiano Malossi, IBM Research - Zurich
A. Pozdneev, V. Weber, T. Laino, C. Bekas, A. Curioni, IBM Research - Zurich

© 2016 IBM Corporation2
IBM Research
Motivations
Applications of quantum Hamiltonians to biological systems is limited by the cost
of performing long calculations on large systems (+30K atoms).
Classical forcefields and QM/MM are good for conformational changes and
localized reactions, respectively. Thus the need for developing scalable algorithms
that allow the applications of quantum Hamiltonians to biological systems, to:
NADH:ubiquinone oxidoreductase
succinate dehydrogenase
large scale ion motion large scale electron transfer

IBM Research
Outlook and Goal
Goal: Design an efficient parallel sparse matrix-matrix multiply.
 Introduction: Born-Oppenheimer molecular dynamics.
 Parallelization: midpoint-based parallel sparse matrix-matrix
multiplication for matrices with decay.
 Benchmark: weak and strong scaling, and communication volume
on BlueGene/Q1
.
 Summary
1
IBM and Blue Gene are trademarks of International Business Machines Corporation, registered in many jurisdictions worldwide. Other product and service
names might be trademarks of IBM or other companies.

IBM Research
Introduction: Born-Oppenheimer molecular
dynamics
The core operation of the SCF iterations is the sparse matrix-matrix
multiplication.
Each SCF iteration requires the construction of the density matrix.
Each MD step requires U to be calculated at the relaxed ground
state electronic density.

IBM Research
Parallel sparse matrix-matrix multiplication
atoms
cell
 Atoms in the simulation cell.

IBM Research
box
 Simulation cell divided into boxes. Each box and its atoms are owned by
a process.

IBM Research
i
k
 These two atoms are owned by different processes.

IBM Research
i
k
+
Aik
 The matrix block Aij
is owned by the process where the midpoint
(distance) resides.

IBM Research
i
k
+
Aik
 The matrix block Aik
is owned by the process where the midpoint
(distance) resides.

IBM Research
i
k
+
Aik
j
Bkj
+
 Another matrix block Bkj
.

IBM Research
i
k
j+
Cij
 The result Cij
of the product Aik
Bkj
is owned by the process where the
midpoint between i and j resides.

IBM Research
 Blocks Aik
and Bkj
are sent to the process that owns Cij
and multiplication
takes place. Blocks are sent along x, y and z.
i
k
Aik
j
Bkj
+
Cij
= Cij
+ Aik
Bkj

IBM Research
Improved MPSM3
 The process that owns the midcell performs the multiplication.
+
x
x

IBM Research
Improved MPSM3
 All blocks A and B are sent to the process that owns the midcell and
multiplication takes place. Blocks are sent along x, y and z.
A**
B**
+
C = C + AB
x
x

IBM Research
Improved MPSM3
 Process that does the multiplication needs to redistribute the results to
neighbors processes. Blocks are sent along x, y and z.
+
Ci'j'
+
i
j i'
j'
Cij

IBM Research
Improved MPSM3
Redistribution of the
computed matrix
Exchange of local
matrices
Local products

IBM Research
Benchmark: weak scaling
 Time per DM build wrt MPI
tasks, PM6
 About 19 waters per task
 Parallel efficiency: 92% at
110592 MPI tasks (2.1M
waters)
 Nbr non-zero elements:
1.6k/water (O1) and
1.0k/water (O2)

IBM Research
Benchmark: weak scaling
tasks, PM6
 Parallel efficiency: 92% at
110592 MPI tasks (2.1M
waters)
1.6k/water (O1) and
1.0k/water (O2)
Constant walltime with proportional resources

IBM Research
Benchmark: weak scaling (improved MPSM3)
tasks, PM6
1.6k/water ~10x

IBM Research
Benchmark: weak scaling (improved MPSM3)
tasks, PM6
1.6k/water
Improved MPSM3 competes already with libdbcsr for small
system/MPI tasks ratio
https://dbcsr.cp2k.org/

IBM Research
Benchmark: strong scaling
tasks, PM6
 110k (S1), 373k (S2) and
1124k (S3) waters

IBM Research
Benchmark: strong scaling
tasks, PM6
 110k (S1), 373k (S2) and
1124k (S3) waters
Largest system:
Matrix dimensions: 6749184 x 6749184
Non-zero: 3.9E-3%
Nbr. multiplies: 17
Sparsity boost vs dense: 42760 x

IBM Research
Benchmark: strong scaling (improved MPSM3)
tasks, PM6
 32k (S0), 110k (S1)

IBM Research
Benchmark: communication volume
 Total communication
volume (Isend/Irecv) per
DM build wrt MPI tasks
 110k (S1), 373k (S2) and
1124k (S3) waters
 BlueGene/Q

IBM Research
Summary
The MPSM3 and its improved version shows (1 push per direction):
 close to perfect weak scaling
 very good strong scaling
 communication volume decreases as nbr task increases
 fewer logistic operations (improved version)
Providing proportional resources, we can perform a MD step in
about few dozen of seconds regardless of system size.

IBM Research
radius
 Interaction of an atom with its neighbors.

IBM Research
References
 SEMD I: Midpoint-based parallel sparse matrix-matrix multiplication algorithm for
matrices with decay
Valéry Weber, Teodoro Laino, Alexander Pozdneev, Irina Fedulova, and Alessandro Curioni
Journal of Chemical Theory and Computation 2015 11 (7), 3145-3152
https://doi.org/10.1021/acs.jctc.5b00382
 Enhanced MPSM3 for Applications to Quantum Biological Simulations
Alexander Pozdneev, Valéry Weber, Teodoro Laino, Costas Bekas, and Alessandro Curioni
In Proceedings of SC16: The International Conference for High Performance Computing,
Networking, Storage and Analysis, Salt Lake City, Utah, November 13-18, 2016, Article no. 9
https://dl.acm.org/citation.cfm?id=3014916

Enhanced MPSM3 for applications to quantum biological simulations

More Related Content

What's hot

Similar to Enhanced MPSM3 for applications to quantum biological simulations

More from Alexander Pozdneev

Recently uploaded

Enhanced MPSM3 for applications to quantum biological simulations