QuantumChemistry500 
NAKATA Maho (RIKEN) 
ISHIMURA Kazuya (IMS) 
HIRANO Toshiyuki (U of Tokyo) 
Jeff HAMMOND (Intel) 
2014/11/18 Tue Nov 18 @ 12:15pm-1:15pm Room 295
Role of HPC Benchmarks 
• Represent important applications. 
• Based upon simple codes that non-experts 
(especially vendors) can optimize them. 
• Be somewhat orthogonal to each other. 
• Stress computer hardware in interesting ways. 
• Allow for objective comparison between 
different computing platforms.
Current HPC Benchmarks 
• Top500 = HPL: LU factorization (just DGEMM?). 
• Graph500: Non-numerical benchmark. 
• HPCG: Conjugate gradient PDE solver for simple 
stencil. 
• HPGMG: Geometric multigride PDE solver. 
• HPCChallenge: Collection of benchmarks. 
• STREAM 
• Scalable Synthetic Compact Applications (SSCA) 
• DOE Mini-apps 
• ...
Quantum Chemistry in HPC 
• QC/DFT major component of scientific workloads. 
Many QC apps are 
built by users and 
un-tracked. 
Figure courtesy of Richard Gerber (NERSC) 
VASP and 
NWChem build 
matrix differently. 
QC500 represents 
harder way.
What is QuantumChemistry 500? 
• Very different properties than existing benchmarks: 
– nontrivial load-balancing (irregular, dynamic tasks) 
– small- to mid-sized messages (between 8B and 100KB) 
– nontrivial to vectorize (short SIMD) 
– balance of memory- and compute- intensive 
– kernels contain branching 
– modest dense linear algebra (not HPL-sized) 
• Allows many implementations as long as same numerically. 
– Easier entry for novel hardware (VHDL impl???). 
• Optimized implementations already exist: 
NWChem, …, GTFOCK (OpenMP/SSE/AVX), TeraChem (GPU)
What is QuantumChemistry 500? 
• Chemistry-specific benchmark targeting most common 
method(s). Initial target is Hartree-Fock SCF (DFT-like). 
• Science-driven, scale-invariant focus: 
Performance per node/watt/etc… 
• Allows different algorithms and software as long as the 
answer is the same. 
• Building upon existing HPC codes for initial data; 
encourage new optimized code development. 
• Exercise hardware using challenging kernels not captured 
by any existing benchmark. 
• Avoid Goodhart’s Law (A machine built just for QC500 will 
be good at many things...)
Hartree-Fock/SCF/DFT Theory 
This is the classic algorithm; variations exist. 
● Formation of matrix is irregular. 
● Matrix elements highly non-trivial 
(3+ methods exist). 
● Diagonalize via GEVP or DMM. 
Quantum chemists have 
implemented many algorithms 
in many software packages 
and yet it is possible to obtain 
numerical consistency between 
codes!
Reference inputs 
- Insulin 
- http://www.pdb.org/pdb/101/motm.do?momID=14 
- PDBID: 2HIU 
- 51 AA 
- Antifreeze Proteins 
- http://www.rcsb.org/pdb/explore.do?structureId=2PNE 
- PDBID: 2PNE 
- 70 AA 
- Ubiquitin 
- http://www.pdb.org/pdb/101/motm.do?momID=60 
- PDBID: 1UBQ 
- 76 AA 
- HIV-1 Protease 
- http://www.pdb.org/pdb/101/motm.do?momID=6 
- PDBID: 7HVP 
- 99 AA 
2HIU : Insulin 
2PNE : antifreeze Protein
Input Specification 
• Given data as reference 
– Atomic coordinates, molecule charge and spin. 
– Basis set (cc-pVXZ – to allow for a range of problem sizes) 
– Sample input files for several quantum chemistry packages 
to enable data collection by operators. 
• Requirements for accuracy/precision of final 
results. 
• Hartree-Fock and DFT (B3LYP) 
• Other methods under investigation for the future.
Implementations 
• ACESIII 
• CFOUR 
• Dalton 
• FireFly 
• GAMESS 
• Gaussian 
• GTFock 
• Molpro 
• Molcas 
• NWChem 
• ORCA 
• ProteinDF 
• Psi4 
• QChem 
• SMASH 
• TeraChem 
• TurboMole 
• etc. 
Submissions must include detailed 
algorithmic and implementation 
specification sufficient for reproduction 
in a different implementation. 
Numerical tolerances must be 
documented. 
The best specification includes complete 
source code.
Reference Results 
● We will use GAMESS, NWChem, and ProteinDF to 
generate reference energy values. 
>>> They must agree to be a valid reference! 
● These codes are free, parallel and widely supported 
by HPC folks. Involved in many procurements so 
vendors are familiar. 
● Reference codes do not have a lot of approximations 
by default (no linear-scaling tricks).
Conditions for valid result 
• Converged total energy should match our 
reference value to six (?) decimal places in 
atomic unit. 
• Converged orbital energies should match our 
reference value to three (?) decimal places in 
atomic unit. 
• These criteria are open for debate. Some may 
argue for higher accuracy...
Results to be submitted 
• Elapsed time 
• Which program package is used & Input file 
– Changes from default (e.g., cutoff value) 
– Details of implementation and algorithms 
• Output file 
– Total energy and orbital energies 
• Machine configuration 
– CPU, memory, network, storage and their peak 
(theoretical) performance 
• All of above info will be open to the public
TODO until next year 
• Forming steering committee of scientific 
experts with deep HPC knowledge: 
• academia/national labs 
• industry (IBM, NVIDIA, …) 
• Digital presence 
• Reference data and validation infrastructure 
• The first benchmark results on several 
supercomputers.

QuantumChemistry500

  • 1.
    QuantumChemistry500 NAKATA Maho(RIKEN) ISHIMURA Kazuya (IMS) HIRANO Toshiyuki (U of Tokyo) Jeff HAMMOND (Intel) 2014/11/18 Tue Nov 18 @ 12:15pm-1:15pm Room 295
  • 2.
    Role of HPCBenchmarks • Represent important applications. • Based upon simple codes that non-experts (especially vendors) can optimize them. • Be somewhat orthogonal to each other. • Stress computer hardware in interesting ways. • Allow for objective comparison between different computing platforms.
  • 3.
    Current HPC Benchmarks • Top500 = HPL: LU factorization (just DGEMM?). • Graph500: Non-numerical benchmark. • HPCG: Conjugate gradient PDE solver for simple stencil. • HPGMG: Geometric multigride PDE solver. • HPCChallenge: Collection of benchmarks. • STREAM • Scalable Synthetic Compact Applications (SSCA) • DOE Mini-apps • ...
  • 4.
    Quantum Chemistry inHPC • QC/DFT major component of scientific workloads. Many QC apps are built by users and un-tracked. Figure courtesy of Richard Gerber (NERSC) VASP and NWChem build matrix differently. QC500 represents harder way.
  • 5.
    What is QuantumChemistry500? • Very different properties than existing benchmarks: – nontrivial load-balancing (irregular, dynamic tasks) – small- to mid-sized messages (between 8B and 100KB) – nontrivial to vectorize (short SIMD) – balance of memory- and compute- intensive – kernels contain branching – modest dense linear algebra (not HPL-sized) • Allows many implementations as long as same numerically. – Easier entry for novel hardware (VHDL impl???). • Optimized implementations already exist: NWChem, …, GTFOCK (OpenMP/SSE/AVX), TeraChem (GPU)
  • 6.
    What is QuantumChemistry500? • Chemistry-specific benchmark targeting most common method(s). Initial target is Hartree-Fock SCF (DFT-like). • Science-driven, scale-invariant focus: Performance per node/watt/etc… • Allows different algorithms and software as long as the answer is the same. • Building upon existing HPC codes for initial data; encourage new optimized code development. • Exercise hardware using challenging kernels not captured by any existing benchmark. • Avoid Goodhart’s Law (A machine built just for QC500 will be good at many things...)
  • 7.
    Hartree-Fock/SCF/DFT Theory Thisis the classic algorithm; variations exist. ● Formation of matrix is irregular. ● Matrix elements highly non-trivial (3+ methods exist). ● Diagonalize via GEVP or DMM. Quantum chemists have implemented many algorithms in many software packages and yet it is possible to obtain numerical consistency between codes!
  • 8.
    Reference inputs -Insulin - http://www.pdb.org/pdb/101/motm.do?momID=14 - PDBID: 2HIU - 51 AA - Antifreeze Proteins - http://www.rcsb.org/pdb/explore.do?structureId=2PNE - PDBID: 2PNE - 70 AA - Ubiquitin - http://www.pdb.org/pdb/101/motm.do?momID=60 - PDBID: 1UBQ - 76 AA - HIV-1 Protease - http://www.pdb.org/pdb/101/motm.do?momID=6 - PDBID: 7HVP - 99 AA 2HIU : Insulin 2PNE : antifreeze Protein
  • 9.
    Input Specification •Given data as reference – Atomic coordinates, molecule charge and spin. – Basis set (cc-pVXZ – to allow for a range of problem sizes) – Sample input files for several quantum chemistry packages to enable data collection by operators. • Requirements for accuracy/precision of final results. • Hartree-Fock and DFT (B3LYP) • Other methods under investigation for the future.
  • 10.
    Implementations • ACESIII • CFOUR • Dalton • FireFly • GAMESS • Gaussian • GTFock • Molpro • Molcas • NWChem • ORCA • ProteinDF • Psi4 • QChem • SMASH • TeraChem • TurboMole • etc. Submissions must include detailed algorithmic and implementation specification sufficient for reproduction in a different implementation. Numerical tolerances must be documented. The best specification includes complete source code.
  • 11.
    Reference Results ●We will use GAMESS, NWChem, and ProteinDF to generate reference energy values. >>> They must agree to be a valid reference! ● These codes are free, parallel and widely supported by HPC folks. Involved in many procurements so vendors are familiar. ● Reference codes do not have a lot of approximations by default (no linear-scaling tricks).
  • 12.
    Conditions for validresult • Converged total energy should match our reference value to six (?) decimal places in atomic unit. • Converged orbital energies should match our reference value to three (?) decimal places in atomic unit. • These criteria are open for debate. Some may argue for higher accuracy...
  • 13.
    Results to besubmitted • Elapsed time • Which program package is used & Input file – Changes from default (e.g., cutoff value) – Details of implementation and algorithms • Output file – Total energy and orbital energies • Machine configuration – CPU, memory, network, storage and their peak (theoretical) performance • All of above info will be open to the public
  • 14.
    TODO until nextyear • Forming steering committee of scientific experts with deep HPC knowledge: • academia/national labs • industry (IBM, NVIDIA, …) • Digital presence • Reference data and validation infrastructure • The first benchmark results on several supercomputers.