Sustainable Software for Computational Chemistry and Materials Modeling


Published on

Presented at the 1st Workshop on Maintainable Software Practices in e-Science, Chicago, 9 October 2012. Co-located with e-Science 2012.

  • Be the first to comment

  • Be the first to like this

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

Sustainable Software for Computational Chemistry and Materials Modeling

  2. 2. Outline Overview Challenges and Current State Overcoming the Barriers—technical and cultural
  3. 3. Computational Chemistry  Long history – over 50 years  Underpins broad array of scientific applications—Grand Challenges  Efficient combustion systems  Drug design  Understanding biological systems  Semiconductor design  Water sustainability  CO2 Sequestration  Efficient lighting (quantum dots) …  Full partner with experiments—”computational experiments” may be more reliable than lab measurements
  4. 4. Scientific Software Innovation Institute for ComputationalChemistry and Materials Modeling -- S2I2C2M2  Collaboration between computational chemists, computer scientists, applied mathematicians, and computer engineers  Goal: overcome obstacles of algorithms and culture and change the nature of computational chemistry software development.  Year long conceptualization phase has been funded by NSF  First meeting scheduled for January 2013
  5. 5. PeopleDaniel Crawford (Virginia Tech) Vijay Pande (Stanford)Robert Harrison (Tennessee, ORNL) Manish Parashar (Rutgers)Anna I. Krylov (U.S.C.) Ram Ramanujam (LSU)Theresa Windus (Iowa State) Beverly Sanders (Florida)Emily Carter (Princeton) Bernhard Schlegel (Wayne State)Edmund Chow (Georgia Tech) David Sherrill (Georgia Tech)Erik Deumens (Florida) Lyudmila Slipchenko (Purdue)Mark Gordon (Iowa State) Masha Sosonkina (Iowa State)Martin Head-Gordon (Berkeley) Edward Valeev (Virginia Tech)Todd Martinez (Stanford), Ross Walker (San DiegoDavid McDowell (Georgia Tech) Supercomputing Center) + others
  6. 6. Current State of ComputationalChemistry Long history--legacies of modern molecular dynamics and quantum chemistry packages span decades Both open source and commercial Amalgam of programming languages Domain specific methods  Multi-dimensional integral engines General purpose  Davidson method for computing eigenvalues of large matrices (ranks in tens of billions)
  7. 7. Software is extremely complex Example: Modern ab initio quantum chemistry simulations  Computations scale as O(N7) or higher  Where N represents size of molecular system (number of atoms, electrons, or basis functions) Code complexity arises naturally from problems, but  is an obstacle to long-term sustainability  is an obstacle to exploitation of (ever changing) HPC hardware  hinders education of next generation of scientists  only a handful of very senior students can make a contribution
  8. 8.  Much recent software development focused on exploiting parallel architectures  Varying degrees of success  With a few exceptions, still not fully exploiting available systems  Utilizing exascale will require rethinking of approach Desperate need for tools to generate high performance massively parallel code from high level specifications
  9. 9. Developers Mostly grad students and post-docs Training in software engineering left to individual research groups  Extent to which this is done varies  Large burden for small groups: community approach has potential benefits for both software and the students  Education tends to be narrow: students learn about software their advisors are involved with
  10. 10. Science Drivers Catalysis  Catalysts facilitate control of chemical reactions by raising rates that chemical bonds are formed or broken  Improve selectivity and control over unwanted byproducts  Decreased energy consumption  Reduction of waste stream  Rational design of catalysts for a specific application is one of the Holy Grails of of modern chemistry and chemical engineering  Requires quantitative information about transition states  Intermediates low concentration and short lifetimes—thwart experimental evaluation  Will require state-of-the-art computation combined with experiements
  11. 11. Science Drivers Organic photovoltaic cells  Potential applications: thin-film transistors, LEDs, solar cells, optical switches  Advantages  Devicescan be flexible  Inexpensive to produce  Limitations  Reduced power conversion energy  But,process leading to current generation not well understood
  12. 12. Overcoming the Barriers 1 year conceptualization phase funded by NSF  First meeting Jan 2013  3 working groups Highest priority  Portable parallel infrastructure  General-purpose tensor algebra algorithms  Protocols for information exchange and code interoperability  Education and training
  13. 13. Portable parallel infrastructure Technology trends  Massive concurrency on a chip  Massive number of sockets in largest supercomputers  Heterogeneity (CPU + GPU)  Deep, complex memory hierarchies  Memory and communication bandwidth limited Bleeding edge applications may need to  Coordinate over 109 threads  Tolerate faults  Explicitly manage energy consumption
  14. 14. Sustainability of large and widely distributedchemistry codes Enable most computations in chemistry Likely will run on leadership class machines Working group will include computational chemists, parallel programming experts, and reps from major tech providers (NVIDIA, Intel, IBM)
  15. 15. Sustainability of software developed by smallerresearch groups Need to understand programming models and tools Need to understand how both community and software can be better organized  Accelerate testing of new ideas at sufficient scale to determine their worth  Key: being able to write code and integrate into existing software. Currently, new developments take months or years to migrate from developers software to other packages
  16. 16. General-purpose tensor algebraalgorithms Tensor algebra ubiquitous in science and engineering Need new approaches for computing with high dimensional tensors  Currentsoftware—8 or fewer  Emerging methods require 3N dimensions where N (the number of electrons) may be O(100) or more. Need common framework of reusable software elements.
  17. 17. Challenges for high-rank tensors Challenges  Develop robust implementations of algorithms  Standardize data structures and algorithms, APIs, software elements, and frameworks  Automate the derivation, transformation, and implementation of tensor expressions. Will require cross-disciplinary collaborations Infrastructure will include DSL, runtimes, compilers as well as static and dynamic algorithm analyses
  18. 18. Protocols for Information Exchange andCode Interoperability Historically culture has been competitive rather than collaborative  Sharing of code and data limited Theoretical methods driving code towards greater size and complexity Currently, progress may require substantial duplication of effort—code that could be reused is not due to lack of standards Impedes new science, wastes human labor
  19. 19. Information Sharing Standards for data shared between codes Standards (or methods to convert between standards) for stored data to facilitate mining. Data provenance Cannot expect all code to use the same format, but transformation leads to errors, computational inefficiencies, complex interfaces
  20. 20. Code Sharing Need to establish level of interoperability  Coarse-grained  Hartree-Fock code from one app, MP2 code from another  Fine-grained  Calculate most of one electron contributions to a Fock matrix in one program, relativistic and solvent terms in another Need architecture
  21. 21. Education and Training Mastering existing codes is daunting task for grad students Chemical education culture worse than many STEM fields  PhD only requires modest coursework  OK for most fields of chemistry where undergrad training is adequate preparation for hands-on lab research  Most students unprepared for research in computational chemistry  Ad hoc training by individual research groups is innefficient How should students be prepared for 2020 and beyond?  Programming models and tools  Multidisciplinary foundation to computational science  Reasoning about software  Manipulating software with confidence and facility
  22. 22. Summer school Summer school for grad students supported by community  Fundamental algorithms of computational chemistry  Software best practices
  23. 23. If S 2I2C2M2 is successful Open access to new software tools and infrastructure Training and educational opportunities for grad students and post-docs in  Algorithms  Code standards  Software best practices NOT the goal to produce monolithic computational chemistry package to replace existing ones  Healthy competition is good Sets of robust and properly validated software components that can be shared will benefit the entire community