Avogadro, Open Chemistry and Semantics


Published on

Avogadro is being rewritten and architected to put semantic chemical meaning at the center of its internal data structures in order to fully support data-centric workflows. Computational and experimental chemistry both suffer when semantic meaning is lost; through the use of expressive formats such as CML, along with lightweight data-exchange formats such as JSON, workflows that previously demanded manual intervention to retain semantic meaning can be used. Integration with projects like JUMBO and Open Babel when conversion is required, coupled with codes such as NWChem where direct support for CML is being added, allow for much richer storage, analysis, and indexing of data. As web-based data sources add more semantic structure to their data, Avogadro will take advantage of those resources.

Published in: Technology
  • Be the first to comment

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

Avogadro, Open Chemistry and Semantics

  1. Avogadro, Open Chemistry and Semantics August 21, 2012 Skolnik Symposium Marcus D. Hanwell Kyle Lutz 1  
  2. Introduction to Kitware•  Founded in 1998: 5 former GE Research employees•  105 employees: more than 50 PhDs•  Privately held, profitable from creation, no debt•  Rapidly Growing: >30% in 2011, 7M web-visitors/quarter•  Offices •  2011 Small Business –  Albany, NY Administration’s –  Carrboro, NC Tibbetts Award –  Santa Fe, NM •  HPCWire Readers and Editor’s Choice –  Lyon, France •  Inc’s 5000 List: 2008 –  Bangalore, India to 2011
  3. Avogadro•  Project began in 2006•  Split into library & application (plugin-based)•  One of very few open source editors•  Designed to be extensible from the start•  Generates input & reads output from many codes•  An active and growing community•  Chemistry needs a free, open framework http://avogadro.openmolecules.net/ 3  
  4. Avogadro Paper Published 8/13/12http://www.jcheminf.com/content/4/1/17 4  
  5. Structure to Input Deck 5  
  6. Vision for the Future•  Advancing the state-of-the-art•  Tight integration is needed •  Computational codes •  Clusters/supercomputers •  Data repositories •  Reduce, reuse, recycle!•  Facilitating sharing and searching of data•  Embracing open data, cheminformatics 6  
  7. Opening Up Chemistry•  One of the most closed sciences•  Lots of black box proprietary codes –  Only a few have access to the code –  Publishing results from black box codes –  Many file formats in use, little agreement•  More papers should be including data•  Growing need for open standards•  Open tools needed to make that happen 7  
  8. Introduction to Open Chemistry•  User-friendly integration with –  Computational codes –  HPC/cloud resources –  Database/informatics resources 8  
  9. Introduction•  An open approach to chemistry software Build, Test –  Open source frameworks & Package Community Review –  Developed openly –  Cross-platform –  Tested, verified Software –  Contribution model Repository –  Supported by Kitware experts Developers & Users•  BSD licensed to facilitate research/reuse 9  
  10. Open Chemistry Development Team•  Assembled an inter-disciplinary team•  Domain specialists: quantum chemistry, biology, solid-state materials•  Computer scientists: build systems, queuing, graphics, software process•  Marcus, Kyle, David L., Chris, David C. 10  
  11. OpenChemistry.org•  New website to promote open chemistry•  Hosts project-specific pages•  Provides an identity for related projects•  Promotes shared ownership of projects –  Website –  Code submission/review –  Testing infrastructure –  Wiki, mailing lists, news, galleries 11  
  12. 12  
  13. Applications Being Developed•  Three independent applications•  Communication handled with local sockets•  Avogadro 2 – structure editing, input generation, output viewing, and analysis•  MoleQueue – running local and remote jobs in standalone programs, management•  ChemData – Storage of data, searching, entry, annotation 13  
  14. Open Frameworks•  AvogadroLibs – core data structures and algorithms shared across codes•  OpenQube – a collaboration platform for quantum data ingestion and visualization•  Chemkit – file I/O, exploration and chemoinformatics analysis•  VTK – specialized chemistry visualization/ data structures, use of above 14  
  15. Project Diagram: Libraries/AppsCore,  command  line   GUI/Visualiza:on   HPC   OpenQube   Avogadro   AvogadroLibs   VTK   MoleQueue   Chemkit   ChemData   15  
  16. Typical Workflow Log File Input File Edit/Analyze   Results   Data   Job  Submission   Local Calcula:on   Remote 16  
  17. Proposed Workflow Log File Input File Edit/Analyze   Results   Data   Job  Submission   Local Calcula:on   Remote 17  
  18. Optimal Workflow Log File Input File Avogadro   Results   ChemData   Job  Submission   MoleQueue   Local Remote Calcula:on   18  
  19. Avogadro2•  Project began 2006•  Split into library & application (plugin-based)•  One of very few open source editors•  Still using Qt, C++, Eigen, OpenGL•  Uses AvogadroLibs and OpenQube for core data•  Introduces client-server dataflow/patterns•  Includes new, efficient rendering code•  More liberally licensed – from GPL to BSD 19  
  20. Avogadro: Visualization•  GPU-accelerated rendering•  VTK for advanced visualization•  Support for 2D and 3D data plots•  Optimized data structures –  Large data –  Streaming•  Reworked interface –  Tighter database/workflow integration 20  
  21. MoleQueue: Job Management•  Tighter integration with remote queues•  Integration with databases –  Retains full log of computational jobs –  Triggers actions on completion•  Plugin-based system –  Easy addition of new codes –  Easy addition of new queuing systems•  Provides a client API for applications 21  
  22. MoleQueue•  Supports configuration of a variety of remote clusters and queuing software
  23. New CML I/O•  Development of modular CML code•  Allows for multi-pass parsing of CML•  Keeps the CML closer to application•  Much faster, easier to extend and change•  Moving from simple CML to full semantic documents that can be edited•  Learned from previous work in VTK and Open Babel 23  
  24. File Format: CML & HDF5•  Leverages our experience with XDMF•  CML stores semantic data –  Name, formula, atoms, bonds –  Computational code, theory, basis set•  HDF5 used to store heavy data –  Basis set, intermediate data –  Eigenvectors, SCF matrix –  Volumetric data (MOs, electron density) 24  
  25. Rethinking Input File Generation•  Can we create a CML representation? –  Could be loaded directly by some codes –  Could be translated to input files for others•  Would allow search on input and output•  Could be stored and published•  Makes it easier to set up calculations•  Creates a more uniform experience 25  
  26. Advanced Impostor Rendering•  Using a scene, vertex buffer objects, and OpenGL shading language•  Impostor techniques –  Sphere goes from 100s of triangles to 2! –  No artifacts from triangulation –  Scales to millions of spheres on modest GPU 26  
  27. Impostor Sphere Rendering 27  
  28. Building Community•  Community around chemistry Build, Test projects & Package Community•  Using Kitware’s software process Review –  Ensuring quality with continuous testing –  Code contributions on the web –  Public mailing lists, bug trackers, code review•  Promoting projects and Software Repository participation –  Publications Developers –  Conferences & Users –  Workshops 28  
  29. Software Process•  Source code publicly hosted using Git•  Gerrit for code review•  CTest/CDash for testing/summary –  Gerrit can use CDash@Home •  Test proposed changes before merging•  CDash can now provide binaries –  Built nightly, available for direct download•  Wiki, mailing list, bug tracker 29  
  30. Conclusions•  Real opportunity to make an impact•  Improve research, industry and teaching•  Semantic data at the center of our work –  Storage –  Search –  Interaction with computational codes –  Comparison with experimental data•  Add support for iOS, Android and web 30  
  31. Acknowledgements•  Google Summer of Code for initial summer funding•  Avogadro developers: Geoffrey R. Hutchison, Donald E. Curtis, David C. Lonie, Tim Vandermeersch and many more contributors, users and supporters•  Kitware, Inc. for their unique business model & support•  The Engineer Research and Development Center’s Environmental Laboratory for recent funding•  Open-source projects, standards and services we build on: Qt, Open Babel, GLEW, CML, CACTUS Resolver, many, many more projects•  Support of many code developers including MOPAC, NWChem, Q- Chem and others•  Support from Peter Murray-Rust and the Blue Obelisk 31