• Share
  • Email
  • Embed
  • Like
  • Save
  • Private Content
Avogadro, Open Chemistry and Semantics

Avogadro, Open Chemistry and Semantics



Avogadro is being rewritten and architected to put semantic chemical meaning at the center of its internal data structures in order to fully support data-centric workflows. Computational and ...

Avogadro is being rewritten and architected to put semantic chemical meaning at the center of its internal data structures in order to fully support data-centric workflows. Computational and experimental chemistry both suffer when semantic meaning is lost; through the use of expressive formats such as CML, along with lightweight data-exchange formats such as JSON, workflows that previously demanded manual intervention to retain semantic meaning can be used. Integration with projects like JUMBO and Open Babel when conversion is required, coupled with codes such as NWChem where direct support for CML is being added, allow for much richer storage, analysis, and indexing of data. As web-based data sources add more semantic structure to their data, Avogadro will take advantage of those resources.



Total Views
Views on SlideShare
Embed Views



1 Embed 101

http://lanyrd.com 101



Upload Details

Uploaded via as Adobe PDF

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
Post Comment
Edit your comment

    Avogadro, Open Chemistry and Semantics Avogadro, Open Chemistry and Semantics Presentation Transcript

    • Avogadro, Open Chemistry and Semantics August 21, 2012 Skolnik Symposium Marcus D. Hanwell Kyle Lutz 1  
    • Introduction to Kitware•  Founded in 1998: 5 former GE Research employees•  105 employees: more than 50 PhDs•  Privately held, profitable from creation, no debt•  Rapidly Growing: >30% in 2011, 7M web-visitors/quarter•  Offices •  2011 Small Business –  Albany, NY Administration’s –  Carrboro, NC Tibbetts Award –  Santa Fe, NM •  HPCWire Readers and Editor’s Choice –  Lyon, France •  Inc’s 5000 List: 2008 –  Bangalore, India to 2011
    • Avogadro•  Project began in 2006•  Split into library & application (plugin-based)•  One of very few open source editors•  Designed to be extensible from the start•  Generates input & reads output from many codes•  An active and growing community•  Chemistry needs a free, open framework http://avogadro.openmolecules.net/ 3  
    • Avogadro Paper Published 8/13/12http://www.jcheminf.com/content/4/1/17 4  
    • Structure to Input Deck 5  
    • Vision for the Future•  Advancing the state-of-the-art•  Tight integration is needed •  Computational codes •  Clusters/supercomputers •  Data repositories •  Reduce, reuse, recycle!•  Facilitating sharing and searching of data•  Embracing open data, cheminformatics 6  
    • Opening Up Chemistry•  One of the most closed sciences•  Lots of black box proprietary codes –  Only a few have access to the code –  Publishing results from black box codes –  Many file formats in use, little agreement•  More papers should be including data•  Growing need for open standards•  Open tools needed to make that happen 7  
    • Introduction to Open Chemistry•  User-friendly integration with –  Computational codes –  HPC/cloud resources –  Database/informatics resources 8  
    • Introduction•  An open approach to chemistry software Build, Test –  Open source frameworks & Package Community Review –  Developed openly –  Cross-platform –  Tested, verified Software –  Contribution model Repository –  Supported by Kitware experts Developers & Users•  BSD licensed to facilitate research/reuse 9  
    • Open Chemistry Development Team•  Assembled an inter-disciplinary team•  Domain specialists: quantum chemistry, biology, solid-state materials•  Computer scientists: build systems, queuing, graphics, software process•  Marcus, Kyle, David L., Chris, David C. 10  
    • OpenChemistry.org•  New website to promote open chemistry•  Hosts project-specific pages•  Provides an identity for related projects•  Promotes shared ownership of projects –  Website –  Code submission/review –  Testing infrastructure –  Wiki, mailing lists, news, galleries 11  
    • 12  
    • Applications Being Developed•  Three independent applications•  Communication handled with local sockets•  Avogadro 2 – structure editing, input generation, output viewing, and analysis•  MoleQueue – running local and remote jobs in standalone programs, management•  ChemData – Storage of data, searching, entry, annotation 13  
    • Open Frameworks•  AvogadroLibs – core data structures and algorithms shared across codes•  OpenQube – a collaboration platform for quantum data ingestion and visualization•  Chemkit – file I/O, exploration and chemoinformatics analysis•  VTK – specialized chemistry visualization/ data structures, use of above 14  
    • Project Diagram: Libraries/AppsCore,  command  line   GUI/Visualiza:on   HPC   OpenQube   Avogadro   AvogadroLibs   VTK   MoleQueue   Chemkit   ChemData   15  
    • Typical Workflow Log File Input File Edit/Analyze   Results   Data   Job  Submission   Local Calcula:on   Remote 16  
    • Proposed Workflow Log File Input File Edit/Analyze   Results   Data   Job  Submission   Local Calcula:on   Remote 17  
    • Optimal Workflow Log File Input File Avogadro   Results   ChemData   Job  Submission   MoleQueue   Local Remote Calcula:on   18  
    • Avogadro2•  Project began 2006•  Split into library & application (plugin-based)•  One of very few open source editors•  Still using Qt, C++, Eigen, OpenGL•  Uses AvogadroLibs and OpenQube for core data•  Introduces client-server dataflow/patterns•  Includes new, efficient rendering code•  More liberally licensed – from GPL to BSD 19  
    • Avogadro: Visualization•  GPU-accelerated rendering•  VTK for advanced visualization•  Support for 2D and 3D data plots•  Optimized data structures –  Large data –  Streaming•  Reworked interface –  Tighter database/workflow integration 20  
    • MoleQueue: Job Management•  Tighter integration with remote queues•  Integration with databases –  Retains full log of computational jobs –  Triggers actions on completion•  Plugin-based system –  Easy addition of new codes –  Easy addition of new queuing systems•  Provides a client API for applications 21  
    • MoleQueue•  Supports configuration of a variety of remote clusters and queuing software
    • New CML I/O•  Development of modular CML code•  Allows for multi-pass parsing of CML•  Keeps the CML closer to application•  Much faster, easier to extend and change•  Moving from simple CML to full semantic documents that can be edited•  Learned from previous work in VTK and Open Babel 23  
    • File Format: CML & HDF5•  Leverages our experience with XDMF•  CML stores semantic data –  Name, formula, atoms, bonds –  Computational code, theory, basis set•  HDF5 used to store heavy data –  Basis set, intermediate data –  Eigenvectors, SCF matrix –  Volumetric data (MOs, electron density) 24  
    • Rethinking Input File Generation•  Can we create a CML representation? –  Could be loaded directly by some codes –  Could be translated to input files for others•  Would allow search on input and output•  Could be stored and published•  Makes it easier to set up calculations•  Creates a more uniform experience 25  
    • Advanced Impostor Rendering•  Using a scene, vertex buffer objects, and OpenGL shading language•  Impostor techniques –  Sphere goes from 100s of triangles to 2! –  No artifacts from triangulation –  Scales to millions of spheres on modest GPU 26  
    • Impostor Sphere Rendering 27  
    • Building Community•  Community around chemistry Build, Test projects & Package Community•  Using Kitware’s software process Review –  Ensuring quality with continuous testing –  Code contributions on the web –  Public mailing lists, bug trackers, code review•  Promoting projects and Software Repository participation –  Publications Developers –  Conferences & Users –  Workshops 28  
    • Software Process•  Source code publicly hosted using Git•  Gerrit for code review•  CTest/CDash for testing/summary –  Gerrit can use CDash@Home •  Test proposed changes before merging•  CDash can now provide binaries –  Built nightly, available for direct download•  Wiki, mailing list, bug tracker 29  
    • Conclusions•  Real opportunity to make an impact•  Improve research, industry and teaching•  Semantic data at the center of our work –  Storage –  Search –  Interaction with computational codes –  Comparison with experimental data•  Add support for iOS, Android and web 30  
    • Acknowledgements•  Google Summer of Code for initial summer funding•  Avogadro developers: Geoffrey R. Hutchison, Donald E. Curtis, David C. Lonie, Tim Vandermeersch and many more contributors, users and supporters•  Kitware, Inc. for their unique business model & support•  The Engineer Research and Development Center’s Environmental Laboratory for recent funding•  Open-source projects, standards and services we build on: Qt, Open Babel, GLEW, CML, CACTUS Resolver, many, many more projects•  Support of many code developers including MOPAC, NWChem, Q- Chem and others•  Support from Peter Murray-Rust and the Blue Obelisk 31