Open Chemistry: Realizing Open Data, Open Standards, and Open Source


Published on

The Blue Obelisk has brought together the computational chemistry community and those who are passionate about Open Chemistry and realizing the promise of Open Data, Open Standards, and Open Software (ODOSOS); the three pillars the group promotes. We will present current work that has taken place over the past five years, which is inspired by these pillars, and present plans for future work.

The group is actively engaged in multiple open source projects that rely on and promote open standards and open data including: Avogadro (a powerful 3D molecular editor), OpenQube (a library for quantum mechanics), ChemData (a tool for large-scale chemical data analysis and visualization), Chemkit (a library for cheminformatics), MoleQueue (a HPC queue manager), and VTK (a library for scientific data visualization). The Open Chemistry project benefits greatly from the activities of the Blue Obelisk and makes use of several prominent open-source projects including Qt and MongoDB.

Published in: Technology
1 Like
  • Be the first to comment

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

Open Chemistry: Realizing Open Data, Open Standards, and Open Source

  1. Open Chemistry: Realizing Open Data, Open Standards and Open Source Marcus D. Hanwell, Kyle Lutz, David Lonie, Chris Harris, and David Cole Website: Email:, Scientific Computing, Kitware, Inc, 28 Corporate Drive, Clifton Park, NY 12065. Avogadro Open Chemistry Chemical Data ExplorerThe Avogadro project is a cross-platform, open-source approach to building chemical The Open Chemistry project is developing a suite of applications and support libraries The Chemical Data Explorer is an cross-platform, open-source application thatstructures. It uses external simulation packages in addition to integrated analysis and to improve the workflow in computational chemistry, biology, materials science and builds on the capabilities of the Visualization Toolkit, Qt and MongoDB. It canvisualization routines. The work presented here illustrates a workflow for quantum related areas. A set of open, connected components that can tackle small problems connect to a local or remote database, ingest new data from various sources andmechanical calculations, allowing the preparation of chemical structures, rough on the desktop, and big research projects requiring significant time on the world’s top make that data semantically rich. It can apply informatics techniques to the dataoptimization, and subsequent calculation of electron density isosurfaces, molecular supercomputers. it contains to search for structures with particular properties. Work is ongoing toorbitals, etc. more tightly integrate computational job storage and search. Log File Input File Simulation Results Informatics Job Submission HPC integration Local Cloud Supercomputer Figure 5: The workflow that the Open Chemistry components are being developed for. Figure 1: Avogadro application (left), ray-traced molecule (center) and the periodic table widget (right). OpenQubeAvogadro allows the user to prepare jobs for quantum packages, such as NWChem, OpenQube is a small, open-source C++ library that reads key quantum data from Figure 3: The user interface showing a query and structures (top-left), a scatter plot matrix (top-right), scatterGAMESS, Gaussian and Q-Chem. Due to the plugin-based nature of the Avogadro calculations produced by codes such as NWChem, GAMESS and Gaussian. It can plot with tooltip (bottom-left), and K-means clustering (bottom-right).project, many specialized functions can be added for a large range of applications, read in basis sets, eigenvectors and density matrices, and calculate the magnitudesuch as molecular docking, surface modeling and electronic structure. of the molecular orbitals and electron density on regularly-spaced grids. The data produced can be used for further analysis and visualization of electronic structure. Visualization Toolkit and ParaView The Visualization Toolkit (VTK) is an open-source, C++ toolkit for 2D and MoleQueue Chemkit 3D graphics, volume rendering, image processing, visualization and modeling.The MoleQueue application provides a graphical interface that integrates high- Development began in 1993, and it now has a large community of developers Chemkit is an open-source, C++ library for molecular modeling, cheminformatics,performance computing (HPC) resources on the desktop. It offers a seamless distributed around the world in a diverse set of fields. VTK processes data using and molecular visualization. It features a modular, plugin-based architecture andintegration layer for applications, such as Avogadro, to submit jobs to local and a data flow graph (pipeline) in which each algorithm takes zero or more inputs includes over 40 plugins that implement 15 file formats, 6 line formats, 4 force-fields,remote computational resources. Job lifetime is managed by MoleQueue, and results and produces zero or more outputs. VTK is scalable to large data because it has 2 partial charge models, 2 aromaticity models, 8 atom typers and 30 molecularcan be opened in any external program. distributed algorithms that use MPI to execute on large computing clusters. descriptors. In addition, Chemkit includes an integrated visualization library built on OpenGL/Qt, with Python bindings for easy scripting. Figure 4: Volume rendered molecular orbital with sliced contour (left), and library dependency graph (right). Figure 6: Cartoon rendering of protein (left), surface rendering (center), and molecule rendering (right). ParaView is an open-source, cross-platform data analysis and visualization application. It is one of the flagship open-source projects developed by Kitware, Figure 2: The MoleQueue program configuration dialog for a PBS remote system. Software Process building on VTK and Qt to provide a client-server application that allows users• Graphical configuration of queues and programs These projects are open-source, targeting multiple platforms and architectures. A to quickly build visualizations to analyze their data. ParaView was developed to quality-inducing software process is employed using best-of-breed technologies such analyze extremely large data sets using distributed memory computing resources.• Support for Sun Grid Engine, PBS and running calculations locally as Git for distributed version control, Gerrit for code review, CMake for cross- It can be used interactively with the cross-platform GUI, or scripted from Python.• JSON-RPC protocol for interprocess communication over local sockets or ZeroMQ VTK and ParaView are being augmented with additional functionality for chemistry platform building, CTest for unit/regression testing and CDash for software quality• C++ and Python client libraries feedback. Most code is BSD licensed, and designed with reuse in mind. through projects such as the Google Summer of Code and Open Chemistry.