Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
Chemical Databases and Open   Chemistry on the Desktop5th Meeting on US Government Chemical Databases & Open Chemistry    ...
Outline•    Background•    Opening up chemistry•    Workflows in computational chemistry•    Avogadro – chemical editor•  ...
My Background•  Ph.D. (Physics) – University of Sheffield•  Google Summer of Code – Avogadro•  Postdoc (Chemistry) – Unive...
Kitware•  Founded in 1998: 5 former GE Research employees•  95 employees: 42% PhD•  Privately held, profitable from creati...
Kitware: Core Technologies                CMake                CDash                             5	  
Opening Up Chemistry•  Computational chemistry is currently one   of the more closed sciences•  Lots of black box propriet...
Movements for Open Chemistry•  Formed an “unorganization” – Blue Obelisk  –  Published first article in 2005  –  Open data...
Typical Chemistry Workflow        Log File                      Input File                   Edit/Analyze	     Results	   ...
Problem: Pretty Complex/Manual•    Most steps require user intervention•    Obtain starting structure (previous work, data...
Improved Chemistry Workflow       Log File                      Input File                  Edit/Analyze	    Results	     ...
Avogadro•  Project began 2006•  Split into library andapplication (plugin based)•  One of very few open source editors•  D...
Avogadro’s Roots•  Avogadro projected started in 2006•  First funded work in 2007 by Marcus Hanwell  –  Google Summer of C...
Avogadro Vital Statistics•    Supports Linux, Windows and Mac OS X•    Contributions from over 20 developers•    Over 180,...
14	  
Desktop Database•  Use of “document store” NoSQL  •  Doesn’t force too much structure     •  Some entries have experimenta...
Chemistry Data Explorer•    Qt application•    Connects to local or remote database•    Uses VTK for visual data explorati...
Chemistry Data Explorer                          17	  
Database Interaction on the Web•  Avogadro directly accesses some (read-   only) public databases:  •  PDB, NIH “fetch by ...
Quixote Architecture                       19	  
Avogadro           20	  
OpenQube – Quantum Data•  Reads in key quantum data  –  Basis set used in calculation  –  Eigenvectors for molecular orbit...
Molecular Orbitals and Electron Density•  Quantum files store basis sets and   matrices                −αr 2  GTO = ce  φ ...
Calling Stand Alone Programs•  Many already supported:  •  GAMESS, GAMESS-UK, Molpro, Q-Chem,     MOPAC, NWChem, Gaussian,...
Job Submission & Management•  Take input file, submit to queue, monitor,   retrieve, repeat•  System tray resident Qt appl...
Open in Avogadro When Complete                                 25	  
Advanced Visualization: VTK•  New Avogadro plugin:     •  Takes volumetric data from Avogadro     •  Uses GPU accelerated ...
Volume Rendered With Contours                                27	  
Electron Density Volume Render                                 28	  
Electron Density Ray Tracing                               29	  
Conclusions•  There is still a lot of work to do•  Open databases are of critical importance•  Need tools to make retrievi...
Extra Background Slides•  Additional visualization and background   slides                                             31	  
Standard Representations                           32	  
Standard Representations                           33	  
Biomolecules               34	  
Nanomaterials                35	  
Simplified Views                   36	  
Volumetric Data: Molecular Orbitals                                  37	  
Periodic Systems                   38	  
Hybrid Views: CPK + MO + Ball & Stick                                    39	  
Linked Views of Live Data                            40	  
2D: Graphs and Charts                        41	  
Informatics              42	  
3D Interaction Widgets                         43	  
VTK: The Toolkit•  Collection of C++ libraries  –  Leveraged by many applications  –  Divided into logical areas, e.g.    ...
VTK Development Team • From Ohloh: Very large, active development team: Over the past twelve  months, 100 developers contr...
ParaView•    Parallel visualization application•    Open source, BSD licensed•    Turn-key application wrapper around VTK•...
Large Data Visualization•  BlueGene/L at LLNL  –  65,536 compute nodes (32 bit PPC)  –  1,024 I/O nodes (32 bit PPC)  –  5...
1 Billion Cell Asteroid Simulation                                     48	  
Tiled Displays                 49	  
Parallel Processing/Rendering                                50	  
3D Chemistry Visualization•  Some existing features specific to chemistry   –  Gaussian cube, PDB, and a few others•  Exce...
Titan: VTK and Informatics•  Led by Sandia National Laboratories•  Substantial expansion of VTK:   –  Informatics & analys...
Manta: Real Time Ray Tracing                               53	  
New Frontiers•  New work porting VTK  –  Use C++ as the common core    •  iOS port in the early stages    •  Android port ...
Future Directions•  VTK modularization (in progress)   –  Developing more agile build systems   –  Automating more with CM...
Upcoming SlideShare
Loading in …5
×

Chemical Databases and Open Chemistry on the Desktop

0 views

Published on

The modern chemist has access to large databases containing both experimental and calculated data. The power of HPC resources continues to increase, with more practitioners having routine access to powerful computational chemistry tools. This places an increasingly high burden on users to assimilate these resources into their workflow in order to effectively utilize resources. The creation of an open, extensible application framework that puts computational tools, data, and domain specific knowledge at the fingertips of chemists is increasingly important. A data-centric approach to chemistry, storing all data in a searchable database, will empower users to efficiently collaborate, innovate, and push the frontiers of research. Providing an open, user-friendly and extensible application will open up new tools to experimental chemists, while providing computational chemists the ability to address greater challenges. Additionally, by distributing experimental and computational data across the research community, incorporating cheminformatics analytics techniques, and providing visual search for chemical structures, the workflow of both groups can be significantly improved. This requires suitable data formats for data exchange, and databases with appropriate APIs for querying, and uploading data in order to effectively share. This talk will discuss recent progress made in developing a suite of open chemistry applications on the desktop. The applications can query online databases, such as the NIH structure resolver service, download and manipulate structures, and prepare input files for standalone computational chemistry codes. Another application developed to submit jobs, monitor and retrieve results from HPC resources will also be shown, and a desktop chemistry database browser. The Quixote project aims to establish standards for data exchange in computational chemistry, along with data repositories for organizations. Establishing these standards is important to promote open, reproducible chemistry, and their integration into user-friendly desktop applications will promote their integration in the standard workflow of researchers.

Published in: Technology
  • Be the first to comment

  • Be the first to like this

Chemical Databases and Open Chemistry on the Desktop

  1. 1. Chemical Databases and Open Chemistry on the Desktop5th Meeting on US Government Chemical Databases & Open Chemistry August 25, 2011 Dr. Marcus D. Hanwell marcus.hanwell@kitware.com 1  
  2. 2. Outline•  Background•  Opening up chemistry•  Workflows in computational chemistry•  Avogadro – chemical editor•  Databases on the desktop•  Quixote•  HPC resource integration•  Advanced visualization 2  
  3. 3. My Background•  Ph.D. (Physics) – University of Sheffield•  Google Summer of Code – Avogadro•  Postdoc (Chemistry) – University of Pittsburgh•  R&D engineer – Kitware, Inc•  Passionate about physics, chemistry, and the growing need to improve computational tools•  See the need for powerful open source, cross platform frameworks and applications•  Develop(ed): Gentoo, KDE, Kalzium, Avogadro, Open Babel, VTK, ParaView, Titan, CMake 3  
  4. 4. Kitware•  Founded in 1998: 5 former GE Research employees•  95 employees: 42% PhD•  Privately held, profitable from creation, no debt•  Rapidly Growing: >30% in 2010, 7M web-visitors/quarter•  Offices •  2011 Small Business –  Albany, NY Administration’s Tibbetts Award –  Carrboro, NC •  HPCWire Readers –  Lyon, France and Editor’s Choice –  Bangalore, India •  Inc’s 5000 List: 2008 to 2010
  5. 5. Kitware: Core Technologies CMake CDash 5  
  6. 6. Opening Up Chemistry•  Computational chemistry is currently one of the more closed sciences•  Lots of black box proprietary codes –  Only a few have access to the code –  Publishing results from black box codes –  Many file formats in use, little agreement•  More papers should be including data•  Growing need for open standards 6  
  7. 7. Movements for Open Chemistry•  Formed an “unorganization” – Blue Obelisk –  Published first article in 2005 –  Open data, open standards and open source –  Meet at ACS and other conferences when possible –  Follow-up article currently in press•  Quixote collaboration more recently –  Provide meaningful data storage and exchange –  Principally targeting computational chemistry 7  
  8. 8. Typical Chemistry Workflow Log File Input File Edit/Analyze   Results   Data   Job  Submission   Local Calcula>on   Remote 8  
  9. 9. Problem: Pretty Complex/Manual•  Most steps require user intervention•  Obtain starting structure (previous work, databases)•  Edit structure•  Write input file•  Move input file to cluster•  Submit to queue•  Wait for completion•  Retrieve input file•  Analyze output file•  Extract the relevant data, change formats•  Store results•  Repeat 9  
  10. 10. Improved Chemistry Workflow Log File Input File Edit/Analyze   Results   Data   Job  Submission   Local Calcula>on   Remote 10  
  11. 11. Avogadro•  Project began 2006•  Split into library andapplication (plugin based)•  One of very few open source editors•  Designed to be extensible from the start•  Generate input & read output from many codes•  An active and growing community•  Chemistry needs a free, open framework 11  
  12. 12. Avogadro’s Roots•  Avogadro projected started in 2006•  First funded work in 2007 by Marcus Hanwell –  Google Summer of Code student –  Final year of Ph.D. spent the summer coding –  Funded as part of KDE project – Kalzium editor•  Built on several other open source projects –  Qt, Eigen, Open Babel, Blue Obelisk Data Repository•  Also uses open standards, e.g. OpenGL•  Cross platform, open source stack 12  
  13. 13. Avogadro Vital Statistics•  Supports Linux, Windows and Mac OS X•  Contributions from over 20 developers•  Over 180,000 downloads over 4 years•  Translated into 19 languages•  Used by Kalzium for molecular editor•  Featured by Trolltech/Nokia, –  Qt in use –  Qt ambassador program 13  
  14. 14. 14  
  15. 15. Desktop Database•  Use of “document store” NoSQL •  Doesn’t force too much structure •  Some entries have experimental data available •  Some have computational jobs •  Employ a “pile of stuff” approach •  Can store both source and derived data •  Calculate identifiers, QSAR properties, etc•  MongoDB is a scalable, open solution •  Proven scaling with large web applications 15  
  16. 16. Chemistry Data Explorer•  Qt application•  Connects to local or remote database•  Uses VTK for visual data exploration•  Can ingest new data –  Uses Open Babel to generate descriptors –  Standard InChi, SMILES, molecular weight –  More could be added •  All derived from files stored in the database 16  
  17. 17. Chemistry Data Explorer 17  
  18. 18. Database Interaction on the Web•  Avogadro directly accesses some (read- only) public databases: •  PDB, NIH “fetch by name” •  Resolve structure to common name using CIR •  More could be added•  ChemData also uses NIH CIR for data•  Quixote aims to support both public and private sharing models – open framework 18  
  19. 19. Quixote Architecture 19  
  20. 20. Avogadro 20  
  21. 21. OpenQube – Quantum Data•  Reads in key quantum data –  Basis set used in calculation –  Eigenvectors for molecular orbitals –  Density matrix for electron density –  Standard geometry•  Multithreaded calculation –  Produce regular grids of scalar data –  Molecular orbitals, electron density… 21  
  22. 22. Molecular Orbitals and Electron Density•  Quantum files store basis sets and matrices −αr 2 GTO = ce φ i = ∑ c µiφ µ µ ρ(r) = ∑ ∑ Pµν φ µ φν µ ν•  Using these equations, and the supplied matrices – calculate cubes 22  
  23. 23. Calling Stand Alone Programs•  Many already supported: •  GAMESS, GAMESS-UK, Molpro, Q-Chem, MOPAC, NWChem, Gaussian, Dalton •  Easy to add more•  Some codes writing Avogadro based custom applications, •  Q-Chem, Molpro…•  DLPOLY author approached me: •  Open sourced DLPOLY2, want a GUI 23  
  24. 24. Job Submission & Management•  Take input file, submit to queue, monitor, retrieve, repeat•  System tray resident Qt application •  Manage both local and remote jobs•  Interest from developers •  Use in other applications •  Share development/maintenance burden 24  
  25. 25. Open in Avogadro When Complete 25  
  26. 26. Advanced Visualization: VTK•  New Avogadro plugin: •  Takes volumetric data from Avogadro •  Uses GPU accelerated rendering in VTK•  Excitement from many in the community•  Several groups interested in collaborating•  Google Summer of Code project•  Leverage significant capabilities in VTK 26  
  27. 27. Volume Rendered With Contours 27  
  28. 28. Electron Density Volume Render 28  
  29. 29. Electron Density Ray Tracing 29  
  30. 30. Conclusions•  There is still a lot of work to do•  Open databases are of critical importance•  Need tools to make retrieving and depositing data easier•  Improved data exchange is essential to improve reproducibility in chemistry•  Create shared collaboration platforms –  Deliver improved workflows, enable research 30  
  31. 31. Extra Background Slides•  Additional visualization and background slides 31  
  32. 32. Standard Representations 32  
  33. 33. Standard Representations 33  
  34. 34. Biomolecules 34  
  35. 35. Nanomaterials 35  
  36. 36. Simplified Views 36  
  37. 37. Volumetric Data: Molecular Orbitals 37  
  38. 38. Periodic Systems 38  
  39. 39. Hybrid Views: CPK + MO + Ball & Stick 39  
  40. 40. Linked Views of Live Data 40  
  41. 41. 2D: Graphs and Charts 41  
  42. 42. Informatics 42  
  43. 43. 3D Interaction Widgets 43  
  44. 44. VTK: The Toolkit•  Collection of C++ libraries –  Leveraged by many applications –  Divided into logical areas, e.g. •  Filtering – data processing in visualization pipeline •  InfoVis – informatics visualization •  Widgets – 3D interaction widgets •  VolumeRendering – 3D volume rendering•  Cross platform, using OpenGL•  Wrapped in Python, Tcl and Java 44  
  45. 45. VTK Development Team • From Ohloh: Very large, active development team: Over the past twelve months, 100 developers contributed new code to VTK. This is one of the largest open-source teams in the world, and is in the top 2% of all project teams on Ohloh. and many others... 45  
  46. 46. ParaView•  Parallel visualization application•  Open source, BSD licensed•  Turn-key application wrapper around VTK•  Parallel data processing and rendering 46  
  47. 47. Large Data Visualization•  BlueGene/L at LLNL –  65,536 compute nodes (32 bit PPC) –  1,024 I/O nodes (32 bit PPC) –  512 MB of RAM per node•  Sandia Red Storm –  12,960 compute nodes (AMD Opteron dual) –  640 service and I/O nodes –  40 TB of DDR RAM per node 47  
  48. 48. 1 Billion Cell Asteroid Simulation 48  
  49. 49. Tiled Displays 49  
  50. 50. Parallel Processing/Rendering 50  
  51. 51. 3D Chemistry Visualization•  Some existing features specific to chemistry –  Gaussian cube, PDB, and a few others•  Excellent handling of volumetric data: –  Marching cubes –  Volume rendering –  Contouring•  Advanced rendering: –  Point sprites –  Manta – real time ray tracing 51  
  52. 52. Titan: VTK and Informatics•  Led by Sandia National Laboratories•  Substantial expansion of VTK: –  Informatics & analysis•  Actively developed, growing feature set•  Improved 2D rendering and API•  Database connectivity, client-server, pipeline based approach•  Uses web technologies such as ProtoViz•  Scalable, interactive infoviz 52  
  53. 53. Manta: Real Time Ray Tracing 53  
  54. 54. New Frontiers•  New work porting VTK –  Use C++ as the common core •  iOS port in the early stages •  Android port –  Use OpenGL ES 2.0 – new rendering code•  Also ParaViewWeb – delivering over web –  Use image delivery and rendering on server –  Also using WebGL for rendering (optionally) 54  
  55. 55. Future Directions•  VTK modularization (in progress) –  Developing more agile build systems –  Automating more with CMake•  Using Git more fully to improve stability –  Use of master and next –  Topic branches - merge when ready•  Code review using Gerrit –  Integration with continuous integration –  Test before merge 55  

×