Your SlideShare is downloading. ×
0
Computational Chemistry Robots ACS Sep 2005 Computational Chemistry Robots J. A. Townsend, P. Murray-Rust,  S. M. Tyrrell,...
<ul><li>Can high-throughput computation provide a  reliable “experimental” resource for  molecular properties? </li></ul><...
Aspects of complete automation <ul><li>Humans must validate  protocols  rather than individual data </li></ul><ul><li>Low ...
Approaches to conformance <ul><li>Explore limits of job behaviour (times, convergence, etc.) </li></ul><ul><li>Analyse rep...
The overall view molecules computation dissemination
The overall view molecules computation dissemination Check  results
Components of System <ul><li>Workflow for management of jobs (Taverna) </li></ul><ul><li>Natural Language Processing based...
Computing the NCI database MOPAC PM5 a a MOPAC PM5 – collaboration with J.J.P. Stewart
Protocol Log Files Parse System Crashes Science Errors Analysis Pathological Behaviour Statistics Other Science Disseminat...
Taverna <ul><li>Workflow programs allow a series of small tasks to be  linked together to develop more complex tasks </li>...
An Example Taverna Workflow
Parsing Log Files to CML Coordinates Molecular Formula Calculation Type Point Group Dipole Total Energy Computational Chem...
CompChem Output Coordinates Energy Levels Vibrations Coordinates Energy Level Vibration CML File CMLCore CMLCore CMLComp C...
Dissemination of results LOG FILE CML FILE HUMAN DISPLAY WWMM* Server and DSpace Outside world JUMBOMarker NLP-based log f...
InChI: IUPAC International Chemical Identifier <ul><ul><li>A non-proprietary unique identifier for the representation of c...
Proteus molecules * Calculation JUNK     Cured by MOPAC * Proteus was a shape changing ocean deity
Proteus molecules Calculation Input     JUNK
How do we know our results are valid? Computational Method 1 Computational Method 2 Experiment
J.J.P. Stewart’s example Calculated   H f   –  Expt   H f
GAMESS MOPAC results GAMESS a 631G* B3LYP Log Files a  Project with Kim Baldridge and Wibke Sudholt
Protocol Log Files Parse System Crashes Science Errors Analysis Pathological Behaviour Statistics Other Science Disseminat...
Repeat runs, different methods Multiple runs give same final structure from same input Changing memory allocation doesn’t ...
Pathological behaviour - Early detection 100 min 631G*, B3LYP 200 min 15 min   631G*, B3LYP   10080 min divinyl ether  tra...
Times to run jobs
Analysis of different computational methods Mean  - Overall difference Normality  - Distribution of values Outliers  - Unu...
Probability Plot (Normal QQ plot)
Mean of distribution (Approx - 0.03  Å ) Range over which sample distribution is  approximately normal Outliers Probabilit...
All bonds*   r (MOPAC – GAMESS) /  Å * Excludes bonds to Hydrogenc
All bonds*   r (MOPAC – GAMESS) /  Å Good agreement Nearly normal  Outliers S.D. 0.005  Å * Excludes bonds to Hydrogenc
2- Bad molecules and data usually cause outliers Na P O O H H
Mean   r (M - G) /  Å  Standard Error of the Mean / Å All values given to 3 significant figures   C N O F S Cl C -0.006 0...
 r CC bonds (M - G) /  Å
 r CC bonds (M - G) /  Å Good agreement Nearly normal Outliers S.D. 0.013  Å JUNK
Selection of molecules with C C   r (M - G) > 0.05 Angstroms
Y = 0.0277 X – 0.0061 Non aromatic C C bonds adjacent to CF n
 r NN bonds (M - G) /  Å
Good agreement Nearly normal Kink S.D. 0.022  Å  r NN bonds (M - G) /  Å
Density plot of   r NN bonds (M - G) /  Å
LEFT RIGHT Density plot of   r NN bonds (M - G) /  Å
Most common fragments found in  Left set but not Right set C(sp 3 ) C(sp 3 ) (sp 3 ) S(sp 2 ) N(ar) N (ar) C(sp 2 ) S(sp 2...
GAMESS Log Files Comparison of theory and experiment CIF* CIF* CIF* CIF* CIF* CIF 2 CML * CIF: Crystallographic Informatio...
Reading Acta Crystallographica Section E
All bonds*   r (Cryst. – GAMESS) / Å  Single molecules, no disorder * Excludes bonds to Hydrogenc
All bonds*   r (Cryst. – GAMESS) / Å  Single molecules, no disorder Mean   r  - 0.011  Å Nearly normal Outliers S.D. 0.0...
 r CC bonds (C – G) / Å
Mean   r - 0.01  Å Nearly normal S.D. 0.009  Å  r CC bonds (C – G) / Å
 r CO bonds (C – G) / Å
Good agreement Nearly normal Outliers ? S.D. 0.011  Å  r CO bonds (C – G) / Å
 r = +0.08  Å Chemistry can cause outliers H movement
Conclusions <ul><li>Protocols can be automated </li></ul><ul><li>Machines can highlight unusual behaviour, </li></ul><ul><...
Thanks J.J.P. Stewart Kim Baldridge Wibke Sudholt Simon Tyrrell Yong Zhang Peter Murray-Rust Unilever
Questions Homepage: http://wwmm.ch.cam.ac.uk InChI FAQ: http://wwmm.ch.cam.ac.uk/inchifaq R: http:// www.r-project.org Tav...
Upcoming SlideShare
Loading in...5
×

Computational Chemistry Robots

1,001

Published on

describes how to design and implement a protocol for high-through put computation

Published in: Technology
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
1,001
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
19
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Transcript of "Computational Chemistry Robots"

  1. 1. Computational Chemistry Robots ACS Sep 2005 Computational Chemistry Robots J. A. Townsend, P. Murray-Rust, S. M. Tyrrell, Y. Zhang [email_address]
  2. 2. <ul><li>Can high-throughput computation provide a reliable “experimental” resource for molecular properties? </li></ul><ul><li>Can protocols be automated? </li></ul><ul><li>Can we believe the results? </li></ul>
  3. 3. Aspects of complete automation <ul><li>Humans must validate protocols rather than individual data </li></ul><ul><li>Low rates of error must be addressed </li></ul><ul><li>Users should know the rates of error and degree of conformance </li></ul>
  4. 4. Approaches to conformance <ul><li>Explore limits of job behaviour (times, convergence, etc.) </li></ul><ul><li>Analyse reproducibility </li></ul><ul><li>Vary and analyse effects of parameters and algorithms </li></ul><ul><li>Compare output with other “measurements” of same quantity </li></ul>
  5. 5. The overall view molecules computation dissemination
  6. 6. The overall view molecules computation dissemination Check results
  7. 7. Components of System <ul><li>Workflow for management of jobs (Taverna) </li></ul><ul><li>Natural Language Processing based parsing of outputs (JUMBOMarker) </li></ul><ul><li>Pairwise comparison of data sets (R) </li></ul><ul><li>Analysis of mean and variance </li></ul><ul><li>Detection and analysis of outliers </li></ul>
  8. 8. Computing the NCI database MOPAC PM5 a a MOPAC PM5 – collaboration with J.J.P. Stewart
  9. 9. Protocol Log Files Parse System Crashes Science Errors Analysis Pathological Behaviour Statistics Other Science Disseminate Results Unsuitable Data Program Crashes Inform Developer
  10. 10. Taverna <ul><li>Workflow programs allow a series of small tasks to be linked together to develop more complex tasks </li></ul><ul><li>Open Source </li></ul><ul><li>myGRID, eScience </li></ul><ul><li>European Bioinformatics Institute </li></ul><ul><li>University of Manchester </li></ul>
  11. 11. An Example Taverna Workflow
  12. 12. Parsing Log Files to CML Coordinates Molecular Formula Calculation Type Point Group Dipole Total Energy Computational Chemistry Log Files
  13. 13. CompChem Output Coordinates Energy Levels Vibrations Coordinates Energy Level Vibration CML File CMLCore CMLCore CMLComp CMLSpect Input/jobControl General Parsers
  14. 14. Dissemination of results LOG FILE CML FILE HUMAN DISPLAY WWMM* Server and DSpace Outside world JUMBOMarker NLP-based log file parser * World Wide Molecular Matrix
  15. 15. InChI: IUPAC International Chemical Identifier <ul><ul><li>A non-proprietary unique identifier for the representation of chemical structures. </li></ul></ul><ul><ul><li>A normal, canonicalised and serialised form of a chemical connection table. </li></ul></ul><ul><ul><li>InChI FAQ: http://wwmm.ch.cam.ac.uk/inchifaq/ </li></ul></ul>
  16. 16. Proteus molecules * Calculation JUNK Cured by MOPAC * Proteus was a shape changing ocean deity
  17. 17. Proteus molecules Calculation Input JUNK
  18. 18. How do we know our results are valid? Computational Method 1 Computational Method 2 Experiment
  19. 19. J.J.P. Stewart’s example Calculated  H f – Expt  H f
  20. 20. GAMESS MOPAC results GAMESS a 631G* B3LYP Log Files a Project with Kim Baldridge and Wibke Sudholt
  21. 21. Protocol Log Files Parse System Crashes Science Errors Analysis Pathological Behaviour Statistics Other Science Disseminate Results Unsuitable Data Program Crashes Inform Developer
  22. 22. Repeat runs, different methods Multiple runs give same final structure from same input Changing memory allocation doesn’t make a difference
  23. 23. Pathological behaviour - Early detection 100 min 631G*, B3LYP 200 min 15 min 631G*, B3LYP 10080 min divinyl ether trans-Crotonaldehyde Z matrix
  24. 24. Times to run jobs
  25. 25. Analysis of different computational methods Mean - Overall difference Normality - Distribution of values Outliers - Unusual molecules? Variance - Spread of the data, depends on both distributions. (standard deviation)
  26. 26. Probability Plot (Normal QQ plot)
  27. 27. Mean of distribution (Approx - 0.03 Å ) Range over which sample distribution is approximately normal Outliers Probability Plot (Normal QQ plot) S.D. 0.020 Å
  28. 28. All bonds*  r (MOPAC – GAMESS) / Å * Excludes bonds to Hydrogenc
  29. 29. All bonds*  r (MOPAC – GAMESS) / Å Good agreement Nearly normal Outliers S.D. 0.005 Å * Excludes bonds to Hydrogenc
  30. 30. 2- Bad molecules and data usually cause outliers Na P O O H H
  31. 31. Mean  r (M - G) / Å Standard Error of the Mean / Å All values given to 3 significant figures   C N O F S Cl C -0.006 0.020 -0.010 -0.014 -0.040 -0.037 0.000 0.000 0.000 0.001 0.001 0.001 N   0.006 -0.037   -0.055     0.001 0.001   0.009   O     -0.087   -0.070       0.004   0.014  
  32. 32.  r CC bonds (M - G) / Å
  33. 33.  r CC bonds (M - G) / Å Good agreement Nearly normal Outliers S.D. 0.013 Å JUNK
  34. 34. Selection of molecules with C C  r (M - G) > 0.05 Angstroms
  35. 35. Y = 0.0277 X – 0.0061 Non aromatic C C bonds adjacent to CF n
  36. 36.  r NN bonds (M - G) / Å
  37. 37. Good agreement Nearly normal Kink S.D. 0.022 Å  r NN bonds (M - G) / Å
  38. 38. Density plot of  r NN bonds (M - G) / Å
  39. 39. LEFT RIGHT Density plot of  r NN bonds (M - G) / Å
  40. 40. Most common fragments found in Left set but not Right set C(sp 3 ) C(sp 3 ) (sp 3 ) S(sp 2 ) N(ar) N (ar) C(sp 2 ) S(sp 2 ) N(ar) N (ar) C(sp 2 ) Or
  41. 41. GAMESS Log Files Comparison of theory and experiment CIF* CIF* CIF* CIF* CIF* CIF 2 CML * CIF: Crystallographic Information File
  42. 42. Reading Acta Crystallographica Section E
  43. 43. All bonds*  r (Cryst. – GAMESS) / Å Single molecules, no disorder * Excludes bonds to Hydrogenc
  44. 44. All bonds*  r (Cryst. – GAMESS) / Å Single molecules, no disorder Mean  r - 0.011 Å Nearly normal Outliers S.D. 0.014 Å * Excludes bonds to Hydrogenc
  45. 45.  r CC bonds (C – G) / Å
  46. 46. Mean  r - 0.01 Å Nearly normal S.D. 0.009 Å  r CC bonds (C – G) / Å
  47. 47.  r CO bonds (C – G) / Å
  48. 48. Good agreement Nearly normal Outliers ? S.D. 0.011 Å  r CO bonds (C – G) / Å
  49. 49.  r = +0.08 Å Chemistry can cause outliers H movement
  50. 50. Conclusions <ul><li>Protocols can be automated </li></ul><ul><li>Machines can highlight unusual behaviour, </li></ul><ul><li>geometries and distribution of results for </li></ul><ul><li>humans to consider </li></ul><ul><li>Computational programs can provide high </li></ul><ul><li>quality “experimental” molecular properties </li></ul>
  51. 51. Thanks J.J.P. Stewart Kim Baldridge Wibke Sudholt Simon Tyrrell Yong Zhang Peter Murray-Rust Unilever
  52. 52. Questions Homepage: http://wwmm.ch.cam.ac.uk InChI FAQ: http://wwmm.ch.cam.ac.uk/inchifaq R: http:// www.r-project.org Taverna: http://taverna.sourceforge.net/ MOPAC 2002: http://www.cachesoftware.com/mopac/ GAMESS: http:// www.msg.ameslab.gov/GAMESS/GAMESS.html
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.

×