Your SlideShare is downloading. ×
Computational Chemistry Robots
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×
Saving this for later? Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime – even offline.
Text the download link to your phone
Standard text messaging rates apply

Computational Chemistry Robots

977

Published on

describes how to design and implement a protocol for high-through put computation

describes how to design and implement a protocol for high-through put computation

Published in: Technology
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
977
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
18
Comments
0
Likes
0
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide

Transcript

  • 1. Computational Chemistry Robots ACS Sep 2005 Computational Chemistry Robots J. A. Townsend, P. Murray-Rust, S. M. Tyrrell, Y. Zhang [email_address]
  • 2.
    • Can high-throughput computation provide a reliable “experimental” resource for molecular properties?
    • Can protocols be automated?
    • Can we believe the results?
  • 3. Aspects of complete automation
    • Humans must validate protocols rather than individual data
    • Low rates of error must be addressed
    • Users should know the rates of error and degree of conformance
  • 4. Approaches to conformance
    • Explore limits of job behaviour (times, convergence, etc.)
    • Analyse reproducibility
    • Vary and analyse effects of parameters and algorithms
    • Compare output with other “measurements” of same quantity
  • 5. The overall view molecules computation dissemination
  • 6. The overall view molecules computation dissemination Check results
  • 7. Components of System
    • Workflow for management of jobs (Taverna)
    • Natural Language Processing based parsing of outputs (JUMBOMarker)
    • Pairwise comparison of data sets (R)
    • Analysis of mean and variance
    • Detection and analysis of outliers
  • 8. Computing the NCI database MOPAC PM5 a a MOPAC PM5 – collaboration with J.J.P. Stewart
  • 9. Protocol Log Files Parse System Crashes Science Errors Analysis Pathological Behaviour Statistics Other Science Disseminate Results Unsuitable Data Program Crashes Inform Developer
  • 10. Taverna
    • Workflow programs allow a series of small tasks to be linked together to develop more complex tasks
    • Open Source
    • myGRID, eScience
    • European Bioinformatics Institute
    • University of Manchester
  • 11. An Example Taverna Workflow
  • 12. Parsing Log Files to CML Coordinates Molecular Formula Calculation Type Point Group Dipole Total Energy Computational Chemistry Log Files
  • 13. CompChem Output Coordinates Energy Levels Vibrations Coordinates Energy Level Vibration CML File CMLCore CMLCore CMLComp CMLSpect Input/jobControl General Parsers
  • 14. Dissemination of results LOG FILE CML FILE HUMAN DISPLAY WWMM* Server and DSpace Outside world JUMBOMarker NLP-based log file parser * World Wide Molecular Matrix
  • 15. InChI: IUPAC International Chemical Identifier
      • A non-proprietary unique identifier for the representation of chemical structures.
      • A normal, canonicalised and serialised form of a chemical connection table.
      • InChI FAQ: http://wwmm.ch.cam.ac.uk/inchifaq/
  • 16. Proteus molecules * Calculation JUNK Cured by MOPAC * Proteus was a shape changing ocean deity
  • 17. Proteus molecules Calculation Input JUNK
  • 18. How do we know our results are valid? Computational Method 1 Computational Method 2 Experiment
  • 19. J.J.P. Stewart’s example Calculated  H f – Expt  H f
  • 20. GAMESS MOPAC results GAMESS a 631G* B3LYP Log Files a Project with Kim Baldridge and Wibke Sudholt
  • 21. Protocol Log Files Parse System Crashes Science Errors Analysis Pathological Behaviour Statistics Other Science Disseminate Results Unsuitable Data Program Crashes Inform Developer
  • 22. Repeat runs, different methods Multiple runs give same final structure from same input Changing memory allocation doesn’t make a difference
  • 23. Pathological behaviour - Early detection 100 min 631G*, B3LYP 200 min 15 min 631G*, B3LYP 10080 min divinyl ether trans-Crotonaldehyde Z matrix
  • 24. Times to run jobs
  • 25. Analysis of different computational methods Mean - Overall difference Normality - Distribution of values Outliers - Unusual molecules? Variance - Spread of the data, depends on both distributions. (standard deviation)
  • 26. Probability Plot (Normal QQ plot)
  • 27. Mean of distribution (Approx - 0.03 Å ) Range over which sample distribution is approximately normal Outliers Probability Plot (Normal QQ plot) S.D. 0.020 Å
  • 28. All bonds*  r (MOPAC – GAMESS) / Å * Excludes bonds to Hydrogenc
  • 29. All bonds*  r (MOPAC – GAMESS) / Å Good agreement Nearly normal Outliers S.D. 0.005 Å * Excludes bonds to Hydrogenc
  • 30. 2- Bad molecules and data usually cause outliers Na P O O H H
  • 31. Mean  r (M - G) / Å Standard Error of the Mean / Å All values given to 3 significant figures   C N O F S Cl C -0.006 0.020 -0.010 -0.014 -0.040 -0.037 0.000 0.000 0.000 0.001 0.001 0.001 N   0.006 -0.037   -0.055     0.001 0.001   0.009   O     -0.087   -0.070       0.004   0.014  
  • 32.  r CC bonds (M - G) / Å
  • 33.  r CC bonds (M - G) / Å Good agreement Nearly normal Outliers S.D. 0.013 Å JUNK
  • 34. Selection of molecules with C C  r (M - G) > 0.05 Angstroms
  • 35. Y = 0.0277 X – 0.0061 Non aromatic C C bonds adjacent to CF n
  • 36.  r NN bonds (M - G) / Å
  • 37. Good agreement Nearly normal Kink S.D. 0.022 Å  r NN bonds (M - G) / Å
  • 38. Density plot of  r NN bonds (M - G) / Å
  • 39. LEFT RIGHT Density plot of  r NN bonds (M - G) / Å
  • 40. Most common fragments found in Left set but not Right set C(sp 3 ) C(sp 3 ) (sp 3 ) S(sp 2 ) N(ar) N (ar) C(sp 2 ) S(sp 2 ) N(ar) N (ar) C(sp 2 ) Or
  • 41. GAMESS Log Files Comparison of theory and experiment CIF* CIF* CIF* CIF* CIF* CIF 2 CML * CIF: Crystallographic Information File
  • 42. Reading Acta Crystallographica Section E
  • 43. All bonds*  r (Cryst. – GAMESS) / Å Single molecules, no disorder * Excludes bonds to Hydrogenc
  • 44. All bonds*  r (Cryst. – GAMESS) / Å Single molecules, no disorder Mean  r - 0.011 Å Nearly normal Outliers S.D. 0.014 Å * Excludes bonds to Hydrogenc
  • 45.  r CC bonds (C – G) / Å
  • 46. Mean  r - 0.01 Å Nearly normal S.D. 0.009 Å  r CC bonds (C – G) / Å
  • 47.  r CO bonds (C – G) / Å
  • 48. Good agreement Nearly normal Outliers ? S.D. 0.011 Å  r CO bonds (C – G) / Å
  • 49.  r = +0.08 Å Chemistry can cause outliers H movement
  • 50. Conclusions
    • Protocols can be automated
    • Machines can highlight unusual behaviour,
    • geometries and distribution of results for
    • humans to consider
    • Computational programs can provide high
    • quality “experimental” molecular properties
  • 51. Thanks J.J.P. Stewart Kim Baldridge Wibke Sudholt Simon Tyrrell Yong Zhang Peter Murray-Rust Unilever
  • 52. Questions Homepage: http://wwmm.ch.cam.ac.uk InChI FAQ: http://wwmm.ch.cam.ac.uk/inchifaq R: http:// www.r-project.org Taverna: http://taverna.sourceforge.net/ MOPAC 2002: http://www.cachesoftware.com/mopac/ GAMESS: http:// www.msg.ameslab.gov/GAMESS/GAMESS.html

×