Enc07 Neutral Network Algorithms 070420

485 views
461 views

Published on

In silico prediction of small molecules properties is widely used in todays industry and academia. Particularly, NMR spectra are predicted by a variety of software packages. Two main approaches are used:
Database-based. Compounds are compared against a database, the result is calculated using data for close structural relatives found in the dataset.
Regression-based. Experimental database is used to calculate parameters of non-linear regression. Chemical shift is represented as a non-linear function of some variables which describe characteristic features of a molecule of interest.

Two outlined approaches require different strategies for further improvement. Database-based results are improved by acquiring larger database and/or including data for user-specific data into calculation.

Published in: Technology
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
485
On SlideShare
0
From Embeds
0
Number of Embeds
1
Actions
Shares
0
Downloads
18
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide
  • Speed is not much of an advantage for someone doing one calculation at a time, but is of tremendous benefit to those who need to do batch calculations for many structures
  • Speed is not much of an advantage for someone doing one calculation at a time, but is of tremendous benefit to those who need to do batch calculations for many structures
  • The structure of a simple neural net. The input layer is fed with N inputs, then the values are transformed by the hidden layer and the output neuron produces the final output value.
  • The hierarchical structure of the input vectors used in the current study. Spheres are numbered with Roman numerals, each consisting of 32 cells filled with counts of the substituents. The third sphere is expanded into three to take into account the double bond geometry. CI stands for “Cross-Increments”. These are additional inputs used for the rules-based calculations.
  • Screen shot series of accessing Neural Network Predictions in v10
  • Screen shot series of accessing Neural Network Predictions in v10
  • Enc07 Neutral Network Algorithms 070420

    1. 1. Advancements in NMR Predictions- Neural Network vs. HOSE Code Algorithms Brent Lefebvre NMR Product Manager ACD/Labs’ ENC User’s Meeting April 21, 2007
    2. 2. New Outline <ul><li>Impetus behind us using NN Algorithms </li></ul><ul><li>Implementation of NN approach </li></ul><ul><li>Reasons why this is better than past approaches </li></ul><ul><li>How are they accessed in the software </li></ul><ul><li>Statistical comparisons </li></ul><ul><li>The future of NN and HOSE code predictors </li></ul>
    3. 3. Why Neural Networks? <ul><li>The Neural Network algorithm offers a very specific advantage </li></ul><ul><ul><li>Speed of calculation is hundreds of times faster </li></ul></ul><ul><ul><li>This enables prediction on-the-fly </li></ul></ul><ul><ul><ul><li>For Structure Elucidator, a key feature </li></ul></ul></ul>
    4. 4. Why Neural Networks? <ul><li>Also a fresh approach for ACD/Labs to shift prediction </li></ul><ul><li>We are always researching new ways to improve our software </li></ul><ul><ul><li>Also see our poster (#150) on our new increments scheme </li></ul></ul>
    5. 5. Realization <ul><li>The Neural Network algorithm was outperforming our version 9 HOSE code! </li></ul><ul><li>Steps were then taken to migrate this algorithm out of Structure Elucidator and into the ACD/CNMR Predictor </li></ul>
    6. 6. Implementation
    7. 7. Neural Network Algorithm
    8. 8. Implementation <ul><li>Training the Neural Net </li></ul><ul><ul><li>Entire database from version 9 used </li></ul></ul><ul><ul><li>Additional database of 187,000 shifts used for accuracy testing </li></ul></ul>
    9. 9. Neural Network Approach <ul><li>How does this neural net implementation compare to others in the industry? </li></ul><ul><li>What is unique about it? </li></ul><ul><li>Does this make it better or worse? </li></ul>
    10. 10. Neural Network Approach <ul><li>Our research brought us to some new conclusions </li></ul><ul><li>Some implementation details differed from previous industry attempts </li></ul>
    11. 11. Neural Network Approach <ul><li>We found that: </li></ul><ul><ul><ul><li>Characteristics of the Neural Net were NOT the most important factor </li></ul></ul></ul><ul><ul><ul><li>Structure encoding scheme was most important </li></ul></ul></ul><ul><ul><ul><li>Size and accuracy of training set is key </li></ul></ul></ul><ul><ul><ul><ul><li>Our huge quality checked database gave us a tremendous advantage </li></ul></ul></ul></ul>
    12. 12. Using the Neural Network Predictions <ul><li>How are they accessed in the software? </li></ul>
    13. 13. Using the Neural Network Predictions
    14. 14. Using the Neural Network Predictions
    15. 15. Limitations of the Neural Network Predictions <ul><li>Predictions are a black box </li></ul><ul><ul><li>No calculation protocol as for HOSE code </li></ul></ul><ul><li>Training of predictions could be possible </li></ul><ul><ul><li>Does not outperform HOSE code training </li></ul></ul>
    16. 16. Statistics <ul><li>How do NN compare to old and new HOSE code? </li></ul><ul><li>When should I use NN? </li></ul><ul><li>What is the new performance? </li></ul>
    17. 17. Prediction Accuracy <ul><li>We calculate our prediction accuracy for HOSE code the same way every year </li></ul><ul><ul><li>A “Leave-one-out” analysis of our entire database (2 million chemical shifts) </li></ul></ul><ul><li>This allows us to compare year on year improvement </li></ul><ul><li>A TRUE analysis of how accurate the predictors are </li></ul>
    18. 18. L-O-O Analysis Version 8.00 Version 10.05
    19. 19. Prediction Accuracy <ul><li>Standard Error of Prediction Formula: </li></ul>
    20. 20. Prediction Accuracy <ul><li>CNMR Predictor Standard Error </li></ul><ul><ul><li>Version 8 - 3.11 ppm </li></ul></ul><ul><ul><li>Version 9 - 2.32 ppm </li></ul></ul><ul><ul><li>Version 10.00 - 2.26 ppm </li></ul></ul><ul><ul><li>Version 10.05 – 1.84 ppm </li></ul></ul><ul><ul><ul><li>A 21% increase in accuracy over version 9! </li></ul></ul></ul><ul><ul><ul><li>A 41% increase in accuracy over version 8! </li></ul></ul></ul>
    21. 21. Prediction Accuracy <ul><li>Comparison of HOSE and Neural Network </li></ul><ul><ul><li>>187,000 chemical shifts used in test </li></ul></ul><ul><ul><li>NN algorithms- 12% accuracy increase over version 9 HOSE Code </li></ul></ul><ul><ul><li>Version 10 HOSE code- 16% accuracy increase over version 9 HOSE code </li></ul></ul><ul><li>HOSE Code is better for now </li></ul>
    22. 22. The Future of Neural Nets <ul><li>What is planned for NMR Predictors? </li></ul><ul><li>How do Neural Networks fit into these plans? </li></ul>
    23. 23. The Future of Neural Nets <ul><li>Version 11 will further integrate the Neural Network Algorithm </li></ul><ul><ul><li>An intelligent hybrid approach </li></ul></ul><ul><ul><li>Much like the use of incremental scheme today </li></ul></ul><ul><li>Stay tuned for more validation results </li></ul><ul><ul><li>1 H NMR validation study </li></ul></ul>
    24. 24. Acknowledgements <ul><li>Kirill Blinov </li></ul><ul><li>Mikhail Kvasha </li></ul><ul><li>Marina Solnetseva and the database team </li></ul><ul><li>Ryan Sasaki </li></ul>

    ×