The Next Generation ofMCS Search at ChemAxonPéter Englert
How similar are two molecules?Method 10.71Method 20.47...What do these numbers mean?Method X0.YZ
Structural SimilarityWhat do these molecules have in common?Resulting Tanimoto similarity: 0.615Similar property principle
A more complicated example
A more complicated example
Maximum Common Substructure• Many applications‐ Similarity Search‐ Clustering‐ Reaction mapping‐ Molecule alignment• A com...
ChemAxon solutions• 2004, JChem 2.3‐ Backtracking algorithm‐ Connected MCS only• 2010, JChem 5.4‐ Efficient heuristics‐ Ma...
The new MCS module• Improved accuracy and run time• Reduced memory usage• Reduced fragmentation• Many features‐ Connected/...
Applications – 3D Alignment
Applications – Reaction Mapping
Applications – Library MCS
Improvements• Better accuracy• Improved running time• Reduced memory usageHow much improvement?Major improvements
Accuracy
Running time
Extensive tests
Memory usageTested on “hard” cases,graphene-like structures
ExamplesResult: 47 bonds, 1 fragment~20 minutesconnectedJChem 2.3
ExamplesResult: 83 bonds, 8 fragments~2.5 secondsfragmentedJChem 5.12
Examples~0.2 secondsoptimalResult: 92 bonds, 2 fragmentsJChem 6.0
ExamplesJChem 5.12JChem 6.0Reducedfragmentation7 fragments3 fragmentsquery targetquery target
ExamplesJChem 5.12JChem 6.022 bonds92 bondsquery targetquery targetConnected modeimprovement
ExamplesJChem 5.12JChem 6.0query targetquery targetGoodbyeexample
SummaryWe have substantially improved our MCS solutionbased on feedback from the previous versionsAcknowledgements:Thank y...
Upcoming SlideShare
Loading in …5
×

EUGM 2013 - Peter Englert, Peter Kovacs (ChemAxon) - The Next Generation of Maximum Common Substructure Search at ChemAxon

2,240 views
2,180 views

Published on

Finding the maximum common substructure (MCS) among molecules has many applications in the field of cheminformatics. It can be used in automated reaction mapping and molecule alignment, and it is also a popular measure of similarity, just to name a few. However, the complexity of the problem makes finding the exact MCS of two molecules too time-consuming for most use cases. At ChemAxon, we employ approximation algorithms in MCS search, but building one which is both practically fast and accurate is still a challenge. To meet the increasing demands on effective MCS search, we have taken our previous approach and applied different heuristics to further improve its speed and accuracy. The results are very promising, both in large-scale tests involving thousands of different compounds, and in application-specific cases.

0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
2,240
On SlideShare
0
From Embeds
0
Number of Embeds
1,687
Actions
Shares
0
Downloads
0
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

EUGM 2013 - Peter Englert, Peter Kovacs (ChemAxon) - The Next Generation of Maximum Common Substructure Search at ChemAxon

  1. 1. The Next Generation ofMCS Search at ChemAxonPéter Englert
  2. 2. How similar are two molecules?Method 10.71Method 20.47...What do these numbers mean?Method X0.YZ
  3. 3. Structural SimilarityWhat do these molecules have in common?Resulting Tanimoto similarity: 0.615Similar property principle
  4. 4. A more complicated example
  5. 5. A more complicated example
  6. 6. Maximum Common Substructure• Many applications‐ Similarity Search‐ Clustering‐ Reaction mapping‐ Molecule alignment• A complex problem‐ Solution often approximated
  7. 7. ChemAxon solutions• 2004, JChem 2.3‐ Backtracking algorithm‐ Connected MCS only• 2010, JChem 5.4‐ Efficient heuristics‐ Max-clique search based• 2013, JChem 6.0‐ Improved MCS module
  8. 8. The new MCS module• Improved accuracy and run time• Reduced memory usage• Reduced fragmentation• Many features‐ Connected/disconnected‐ Generic atom/bond handling‐ Multiple results‐ Ring matching (JChem 6.1)
  9. 9. Applications – 3D Alignment
  10. 10. Applications – Reaction Mapping
  11. 11. Applications – Library MCS
  12. 12. Improvements• Better accuracy• Improved running time• Reduced memory usageHow much improvement?Major improvements
  13. 13. Accuracy
  14. 14. Running time
  15. 15. Extensive tests
  16. 16. Memory usageTested on “hard” cases,graphene-like structures
  17. 17. ExamplesResult: 47 bonds, 1 fragment~20 minutesconnectedJChem 2.3
  18. 18. ExamplesResult: 83 bonds, 8 fragments~2.5 secondsfragmentedJChem 5.12
  19. 19. Examples~0.2 secondsoptimalResult: 92 bonds, 2 fragmentsJChem 6.0
  20. 20. ExamplesJChem 5.12JChem 6.0Reducedfragmentation7 fragments3 fragmentsquery targetquery target
  21. 21. ExamplesJChem 5.12JChem 6.022 bonds92 bondsquery targetquery targetConnected modeimprovement
  22. 22. ExamplesJChem 5.12JChem 6.0query targetquery targetGoodbyeexample
  23. 23. SummaryWe have substantially improved our MCS solutionbased on feedback from the previous versionsAcknowledgements:Thank you for yourkind attention!• JChem Base team• Péter Kovács• Miklós Vargyas

×