EUGM 2013 - Peter Englert, Peter Kovacs (ChemAxon) - The Next Generation of Maximum Common Substructure Search at ChemAxon

  • 1,738 views
Uploaded on

Finding the maximum common substructure (MCS) among molecules has many applications in the field of cheminformatics. It can be used in automated reaction mapping and molecule alignment, and it is also …

Finding the maximum common substructure (MCS) among molecules has many applications in the field of cheminformatics. It can be used in automated reaction mapping and molecule alignment, and it is also a popular measure of similarity, just to name a few. However, the complexity of the problem makes finding the exact MCS of two molecules too time-consuming for most use cases. At ChemAxon, we employ approximation algorithms in MCS search, but building one which is both practically fast and accurate is still a challenge. To meet the increasing demands on effective MCS search, we have taken our previous approach and applied different heuristics to further improve its speed and accuracy. The results are very promising, both in large-scale tests involving thousands of different compounds, and in application-specific cases.

  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Be the first to comment
    Be the first to like this
No Downloads

Views

Total Views
1,738
On Slideshare
0
From Embeds
0
Number of Embeds
2

Actions

Shares
Downloads
0
Comments
0
Likes
0

Embeds 0

No embeds

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
    No notes for slide

Transcript

  • 1. The Next Generation ofMCS Search at ChemAxonPéter Englert
  • 2. How similar are two molecules?Method 10.71Method 20.47...What do these numbers mean?Method X0.YZ
  • 3. Structural SimilarityWhat do these molecules have in common?Resulting Tanimoto similarity: 0.615Similar property principle
  • 4. A more complicated example
  • 5. A more complicated example
  • 6. Maximum Common Substructure• Many applications‐ Similarity Search‐ Clustering‐ Reaction mapping‐ Molecule alignment• A complex problem‐ Solution often approximated
  • 7. ChemAxon solutions• 2004, JChem 2.3‐ Backtracking algorithm‐ Connected MCS only• 2010, JChem 5.4‐ Efficient heuristics‐ Max-clique search based• 2013, JChem 6.0‐ Improved MCS module
  • 8. The new MCS module• Improved accuracy and run time• Reduced memory usage• Reduced fragmentation• Many features‐ Connected/disconnected‐ Generic atom/bond handling‐ Multiple results‐ Ring matching (JChem 6.1)
  • 9. Applications – 3D Alignment
  • 10. Applications – Reaction Mapping
  • 11. Applications – Library MCS
  • 12. Improvements• Better accuracy• Improved running time• Reduced memory usageHow much improvement?Major improvements
  • 13. Accuracy
  • 14. Running time
  • 15. Extensive tests
  • 16. Memory usageTested on “hard” cases,graphene-like structures
  • 17. ExamplesResult: 47 bonds, 1 fragment~20 minutesconnectedJChem 2.3
  • 18. ExamplesResult: 83 bonds, 8 fragments~2.5 secondsfragmentedJChem 5.12
  • 19. Examples~0.2 secondsoptimalResult: 92 bonds, 2 fragmentsJChem 6.0
  • 20. ExamplesJChem 5.12JChem 6.0Reducedfragmentation7 fragments3 fragmentsquery targetquery target
  • 21. ExamplesJChem 5.12JChem 6.022 bonds92 bondsquery targetquery targetConnected modeimprovement
  • 22. ExamplesJChem 5.12JChem 6.0query targetquery targetGoodbyeexample
  • 23. SummaryWe have substantially improved our MCS solutionbased on feedback from the previous versionsAcknowledgements:Thank you for yourkind attention!• JChem Base team• Péter Kovács• Miklós Vargyas