This document discusses using clustering techniques to group molecules based on their structural similarity. It provides examples of hierarchical clustering of molecules using their maximum common substructure (MCS). MCS clustering allows intuitive visualization of molecule hierarchies and relationships. The document compares the performance of different clustering algorithms and notes that MCS-based clustering provides a more interpretable and human-understandable approach compared to other methods.
6. Clustering cars
Live demonstration
Group by property
• Shape, size, type, brand, colour
• Many possible arrangement, multiple aspects
Group by similarity
• Categorial perception
7
7. Why is clustering stars easy?
God did the job for us!
• Stars have an apparent spatial arrangement
• Distance between stars defines clusters
8
8. Why is clustering cars hard?
Lack of innate spatial arrangement
• Artificial arrangement
• Various approaches, no superior one
• “Cars come in all shapes and sizes”
Problem of dimensionality
• Why 2?!
9
9. So what about Molecules
Are they like stars or rather like cars?
• They come in all shapes and sizes
• Vast number of properties
Chemical spaces
• Select molecular properties
• Estimate or measure them
• Use them as coordinates
• Place your molecules as points in this abstract space
• Group that are close to each other to form clusters
10
17. So what’s wrong with that?
1. manual tuning
2. lack of interpretability
3. need:
4. automated (unsupervised) techniques
5. easy to grasp simple to understand “explanations”
6. one possible solutions: MCS based clustering
18
18. Maximum Common Substructure
Largest substructure shared by two molecules
MCS
Simple concept! More human, visual.
Yet hard (= expensive (= slow)) to compute..
19
34. Find out more
Product descriptions & links
www.chemaxon.com/products.html
Forum
www.chemaxon.com/forum
Presentations and posters
www.chemaxon.com/conf
Download
www.chemaxon.com/downlo
ad.html
35