Clustering Made Human: US UGM 2008

Loading...

Flash Player 9 (or above) is needed to view presentations.
We have detected that you do not have it on your computer. To install it, go here.

0 comments

Post a comment

    Post a comment
    Embed Video
    Edit your comment Cancel

    Favorites, Groups & Events

    Clustering Made Human: US UGM 2008 - Presentation Transcript

    1. Clustering made human Miklos Vargyas •Solutions for Cheminformatics
    2. Cluster in computing Computer cluster 3
    3. Cluster in Chemistry Transition metal carbonyl clusters Dimanganese-decacarbonyl di-tungsten tetra(hpp) Transition metal halide clusters Boron hydrides Gas-phase clusters and fullerenes 4
    4. Cluster in Chemistry/Physics Nanoscale particles • Fullerenes • Nano machines Images produced by MarvinSpace 5
    5. Star cluster gravitationally bound groups of stars Image from Wikipedia, the free encyclopedia 6
    6. Clustering cars Live demonstration Group by property • Shape, size, type, brand, colour • Many possible arrangement, multiple aspects Group by similarity • Categorial perception 7
    7. Why is clustering stars easy? God did the job for us! • Stars have an apparent spatial arrangement • Distance between stars defines clusters 8
    8. Why is clustering cars hard? Lack of innate spatial arrangement • Artificial arrangement • Various approaches, no superior one • “Cars come in all shapes and sizes” Problem of dimensionality • Why 2?! 9
    9. So what about Molecules Are they like stars or rather like cars? • They come in all shapes and sizes • Vast number of properties Chemical spaces • Select molecular properties • Estimate or measure them • Use them as coordinates • Place your molecules as points in this abstract space • Group that are close to each other to form clusters 10
    10. Example in 2D 11
    11. Further attempts in 2D 300 250 200 logP 150 100 50 300 0 0 200 400 600 800 1000 250 tpsa 200 mass 150 100 50 0 -2 0 2 4 6 8 10 12 tpsa 12
    12. Molecule clusters by similarity Jarvis-Patrick clustering • Fast SC1000.cfp -m 0 -f 1024 -t 0.6 -c jarp -i 0.1 • Tanimoto -o SC1000.jarp.t0.6.c0.1 –g -y -z similarity • Globular clusters Number of objects = 999 • Tendency to create large singletons) = Number of clusters (without number of 2 singletons Number of singletons = 8 • Molecular properties & fingerprint Average dissimilarity = 0.66208726 Minimum dissimilarity = 0.0 Maximum dissimilarity = 0.9411765 13
    13. Parameter tuning t c Clusters singletons 0.6 0.1 2 8 0.3 0.1 179 248 0.5 0.1 7 36 14
    14. The most populated cluster 15
    15. Parameter tuning t c Clusters singletons 0.6 0.1 2 8 0.3 0.1 179 248 0.5 0.1 7 36 0.5 0.5 10 37 0.5 0.8 81 115 16
    16. Another cluster 17
    17. So what’s wrong with that? 1. manual tuning 2. lack of interpretability 3. need: 4. automated (unsupervised) techniques 5. easy to grasp simple to understand “explanations” 6. one possible solutions: MCS based clustering 18
    18. Maximum Common Substructure Largest substructure shared by two molecules MCS Simple concept! More human, visual. Yet hard (= expensive (= slow)) to compute.. 19
    19. MCS of a structure set 20
    20. Hierarchical star clusters star 21
    21. Hierarchical star clusters star cluster • star 22
    22. Hierarchical star clusters galaxy • star cluster – star 23
    23. Hierarchical star clusters local group • galaxy – star cluster  star 24
    24. Hierarchical star clusters supercluster • cluster – local group  galaxy » star cluster 25
    25. Visualisation of hierarchy Dendrogram 26
    26. Hierarchical MCS 27
    27. Intuitive visualisation 28
    28. SAR table view 29
    29. R-group deconvolusion 30
    30. Speed-up achieved last year 4000 3500 2006 3000 2007 Linear (2007) Running time (sec) 2500 2000 1500 1000 500 0 -500 0 5000 10000 15000 20000 25000 30000 35000 Structure count Presented at UGM’07 31
    31. Speed-up achieved this year 4000 3500 2006 3000 2007 2008 Running time (sec) 2500 2000 1500 1000 500 0 0 5000 10000 15000 20000 25000 30000 35000 Structure count 32
    32. Speed-up this year 10000 1000 Running time (sec) 100 2006 2007 2008 10 1 0.1 0 5000 10000 15000 20000 25000 30000 35000 Structure count 33
    33. Clustering performance comparison 90 80 LibraryMCS Running time (min) 70 60 Jarvis-Patrick Ward-Murtagh 50 40 30 20 10 0 0 20000 40000 60000 80000 100000 120000 Structure count 34
    34. Find out more Product descriptions & links www.chemaxon.com/products.html Forum www.chemaxon.com/forum Presentations and posters www.chemaxon.com/conf Download www.chemaxon.com/downlo ad.html 35

    + ChemAxonChemAxon, 2 years ago

    custom

    473 views, 0 favs, 0 embeds more stats

    Clustering chemical structures alleviates the tedio more

    More info about this document

    © All Rights Reserved

    Go to text version

    • Total Views 473
      • 473 on SlideShare
      • 0 from embeds
    • Comments 0
    • Favorites 0
    • Downloads 1
    Most viewed embeds

    more

    All embeds

    less

    Flagged as inappropriate Flag as inappropriate
    Flag as inappropriate

    Select your reason for flagging this presentation as inappropriate. If needed, use the feedback form to let us know more details.

    Cancel
    File a copyright complaint
    Having problems? Go to our helpdesk?

    Categories