0
Interactive Datamining of Large-Scale           Screening Datasets                                     Frank Oellien, Wolf...
Overview Multi-variate and multi-dimensional datasets • Motivation • Information Visualization Techniques • Examples (Chem...
Overview Multi-variate and multi-dimensional datasets • Motivation • Information Visualization Techniques • Examples (Chem...
Chemical              data1800000016000000                               Merck Katalog14000000                            ...
Multi-Variate and Multi-Dimensional               Numeric Datasets Today Change in chemical synthesis technology • new tec...
Tools for Interactive Visualization of    Multi-Variate and Multi-Dimensional DataStandard applications   • barchart, 2D a...
Overview Multi-variate and multi-dimensional datasets • Motivation • Information Visualization Techniques • Examples (Chem...
3D Tools for Interactive                       Information Visualization Information Visualization Applications that uses ...
Glyph-based InfVis Tools • 3 orthogonal axes • color • shape • size • transparency • surface effects • animation • up to ~...
Java/Java3D InfVis AppletJava3D                                           Tool PanelCanvas                                ...
Java/Java3D InfVis Applet                            3D Render Panel                3D Glyphs                 3D BarchartC...
Java/Java3D InfVis Applet                              3D Tool Panel                Dynamic Filter Tools                  ...
Java/Java3D InfVis Applet                            3D Control PanelC  3© Oellien, Ihlenfeldt, Engel, Ertl             MM...
Advantages of Volume-based InfVis ToolsDatabases with millions of data points      – Glyph-based InfVis approaches        ...
Overview Multi-variate and multi-dimensional datasets • Motivation • Information Visualization Techniques • Examples (Chem...
ChemCodes Reaction Database • 100 most important FGs ~75% chemistry • 100 standard reactions • Limits of standard reaction...
ChemCodes - Reaction Optimization I • Goal:   Reaction Optimization: > 95% Yield • 7 Dimensions:   reagent, solvent,   tim...
ChemCodes - Reaction Optimization IIC  3© Oellien, Ihlenfeldt, Engel, Ertl   MMWS 2002
ChemCodes - Reaction PlanningFunctionalGroupCompatibilityCheck          H           H                NH     OC  3© Oellien...
Example 2: NCI Anti-tumor                       / Anti-viral Database • Initiated in April 1990 (modified 1994) • ~ 250.00...
Lead Compound Discovery IIC  3© Oellien, Ihlenfeldt, Engel, Ertl        MMWS 2002
Lead Compound Discovery IIC  3© Oellien, Ihlenfeldt, Engel, Ertl        MMWS 2002
Overview Multi-variate and multi-dimensional datasets • Motivation • Information Visualization Techniques • Examples (Chem...
Acknowledgment • Prof. Johann Gasteiger   Computer-Chemie-Centrum   University of Erlangen-Nuremberg • Prof. Thomas Ertl, ...
Upcoming SlideShare
Loading in...5
×

Interactive Datamining of Large-Scale Screening Datasets

169

Published on

16th Darmstädter Molecular Modeling Workshop, Darmstadt, Germany, 2002

0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
169
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
2
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Transcript of "Interactive Datamining of Large-Scale Screening Datasets"

  1. 1. Interactive Datamining of Large-Scale Screening Datasets Frank Oellien, Wolf D. Ihlenfeldt Computer-Chemie-Centrum University Erlangen-Nuremberg Klaus Engel, Thomas Ertl Visualization and Interactive Systems Group University StuttgartC 3© Oellien, Ihlenfeldt, Engel, Ertl MMWS 2002
  2. 2. Overview Multi-variate and multi-dimensional datasets • Motivation • Information Visualization Techniques • Examples (ChemCodes Inc., NCI) • DemoC 3© Oellien, Ihlenfeldt, Engel, Ertl MMWS 2002
  3. 3. Overview Multi-variate and multi-dimensional datasets • Motivation • Information Visualization Techniques • Examples (ChemCodes Inc., NCI) • DemoC 3© Oellien, Ihlenfeldt, Engel, Ertl MMWS 2002
  4. 4. Chemical data1800000016000000 Merck Katalog14000000 Synopsys PG12000000 ACX NCI DTP10000000 ChemInform8000000 Spresi6000000 Beilstein4000000 CAS2000000 Current datasets 0 C 3 © Oellien, Ihlenfeldt, Engel, Ertl MMWS 2002
  5. 5. Multi-Variate and Multi-Dimensional Numeric Datasets Today Change in chemical synthesis technology • new technologies (HTS, combinatorial synthesis) → experiments generate terabytes of data per year • development of data mining and visualization tools could not keep pace • most critical bottleneck in R&D today ! → tools for interactive mining and information visualization are neededC 3© Oellien, Ihlenfeldt, Engel, Ertl MMWS 2002
  6. 6. Tools for Interactive Visualization of Multi-Variate and Multi-Dimensional DataStandard applications • barchart, 2D and pseudo 3D scatter plots, molecular spreadsheets • limited to small subsets • platform-dependentOur goal: applications that are • simple to use • allow straightforward interpretation of results • generalized access to tabular numeric data • platform-independent C 3 © Oellien, Ihlenfeldt, Engel, Ertl MMWS 2002
  7. 7. Overview Multi-variate and multi-dimensional datasets • Motivation • Information Visualization Techniques • Examples (ChemCodes Inc., NCI) • DemoC 3© Oellien, Ihlenfeldt, Engel, Ertl MMWS 2002
  8. 8. 3D Tools for Interactive Information Visualization Information Visualization Applications that uses 3D capabilities of modern clients • Glyph-based InfVis approaches • Volume-based InfVis approachesC 3© Oellien, Ihlenfeldt, Engel, Ertl MMWS 2002
  9. 9. Glyph-based InfVis Tools • 3 orthogonal axes • color • shape • size • transparency • surface effects • animation • up to ~100 GlyphsC 3© Oellien, Ihlenfeldt, Engel, Ertl MMWS 2002
  10. 10. Java/Java3D InfVis AppletJava3D Tool PanelCanvas (filters, selection tools, details)ControlPanelC 3© Oellien, Ihlenfeldt, Engel, Ertl MMWS 2002
  11. 11. Java/Java3D InfVis Applet 3D Render Panel 3D Glyphs 3D BarchartC 3© Oellien, Ihlenfeldt, Engel, Ertl MMWS 2002
  12. 12. Java/Java3D InfVis Applet 3D Tool Panel Dynamic Filter Tools Selection Tools Detail ToolsC 3© Oellien, Ihlenfeldt, Engel, Ertl MMWS 2002
  13. 13. Java/Java3D InfVis Applet 3D Control PanelC 3© Oellien, Ihlenfeldt, Engel, Ertl MMWS 2002
  14. 14. Advantages of Volume-based InfVis ToolsDatabases with millions of data points – Glyph-based InfVis approaches • produce millions of geometric primitives • interactive visualization not possible – Volume-based InfVis approaches • can handle large number of data points • interactive visualization using low-cost graphics hardware is possibleC 3© Oellien, Ihlenfeldt, Engel, Ertl MMWS 2002
  15. 15. Overview Multi-variate and multi-dimensional datasets • Motivation • Information Visualization Techniques • Examples (ChemCodes Inc., NCI) • DemoC 3© Oellien, Ihlenfeldt, Engel, Ertl MMWS 2002
  16. 16. ChemCodes Reaction Database • 100 most important FGs ~75% chemistry • 100 standard reactions • Limits of standard reactions • Functional Group Compatibility • Generating Rules Goal: Analysis of the reaction spaceC 3© Oellien, Ihlenfeldt, Engel, Ertl MMWS 2002
  17. 17. ChemCodes - Reaction Optimization I • Goal: Reaction Optimization: > 95% Yield • 7 Dimensions: reagent, solvent, time, temperature, stoichiometry, reagent order, FG-compatibilityC 3© Oellien, Ihlenfeldt, Engel, Ertl MMWS 2002
  18. 18. ChemCodes - Reaction Optimization IIC 3© Oellien, Ihlenfeldt, Engel, Ertl MMWS 2002
  19. 19. ChemCodes - Reaction PlanningFunctionalGroupCompatibilityCheck H H NH OC 3© Oellien, Ihlenfeldt, Engel, Ertl MMWS 2002
  20. 20. Example 2: NCI Anti-tumor / Anti-viral Database • Initiated in April 1990 (modified 1994) • ~ 250.000 compounds • ~ 30.000 with anti-tumor screening data Enhanced NCI Database Browser • > 30 different molecular properties • up to 23 3D conformers per compoundC 3© Oellien, Ihlenfeldt, Engel, Ertl MMWS 2002
  21. 21. Lead Compound Discovery IIC 3© Oellien, Ihlenfeldt, Engel, Ertl MMWS 2002
  22. 22. Lead Compound Discovery IIC 3© Oellien, Ihlenfeldt, Engel, Ertl MMWS 2002
  23. 23. Overview Multi-variate and multi-dimensional datasets • Motivation • Information Visualization Techniques • Examples (ChemCodes Inc., NCI) • DemoC 3© Oellien, Ihlenfeldt, Engel, Ertl MMWS 2002
  24. 24. Acknowledgment • Prof. Johann Gasteiger Computer-Chemie-Centrum University of Erlangen-Nuremberg • Prof. Thomas Ertl, Dipl. Inf. Klaus Engel Visualization and interactive Systems University of Stuttgart • Dr. Patrick Kiser, Dr. Gary Eichenbaum ChemCodes Inc. • Marc Nicklaus Laboratory of Medicinal Chemistry NCI, NIH • Deutsche ForschungsgemeinschaftC 3© Oellien, Ihlenfeldt, Engel, Ertl MMWS 2002
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.

×