Online Chemical Modeling Environment: Models

732 views

Published on

AACIMP 2009 Summer School lecture by Yuriy Sushko and Sergii Novotarskyi. "Environmental Chemoinfornatics" course.

Published in: Education, Technology, Business
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
732
On SlideShare
0
From Embeds
0
Number of Embeds
102
Actions
Shares
0
Downloads
6
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Online Chemical Modeling Environment: Models

  1. 1. Online chemical modeling environment: models Iurii Sushko, Sergey Novotarskiy Thursday, August 13, 2009
  2. 2. Existent alternatives Classical approach: Weka, R, Mathematica Advantages: 1. Most flexible 2. Suitable for research and deep analysis Disadvantages: 1. It’s complex: suitable for mathematician, informatician, statistician but not chemist and biologist 2. Very tedious data preparation
  3. 3. Community driven source Authority driven source
  4. 4. Collaboration in QSAR Possibilities for collaboration in QSAR: 1.Use others' data a.build models, based on others' data b.validate your models against others' data 2. Use others' models a.validate your data against published models b.use output of published models as an input for new ones c.compare performance of published models with own ones All existent modeling tools lack means of collaboration
  5. 5. OCHEM advantages Collaboration-targeted features: 1. Tight connection between database and modeling tools 2. Wiki, discussion, comments, tags Simplified modeling workflow: 1. Sensible defaults for most parameters 2. Only necessary parameters requested 3. Data representation is targeted for chemist 4. Possibility of fine tune for experts
  6. 6. Modeling workflow 1. Data preparation 2. Building a model 3. Analysing the model AD 4. Application of the model
  7. 7. Stage 1 – Data preparation Property Filtering Condition logP = 0.5 Toxicology, Biology, Temperature, Partition coefficient. pH, species, Melting Point = 100 C tissue, method Data Point Introducer Tags Bill G., Sergey B. Toxicology, Biology, Partition coefficient. Date of modification Informationsystem Structure Article Manipulation Benzene, Urea, ... Editing Garberg, P Organization “In vitro models for …” Working sets<
  8. 8. Stage 1 – Data preparation Tags Toxicology, Biology, Partition coefficient. Manipulation Editing Organization Working sets< Filtering Toxicology, Biology, Partition coefficient.
  9. 9. Stage 1: Data preparation
  10. 10. Stage 1: Data preparation
  11. 11. Stage 1: Data preparation
  12. 12. Stage 1: Data preparation
  13. 13. Stage 2: Model building - input data
  14. 14. Stage 2: Model building - descriptors (I)
  15. 15. Stage 2: Model building - descriptors (II)
  16. 16. Stage 2: Model building – descriptors (manual)
  17. 17. Stage 3: Analysing the model (I) Basic model statistics
  18. 18. Stage 3: Analysing the model (II) Applicability domain assessment
  19. 19. Stage 4: Application of the model Selection of the model of interest Model, published by another user Newly created model
  20. 20. Stage 4: Application of the model Provide target compounds
  21. 21. Stage 4: Application of the model Prediction results Target compound Prediction Accuracy assessment
  22. 22. Stage 4: Application of the model Assessment of accuracy of predictions Target compound
  23. 23. Need for distribution of calculations Fact: QSAR modeling is calculation-intensive Examples of calculations: • Training of neural network ensembles • Computing 3D conformations • Computing complex molecular descriptors Solution: • Distributed calculation network • User can postpone, cancel or fetch task results later
  24. 24. Automatic updates and testing Calculation servers are automatically updated upon availability of new release Automatic testing of servers upon updates Tasks that did not pass tests are disabled, keeping the server functional
  25. 25. Backend - distributed calculation Central metaserver, distributed calculation servers Automatic server updates, on-the-fly server testing
  26. 26. Basic facts About 50000 experimental measurements on 285 physicochemical properties published in about 2000 articles Implemented modeling methods: ANN, KNN, MLR, Kernel ridge regression Integrated descriptors: Dragon, E-State, Fragments
  27. 27. Backend - basic facts Platform: Java EE Database: MySQL Server: Tomcat ORM: Hibernate MVC: Spring framework Client side: AJAX, HTML+Javascript

×