PMML for QSAR Model Exchange

1,421 views

Published on

Published in: Technology, Education
0 Comments
1 Like
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
1,421
On SlideShare
0
From Embeds
0
Number of Embeds
8
Actions
Shares
0
Downloads
12
Comments
0
Likes
1
Embeds 0
No embeds

No notes for slide

PMML for QSAR Model Exchange

  1. 1. PMML  for  QSAR  Model  Exchange     Rajarshi  Guha,  Ph.D.     NIH  Center  for  Advancing   TranslaEonal  Sciences   guhar@mail.nih.gov  /  h0p://rguha.net  
  2. 2. Background  •  CheminformaEcs     –  QSAR,  diversity  analysis,  virtual  screening,     fragments,  polypharmacology,  networks  •  RNAi  screening,  high  content  imaging  •  Extensive  use  of  machine  learning  •  All  Eed  together  with  soLware     development  (GUI’s,  libraries)  •  Contributed  pmml.lm  to  the  PMML   package  
  3. 3. QuanEtaEve  Structure  AcEvity   RelaEonships  
  4. 4. Why  is  QSAR  Useful?  •  Lets  us  predict  whether  a  chemical  is  likely  to   be  toxic,  avoiding  animal  tesEng  •  PrioriEze  molecules  from  a  high  throughput   screen  of  300K  molecules  •  Predict  whether  a  molecule  will  be  (sufficiently)   soluble  in  water  •  IdenEfy  molecules  with  anE-­‐malarial  properEes  •  Accurate,  predic-ve  models  can  save   significant  -me  and  money  (and  cute  bunnies)  
  5. 5. Lots  and  Lots  of  Models  •  Hundreds  of  such  models  published  in  the   literature   –  Usually  in  the  form  of  tables  of  regression   coefficients  (if  we’re  lucky)   –  If  the  paper  describes  an  SVM  model,  no  chance   of  reproducing  the  results  •  How  can  we  exchange  QSAR  models?  
  6. 6. QSAR  Model  Exchange  •  Build  models  in  ….,    •  Save  them  in  PMML  •  Distribute  •  …  •  Profit?   –  Not  always    The  bo0leneck  is  evalua:ng  descriptors  for  the  new  observa:ons  to  supply  to  the  model  
  7. 7. CheminformaEcs  in  R  •  rcdk  provides  cheminformaEcs  support  in  R   –  Load  and  parse  molecular  file  formats   –  Evaluate  numerical  descriptors  from  chemical   structures   rcdkCDK Jmol rpubchem rJava fingerprint XML R Programming Environment
  8. 8. CheminformaEcs  in  R  library(pmml)!library(rcdk)!data(bpdata)!mols <- parse.smiles(bpdata[, 1])!descNames <- unique(unlist(sapply(topological, ! get.desc.names)))!descs <- eval.desc(mols, descNames)!model <- lm(BP ~ khs.sCH3 + khs.sF + TopoPSA + VABC,data.frame(bpdata,descs))!pmml(model)!
  9. 9. R,  rcdk,  PMML  •  rcdk  provides  the  means  to  take  in  molecules   and  output  a  PMML  encoded  model  •  One  could  record  appropriate  funcEons/classes   in  the  document  and  use  that  info  to  evaluate   descriptor  for  new  observaEons  •  Since  rcdk  is  based  on  the  Java  CDK  library,   could  also  use  jpmml,  a  Java  API  for  PMML   documents  

×