Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
PMML	  for	  QSAR	  Model	  Exchange	  	              Rajarshi	  Guha,	  Ph.D.	  	            NIH	  Center	  for	  Advanci...
Background	  •  CheminformaEcs	  	     –  QSAR,	  diversity	  analysis,	  virtual	  screening,	  	        fragments,	  pol...
QuanEtaEve	  Structure	  AcEvity	         RelaEonships	  
Why	  is	  QSAR	  Useful?	  •  Lets	  us	  predict	  whether	  a	  chemical	  is	  likely	  to	     be	  toxic,	  avoiding...
Lots	  and	  Lots	  of	  Models	  •  Hundreds	  of	  such	  models	  published	  in	  the	     literature	      –  Usually...
QSAR	  Model	  Exchange	  •    Build	  models	  in	  ….,	  	  •    Save	  them	  in	  PMML	  •    Distribute	  •    …	  • ...
CheminformaEcs	  in	  R	  •  rcdk	  provides	  cheminformaEcs	  support	  in	  R	     –  Load	  and	  parse	  molecular	  ...
CheminformaEcs	  in	  R	  library(pmml)!library(rcdk)!data(bpdata)!mols <- parse.smiles(bpdata[, 1])!descNames <- unique(u...
R,	  rcdk,	  PMML	  •  rcdk	  provides	  the	  means	  to	  take	  in	  molecules	     and	  output	  a	  PMML	  encoded	 ...
Upcoming SlideShare
Loading in …5

PMML for QSAR Model Exchange


Published on

Published in: Technology, Education
  • Be the first to comment

PMML for QSAR Model Exchange

  1. 1. PMML  for  QSAR  Model  Exchange     Rajarshi  Guha,  Ph.D.     NIH  Center  for  Advancing   TranslaEonal  Sciences  /  h0p://  
  2. 2. Background  •  CheminformaEcs     –  QSAR,  diversity  analysis,  virtual  screening,     fragments,  polypharmacology,  networks  •  RNAi  screening,  high  content  imaging  •  Extensive  use  of  machine  learning  •  All  Eed  together  with  soLware     development  (GUI’s,  libraries)  •  Contributed  pmml.lm  to  the  PMML   package  
  3. 3. QuanEtaEve  Structure  AcEvity   RelaEonships  
  4. 4. Why  is  QSAR  Useful?  •  Lets  us  predict  whether  a  chemical  is  likely  to   be  toxic,  avoiding  animal  tesEng  •  PrioriEze  molecules  from  a  high  throughput   screen  of  300K  molecules  •  Predict  whether  a  molecule  will  be  (sufficiently)   soluble  in  water  •  IdenEfy  molecules  with  anE-­‐malarial  properEes  •  Accurate,  predic-ve  models  can  save   significant  -me  and  money  (and  cute  bunnies)  
  5. 5. Lots  and  Lots  of  Models  •  Hundreds  of  such  models  published  in  the   literature   –  Usually  in  the  form  of  tables  of  regression   coefficients  (if  we’re  lucky)   –  If  the  paper  describes  an  SVM  model,  no  chance   of  reproducing  the  results  •  How  can  we  exchange  QSAR  models?  
  6. 6. QSAR  Model  Exchange  •  Build  models  in  ….,    •  Save  them  in  PMML  •  Distribute  •  …  •  Profit?   –  Not  always    The  bo0leneck  is  evalua:ng  descriptors  for  the  new  observa:ons  to  supply  to  the  model  
  7. 7. CheminformaEcs  in  R  •  rcdk  provides  cheminformaEcs  support  in  R   –  Load  and  parse  molecular  file  formats   –  Evaluate  numerical  descriptors  from  chemical   structures   rcdkCDK Jmol rpubchem rJava fingerprint XML R Programming Environment
  8. 8. CheminformaEcs  in  R  library(pmml)!library(rcdk)!data(bpdata)!mols <- parse.smiles(bpdata[, 1])!descNames <- unique(unlist(sapply(topological, ! get.desc.names)))!descs <- eval.desc(mols, descNames)!model <- lm(BP ~ khs.sCH3 + khs.sF + TopoPSA + VABC,data.frame(bpdata,descs))!pmml(model)!
  9. 9. R,  rcdk,  PMML  •  rcdk  provides  the  means  to  take  in  molecules   and  output  a  PMML  encoded  model  •  One  could  record  appropriate  funcEons/classes   in  the  document  and  use  that  info  to  evaluate   descriptor  for  new  observaEons  •  Since  rcdk  is  based  on  the  Java  CDK  library,   could  also  use  jpmml,  a  Java  API  for  PMML   documents