SlideShare a Scribd company logo
PMML	
  for	
  QSAR	
  Model	
  Exchange	
  	
  

            Rajarshi	
  Guha,	
  Ph.D.	
  	
  
          NIH	
  Center	
  for	
  Advancing	
  
           TranslaEonal	
  Sciences	
  


          guhar@mail.nih.gov	
  /	
  h0p://rguha.net	
  
Background	
  
•  CheminformaEcs	
  	
  
   –  QSAR,	
  diversity	
  analysis,	
  virtual	
  screening,	
  	
  
      fragments,	
  polypharmacology,	
  networks	
  
•  RNAi	
  screening,	
  high	
  content	
  imaging	
  
•  Extensive	
  use	
  of	
  machine	
  learning	
  
•  All	
  Eed	
  together	
  with	
  soLware	
  	
  
   development	
  (GUI’s,	
  libraries)	
  
•  Contributed	
  pmml.lm	
  to	
  the	
  PMML	
  
   package	
  
QuanEtaEve	
  Structure	
  AcEvity	
  
       RelaEonships	
  
Why	
  is	
  QSAR	
  Useful?	
  
•  Lets	
  us	
  predict	
  whether	
  a	
  chemical	
  is	
  likely	
  to	
  
   be	
  toxic,	
  avoiding	
  animal	
  tesEng	
  
•  PrioriEze	
  molecules	
  from	
  a	
  high	
  throughput	
  
   screen	
  of	
  300K	
  molecules	
  
•  Predict	
  whether	
  a	
  molecule	
  will	
  be	
  (sufficiently)	
  
   soluble	
  in	
  water	
  
•  IdenEfy	
  molecules	
  with	
  anE-­‐malarial	
  properEes	
  
•  Accurate,	
  predic-ve	
  models	
  can	
  save	
  
   significant	
  -me	
  and	
  money	
  (and	
  cute	
  bunnies)	
  
Lots	
  and	
  Lots	
  of	
  Models	
  
•  Hundreds	
  of	
  such	
  models	
  published	
  in	
  the	
  
   literature	
  
    –  Usually	
  in	
  the	
  form	
  of	
  tables	
  of	
  regression	
  
       coefficients	
  (if	
  we’re	
  lucky)	
  
    –  If	
  the	
  paper	
  describes	
  an	
  SVM	
  model,	
  no	
  chance	
  
       of	
  reproducing	
  the	
  results	
  
•  How	
  can	
  we	
  exchange	
  QSAR	
  models?	
  
QSAR	
  Model	
  Exchange	
  
•    Build	
  models	
  in	
  ….,	
  	
  
•    Save	
  them	
  in	
  PMML	
  
•    Distribute	
  
•    …	
  
•    Profit?	
  
      –  Not	
  always	
  
	
  
The	
  bo0leneck	
  is	
  evalua:ng	
  descriptors	
  for	
  the	
  
new	
  observa:ons	
  to	
  supply	
  to	
  the	
  model	
  
CheminformaEcs	
  in	
  R	
  
•  rcdk	
  provides	
  cheminformaEcs	
  support	
  in	
  R	
  
   –  Load	
  and	
  parse	
  molecular	
  file	
  formats	
  
   –  Evaluate	
  numerical	
  descriptors	
  from	
  chemical	
  
      structures	
  

      rcdk

CDK           Jmol                                      rpubchem

      rJava                  fingerprint                   XML

                R Programming Environment
CheminformaEcs	
  in	
  R	
  

library(pmml)!
library(rcdk)!
data(bpdata)!
mols <- parse.smiles(bpdata[, 1])!
descNames <- unique(unlist(sapply('topological',      !
                                   get.desc.names)))!
descs <- eval.desc(mols, descNames)!
model <- lm(BP ~ khs.sCH3 + khs.sF + TopoPSA + VABC,
data.frame(bpdata,descs))!
pmml(model)!
R,	
  rcdk,	
  PMML	
  
•  rcdk	
  provides	
  the	
  means	
  to	
  take	
  in	
  molecules	
  
   and	
  output	
  a	
  PMML	
  encoded	
  model	
  
•  One	
  could	
  record	
  appropriate	
  funcEons/classes	
  
   in	
  the	
  document	
  and	
  use	
  that	
  info	
  to	
  evaluate	
  
   descriptor	
  for	
  new	
  observaEons	
  
•  Since	
  rcdk	
  is	
  based	
  on	
  the	
  Java	
  CDK	
  library,	
  
   could	
  also	
  use	
  jpmml,	
  a	
  Java	
  API	
  for	
  PMML	
  
   documents	
  

More Related Content

Similar to PMML for QSAR Model Exchange

Crunching Molecules and Numbers in R
Crunching Molecules and Numbers in RCrunching Molecules and Numbers in R
Crunching Molecules and Numbers in RRajarshi Guha
 
Lecture 9 molecular descriptors
Lecture 9  molecular descriptorsLecture 9  molecular descriptors
Lecture 9 molecular descriptors
RAJAN ROLTA
 
Extending the R API for Spark with sparklyr and Microsoft R Server with Ali Z...
Extending the R API for Spark with sparklyr and Microsoft R Server with Ali Z...Extending the R API for Spark with sparklyr and Microsoft R Server with Ali Z...
Extending the R API for Spark with sparklyr and Microsoft R Server with Ali Z...
Databricks
 
Matrix Factorizations at Scale: a Comparison of Scientific Data Analytics on ...
Matrix Factorizations at Scale: a Comparison of Scientific Data Analytics on ...Matrix Factorizations at Scale: a Comparison of Scientific Data Analytics on ...
Matrix Factorizations at Scale: a Comparison of Scientific Data Analytics on ...
Databricks
 
Scaling up genomic analysis with ADAM
Scaling up genomic analysis with ADAMScaling up genomic analysis with ADAM
Scaling up genomic analysis with ADAM
fnothaft
 
Scalable up genomic analysis with ADAM
Scalable up genomic analysis with ADAMScalable up genomic analysis with ADAM
Scalable up genomic analysis with ADAM
fnothaft
 
Richard Cramer 2014 euro QSAR presentation
Richard Cramer 2014 euro QSAR presentationRichard Cramer 2014 euro QSAR presentation
Richard Cramer 2014 euro QSAR presentation
Certara
 
An Overview to Protein bioinformatics
An Overview to Protein bioinformaticsAn Overview to Protein bioinformatics
An Overview to Protein bioinformatics
Joel Ricci-López
 
What is the "Big Data" version of the Linpack Benchmark? ; What is “Big Data...
What is the "Big Data" version of the Linpack Benchmark?; What is “Big Data...What is the "Big Data" version of the Linpack Benchmark?; What is “Big Data...
What is the "Big Data" version of the Linpack Benchmark? ; What is “Big Data...
Geoffrey Fox
 
The importance of standards for data exchange and interchange on the Royal So...
The importance of standards for data exchange and interchange on the Royal So...The importance of standards for data exchange and interchange on the Royal So...
The importance of standards for data exchange and interchange on the Royal So...
US Environmental Protection Agency (EPA), Center for Computational Toxicology and Exposure
 
Introduction to Galaxy and RNA-Seq
Introduction to Galaxy and RNA-SeqIntroduction to Galaxy and RNA-Seq
Introduction to Galaxy and RNA-Seq
Enis Afgan
 
Building Large Scale Machine Learning Applications with Pipelines-(Evan Spark...
Building Large Scale Machine Learning Applications with Pipelines-(Evan Spark...Building Large Scale Machine Learning Applications with Pipelines-(Evan Spark...
Building Large Scale Machine Learning Applications with Pipelines-(Evan Spark...
Spark Summit
 
exFrame: a Semantic Web Platform for Genomics Experiments
exFrame: a Semantic Web Platform for Genomics ExperimentsexFrame: a Semantic Web Platform for Genomics Experiments
exFrame: a Semantic Web Platform for Genomics Experiments
Tim Clark
 
eXframe: A Semantic Web Platform for Genomic Experiments
eXframe: A Semantic Web Platform for Genomic ExperimentseXframe: A Semantic Web Platform for Genomic Experiments
eXframe: A Semantic Web Platform for Genomic ExperimentsTim Clark
 
Python for Chemistry
Python for ChemistryPython for Chemistry
Python for Chemistrybaoilleach
 
Python for Chemistry
Python for ChemistryPython for Chemistry
Python for Chemistryguest5929fa7
 
Galaxy dna-seq-variant calling-presentationandpractical_gent_april-2016
Galaxy dna-seq-variant calling-presentationandpractical_gent_april-2016Galaxy dna-seq-variant calling-presentationandpractical_gent_april-2016
Galaxy dna-seq-variant calling-presentationandpractical_gent_april-2016
Prof. Wim Van Criekinge
 
ADMET.pptx
ADMET.pptxADMET.pptx
ADMET.pptx
Santu Chall
 
RNA sequencing analysis tutorial with NGS
RNA sequencing analysis tutorial with NGSRNA sequencing analysis tutorial with NGS
RNA sequencing analysis tutorial with NGS
HAMNAHAMNA8
 

Similar to PMML for QSAR Model Exchange (20)

Crunching Molecules and Numbers in R
Crunching Molecules and Numbers in RCrunching Molecules and Numbers in R
Crunching Molecules and Numbers in R
 
Lecture 9 molecular descriptors
Lecture 9  molecular descriptorsLecture 9  molecular descriptors
Lecture 9 molecular descriptors
 
Extending the R API for Spark with sparklyr and Microsoft R Server with Ali Z...
Extending the R API for Spark with sparklyr and Microsoft R Server with Ali Z...Extending the R API for Spark with sparklyr and Microsoft R Server with Ali Z...
Extending the R API for Spark with sparklyr and Microsoft R Server with Ali Z...
 
Matrix Factorizations at Scale: a Comparison of Scientific Data Analytics on ...
Matrix Factorizations at Scale: a Comparison of Scientific Data Analytics on ...Matrix Factorizations at Scale: a Comparison of Scientific Data Analytics on ...
Matrix Factorizations at Scale: a Comparison of Scientific Data Analytics on ...
 
Scaling up genomic analysis with ADAM
Scaling up genomic analysis with ADAMScaling up genomic analysis with ADAM
Scaling up genomic analysis with ADAM
 
Scalable up genomic analysis with ADAM
Scalable up genomic analysis with ADAMScalable up genomic analysis with ADAM
Scalable up genomic analysis with ADAM
 
Richard Cramer 2014 euro QSAR presentation
Richard Cramer 2014 euro QSAR presentationRichard Cramer 2014 euro QSAR presentation
Richard Cramer 2014 euro QSAR presentation
 
An Overview to Protein bioinformatics
An Overview to Protein bioinformaticsAn Overview to Protein bioinformatics
An Overview to Protein bioinformatics
 
What is the "Big Data" version of the Linpack Benchmark? ; What is “Big Data...
What is the "Big Data" version of the Linpack Benchmark?; What is “Big Data...What is the "Big Data" version of the Linpack Benchmark?; What is “Big Data...
What is the "Big Data" version of the Linpack Benchmark? ; What is “Big Data...
 
Hcls sci disc-isa2rdf
Hcls sci disc-isa2rdfHcls sci disc-isa2rdf
Hcls sci disc-isa2rdf
 
The importance of standards for data exchange and interchange on the Royal So...
The importance of standards for data exchange and interchange on the Royal So...The importance of standards for data exchange and interchange on the Royal So...
The importance of standards for data exchange and interchange on the Royal So...
 
Introduction to Galaxy and RNA-Seq
Introduction to Galaxy and RNA-SeqIntroduction to Galaxy and RNA-Seq
Introduction to Galaxy and RNA-Seq
 
Building Large Scale Machine Learning Applications with Pipelines-(Evan Spark...
Building Large Scale Machine Learning Applications with Pipelines-(Evan Spark...Building Large Scale Machine Learning Applications with Pipelines-(Evan Spark...
Building Large Scale Machine Learning Applications with Pipelines-(Evan Spark...
 
exFrame: a Semantic Web Platform for Genomics Experiments
exFrame: a Semantic Web Platform for Genomics ExperimentsexFrame: a Semantic Web Platform for Genomics Experiments
exFrame: a Semantic Web Platform for Genomics Experiments
 
eXframe: A Semantic Web Platform for Genomic Experiments
eXframe: A Semantic Web Platform for Genomic ExperimentseXframe: A Semantic Web Platform for Genomic Experiments
eXframe: A Semantic Web Platform for Genomic Experiments
 
Python for Chemistry
Python for ChemistryPython for Chemistry
Python for Chemistry
 
Python for Chemistry
Python for ChemistryPython for Chemistry
Python for Chemistry
 
Galaxy dna-seq-variant calling-presentationandpractical_gent_april-2016
Galaxy dna-seq-variant calling-presentationandpractical_gent_april-2016Galaxy dna-seq-variant calling-presentationandpractical_gent_april-2016
Galaxy dna-seq-variant calling-presentationandpractical_gent_april-2016
 
ADMET.pptx
ADMET.pptxADMET.pptx
ADMET.pptx
 
RNA sequencing analysis tutorial with NGS
RNA sequencing analysis tutorial with NGSRNA sequencing analysis tutorial with NGS
RNA sequencing analysis tutorial with NGS
 

More from Rajarshi Guha

Pharos: A Torch to Use in Your Journey in the Dark Genome
Pharos: A Torch to Use in Your Journey in the Dark GenomePharos: A Torch to Use in Your Journey in the Dark Genome
Pharos: A Torch to Use in Your Journey in the Dark Genome
Rajarshi Guha
 
Pharos: Putting targets in context
Pharos: Putting targets in contextPharos: Putting targets in context
Pharos: Putting targets in context
Rajarshi Guha
 
Pharos – A Torch to Use in Your Journey In the Dark Genome
Pharos – A Torch to Use in Your Journey In the Dark GenomePharos – A Torch to Use in Your Journey In the Dark Genome
Pharos – A Torch to Use in Your Journey In the Dark Genome
Rajarshi Guha
 
Pharos - Face of the KMC
Pharos - Face of the KMCPharos - Face of the KMC
Pharos - Face of the KMC
Rajarshi Guha
 
Enhancing Prioritization & Discovery of Novel Combinations using an HTS Platform
Enhancing Prioritization & Discovery of Novel Combinations using an HTS PlatformEnhancing Prioritization & Discovery of Novel Combinations using an HTS Platform
Enhancing Prioritization & Discovery of Novel Combinations using an HTS Platform
Rajarshi Guha
 
What can your library do for you?
What can your library do for you?What can your library do for you?
What can your library do for you?
Rajarshi Guha
 
So I have an SD File … What do I do next?
So I have an SD File … What do I do next?So I have an SD File … What do I do next?
So I have an SD File … What do I do next?
Rajarshi Guha
 
Characterization of Chemical Libraries Using Scaffolds and Network Models
Characterization of Chemical Libraries Using Scaffolds and Network ModelsCharacterization of Chemical Libraries Using Scaffolds and Network Models
Characterization of Chemical Libraries Using Scaffolds and Network Models
Rajarshi Guha
 
From Data to Action : Bridging Chemistry and Biology with Informatics at NCATS
From Data to Action: Bridging Chemistry and Biology with Informatics at NCATSFrom Data to Action: Bridging Chemistry and Biology with Informatics at NCATS
From Data to Action : Bridging Chemistry and Biology with Informatics at NCATSRajarshi Guha
 
Robots, Small Molecules & R
Robots, Small Molecules & RRobots, Small Molecules & R
Robots, Small Molecules & RRajarshi Guha
 
Fingerprinting Chemical Structures
Fingerprinting Chemical StructuresFingerprinting Chemical Structures
Fingerprinting Chemical StructuresRajarshi Guha
 
Exploring Compound Combinations in High Throughput Settings: Going Beyond 1D...
Exploring Compound Combinations in High Throughput Settings: Going Beyond 1D...Exploring Compound Combinations in High Throughput Settings: Going Beyond 1D...
Exploring Compound Combinations in High Throughput Settings: Going Beyond 1D...Rajarshi Guha
 
When the whole is better than the parts
When the whole is better than the partsWhen the whole is better than the parts
When the whole is better than the partsRajarshi Guha
 
Exploring Compound Combinations in High Throughput Settings: Going Beyond 1D ...
Exploring Compound Combinations in High Throughput Settings: Going Beyond 1D ...Exploring Compound Combinations in High Throughput Settings: Going Beyond 1D ...
Exploring Compound Combinations in High Throughput Settings: Going Beyond 1D ...Rajarshi Guha
 
Characterization and visualization of compound combination responses in a hig...
Characterization and visualization of compound combination responses in a hig...Characterization and visualization of compound combination responses in a hig...
Characterization and visualization of compound combination responses in a hig...Rajarshi Guha
 
The BioAssay Research Database
The BioAssay Research DatabaseThe BioAssay Research Database
The BioAssay Research DatabaseRajarshi Guha
 
Chemical Data Mining: Open Source & Reproducible
Chemical Data Mining: Open Source & ReproducibleChemical Data Mining: Open Source & Reproducible
Chemical Data Mining: Open Source & ReproducibleRajarshi Guha
 
Chemogenomics in the cloud: Is the sky the limit?
Chemogenomics in the cloud: Is the sky the limit?Chemogenomics in the cloud: Is the sky the limit?
Chemogenomics in the cloud: Is the sky the limit?Rajarshi Guha
 
Quantifying Text Sentiment in R
Quantifying Text Sentiment in RQuantifying Text Sentiment in R
Quantifying Text Sentiment in RRajarshi Guha
 

More from Rajarshi Guha (20)

Pharos: A Torch to Use in Your Journey in the Dark Genome
Pharos: A Torch to Use in Your Journey in the Dark GenomePharos: A Torch to Use in Your Journey in the Dark Genome
Pharos: A Torch to Use in Your Journey in the Dark Genome
 
Pharos: Putting targets in context
Pharos: Putting targets in contextPharos: Putting targets in context
Pharos: Putting targets in context
 
Pharos – A Torch to Use in Your Journey In the Dark Genome
Pharos – A Torch to Use in Your Journey In the Dark GenomePharos – A Torch to Use in Your Journey In the Dark Genome
Pharos – A Torch to Use in Your Journey In the Dark Genome
 
Pharos - Face of the KMC
Pharos - Face of the KMCPharos - Face of the KMC
Pharos - Face of the KMC
 
Enhancing Prioritization & Discovery of Novel Combinations using an HTS Platform
Enhancing Prioritization & Discovery of Novel Combinations using an HTS PlatformEnhancing Prioritization & Discovery of Novel Combinations using an HTS Platform
Enhancing Prioritization & Discovery of Novel Combinations using an HTS Platform
 
What can your library do for you?
What can your library do for you?What can your library do for you?
What can your library do for you?
 
So I have an SD File … What do I do next?
So I have an SD File … What do I do next?So I have an SD File … What do I do next?
So I have an SD File … What do I do next?
 
Characterization of Chemical Libraries Using Scaffolds and Network Models
Characterization of Chemical Libraries Using Scaffolds and Network ModelsCharacterization of Chemical Libraries Using Scaffolds and Network Models
Characterization of Chemical Libraries Using Scaffolds and Network Models
 
From Data to Action : Bridging Chemistry and Biology with Informatics at NCATS
From Data to Action: Bridging Chemistry and Biology with Informatics at NCATSFrom Data to Action: Bridging Chemistry and Biology with Informatics at NCATS
From Data to Action : Bridging Chemistry and Biology with Informatics at NCATS
 
Robots, Small Molecules & R
Robots, Small Molecules & RRobots, Small Molecules & R
Robots, Small Molecules & R
 
Fingerprinting Chemical Structures
Fingerprinting Chemical StructuresFingerprinting Chemical Structures
Fingerprinting Chemical Structures
 
Exploring Compound Combinations in High Throughput Settings: Going Beyond 1D...
Exploring Compound Combinations in High Throughput Settings: Going Beyond 1D...Exploring Compound Combinations in High Throughput Settings: Going Beyond 1D...
Exploring Compound Combinations in High Throughput Settings: Going Beyond 1D...
 
When the whole is better than the parts
When the whole is better than the partsWhen the whole is better than the parts
When the whole is better than the parts
 
Exploring Compound Combinations in High Throughput Settings: Going Beyond 1D ...
Exploring Compound Combinations in High Throughput Settings: Going Beyond 1D ...Exploring Compound Combinations in High Throughput Settings: Going Beyond 1D ...
Exploring Compound Combinations in High Throughput Settings: Going Beyond 1D ...
 
Characterization and visualization of compound combination responses in a hig...
Characterization and visualization of compound combination responses in a hig...Characterization and visualization of compound combination responses in a hig...
Characterization and visualization of compound combination responses in a hig...
 
The BioAssay Research Database
The BioAssay Research DatabaseThe BioAssay Research Database
The BioAssay Research Database
 
Chemical Data Mining: Open Source & Reproducible
Chemical Data Mining: Open Source & ReproducibleChemical Data Mining: Open Source & Reproducible
Chemical Data Mining: Open Source & Reproducible
 
Chemogenomics in the cloud: Is the sky the limit?
Chemogenomics in the cloud: Is the sky the limit?Chemogenomics in the cloud: Is the sky the limit?
Chemogenomics in the cloud: Is the sky the limit?
 
Quantifying Text Sentiment in R
Quantifying Text Sentiment in RQuantifying Text Sentiment in R
Quantifying Text Sentiment in R
 
Smashing Molecules
Smashing MoleculesSmashing Molecules
Smashing Molecules
 

Recently uploaded

GraphSummit Singapore | Neo4j Product Vision & Roadmap - Q2 2024
GraphSummit Singapore | Neo4j Product Vision & Roadmap - Q2 2024GraphSummit Singapore | Neo4j Product Vision & Roadmap - Q2 2024
GraphSummit Singapore | Neo4j Product Vision & Roadmap - Q2 2024
Neo4j
 
Monitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR EventsMonitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR Events
Ana-Maria Mihalceanu
 
Video Streaming: Then, Now, and in the Future
Video Streaming: Then, Now, and in the FutureVideo Streaming: Then, Now, and in the Future
Video Streaming: Then, Now, and in the Future
Alpen-Adria-Universität
 
DevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA ConnectDevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA Connect
Kari Kakkonen
 
Large Language Model (LLM) and it’s Geospatial Applications
Large Language Model (LLM) and it’s Geospatial ApplicationsLarge Language Model (LLM) and it’s Geospatial Applications
Large Language Model (LLM) and it’s Geospatial Applications
Rohit Gautam
 
GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...
GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...
GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...
Neo4j
 
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdfObservability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Paige Cruz
 
Mind map of terminologies used in context of Generative AI
Mind map of terminologies used in context of Generative AIMind map of terminologies used in context of Generative AI
Mind map of terminologies used in context of Generative AI
Kumud Singh
 
GraphSummit Singapore | The Art of the Possible with Graph - Q2 2024
GraphSummit Singapore | The Art of the  Possible with Graph - Q2 2024GraphSummit Singapore | The Art of the  Possible with Graph - Q2 2024
GraphSummit Singapore | The Art of the Possible with Graph - Q2 2024
Neo4j
 
GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...
GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...
GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...
Neo4j
 
Microsoft - Power Platform_G.Aspiotis.pdf
Microsoft - Power Platform_G.Aspiotis.pdfMicrosoft - Power Platform_G.Aspiotis.pdf
Microsoft - Power Platform_G.Aspiotis.pdf
Uni Systems S.M.S.A.
 
Introduction to CHERI technology - Cybersecurity
Introduction to CHERI technology - CybersecurityIntroduction to CHERI technology - Cybersecurity
Introduction to CHERI technology - Cybersecurity
mikeeftimakis1
 
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...
Neo4j
 
By Design, not by Accident - Agile Venture Bolzano 2024
By Design, not by Accident - Agile Venture Bolzano 2024By Design, not by Accident - Agile Venture Bolzano 2024
By Design, not by Accident - Agile Venture Bolzano 2024
Pierluigi Pugliese
 
Pushing the limits of ePRTC: 100ns holdover for 100 days
Pushing the limits of ePRTC: 100ns holdover for 100 daysPushing the limits of ePRTC: 100ns holdover for 100 days
Pushing the limits of ePRTC: 100ns holdover for 100 days
Adtran
 
zkStudyClub - Reef: Fast Succinct Non-Interactive Zero-Knowledge Regex Proofs
zkStudyClub - Reef: Fast Succinct Non-Interactive Zero-Knowledge Regex ProofszkStudyClub - Reef: Fast Succinct Non-Interactive Zero-Knowledge Regex Proofs
zkStudyClub - Reef: Fast Succinct Non-Interactive Zero-Knowledge Regex Proofs
Alex Pruden
 
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdfFIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance
 
20240609 QFM020 Irresponsible AI Reading List May 2024
20240609 QFM020 Irresponsible AI Reading List May 202420240609 QFM020 Irresponsible AI Reading List May 2024
20240609 QFM020 Irresponsible AI Reading List May 2024
Matthew Sinclair
 
How to Get CNIC Information System with Paksim Ga.pptx
How to Get CNIC Information System with Paksim Ga.pptxHow to Get CNIC Information System with Paksim Ga.pptx
How to Get CNIC Information System with Paksim Ga.pptx
danishmna97
 
Essentials of Automations: The Art of Triggers and Actions in FME
Essentials of Automations: The Art of Triggers and Actions in FMEEssentials of Automations: The Art of Triggers and Actions in FME
Essentials of Automations: The Art of Triggers and Actions in FME
Safe Software
 

Recently uploaded (20)

GraphSummit Singapore | Neo4j Product Vision & Roadmap - Q2 2024
GraphSummit Singapore | Neo4j Product Vision & Roadmap - Q2 2024GraphSummit Singapore | Neo4j Product Vision & Roadmap - Q2 2024
GraphSummit Singapore | Neo4j Product Vision & Roadmap - Q2 2024
 
Monitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR EventsMonitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR Events
 
Video Streaming: Then, Now, and in the Future
Video Streaming: Then, Now, and in the FutureVideo Streaming: Then, Now, and in the Future
Video Streaming: Then, Now, and in the Future
 
DevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA ConnectDevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA Connect
 
Large Language Model (LLM) and it’s Geospatial Applications
Large Language Model (LLM) and it’s Geospatial ApplicationsLarge Language Model (LLM) and it’s Geospatial Applications
Large Language Model (LLM) and it’s Geospatial Applications
 
GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...
GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...
GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...
 
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdfObservability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
 
Mind map of terminologies used in context of Generative AI
Mind map of terminologies used in context of Generative AIMind map of terminologies used in context of Generative AI
Mind map of terminologies used in context of Generative AI
 
GraphSummit Singapore | The Art of the Possible with Graph - Q2 2024
GraphSummit Singapore | The Art of the  Possible with Graph - Q2 2024GraphSummit Singapore | The Art of the  Possible with Graph - Q2 2024
GraphSummit Singapore | The Art of the Possible with Graph - Q2 2024
 
GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...
GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...
GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...
 
Microsoft - Power Platform_G.Aspiotis.pdf
Microsoft - Power Platform_G.Aspiotis.pdfMicrosoft - Power Platform_G.Aspiotis.pdf
Microsoft - Power Platform_G.Aspiotis.pdf
 
Introduction to CHERI technology - Cybersecurity
Introduction to CHERI technology - CybersecurityIntroduction to CHERI technology - Cybersecurity
Introduction to CHERI technology - Cybersecurity
 
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...
 
By Design, not by Accident - Agile Venture Bolzano 2024
By Design, not by Accident - Agile Venture Bolzano 2024By Design, not by Accident - Agile Venture Bolzano 2024
By Design, not by Accident - Agile Venture Bolzano 2024
 
Pushing the limits of ePRTC: 100ns holdover for 100 days
Pushing the limits of ePRTC: 100ns holdover for 100 daysPushing the limits of ePRTC: 100ns holdover for 100 days
Pushing the limits of ePRTC: 100ns holdover for 100 days
 
zkStudyClub - Reef: Fast Succinct Non-Interactive Zero-Knowledge Regex Proofs
zkStudyClub - Reef: Fast Succinct Non-Interactive Zero-Knowledge Regex ProofszkStudyClub - Reef: Fast Succinct Non-Interactive Zero-Knowledge Regex Proofs
zkStudyClub - Reef: Fast Succinct Non-Interactive Zero-Knowledge Regex Proofs
 
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdfFIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
 
20240609 QFM020 Irresponsible AI Reading List May 2024
20240609 QFM020 Irresponsible AI Reading List May 202420240609 QFM020 Irresponsible AI Reading List May 2024
20240609 QFM020 Irresponsible AI Reading List May 2024
 
How to Get CNIC Information System with Paksim Ga.pptx
How to Get CNIC Information System with Paksim Ga.pptxHow to Get CNIC Information System with Paksim Ga.pptx
How to Get CNIC Information System with Paksim Ga.pptx
 
Essentials of Automations: The Art of Triggers and Actions in FME
Essentials of Automations: The Art of Triggers and Actions in FMEEssentials of Automations: The Art of Triggers and Actions in FME
Essentials of Automations: The Art of Triggers and Actions in FME
 

PMML for QSAR Model Exchange

  • 1. PMML  for  QSAR  Model  Exchange     Rajarshi  Guha,  Ph.D.     NIH  Center  for  Advancing   TranslaEonal  Sciences   guhar@mail.nih.gov  /  h0p://rguha.net  
  • 2. Background   •  CheminformaEcs     –  QSAR,  diversity  analysis,  virtual  screening,     fragments,  polypharmacology,  networks   •  RNAi  screening,  high  content  imaging   •  Extensive  use  of  machine  learning   •  All  Eed  together  with  soLware     development  (GUI’s,  libraries)   •  Contributed  pmml.lm  to  the  PMML   package  
  • 3. QuanEtaEve  Structure  AcEvity   RelaEonships  
  • 4. Why  is  QSAR  Useful?   •  Lets  us  predict  whether  a  chemical  is  likely  to   be  toxic,  avoiding  animal  tesEng   •  PrioriEze  molecules  from  a  high  throughput   screen  of  300K  molecules   •  Predict  whether  a  molecule  will  be  (sufficiently)   soluble  in  water   •  IdenEfy  molecules  with  anE-­‐malarial  properEes   •  Accurate,  predic-ve  models  can  save   significant  -me  and  money  (and  cute  bunnies)  
  • 5. Lots  and  Lots  of  Models   •  Hundreds  of  such  models  published  in  the   literature   –  Usually  in  the  form  of  tables  of  regression   coefficients  (if  we’re  lucky)   –  If  the  paper  describes  an  SVM  model,  no  chance   of  reproducing  the  results   •  How  can  we  exchange  QSAR  models?  
  • 6. QSAR  Model  Exchange   •  Build  models  in  ….,     •  Save  them  in  PMML   •  Distribute   •  …   •  Profit?   –  Not  always     The  bo0leneck  is  evalua:ng  descriptors  for  the   new  observa:ons  to  supply  to  the  model  
  • 7. CheminformaEcs  in  R   •  rcdk  provides  cheminformaEcs  support  in  R   –  Load  and  parse  molecular  file  formats   –  Evaluate  numerical  descriptors  from  chemical   structures   rcdk CDK Jmol rpubchem rJava fingerprint XML R Programming Environment
  • 8. CheminformaEcs  in  R   library(pmml)! library(rcdk)! data(bpdata)! mols <- parse.smiles(bpdata[, 1])! descNames <- unique(unlist(sapply('topological', ! get.desc.names)))! descs <- eval.desc(mols, descNames)! model <- lm(BP ~ khs.sCH3 + khs.sF + TopoPSA + VABC, data.frame(bpdata,descs))! pmml(model)!
  • 9. R,  rcdk,  PMML   •  rcdk  provides  the  means  to  take  in  molecules   and  output  a  PMML  encoded  model   •  One  could  record  appropriate  funcEons/classes   in  the  document  and  use  that  info  to  evaluate   descriptor  for  new  observaEons   •  Since  rcdk  is  based  on  the  Java  CDK  library,   could  also  use  jpmml,  a  Java  API  for  PMML   documents