SlideShare a Scribd company logo
Exploring virtual compound space with Bayesian statistics Willem P van Hoorn Chemistry Pfizer Global Research and Development Sandwich UK [email_address] Pipeline Pilot UGM, San Diego, Mar 2007
Overview ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
An embarrassment of the riches ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],As an aside: this is a trillion in USA and modern British, and a billion in traditional British http://en.wikipedia.org/wiki/Names_of_large_numbers
Methods
Bayesian Learning, single category Data set (assay data) Fingerprint bits ~ substructures “ Good” Actives “ Bad” Inactives Bayesian Model Rev Thomas Bayes ca 1702 - 1761 ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Bayesian Learning, multiple categories Pfizer library file Fingerprint bits ~ substructures Library 1 Bayesian Models ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],Library N …
Building the multi-category Bayesian models Pfizer compound database Pfizer library file: all compounds made in-house and externally by combinatorial chemistry:  O(6) compounds, O(3) libraries 50% 50% 12.5K Pfizer singleton diversity subset
A singleton library? ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Multi-category Bayesian predictions A probe (UK-92480, Sildenafil) By default top 16 libraries is calculated: Singleton Library1 Library2 ... Library15 104.57 84.10 43.97 ... 12.63
Bayesian predictions are exemplified by Nearest Neighbour search Exemplify libraries by identifying nearest  neighbours from library file, default top 6. Final output: 16 x 6 = 96 compounds (one-plate screenable hypothesis) 16 96 R1 R2 849914-95-0  139755-82-1  298214-47-8  no CAS  155879-54-2  223430-18-0 UK-A  UK-B  UK-C  UK-D  UK-E 1. Singleton (in file: 12500)  Singleton Library1 Library2 ... Library15 104.57 84.10 43.97 ... 12.63
What is searched? ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],1 1 2 ,[object Object],[object Object],x x x x
A note on coverage of chemical space ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Validation
Random test set Exclude singleton library O(6) Random x%: 9452 Top 5 predicted library ID compared with real library ID V1 9411, 99.6% Found in top 5 41 , 0.4% 9068, 96% 9452 Not in top 5 Found in top 1 Test set
41 compounds with correct library not in top 5 PF-A Internal Library 1 349 -29.0651987519029 Another PF number Internal Library 2 243 -13.3644982689003 Another PF number Internal Library 3 69 -0.400118961865439 Another PF number Internal Library 4 63 0.614583090788282 Another PF number Internal Library 1 53 -0.13987606970494 Another PF number Internal Library 5 50 -7.32271642948761 Another PF number Internal Library 1 35 3.41709994966454 Another PF number Internal Library 3 22 9.57829190295786 Another PF number Internal Library 1 22 10.0504136444794 Another PF number External Library 1 20 22.8956131731457 Another PF number External Library 2 19 18.8320528385981 Another PF number Internal Library 3 15 12.1074842056827 Another PF number Internal Library 1 14 54.6179465790837 Another PF number Internal Library 3 13 16.6244027916311 Another PF number Internal Library 1 12 6.74173586963795 Another PF number Internal Library 1 12 17.0964105412622 Another PF number Internal Library 1 11 58.7994305333701 Another PF number External Library 3 10 58.2193181435516 Another PF number Internal Library 1 10 12.5102031415206 Another PF number Internal Library 6 9 19.4093857882624 Another PF number Internal Library 1 8 20.6383651456158 Another PF number External Library 3 8 73.0633503114444 Another PF number External Library 4 8 18.5429747446516 Another PF number Internal Library 1 8 36.9730841061725 Another PF number Internal Library 1 7 34.8859762378176 Another PF number Internal Library 3 7 17.3617539873978 Another PF number Internal Library 1 7 30.8582847036755 Another PF number Internal Library 1 7 41.848859585633 Another PF number External Library 5 7 25.8587564812026 Another PF number External Library 6 6 33.5395919145182 Another PF number External Library 7 6 39.9074521984672 Another PF number External Library 8 6 32.3248563198852 Another PF number External Library 9 6 23.9421542596281 Another PF number External Library 10 6 95.1965176091739 Another PF number Internal Library 1 6 53.8715604224809 Another PF number Internal Library 1 6 28.2709230508615 Another PF number Internal Library 1 6 48.1827060771728 Another PF number Internal Library 1 6 17.7689907755174 Another PF number Internal Library 7 6 57.5694207876578 Another PF number Internal Library 7 6 53.9529913359943 Another PF number Internal Library 7 6 56.184768481901 compound_ID correct library_id ranked_as Bayesian score
PF-A Amide formation Monomer 2 Monomer 1 + No registration error Worst mispredicted: PF-A General remark: in-house libraries have broad scope, therefore harder to predict Internal library 1 29,800 compounds registered, monomers known for 28,670 120 of these contain Monomer 1, but only 1 compound contains Monomer 2:  PF-A is atypical product
So what was found? Bayesian predictions: 1. External library 11:  Amide formation 2. External library 12:  Amide formation … .. ,[object Object],[object Object],[object Object],V1 Similar to monomer 2 Similar to monomer 1
Six Bayesian categorisation models are available ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],Fingerprint How to compensate for different sizes of libraries in training set? ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Recall of known library id as function of model Exclude singleton library O(6) Random x%: 9452 Top 5 predicted library ID compared with real library ID 205, 2.2% 9247, 97.8% 5692, 60% ECFP_Enrichment 85, 0.9% 9367, 99.1% 8920, 94% FCFP 13, 0.1% 9439, 99.9% 8372, 89% ECFP_EstPGood 108, 1.1% 9344, 98.9% 6093, 64% FCFP_Enrichment 9441, 99.9% 9411, 99.6% Found in top 5 11, 0.1% 8547, 90% FCFP_EstPGood 41, 0.4% 9068, 96% ECFP Not in top 5 Found in top 1  Model Test set
Comparison of six Bayesian models ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],V1
Interpreting the results
Opening the Bayesian black box ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],Probe Fingerprint bits ,[object Object],[object Object],A library
Probe is highlighted by what each library recognises 1. Singleton  2. In-house 1  3. In-house 2  4. External 1  5. External 2  6. External 3 7. External 4  8. External 5  9. External 6  10. External 7  11. External 8  12. External 9 13. External 10  14. External 11  15. External 12  16. External 13 In-house 2 yields compounds similar to left hand site of probe In-house 1 yields compounds similar to right hand site of probe
Highlighted probes compared to actual compounds retrieved 2. In-house 1  3. In-house 2  4. External 1
How about the singleton file?
How about the Pfizer singleton file? ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],Pfizer compound database All singletons: O(6) compounds
O(4) ,[object Object],[object Object],[object Object],[object Object],[object Object],Singleton O(6) liquid singleton compounds mapped to O(3) libraries  As expected, “Singleton” library dominates Generally: Good spread
Pfizer solids and vendor compounds have been mapped to libraries 7 unmapped libraries 6 unmapped libraries O(5) solid samples for which no liquid sample is available O(4) O(5) O(6) structures from ChemNavigator not in Pfizer files Singleton Singleton
Mapped singleton/vendor compounds can be searched by similarity ,[object Object],[object Object],4 x 96 16 Library compounds Singleton compounds, liquid Singleton compounds, solid Singleton compounds, vendor 147676-92-4
Implementation
Bayesian search implemented as web service ,[object Object],[object Object],[object Object],[object Object],[object Object],Last model update, overview of coverage, etc ~5-10 min User ,[object Object],[object Object],[object Object],[object Object],[object Object],pdf report: Ranked libraries + NN examples 1. Singleton  2. In-house 1  3. In-house 2  4. External 1  5. External 2  6. External 3 7. External 4  8. External 5  9. External 6  10. External 7  11. External 8  12. External 9 13. External 10  14. External 11  15. External 12  16. External 13 Singleton R1 R2 849914-95-0  139755-82-1  298214-47-8  no CAS  155879-54-2  223430-18-0 UK-A  UK-B  UK-C  UK-D  UK-E 1. Singleton (in file: 12500)
A happy user
[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],Conclusions
[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],Acknowledgements

More Related Content

Similar to Exploring virtual compound space with Bayesian statistics

DNA and Genes Lab ActivityComplete your answers in the spaces .docx
DNA and Genes Lab ActivityComplete your answers in the spaces .docxDNA and Genes Lab ActivityComplete your answers in the spaces .docx
DNA and Genes Lab ActivityComplete your answers in the spaces .docx
jacksnathalie
 
LamiaFinal data ( results).docx1- label all lanes, label ma.docx
LamiaFinal data ( results).docx1- label all lanes, label ma.docxLamiaFinal data ( results).docx1- label all lanes, label ma.docx
LamiaFinal data ( results).docx1- label all lanes, label ma.docx
DIPESH30
 
Bioinformatics t9-t10-biocheminformatics v2014
Bioinformatics t9-t10-biocheminformatics v2014Bioinformatics t9-t10-biocheminformatics v2014
Bioinformatics t9-t10-biocheminformatics v2014
Prof. Wim Van Criekinge
 
sdAb Screening Services
sdAb Screening ServicessdAb Screening Services
sdAb Screening Services
ShawVivian
 
Eastman_MI530_FinalProjectReport
Eastman_MI530_FinalProjectReportEastman_MI530_FinalProjectReport
Eastman_MI530_FinalProjectReportNicholas Eastman
 
Semantics for integrated laboratory analytical processes - The Allotrope Pers...
Semantics for integrated laboratory analytical processes - The Allotrope Pers...Semantics for integrated laboratory analytical processes - The Allotrope Pers...
Semantics for integrated laboratory analytical processes - The Allotrope Pers...
OSTHUS
 
Lesson 6 Basic Tutorial Data Analysis Software for Flow Cytometry
Lesson 6 Basic Tutorial Data Analysis Software for Flow CytometryLesson 6 Basic Tutorial Data Analysis Software for Flow Cytometry
Lesson 6 Basic Tutorial Data Analysis Software for Flow Cytometry
Uttam Belbase
 
wings2014 Workshop 1 Design, sequence, align, count, visualize
wings2014 Workshop 1 Design, sequence, align, count, visualizewings2014 Workshop 1 Design, sequence, align, count, visualize
wings2014 Workshop 1 Design, sequence, align, count, visualize
Ann Loraine
 
Can a Free Access Structure-Centric Community for Chemists Benefit Drug Disco...
Can a Free Access Structure-Centric Community for Chemists Benefit Drug Disco...Can a Free Access Structure-Centric Community for Chemists Benefit Drug Disco...
Can a Free Access Structure-Centric Community for Chemists Benefit Drug Disco...
US Environmental Protection Agency (EPA), Center for Computational Toxicology and Exposure
 
Phage Display Screening for Single Domain Antibody (sdAb)
Phage Display Screening for Single Domain Antibody (sdAb)Phage Display Screening for Single Domain Antibody (sdAb)
Phage Display Screening for Single Domain Antibody (sdAb)
ShawVivian
 
sdAb Functional Identification
sdAb Functional IdentificationsdAb Functional Identification
sdAb Functional Identification
ShawVivian
 
Sdab discovery
Sdab discoverySdab discovery
Sdab discovery
Creative Biolabs
 
BIO 1030, Principles of Biology 1 Course Description .docx
BIO 1030, Principles of Biology 1 Course Description .docxBIO 1030, Principles of Biology 1 Course Description .docx
BIO 1030, Principles of Biology 1 Course Description .docx
AASTHA76
 
Sample preparation techniques for biological sample
Sample preparation techniques for biological sampleSample preparation techniques for biological sample
Sample preparation techniques for biological sample
CSIR-Central Drug Research Institute
 
Mashing Up Drug Discovery
Mashing Up Drug DiscoveryMashing Up Drug Discovery
Mashing Up Drug Discovery
SciBite Limited
 
Bioinformatica 15-12-2011-t9-t10-bio cheminformatics
Bioinformatica 15-12-2011-t9-t10-bio cheminformaticsBioinformatica 15-12-2011-t9-t10-bio cheminformatics
Bioinformatica 15-12-2011-t9-t10-bio cheminformatics
Prof. Wim Van Criekinge
 
Serving the medicinal chemistry community with Royal Society of Chemistry che...
Serving the medicinal chemistry community with Royal Society of Chemistry che...Serving the medicinal chemistry community with Royal Society of Chemistry che...
Serving the medicinal chemistry community with Royal Society of Chemistry che...
US Environmental Protection Agency (EPA), Center for Computational Toxicology and Exposure
 
Too good to be true? How validate your data
Too good to be true? How validate your dataToo good to be true? How validate your data
Too good to be true? How validate your data
Alex Henderson
 
Next-Gen Drug Discovery: An Integrated Micro-Droplet Based Platform
Next-Gen Drug Discovery: An Integrated Micro-Droplet Based PlatformNext-Gen Drug Discovery: An Integrated Micro-Droplet Based Platform
Next-Gen Drug Discovery: An Integrated Micro-Droplet Based Platform
Laura Berry
 
A guided SQL tour of bioinformatics databases
A guided SQL tour of bioinformatics databasesA guided SQL tour of bioinformatics databases
A guided SQL tour of bioinformatics databases
Yannick Pouliot
 

Similar to Exploring virtual compound space with Bayesian statistics (20)

DNA and Genes Lab ActivityComplete your answers in the spaces .docx
DNA and Genes Lab ActivityComplete your answers in the spaces .docxDNA and Genes Lab ActivityComplete your answers in the spaces .docx
DNA and Genes Lab ActivityComplete your answers in the spaces .docx
 
LamiaFinal data ( results).docx1- label all lanes, label ma.docx
LamiaFinal data ( results).docx1- label all lanes, label ma.docxLamiaFinal data ( results).docx1- label all lanes, label ma.docx
LamiaFinal data ( results).docx1- label all lanes, label ma.docx
 
Bioinformatics t9-t10-biocheminformatics v2014
Bioinformatics t9-t10-biocheminformatics v2014Bioinformatics t9-t10-biocheminformatics v2014
Bioinformatics t9-t10-biocheminformatics v2014
 
sdAb Screening Services
sdAb Screening ServicessdAb Screening Services
sdAb Screening Services
 
Eastman_MI530_FinalProjectReport
Eastman_MI530_FinalProjectReportEastman_MI530_FinalProjectReport
Eastman_MI530_FinalProjectReport
 
Semantics for integrated laboratory analytical processes - The Allotrope Pers...
Semantics for integrated laboratory analytical processes - The Allotrope Pers...Semantics for integrated laboratory analytical processes - The Allotrope Pers...
Semantics for integrated laboratory analytical processes - The Allotrope Pers...
 
Lesson 6 Basic Tutorial Data Analysis Software for Flow Cytometry
Lesson 6 Basic Tutorial Data Analysis Software for Flow CytometryLesson 6 Basic Tutorial Data Analysis Software for Flow Cytometry
Lesson 6 Basic Tutorial Data Analysis Software for Flow Cytometry
 
wings2014 Workshop 1 Design, sequence, align, count, visualize
wings2014 Workshop 1 Design, sequence, align, count, visualizewings2014 Workshop 1 Design, sequence, align, count, visualize
wings2014 Workshop 1 Design, sequence, align, count, visualize
 
Can a Free Access Structure-Centric Community for Chemists Benefit Drug Disco...
Can a Free Access Structure-Centric Community for Chemists Benefit Drug Disco...Can a Free Access Structure-Centric Community for Chemists Benefit Drug Disco...
Can a Free Access Structure-Centric Community for Chemists Benefit Drug Disco...
 
Phage Display Screening for Single Domain Antibody (sdAb)
Phage Display Screening for Single Domain Antibody (sdAb)Phage Display Screening for Single Domain Antibody (sdAb)
Phage Display Screening for Single Domain Antibody (sdAb)
 
sdAb Functional Identification
sdAb Functional IdentificationsdAb Functional Identification
sdAb Functional Identification
 
Sdab discovery
Sdab discoverySdab discovery
Sdab discovery
 
BIO 1030, Principles of Biology 1 Course Description .docx
BIO 1030, Principles of Biology 1 Course Description .docxBIO 1030, Principles of Biology 1 Course Description .docx
BIO 1030, Principles of Biology 1 Course Description .docx
 
Sample preparation techniques for biological sample
Sample preparation techniques for biological sampleSample preparation techniques for biological sample
Sample preparation techniques for biological sample
 
Mashing Up Drug Discovery
Mashing Up Drug DiscoveryMashing Up Drug Discovery
Mashing Up Drug Discovery
 
Bioinformatica 15-12-2011-t9-t10-bio cheminformatics
Bioinformatica 15-12-2011-t9-t10-bio cheminformaticsBioinformatica 15-12-2011-t9-t10-bio cheminformatics
Bioinformatica 15-12-2011-t9-t10-bio cheminformatics
 
Serving the medicinal chemistry community with Royal Society of Chemistry che...
Serving the medicinal chemistry community with Royal Society of Chemistry che...Serving the medicinal chemistry community with Royal Society of Chemistry che...
Serving the medicinal chemistry community with Royal Society of Chemistry che...
 
Too good to be true? How validate your data
Too good to be true? How validate your dataToo good to be true? How validate your data
Too good to be true? How validate your data
 
Next-Gen Drug Discovery: An Integrated Micro-Droplet Based Platform
Next-Gen Drug Discovery: An Integrated Micro-Droplet Based PlatformNext-Gen Drug Discovery: An Integrated Micro-Droplet Based Platform
Next-Gen Drug Discovery: An Integrated Micro-Droplet Based Platform
 
A guided SQL tour of bioinformatics databases
A guided SQL tour of bioinformatics databasesA guided SQL tour of bioinformatics databases
A guided SQL tour of bioinformatics databases
 

Recently uploaded

Monitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR EventsMonitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR Events
Ana-Maria Mihalceanu
 
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
BookNet Canada
 
Quantum Computing: Current Landscape and the Future Role of APIs
Quantum Computing: Current Landscape and the Future Role of APIsQuantum Computing: Current Landscape and the Future Role of APIs
Quantum Computing: Current Landscape and the Future Role of APIs
Vlad Stirbu
 
DevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA ConnectDevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA Connect
Kari Kakkonen
 
RESUME BUILDER APPLICATION Project for students
RESUME BUILDER APPLICATION Project for studentsRESUME BUILDER APPLICATION Project for students
RESUME BUILDER APPLICATION Project for students
KAMESHS29
 
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
DanBrown980551
 
Assure Contact Center Experiences for Your Customers With ThousandEyes
Assure Contact Center Experiences for Your Customers With ThousandEyesAssure Contact Center Experiences for Your Customers With ThousandEyes
Assure Contact Center Experiences for Your Customers With ThousandEyes
ThousandEyes
 
Enhancing Performance with Globus and the Science DMZ
Enhancing Performance with Globus and the Science DMZEnhancing Performance with Globus and the Science DMZ
Enhancing Performance with Globus and the Science DMZ
Globus
 
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdfObservability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Paige Cruz
 
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
James Anderson
 
Free Complete Python - A step towards Data Science
Free Complete Python - A step towards Data ScienceFree Complete Python - A step towards Data Science
Free Complete Python - A step towards Data Science
RinaMondal9
 
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 previewState of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
Prayukth K V
 
The Future of Platform Engineering
The Future of Platform EngineeringThe Future of Platform Engineering
The Future of Platform Engineering
Jemma Hussein Allen
 
Leading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdfLeading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdf
OnBoard
 
zkStudyClub - Reef: Fast Succinct Non-Interactive Zero-Knowledge Regex Proofs
zkStudyClub - Reef: Fast Succinct Non-Interactive Zero-Knowledge Regex ProofszkStudyClub - Reef: Fast Succinct Non-Interactive Zero-Knowledge Regex Proofs
zkStudyClub - Reef: Fast Succinct Non-Interactive Zero-Knowledge Regex Proofs
Alex Pruden
 
Elevating Tactical DDD Patterns Through Object Calisthenics
Elevating Tactical DDD Patterns Through Object CalisthenicsElevating Tactical DDD Patterns Through Object Calisthenics
Elevating Tactical DDD Patterns Through Object Calisthenics
Dorra BARTAGUIZ
 
Secstrike : Reverse Engineering & Pwnable tools for CTF.pptx
Secstrike : Reverse Engineering & Pwnable tools for CTF.pptxSecstrike : Reverse Engineering & Pwnable tools for CTF.pptx
Secstrike : Reverse Engineering & Pwnable tools for CTF.pptx
nkrafacyberclub
 
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
UiPathCommunity
 
Epistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI supportEpistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI support
Alan Dix
 
By Design, not by Accident - Agile Venture Bolzano 2024
By Design, not by Accident - Agile Venture Bolzano 2024By Design, not by Accident - Agile Venture Bolzano 2024
By Design, not by Accident - Agile Venture Bolzano 2024
Pierluigi Pugliese
 

Recently uploaded (20)

Monitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR EventsMonitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR Events
 
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
 
Quantum Computing: Current Landscape and the Future Role of APIs
Quantum Computing: Current Landscape and the Future Role of APIsQuantum Computing: Current Landscape and the Future Role of APIs
Quantum Computing: Current Landscape and the Future Role of APIs
 
DevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA ConnectDevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA Connect
 
RESUME BUILDER APPLICATION Project for students
RESUME BUILDER APPLICATION Project for studentsRESUME BUILDER APPLICATION Project for students
RESUME BUILDER APPLICATION Project for students
 
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
 
Assure Contact Center Experiences for Your Customers With ThousandEyes
Assure Contact Center Experiences for Your Customers With ThousandEyesAssure Contact Center Experiences for Your Customers With ThousandEyes
Assure Contact Center Experiences for Your Customers With ThousandEyes
 
Enhancing Performance with Globus and the Science DMZ
Enhancing Performance with Globus and the Science DMZEnhancing Performance with Globus and the Science DMZ
Enhancing Performance with Globus and the Science DMZ
 
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdfObservability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
 
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
 
Free Complete Python - A step towards Data Science
Free Complete Python - A step towards Data ScienceFree Complete Python - A step towards Data Science
Free Complete Python - A step towards Data Science
 
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 previewState of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
 
The Future of Platform Engineering
The Future of Platform EngineeringThe Future of Platform Engineering
The Future of Platform Engineering
 
Leading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdfLeading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdf
 
zkStudyClub - Reef: Fast Succinct Non-Interactive Zero-Knowledge Regex Proofs
zkStudyClub - Reef: Fast Succinct Non-Interactive Zero-Knowledge Regex ProofszkStudyClub - Reef: Fast Succinct Non-Interactive Zero-Knowledge Regex Proofs
zkStudyClub - Reef: Fast Succinct Non-Interactive Zero-Knowledge Regex Proofs
 
Elevating Tactical DDD Patterns Through Object Calisthenics
Elevating Tactical DDD Patterns Through Object CalisthenicsElevating Tactical DDD Patterns Through Object Calisthenics
Elevating Tactical DDD Patterns Through Object Calisthenics
 
Secstrike : Reverse Engineering & Pwnable tools for CTF.pptx
Secstrike : Reverse Engineering & Pwnable tools for CTF.pptxSecstrike : Reverse Engineering & Pwnable tools for CTF.pptx
Secstrike : Reverse Engineering & Pwnable tools for CTF.pptx
 
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
 
Epistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI supportEpistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI support
 
By Design, not by Accident - Agile Venture Bolzano 2024
By Design, not by Accident - Agile Venture Bolzano 2024By Design, not by Accident - Agile Venture Bolzano 2024
By Design, not by Accident - Agile Venture Bolzano 2024
 

Exploring virtual compound space with Bayesian statistics

  • 1. Exploring virtual compound space with Bayesian statistics Willem P van Hoorn Chemistry Pfizer Global Research and Development Sandwich UK [email_address] Pipeline Pilot UGM, San Diego, Mar 2007
  • 2.
  • 3.
  • 5.
  • 6.
  • 7. Building the multi-category Bayesian models Pfizer compound database Pfizer library file: all compounds made in-house and externally by combinatorial chemistry: O(6) compounds, O(3) libraries 50% 50% 12.5K Pfizer singleton diversity subset
  • 8.
  • 9. Multi-category Bayesian predictions A probe (UK-92480, Sildenafil) By default top 16 libraries is calculated: Singleton Library1 Library2 ... Library15 104.57 84.10 43.97 ... 12.63
  • 10. Bayesian predictions are exemplified by Nearest Neighbour search Exemplify libraries by identifying nearest neighbours from library file, default top 6. Final output: 16 x 6 = 96 compounds (one-plate screenable hypothesis) 16 96 R1 R2 849914-95-0 139755-82-1 298214-47-8 no CAS 155879-54-2 223430-18-0 UK-A UK-B UK-C UK-D UK-E 1. Singleton (in file: 12500) Singleton Library1 Library2 ... Library15 104.57 84.10 43.97 ... 12.63
  • 11.
  • 12.
  • 14. Random test set Exclude singleton library O(6) Random x%: 9452 Top 5 predicted library ID compared with real library ID V1 9411, 99.6% Found in top 5 41 , 0.4% 9068, 96% 9452 Not in top 5 Found in top 1 Test set
  • 15. 41 compounds with correct library not in top 5 PF-A Internal Library 1 349 -29.0651987519029 Another PF number Internal Library 2 243 -13.3644982689003 Another PF number Internal Library 3 69 -0.400118961865439 Another PF number Internal Library 4 63 0.614583090788282 Another PF number Internal Library 1 53 -0.13987606970494 Another PF number Internal Library 5 50 -7.32271642948761 Another PF number Internal Library 1 35 3.41709994966454 Another PF number Internal Library 3 22 9.57829190295786 Another PF number Internal Library 1 22 10.0504136444794 Another PF number External Library 1 20 22.8956131731457 Another PF number External Library 2 19 18.8320528385981 Another PF number Internal Library 3 15 12.1074842056827 Another PF number Internal Library 1 14 54.6179465790837 Another PF number Internal Library 3 13 16.6244027916311 Another PF number Internal Library 1 12 6.74173586963795 Another PF number Internal Library 1 12 17.0964105412622 Another PF number Internal Library 1 11 58.7994305333701 Another PF number External Library 3 10 58.2193181435516 Another PF number Internal Library 1 10 12.5102031415206 Another PF number Internal Library 6 9 19.4093857882624 Another PF number Internal Library 1 8 20.6383651456158 Another PF number External Library 3 8 73.0633503114444 Another PF number External Library 4 8 18.5429747446516 Another PF number Internal Library 1 8 36.9730841061725 Another PF number Internal Library 1 7 34.8859762378176 Another PF number Internal Library 3 7 17.3617539873978 Another PF number Internal Library 1 7 30.8582847036755 Another PF number Internal Library 1 7 41.848859585633 Another PF number External Library 5 7 25.8587564812026 Another PF number External Library 6 6 33.5395919145182 Another PF number External Library 7 6 39.9074521984672 Another PF number External Library 8 6 32.3248563198852 Another PF number External Library 9 6 23.9421542596281 Another PF number External Library 10 6 95.1965176091739 Another PF number Internal Library 1 6 53.8715604224809 Another PF number Internal Library 1 6 28.2709230508615 Another PF number Internal Library 1 6 48.1827060771728 Another PF number Internal Library 1 6 17.7689907755174 Another PF number Internal Library 7 6 57.5694207876578 Another PF number Internal Library 7 6 53.9529913359943 Another PF number Internal Library 7 6 56.184768481901 compound_ID correct library_id ranked_as Bayesian score
  • 16. PF-A Amide formation Monomer 2 Monomer 1 + No registration error Worst mispredicted: PF-A General remark: in-house libraries have broad scope, therefore harder to predict Internal library 1 29,800 compounds registered, monomers known for 28,670 120 of these contain Monomer 1, but only 1 compound contains Monomer 2: PF-A is atypical product
  • 17.
  • 18.
  • 19. Recall of known library id as function of model Exclude singleton library O(6) Random x%: 9452 Top 5 predicted library ID compared with real library ID 205, 2.2% 9247, 97.8% 5692, 60% ECFP_Enrichment 85, 0.9% 9367, 99.1% 8920, 94% FCFP 13, 0.1% 9439, 99.9% 8372, 89% ECFP_EstPGood 108, 1.1% 9344, 98.9% 6093, 64% FCFP_Enrichment 9441, 99.9% 9411, 99.6% Found in top 5 11, 0.1% 8547, 90% FCFP_EstPGood 41, 0.4% 9068, 96% ECFP Not in top 5 Found in top 1 Model Test set
  • 20.
  • 22.
  • 23. Probe is highlighted by what each library recognises 1. Singleton 2. In-house 1 3. In-house 2 4. External 1 5. External 2 6. External 3 7. External 4 8. External 5 9. External 6 10. External 7 11. External 8 12. External 9 13. External 10 14. External 11 15. External 12 16. External 13 In-house 2 yields compounds similar to left hand site of probe In-house 1 yields compounds similar to right hand site of probe
  • 24. Highlighted probes compared to actual compounds retrieved 2. In-house 1 3. In-house 2 4. External 1
  • 25. How about the singleton file?
  • 26.
  • 27.
  • 28. Pfizer solids and vendor compounds have been mapped to libraries 7 unmapped libraries 6 unmapped libraries O(5) solid samples for which no liquid sample is available O(4) O(5) O(6) structures from ChemNavigator not in Pfizer files Singleton Singleton
  • 29.
  • 31.
  • 33.
  • 34.