SlideShare a Scribd company logo
1 of 26
Predicting Activity Cliffs - Can We Use Machine Learning for Special Cases? Rajarshi Guha NIH Center for Translational Therapeutics August 4, 2011 Joint Statistical Meeting, Miami Beach
Outline Structure-activity landscapes Characterization Prediction
Structure Activity Relationships Similar molecules will have similar activities Small changes in structure will lead to small changes in activity One implication is that SAR’s are additive This is the basis for QSAR modeling Martin, Y.C. et al., J. Med. Chem., 2002, 45, 4350–4358
Exceptions Are Easy to Find Ki = 39.0 nM Ki = 1.8 nM Ki = 10.0 nM Ki = 1.0 nM Tran, J.A. et al., Bioorg. Med. Chem. Lett., 2007, 15, 5166–5176
Structure Activity Landscapes Rugged gorges or rolling hills? Small structural changes associated with large activity changes represent steep slopes in the landscape But traditionally, QSAR assumes gentle slopes Machine learning is not very good for special cases Maggiora, G.M., J. Chem. Inf. Model., 2006, 46, 1535–1535
Characterizing the Landscape A cliff can be numerically characterized Structure Activity Landscape Index (SALI) Cliffs are characterized by elements of the matrix with very large values Guha, R.; Van Drie, J.H., J. Chem. Inf. Model., 2008, 48, 646–658
Visualizing SALI Values The SALI graph Compounds are nodes Nodes i,j are connected if SALI(i,j) > X Only display connected nodes
What Can We Do With SALI’s? SALI characterizes cliffs & non-cliffs For a  given molecular representation, SALI’s gives us an idea of  thesmoothness of the SAR landscape Models try and encodethis landscape Use the landscape to guidedescriptor or model selection
Descriptor Space Smoothness Edge count of the SALI graph for varying cutoffs Measures smoothness of the descriptor space Can reduce this to a single number (AUC)
Feature Selection Using SALI Instead of fingerprints, we use molecular descriptors SALI denominator now uses Euclidean distance 2D & 3D random descriptor sets None are really good Too rough, or Too flat 2D 3D
Measuring Model Quality A QSAR model should easily encode the “rolling hills” A good model captures the most significantcliffs Can be formalized as  How many of the edge orderings of a SALI graph 	      	 does the model predict correctly? Define S (X ), representing the number of edges correctly predicted for a SALI network at a threshold X Repeat for varying X and obtain the SALI curve
SALI Curves
Predicting the Landscape Rather than predicting activity directly, we can try to predict the SAR landscape Implies that we attempt to directly predict cliffs Observations are now pairs of molecules A more complex problem Choice of features is trickier Still face the problem of cliffs as outliers Somewhat similar to predicting activity differences Scheiber et al, Statistical Analysis and Data Mining, 2009, 2, 115-122
Motivation Predicting activity cliffs corresponds to extending the SAR landscape Identify whether a new molecule will perform better or worse compared to the specific molecules in the dataset Can be useful for guiding lead optimization, but not necessarily useful for lead hopping
Predicting Cliffs Dependent variable are pairwise SALI values, calculated using fingerprints Independent variables are molecular descriptors – but considered pairwise Absolute difference of descriptor pairs, or Geometric mean of descriptor pairs … Develop a model to correlate pairwise descriptors to pairwise SALI values
A Test Case We first consider the CavalliCoMFA dataset of 30 molecules with pIC50’s Evaluate topological and physicochemical descriptors Developed random forest models On the original observed values (30 obs) On the SALI values (435 observations) Cavalli, A. et al, J Med Chem, 2002, 45, 3844-3853
Double Counting Structures? The dependent and independent variables both encode structure.  But pretty low correlations between individual pairwisedescriptors and the SALI values
Model  Summaries Original pIC50 RMSE = 0.97 SALI, AbsDiff RMSE = 1.10 SALI, GeoMean RMSE = 1.04 All models explain similar % of variance of their respective datasets  Using geometric mean as the descriptor aggregation function seems to perform best SALI models are more robust due to larger size of the dataset
Test Case 2 Considered the Holloway docking dataset, 32 molecules with pIC50’s and Einter Similar strategy as before Need to transform SALI values  Descriptors show minimal correlation Holloway, M.K. et al, J Med Chem, 1995, 38, 305-317
Model  Summaries Original pIC50 RMSE = 1.05 SALI, AbsDiff RMSE = 0.48 SALI, GeoMean RMSE = 0.48 The SALI models perform much poorer in terms of  % of variance explained Descriptor aggregation method does not seem to have much effect The SALI models appear to perform decently on the cliffs – but misses the most significant
Model  Summaries Original pIC50 RMSE = 1.05 SALI, AbsDiff RMSE = 9.76 SALI, GeoMean RMSE = 10.01 With untransformed SALI values, models perform similarly in terms of  % of variance explained The most significant cliffs correspond to stereoisomers
Test Case 3 38 adenosine receptor antagonists with reported Ki values; use 35 for training and 3 for testing Random forest model on the SALI values performed reasonable well (RMSE = 7.51, R2=0.62) Upper end ofSALI rangeis better predicted Kalla, R.V. et al, J. Med. Chem., 2006, 48, 1984-2008
Test Case 3 ,[object Object]
Generally, performance is poorer for smaller cliffsFor any given hold out molecule, range of error in SALI prediction is large Suggests that some form of domain applicability metric would be useful
Model Caveats Models based on SALI values are dependent on their being an SAR in the original activity data Scrambling results for these models are poorer than the original models but aren’t as random as expected
Conclusions SALI is the first step in characterizing the SAR landscape Allows us to directly analyze the landscape, as opposed to individual molecules Being able to predict the landscape could serve as a useful way to extend an SAR  landscape

More Related Content

Similar to Predicting Activity Cliffs - Can Machine Learning Handle Special Cases?

A Network Visualization of Structure Activity Landscapes
A Network Visualization of Structure Activity LandscapesA Network Visualization of Structure Activity Landscapes
A Network Visualization of Structure Activity LandscapesRajarshi Guha
 
ProjectWriteupforClass (3)
ProjectWriteupforClass (3)ProjectWriteupforClass (3)
ProjectWriteupforClass (3)Jeff Lail
 
International Journal of Quantum Chemistry
International Journal of Quantum ChemistryInternational Journal of Quantum Chemistry
International Journal of Quantum Chemistryspeterangelo
 
Penalized Regressions with Different Tuning Parameter Choosing Criteria and t...
Penalized Regressions with Different Tuning Parameter Choosing Criteria and t...Penalized Regressions with Different Tuning Parameter Choosing Criteria and t...
Penalized Regressions with Different Tuning Parameter Choosing Criteria and t...CSCJournals
 
Model Calibration and Uncertainty Analysis
Model Calibration and Uncertainty AnalysisModel Calibration and Uncertainty Analysis
Model Calibration and Uncertainty AnalysisJ Boisvert-Chouinard
 
Method Comparison Studies of OLS-Bisector Regression and Ranged Major Axis Re...
Method Comparison Studies of OLS-Bisector Regression and Ranged Major Axis Re...Method Comparison Studies of OLS-Bisector Regression and Ranged Major Axis Re...
Method Comparison Studies of OLS-Bisector Regression and Ranged Major Axis Re...Zhaoce Liu
 
Performance analysis of regularized linear regression models for oxazolines a...
Performance analysis of regularized linear regression models for oxazolines a...Performance analysis of regularized linear regression models for oxazolines a...
Performance analysis of regularized linear regression models for oxazolines a...ijcsity
 
Molecular design: One step back and two paths forward
Molecular design:  One step back and two paths forwardMolecular design:  One step back and two paths forward
Molecular design: One step back and two paths forwardPeter Kenny
 
Bayesian Analysis Influences Autoregressive Models
Bayesian Analysis Influences Autoregressive ModelsBayesian Analysis Influences Autoregressive Models
Bayesian Analysis Influences Autoregressive ModelsAI Publications
 
QSAR quantitative structure activity relationship
QSAR quantitative structure activity relationship QSAR quantitative structure activity relationship
QSAR quantitative structure activity relationship ZarlishAttique1
 
Prediction Of Bioactivity From Chemical Structure
Prediction Of Bioactivity From Chemical StructurePrediction Of Bioactivity From Chemical Structure
Prediction Of Bioactivity From Chemical StructureJeremy Besnard
 
Toward a Unified Approach to Fitting Loss Models
Toward a Unified Approach to Fitting Loss ModelsToward a Unified Approach to Fitting Loss Models
Toward a Unified Approach to Fitting Loss ModelsJacques Rioux
 
ALPHA LOGARITHM TRANSFORMED SEMI LOGISTIC DISTRIBUTION USING MAXIMUM LIKELIH...
ALPHA LOGARITHM TRANSFORMED SEMI LOGISTIC  DISTRIBUTION USING MAXIMUM LIKELIH...ALPHA LOGARITHM TRANSFORMED SEMI LOGISTIC  DISTRIBUTION USING MAXIMUM LIKELIH...
ALPHA LOGARITHM TRANSFORMED SEMI LOGISTIC DISTRIBUTION USING MAXIMUM LIKELIH...BRNSS Publication Hub
 
Tales of correlation inflation (2013 CADD GRC)
Tales of correlation inflation (2013 CADD GRC) Tales of correlation inflation (2013 CADD GRC)
Tales of correlation inflation (2013 CADD GRC) Peter Kenny
 

Similar to Predicting Activity Cliffs - Can Machine Learning Handle Special Cases? (20)

A Network Visualization of Structure Activity Landscapes
A Network Visualization of Structure Activity LandscapesA Network Visualization of Structure Activity Landscapes
A Network Visualization of Structure Activity Landscapes
 
ProjectWriteupforClass (3)
ProjectWriteupforClass (3)ProjectWriteupforClass (3)
ProjectWriteupforClass (3)
 
International Journal of Quantum Chemistry
International Journal of Quantum ChemistryInternational Journal of Quantum Chemistry
International Journal of Quantum Chemistry
 
Penalized Regressions with Different Tuning Parameter Choosing Criteria and t...
Penalized Regressions with Different Tuning Parameter Choosing Criteria and t...Penalized Regressions with Different Tuning Parameter Choosing Criteria and t...
Penalized Regressions with Different Tuning Parameter Choosing Criteria and t...
 
Model Calibration and Uncertainty Analysis
Model Calibration and Uncertainty AnalysisModel Calibration and Uncertainty Analysis
Model Calibration and Uncertainty Analysis
 
Method Comparison Studies of OLS-Bisector Regression and Ranged Major Axis Re...
Method Comparison Studies of OLS-Bisector Regression and Ranged Major Axis Re...Method Comparison Studies of OLS-Bisector Regression and Ranged Major Axis Re...
Method Comparison Studies of OLS-Bisector Regression and Ranged Major Axis Re...
 
StatsModelling
StatsModellingStatsModelling
StatsModelling
 
beven 1996.pdf
beven 1996.pdfbeven 1996.pdf
beven 1996.pdf
 
Performance analysis of regularized linear regression models for oxazolines a...
Performance analysis of regularized linear regression models for oxazolines a...Performance analysis of regularized linear regression models for oxazolines a...
Performance analysis of regularized linear regression models for oxazolines a...
 
Molecular design: One step back and two paths forward
Molecular design:  One step back and two paths forwardMolecular design:  One step back and two paths forward
Molecular design: One step back and two paths forward
 
BrazMedChem2014
BrazMedChem2014BrazMedChem2014
BrazMedChem2014
 
Bayesian Analysis Influences Autoregressive Models
Bayesian Analysis Influences Autoregressive ModelsBayesian Analysis Influences Autoregressive Models
Bayesian Analysis Influences Autoregressive Models
 
MagnientSagautDeville_PoF_2001
MagnientSagautDeville_PoF_2001MagnientSagautDeville_PoF_2001
MagnientSagautDeville_PoF_2001
 
QSAR quantitative structure activity relationship
QSAR quantitative structure activity relationship QSAR quantitative structure activity relationship
QSAR quantitative structure activity relationship
 
Prediction Of Bioactivity From Chemical Structure
Prediction Of Bioactivity From Chemical StructurePrediction Of Bioactivity From Chemical Structure
Prediction Of Bioactivity From Chemical Structure
 
Toward a Unified Approach to Fitting Loss Models
Toward a Unified Approach to Fitting Loss ModelsToward a Unified Approach to Fitting Loss Models
Toward a Unified Approach to Fitting Loss Models
 
Data Analyst - Interview Guide
Data Analyst - Interview GuideData Analyst - Interview Guide
Data Analyst - Interview Guide
 
ALPHA LOGARITHM TRANSFORMED SEMI LOGISTIC DISTRIBUTION USING MAXIMUM LIKELIH...
ALPHA LOGARITHM TRANSFORMED SEMI LOGISTIC  DISTRIBUTION USING MAXIMUM LIKELIH...ALPHA LOGARITHM TRANSFORMED SEMI LOGISTIC  DISTRIBUTION USING MAXIMUM LIKELIH...
ALPHA LOGARITHM TRANSFORMED SEMI LOGISTIC DISTRIBUTION USING MAXIMUM LIKELIH...
 
Tales of correlation inflation (2013 CADD GRC)
Tales of correlation inflation (2013 CADD GRC) Tales of correlation inflation (2013 CADD GRC)
Tales of correlation inflation (2013 CADD GRC)
 
beven 2001.pdf
beven 2001.pdfbeven 2001.pdf
beven 2001.pdf
 

More from Rajarshi Guha

Pharos: A Torch to Use in Your Journey in the Dark Genome
Pharos: A Torch to Use in Your Journey in the Dark GenomePharos: A Torch to Use in Your Journey in the Dark Genome
Pharos: A Torch to Use in Your Journey in the Dark GenomeRajarshi Guha
 
Pharos: Putting targets in context
Pharos: Putting targets in contextPharos: Putting targets in context
Pharos: Putting targets in contextRajarshi Guha
 
Pharos – A Torch to Use in Your Journey In the Dark Genome
Pharos – A Torch to Use in Your Journey In the Dark GenomePharos – A Torch to Use in Your Journey In the Dark Genome
Pharos – A Torch to Use in Your Journey In the Dark GenomeRajarshi Guha
 
Pharos - Face of the KMC
Pharos - Face of the KMCPharos - Face of the KMC
Pharos - Face of the KMCRajarshi Guha
 
Enhancing Prioritization & Discovery of Novel Combinations using an HTS Platform
Enhancing Prioritization & Discovery of Novel Combinations using an HTS PlatformEnhancing Prioritization & Discovery of Novel Combinations using an HTS Platform
Enhancing Prioritization & Discovery of Novel Combinations using an HTS PlatformRajarshi Guha
 
What can your library do for you?
What can your library do for you?What can your library do for you?
What can your library do for you?Rajarshi Guha
 
So I have an SD File … What do I do next?
So I have an SD File … What do I do next?So I have an SD File … What do I do next?
So I have an SD File … What do I do next?Rajarshi Guha
 
Characterization of Chemical Libraries Using Scaffolds and Network Models
Characterization of Chemical Libraries Using Scaffolds and Network ModelsCharacterization of Chemical Libraries Using Scaffolds and Network Models
Characterization of Chemical Libraries Using Scaffolds and Network ModelsRajarshi Guha
 
From Data to Action : Bridging Chemistry and Biology with Informatics at NCATS
From Data to Action: Bridging Chemistry and Biology with Informatics at NCATSFrom Data to Action: Bridging Chemistry and Biology with Informatics at NCATS
From Data to Action : Bridging Chemistry and Biology with Informatics at NCATSRajarshi Guha
 
Robots, Small Molecules & R
Robots, Small Molecules & RRobots, Small Molecules & R
Robots, Small Molecules & RRajarshi Guha
 
Fingerprinting Chemical Structures
Fingerprinting Chemical StructuresFingerprinting Chemical Structures
Fingerprinting Chemical StructuresRajarshi Guha
 
Exploring Compound Combinations in High Throughput Settings: Going Beyond 1D...
Exploring Compound Combinations in High Throughput Settings: Going Beyond 1D...Exploring Compound Combinations in High Throughput Settings: Going Beyond 1D...
Exploring Compound Combinations in High Throughput Settings: Going Beyond 1D...Rajarshi Guha
 
When the whole is better than the parts
When the whole is better than the partsWhen the whole is better than the parts
When the whole is better than the partsRajarshi Guha
 
Exploring Compound Combinations in High Throughput Settings: Going Beyond 1D ...
Exploring Compound Combinations in High Throughput Settings: Going Beyond 1D ...Exploring Compound Combinations in High Throughput Settings: Going Beyond 1D ...
Exploring Compound Combinations in High Throughput Settings: Going Beyond 1D ...Rajarshi Guha
 
Pushing Chemical Biology Through the Pipes
Pushing Chemical Biology Through the PipesPushing Chemical Biology Through the Pipes
Pushing Chemical Biology Through the PipesRajarshi Guha
 
Characterization and visualization of compound combination responses in a hig...
Characterization and visualization of compound combination responses in a hig...Characterization and visualization of compound combination responses in a hig...
Characterization and visualization of compound combination responses in a hig...Rajarshi Guha
 
The BioAssay Research Database
The BioAssay Research DatabaseThe BioAssay Research Database
The BioAssay Research DatabaseRajarshi Guha
 
Cloudy with a Touch of Cheminformatics
Cloudy with a Touch of CheminformaticsCloudy with a Touch of Cheminformatics
Cloudy with a Touch of CheminformaticsRajarshi Guha
 
Chemical Data Mining: Open Source & Reproducible
Chemical Data Mining: Open Source & ReproducibleChemical Data Mining: Open Source & Reproducible
Chemical Data Mining: Open Source & ReproducibleRajarshi Guha
 
Chemogenomics in the cloud: Is the sky the limit?
Chemogenomics in the cloud: Is the sky the limit?Chemogenomics in the cloud: Is the sky the limit?
Chemogenomics in the cloud: Is the sky the limit?Rajarshi Guha
 

More from Rajarshi Guha (20)

Pharos: A Torch to Use in Your Journey in the Dark Genome
Pharos: A Torch to Use in Your Journey in the Dark GenomePharos: A Torch to Use in Your Journey in the Dark Genome
Pharos: A Torch to Use in Your Journey in the Dark Genome
 
Pharos: Putting targets in context
Pharos: Putting targets in contextPharos: Putting targets in context
Pharos: Putting targets in context
 
Pharos – A Torch to Use in Your Journey In the Dark Genome
Pharos – A Torch to Use in Your Journey In the Dark GenomePharos – A Torch to Use in Your Journey In the Dark Genome
Pharos – A Torch to Use in Your Journey In the Dark Genome
 
Pharos - Face of the KMC
Pharos - Face of the KMCPharos - Face of the KMC
Pharos - Face of the KMC
 
Enhancing Prioritization & Discovery of Novel Combinations using an HTS Platform
Enhancing Prioritization & Discovery of Novel Combinations using an HTS PlatformEnhancing Prioritization & Discovery of Novel Combinations using an HTS Platform
Enhancing Prioritization & Discovery of Novel Combinations using an HTS Platform
 
What can your library do for you?
What can your library do for you?What can your library do for you?
What can your library do for you?
 
So I have an SD File … What do I do next?
So I have an SD File … What do I do next?So I have an SD File … What do I do next?
So I have an SD File … What do I do next?
 
Characterization of Chemical Libraries Using Scaffolds and Network Models
Characterization of Chemical Libraries Using Scaffolds and Network ModelsCharacterization of Chemical Libraries Using Scaffolds and Network Models
Characterization of Chemical Libraries Using Scaffolds and Network Models
 
From Data to Action : Bridging Chemistry and Biology with Informatics at NCATS
From Data to Action: Bridging Chemistry and Biology with Informatics at NCATSFrom Data to Action: Bridging Chemistry and Biology with Informatics at NCATS
From Data to Action : Bridging Chemistry and Biology with Informatics at NCATS
 
Robots, Small Molecules & R
Robots, Small Molecules & RRobots, Small Molecules & R
Robots, Small Molecules & R
 
Fingerprinting Chemical Structures
Fingerprinting Chemical StructuresFingerprinting Chemical Structures
Fingerprinting Chemical Structures
 
Exploring Compound Combinations in High Throughput Settings: Going Beyond 1D...
Exploring Compound Combinations in High Throughput Settings: Going Beyond 1D...Exploring Compound Combinations in High Throughput Settings: Going Beyond 1D...
Exploring Compound Combinations in High Throughput Settings: Going Beyond 1D...
 
When the whole is better than the parts
When the whole is better than the partsWhen the whole is better than the parts
When the whole is better than the parts
 
Exploring Compound Combinations in High Throughput Settings: Going Beyond 1D ...
Exploring Compound Combinations in High Throughput Settings: Going Beyond 1D ...Exploring Compound Combinations in High Throughput Settings: Going Beyond 1D ...
Exploring Compound Combinations in High Throughput Settings: Going Beyond 1D ...
 
Pushing Chemical Biology Through the Pipes
Pushing Chemical Biology Through the PipesPushing Chemical Biology Through the Pipes
Pushing Chemical Biology Through the Pipes
 
Characterization and visualization of compound combination responses in a hig...
Characterization and visualization of compound combination responses in a hig...Characterization and visualization of compound combination responses in a hig...
Characterization and visualization of compound combination responses in a hig...
 
The BioAssay Research Database
The BioAssay Research DatabaseThe BioAssay Research Database
The BioAssay Research Database
 
Cloudy with a Touch of Cheminformatics
Cloudy with a Touch of CheminformaticsCloudy with a Touch of Cheminformatics
Cloudy with a Touch of Cheminformatics
 
Chemical Data Mining: Open Source & Reproducible
Chemical Data Mining: Open Source & ReproducibleChemical Data Mining: Open Source & Reproducible
Chemical Data Mining: Open Source & Reproducible
 
Chemogenomics in the cloud: Is the sky the limit?
Chemogenomics in the cloud: Is the sky the limit?Chemogenomics in the cloud: Is the sky the limit?
Chemogenomics in the cloud: Is the sky the limit?
 

Recently uploaded

Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupFlorian Wilhelm
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machinePadma Pradeep
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brandgvaughan
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsRizwan Syed
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticscarlostorres15106
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii SoldatenkoFwdays
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebUiPathCommunity
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Mattias Andersson
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr BaganFwdays
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Enterprise Knowledge
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsMiki Katsuragi
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Scott Keck-Warren
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesSinan KOZAK
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 3652toLead Limited
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsMemoori
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024Lorenzo Miniero
 
Pigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions
 

Recently uploaded (20)

Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project Setup
 
Hot Sexy call girls in Panjabi Bagh 🔝 9953056974 🔝 Delhi escort Service
Hot Sexy call girls in Panjabi Bagh 🔝 9953056974 🔝 Delhi escort ServiceHot Sexy call girls in Panjabi Bagh 🔝 9953056974 🔝 Delhi escort Service
Hot Sexy call girls in Panjabi Bagh 🔝 9953056974 🔝 Delhi escort Service
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machine
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brand
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL Certs
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio Web
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering Tips
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024
 
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptxE-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen Frames
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial Buildings
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024
 
DMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special EditionDMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special Edition
 
Pigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping Elbows
 

Predicting Activity Cliffs - Can Machine Learning Handle Special Cases?

  • 1. Predicting Activity Cliffs - Can We Use Machine Learning for Special Cases? Rajarshi Guha NIH Center for Translational Therapeutics August 4, 2011 Joint Statistical Meeting, Miami Beach
  • 2. Outline Structure-activity landscapes Characterization Prediction
  • 3. Structure Activity Relationships Similar molecules will have similar activities Small changes in structure will lead to small changes in activity One implication is that SAR’s are additive This is the basis for QSAR modeling Martin, Y.C. et al., J. Med. Chem., 2002, 45, 4350–4358
  • 4. Exceptions Are Easy to Find Ki = 39.0 nM Ki = 1.8 nM Ki = 10.0 nM Ki = 1.0 nM Tran, J.A. et al., Bioorg. Med. Chem. Lett., 2007, 15, 5166–5176
  • 5. Structure Activity Landscapes Rugged gorges or rolling hills? Small structural changes associated with large activity changes represent steep slopes in the landscape But traditionally, QSAR assumes gentle slopes Machine learning is not very good for special cases Maggiora, G.M., J. Chem. Inf. Model., 2006, 46, 1535–1535
  • 6. Characterizing the Landscape A cliff can be numerically characterized Structure Activity Landscape Index (SALI) Cliffs are characterized by elements of the matrix with very large values Guha, R.; Van Drie, J.H., J. Chem. Inf. Model., 2008, 48, 646–658
  • 7. Visualizing SALI Values The SALI graph Compounds are nodes Nodes i,j are connected if SALI(i,j) > X Only display connected nodes
  • 8. What Can We Do With SALI’s? SALI characterizes cliffs & non-cliffs For a given molecular representation, SALI’s gives us an idea of thesmoothness of the SAR landscape Models try and encodethis landscape Use the landscape to guidedescriptor or model selection
  • 9. Descriptor Space Smoothness Edge count of the SALI graph for varying cutoffs Measures smoothness of the descriptor space Can reduce this to a single number (AUC)
  • 10. Feature Selection Using SALI Instead of fingerprints, we use molecular descriptors SALI denominator now uses Euclidean distance 2D & 3D random descriptor sets None are really good Too rough, or Too flat 2D 3D
  • 11. Measuring Model Quality A QSAR model should easily encode the “rolling hills” A good model captures the most significantcliffs Can be formalized as How many of the edge orderings of a SALI graph does the model predict correctly? Define S (X ), representing the number of edges correctly predicted for a SALI network at a threshold X Repeat for varying X and obtain the SALI curve
  • 13. Predicting the Landscape Rather than predicting activity directly, we can try to predict the SAR landscape Implies that we attempt to directly predict cliffs Observations are now pairs of molecules A more complex problem Choice of features is trickier Still face the problem of cliffs as outliers Somewhat similar to predicting activity differences Scheiber et al, Statistical Analysis and Data Mining, 2009, 2, 115-122
  • 14. Motivation Predicting activity cliffs corresponds to extending the SAR landscape Identify whether a new molecule will perform better or worse compared to the specific molecules in the dataset Can be useful for guiding lead optimization, but not necessarily useful for lead hopping
  • 15. Predicting Cliffs Dependent variable are pairwise SALI values, calculated using fingerprints Independent variables are molecular descriptors – but considered pairwise Absolute difference of descriptor pairs, or Geometric mean of descriptor pairs … Develop a model to correlate pairwise descriptors to pairwise SALI values
  • 16. A Test Case We first consider the CavalliCoMFA dataset of 30 molecules with pIC50’s Evaluate topological and physicochemical descriptors Developed random forest models On the original observed values (30 obs) On the SALI values (435 observations) Cavalli, A. et al, J Med Chem, 2002, 45, 3844-3853
  • 17. Double Counting Structures? The dependent and independent variables both encode structure. But pretty low correlations between individual pairwisedescriptors and the SALI values
  • 18. Model Summaries Original pIC50 RMSE = 0.97 SALI, AbsDiff RMSE = 1.10 SALI, GeoMean RMSE = 1.04 All models explain similar % of variance of their respective datasets Using geometric mean as the descriptor aggregation function seems to perform best SALI models are more robust due to larger size of the dataset
  • 19. Test Case 2 Considered the Holloway docking dataset, 32 molecules with pIC50’s and Einter Similar strategy as before Need to transform SALI values Descriptors show minimal correlation Holloway, M.K. et al, J Med Chem, 1995, 38, 305-317
  • 20. Model Summaries Original pIC50 RMSE = 1.05 SALI, AbsDiff RMSE = 0.48 SALI, GeoMean RMSE = 0.48 The SALI models perform much poorer in terms of % of variance explained Descriptor aggregation method does not seem to have much effect The SALI models appear to perform decently on the cliffs – but misses the most significant
  • 21. Model Summaries Original pIC50 RMSE = 1.05 SALI, AbsDiff RMSE = 9.76 SALI, GeoMean RMSE = 10.01 With untransformed SALI values, models perform similarly in terms of % of variance explained The most significant cliffs correspond to stereoisomers
  • 22. Test Case 3 38 adenosine receptor antagonists with reported Ki values; use 35 for training and 3 for testing Random forest model on the SALI values performed reasonable well (RMSE = 7.51, R2=0.62) Upper end ofSALI rangeis better predicted Kalla, R.V. et al, J. Med. Chem., 2006, 48, 1984-2008
  • 23.
  • 24. Generally, performance is poorer for smaller cliffsFor any given hold out molecule, range of error in SALI prediction is large Suggests that some form of domain applicability metric would be useful
  • 25. Model Caveats Models based on SALI values are dependent on their being an SAR in the original activity data Scrambling results for these models are poorer than the original models but aren’t as random as expected
  • 26. Conclusions SALI is the first step in characterizing the SAR landscape Allows us to directly analyze the landscape, as opposed to individual molecules Being able to predict the landscape could serve as a useful way to extend an SAR landscape
  • 27. Acknowledgements John Van Drie Gerry Maggiora MicLajiness JurgenBajorath

Editor's Notes

  1. Outliers in a cliff prediction model are not as severe since SALI changes more slowly than just activity differences
  2. For SALI = 0, had to set log10(SALI) = 0Similar performance if we use SALI and not log10(SALI) at least more % variance is explained. Still fail on most significant cliffs