SlideShare a Scribd company logo
DEPARTMENT OF ELECTRICAL AND COMPUTER ENGINEERING
Advanced Circuits, Architecture, and Computing Lab
Molecular Docking for Drug Discovery:
Machine-Learning Approaches for Native
Pose Prediction for Protein-Ligand Complexes
Hossam M. Ashtawy
ashtawy@egr.msu.edu
Tenth International Meeting on Computational Intelligence Methods for
Bioinformatics and Biostatistics
(CIBB 2013)
June 20, 2013
Nihar R. Mahapatra
nrm@egr.msu.edu
Department of Electrical & Computer Engineering
Michigan State University, East Lansing, MI, U.S.A.
© 2013
 Accurately predicting BA of large sets of diverse
protein-ligand complexes remains one of the
most challenging unsolved problems in
computational bimolecular science
 Conventional SFs have been shown to have
limited predictive and docking power
 Size and diversity of protein-ligand complexes
with known experimental BA is limited.
Large and diverse datasets of protein-ligand
complexes help in building more accurate statistical-
based SFs
Motivation
2
• Motivation
• Background and Scope of Work
– Scoring Functions
– Our Approach and Scope of Work
• Materials and Methods
– Compound Database and Characterization
– Machine Learning Methods
• Experiments, Results, and Discussion
– Tuning, Training, and Testing Scoring Functions
– Evaluation and Comparison of Scoring Functions
• Concluding Remarks
Outline
3
Background and Scope of Work
4
Docking & Scoring
5
 Lack of accurate accounting of
intermolecular physicochemical interactions
 Imprecise solvent modeling
 Uncertainties in collected experimental
affinity data
 Inability to capture inherent nonlinear
relationships correlating intermolecular
interactions to binding affinity or native
binding pose
Scoring & Docking Challenges
6
 Predict the binding pose explicitly.
 Use sophisticated machine-learning methods
to model closeness of a pose to the native
conformation.
 Use this nonparametric technique in
conjunction with physiochemical features
describing intermolecular interactions
between proteins and ligands
 Train predictive models on a large and
diverse dataset of high-quality protein-ligand
complexes
 Evaluate the docking accuracies of resulting
SF on diverse protein families
Our Approach & Scope of Work
7
Materials and Methods
8
Compound Database: PDBbind
[1]
9
 Protein-ligand complexes obtained from
PDBbind 2007
 PDBbind is a selective compilation of the
Protein Data Bank (PDB) database
Compound Database: PDBbind
10
PDB
Ligand’s
MW ≤ 1000
# non-hydrogen
atoms of the
ligand ≥ 6
Only one
ligand is
bound to
the protein
Protein &
ligand non-
covalently
bound
Resolution of the complex
crystal structure ≤ 2.5Å
Elements in complex
must be C, N, O, P, S, F,
Cl, Br, I, H
Known Kd or Ki
Hydrogenation
Protonation &
deprotonation
Refined set
of PDBbind
PDBbind: Refined Set
11
PDBbind: Core Set
12
Refin
ed
set
Similarity
search using
BLAST
Similarity
cutoff of
90%
Clusters
with ≥ 4
complexes
Binding affinity of highest-
affinity complex is 100-
fold the affinity of lowest
one
First, middle, and
lowest affinity
complexes from
each cluster
Core Set in
PDBbind
[2]
Decoy Generation
13
A protein-
ligand Complex
Generate a
random low-
energy
conformation
Generate ~2000
conformations using 4
different docking
protocols
Discard poses
> 10Å from
native pose
Group poses into
10 1Å bins based
on their RMSD
values
Each bin is further
clustered into 10
clusters
Choose the pose
with the lowest
energy from
each sub-
cluster
100
Decoys
[2]
 Extracted features
calculated for the
following scoring
functions:
X-Score (6 features)
AffiScore (30 features).
RF-Score (36 features)
GOLD (14 features)
Compound Characterization
14
 Primary training
set : Pr
1105 (Y=BA)
39,085 (Y=RMSD)
 Core test set: Cr
16,554
Training and Test Datasets
15
 Single models
Multiple linear regression (MLR)
Multivariate adaptive regression splines (MARS)
k-Nearest neighbors (kNN)
Support vector machine (SVM)
 Ensemble models
Random forests (RF)
Boosted regression trees (BRT)
Machine Learning Methods
16
Conventional SFs
17
Software
SF Type
Discovery
Studio
SYBYL GOLD Schrodinger Standalone |SFs|
Empirical PLP
JAIN
LUDI
ChemScore
F-Score
ChemScore
ASP
GlidScore X-Score 9
Knowledge
Based
LigScore
PMF
PMF-Score DrugScore 4
Force-field D-Score
G-Score
GoldScore 3
|SFs| 5 5 3 1 2 16
Experiments, Results, and
Discussion
18
SF Construction & Application Workflow
19
SF Parameter Tuning
20
 Docking power: Measures the ability of an SF to
distinguish a promising binding mode from a
less promising one
𝑆 𝐶
𝑁
(𝑖𝑛 %)
 Success rate that accounts for the percentage of
times an SF is able to find a pose whose RMSD is
within a predefined cutoff value C Å by only
considering the N topmost poses ranked by their
predicted scores.
 C (e.g. 0, 1, 2, and 3Å) N (e.g. ,1, 2, 3, and 5)
Evaluation of Scoring Functions
21
Success rates of Conv. & ML SFs: Cr
22
> 60%
~ 50%
>70%
~80%
𝑆2
1
< 5%
Success rates of Conv. & ML SFs: Core test set
23
GOLD::ASP from 82% to 92% RF::RG from 87% to 96%
𝑆0
5
~60% 𝑆0
5
~77%
Success rates of Conv. & ML SFs: HIV & TRY
24
MLR: 𝑆1
1
=72% 𝑆3
1
= 90% MLR: 𝑆1
1
=80% 𝑆3
1
= 95%
MLR: 𝑆0
1
=50% 𝑆0
5
= 90% MARS:𝑆0
1
=48% 𝑆0
5
= 83%
MLR: 𝑆1
1
=41% 𝑆3
1
= 80% MLR: 𝑆1
1
=66% 𝑆3
1
= 90%
MARS:𝑆0
1
=36% 𝑆0
5
= 80%MARS:𝑆0
1
=23% 𝑆0
5
= 68%
Success rates of Conv. & ML SFs: CAR & THR
25
MLR: 𝑆1
1
=22% 𝑆3
1
= 53% SVM: 𝑆1
1
=32% 𝑆3
1
= 62%
MLR: 𝑆0
1
=40% 𝑆0
5
= 79%MARS:𝑆0
1
=24% 𝑆0
5
= 74%
MLR: 𝑆0
1
=15% 𝑆0
5
= 34%MARS:𝑆0
1
=9% 𝑆0
5
= 33%
MLR: 𝑆1
1
=58% 𝑆3
1
= 82% MLR: 𝑆1
1
=92% 𝑆3
1
= 95%
Conclusion
26
 ML models trained to explicitly predict RMSD
values significantly outperform all conventional
SFs
 Estimated RMSD values of such models have a
correlation of 0.7 on average with the true RMSD
values. While predicted BA’s have a correlation of
as low as 0.2 with the measured RMSD values.
 The empirical SF GOLD::ASP achieved a success
rate of 70% in identifying a pose that lies within
1Å from the native pose of 195 different
complexes.
 Our top RMSD-based SF, MARS::XARG, has a
success rate of ~80% on the same test set
Concluding Remarks
27
Thank You!
28
[1] Berman, H. et al., The Protein Data Bank, Nucleic Acids Research 28 (1) (2000) 235-242.
[2] Cheng, T., Li, X., Li, Y., Liu, Z., Wang, R.: Comparative assessment of scoring functions on a
diverse test set. Journal of Chemical Information and Modeling 49 (4) (2009) 1079–1093.
References
29

More Related Content

Viewers also liked

Evaluation of Phytochemical Constituents of the Fruits of Cucumis Sativus Lin...
Evaluation of Phytochemical Constituents of the Fruits of Cucumis Sativus Lin...Evaluation of Phytochemical Constituents of the Fruits of Cucumis Sativus Lin...
Evaluation of Phytochemical Constituents of the Fruits of Cucumis Sativus Lin...
Conferenceproceedings
 
ppt poster
ppt posterppt poster
ppt poster
Srijal Patel
 
Docking Tutorial
Docking TutorialDocking Tutorial
Docking Tutorial
Balachandramohan Bcm
 
MOLECULAR DOCKING AND RELATED DRUG DESIGN ACHIEVEMENTS
MOLECULAR DOCKING AND RELATED DRUG DESIGN ACHIEVEMENTS MOLECULAR DOCKING AND RELATED DRUG DESIGN ACHIEVEMENTS
MOLECULAR DOCKING AND RELATED DRUG DESIGN ACHIEVEMENTS
santosh Kumbhar
 
Basics Of Molecular Docking
Basics Of Molecular DockingBasics Of Molecular Docking
Basics Of Molecular Docking
Satarupa Deb
 
Structure based drug design
Structure based drug designStructure based drug design
Structure based drug design
ADAM S
 
New Drug Discovery from natural products
New Drug Discovery from natural productsNew Drug Discovery from natural products
New Drug Discovery from natural products
Mrudang Thakor
 
Computer Aided Drug Design ppt
Computer Aided Drug Design pptComputer Aided Drug Design ppt
Computer Aided Drug Design ppt
Hanumant Suryawanshi
 

Viewers also liked (8)

Evaluation of Phytochemical Constituents of the Fruits of Cucumis Sativus Lin...
Evaluation of Phytochemical Constituents of the Fruits of Cucumis Sativus Lin...Evaluation of Phytochemical Constituents of the Fruits of Cucumis Sativus Lin...
Evaluation of Phytochemical Constituents of the Fruits of Cucumis Sativus Lin...
 
ppt poster
ppt posterppt poster
ppt poster
 
Docking Tutorial
Docking TutorialDocking Tutorial
Docking Tutorial
 
MOLECULAR DOCKING AND RELATED DRUG DESIGN ACHIEVEMENTS
MOLECULAR DOCKING AND RELATED DRUG DESIGN ACHIEVEMENTS MOLECULAR DOCKING AND RELATED DRUG DESIGN ACHIEVEMENTS
MOLECULAR DOCKING AND RELATED DRUG DESIGN ACHIEVEMENTS
 
Basics Of Molecular Docking
Basics Of Molecular DockingBasics Of Molecular Docking
Basics Of Molecular Docking
 
Structure based drug design
Structure based drug designStructure based drug design
Structure based drug design
 
New Drug Discovery from natural products
New Drug Discovery from natural productsNew Drug Discovery from natural products
New Drug Discovery from natural products
 
Computer Aided Drug Design ppt
Computer Aided Drug Design pptComputer Aided Drug Design ppt
Computer Aided Drug Design ppt
 

Similar to Cibb2013

Cadd and molecular modeling for M.Pharm
Cadd and molecular modeling for M.PharmCadd and molecular modeling for M.Pharm
Cadd and molecular modeling for M.Pharm
Shikha Popali
 
In-silico structure activity relationship study of toxicity endpoints by QSAR...
In-silico structure activity relationship study of toxicity endpoints by QSAR...In-silico structure activity relationship study of toxicity endpoints by QSAR...
In-silico structure activity relationship study of toxicity endpoints by QSAR...
Kamel Mansouri
 
The Infobiotics Contact Map predictor at CASP9
The Infobiotics Contact Map predictor at CASP9The Infobiotics Contact Map predictor at CASP9
The Infobiotics Contact Map predictor at CASP9
jaumebp
 
Data Mining Protein Structures' Topological Properties to Enhance Contact Ma...
Data Mining Protein Structures' Topological Properties  to Enhance Contact Ma...Data Mining Protein Structures' Topological Properties  to Enhance Contact Ma...
Data Mining Protein Structures' Topological Properties to Enhance Contact Ma...
jaumebp
 
defense 2.0
defense 2.0defense 2.0
defense 2.0
Kevin Carlin
 
Gordon2003
Gordon2003Gordon2003
Gordon2003
toluene
 
Cloud Pharmaceuticals white paper.LIE_2016
Cloud Pharmaceuticals white paper.LIE_2016Cloud Pharmaceuticals white paper.LIE_2016
Cloud Pharmaceuticals white paper.LIE_2016
Shahar Keinan
 
Prediction of pIC50 Values for the Acetylcholinesterase (AChE) using QSAR Model
Prediction of pIC50 Values for the Acetylcholinesterase (AChE) using QSAR ModelPrediction of pIC50 Values for the Acetylcholinesterase (AChE) using QSAR Model
Prediction of pIC50 Values for the Acetylcholinesterase (AChE) using QSAR Model
IRJET Journal
 
cadd-191129134050 (1).pptx
cadd-191129134050 (1).pptxcadd-191129134050 (1).pptx
cadd-191129134050 (1).pptx
Noorelhuda2
 
Structure based computer aided drug design
Structure based computer aided drug designStructure based computer aided drug design
Structure based computer aided drug design
Thanh Truong
 
RENIN1.ppt
RENIN1.pptRENIN1.ppt
RENIN1.ppt
DrRajeshDas
 
Custom synthesis of adc linker payload set
Custom synthesis of adc linker payload setCustom synthesis of adc linker payload set
Custom synthesis of adc linker payload set
EchoHan4
 
Synthesis of adc linker payload
Synthesis of adc linker payloadSynthesis of adc linker payload
Synthesis of adc linker payload
Creative Biolabs
 
Custom synthesis of adc linker payload set
Custom synthesis of adc linker payload setCustom synthesis of adc linker payload set
Custom synthesis of adc linker payload set
Creative Biolabs
 
Prediction of pKa from chemical structure using free and open source tools
Prediction of pKa from chemical structure using free and open source toolsPrediction of pKa from chemical structure using free and open source tools
Prediction of pKa from chemical structure using free and open source tools
US Environmental Protection Agency (EPA), Center for Computational Toxicology and Exposure
 
Computer aided Drug designing (CADD)
Computer aided Drug designing (CADD)Computer aided Drug designing (CADD)
Computer aided Drug designing (CADD)
GlaxoSmithKline Pharma GmbH
 
The Butterfly Effect: How to see the impact of small changes to your ADC
The Butterfly Effect: How to see the impact of small changes to your ADCThe Butterfly Effect: How to see the impact of small changes to your ADC
The Butterfly Effect: How to see the impact of small changes to your ADC
MilliporeSigma
 
The Butterfly Effect: How to see the impact of small changes to your ADC
The Butterfly Effect: How to see the impact of small changes to your ADCThe Butterfly Effect: How to see the impact of small changes to your ADC
The Butterfly Effect: How to see the impact of small changes to your ADC
Merck Life Sciences
 
Webinar: New RMC - Your lead_optimization Solution June082017
Webinar: New RMC - Your lead_optimization Solution June082017Webinar: New RMC - Your lead_optimization Solution June082017
Webinar: New RMC - Your lead_optimization Solution June082017
Ann-Marie Roche
 
Swaati pro sa web
Swaati pro sa webSwaati pro sa web
Swaati pro sa web
Swati Kumari
 

Similar to Cibb2013 (20)

Cadd and molecular modeling for M.Pharm
Cadd and molecular modeling for M.PharmCadd and molecular modeling for M.Pharm
Cadd and molecular modeling for M.Pharm
 
In-silico structure activity relationship study of toxicity endpoints by QSAR...
In-silico structure activity relationship study of toxicity endpoints by QSAR...In-silico structure activity relationship study of toxicity endpoints by QSAR...
In-silico structure activity relationship study of toxicity endpoints by QSAR...
 
The Infobiotics Contact Map predictor at CASP9
The Infobiotics Contact Map predictor at CASP9The Infobiotics Contact Map predictor at CASP9
The Infobiotics Contact Map predictor at CASP9
 
Data Mining Protein Structures' Topological Properties to Enhance Contact Ma...
Data Mining Protein Structures' Topological Properties  to Enhance Contact Ma...Data Mining Protein Structures' Topological Properties  to Enhance Contact Ma...
Data Mining Protein Structures' Topological Properties to Enhance Contact Ma...
 
defense 2.0
defense 2.0defense 2.0
defense 2.0
 
Gordon2003
Gordon2003Gordon2003
Gordon2003
 
Cloud Pharmaceuticals white paper.LIE_2016
Cloud Pharmaceuticals white paper.LIE_2016Cloud Pharmaceuticals white paper.LIE_2016
Cloud Pharmaceuticals white paper.LIE_2016
 
Prediction of pIC50 Values for the Acetylcholinesterase (AChE) using QSAR Model
Prediction of pIC50 Values for the Acetylcholinesterase (AChE) using QSAR ModelPrediction of pIC50 Values for the Acetylcholinesterase (AChE) using QSAR Model
Prediction of pIC50 Values for the Acetylcholinesterase (AChE) using QSAR Model
 
cadd-191129134050 (1).pptx
cadd-191129134050 (1).pptxcadd-191129134050 (1).pptx
cadd-191129134050 (1).pptx
 
Structure based computer aided drug design
Structure based computer aided drug designStructure based computer aided drug design
Structure based computer aided drug design
 
RENIN1.ppt
RENIN1.pptRENIN1.ppt
RENIN1.ppt
 
Custom synthesis of adc linker payload set
Custom synthesis of adc linker payload setCustom synthesis of adc linker payload set
Custom synthesis of adc linker payload set
 
Synthesis of adc linker payload
Synthesis of adc linker payloadSynthesis of adc linker payload
Synthesis of adc linker payload
 
Custom synthesis of adc linker payload set
Custom synthesis of adc linker payload setCustom synthesis of adc linker payload set
Custom synthesis of adc linker payload set
 
Prediction of pKa from chemical structure using free and open source tools
Prediction of pKa from chemical structure using free and open source toolsPrediction of pKa from chemical structure using free and open source tools
Prediction of pKa from chemical structure using free and open source tools
 
Computer aided Drug designing (CADD)
Computer aided Drug designing (CADD)Computer aided Drug designing (CADD)
Computer aided Drug designing (CADD)
 
The Butterfly Effect: How to see the impact of small changes to your ADC
The Butterfly Effect: How to see the impact of small changes to your ADCThe Butterfly Effect: How to see the impact of small changes to your ADC
The Butterfly Effect: How to see the impact of small changes to your ADC
 
The Butterfly Effect: How to see the impact of small changes to your ADC
The Butterfly Effect: How to see the impact of small changes to your ADCThe Butterfly Effect: How to see the impact of small changes to your ADC
The Butterfly Effect: How to see the impact of small changes to your ADC
 
Webinar: New RMC - Your lead_optimization Solution June082017
Webinar: New RMC - Your lead_optimization Solution June082017Webinar: New RMC - Your lead_optimization Solution June082017
Webinar: New RMC - Your lead_optimization Solution June082017
 
Swaati pro sa web
Swaati pro sa webSwaati pro sa web
Swaati pro sa web
 

Recently uploaded

一比一原版(CBU毕业证)卡普顿大学毕业证如何办理
一比一原版(CBU毕业证)卡普顿大学毕业证如何办理一比一原版(CBU毕业证)卡普顿大学毕业证如何办理
一比一原版(CBU毕业证)卡普顿大学毕业证如何办理
ahzuo
 
Analysis insight about a Flyball dog competition team's performance
Analysis insight about a Flyball dog competition team's performanceAnalysis insight about a Flyball dog competition team's performance
Analysis insight about a Flyball dog competition team's performance
roli9797
 
一比一原版(Glasgow毕业证书)格拉斯哥大学毕业证如何办理
一比一原版(Glasgow毕业证书)格拉斯哥大学毕业证如何办理一比一原版(Glasgow毕业证书)格拉斯哥大学毕业证如何办理
一比一原版(Glasgow毕业证书)格拉斯哥大学毕业证如何办理
g4dpvqap0
 
一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理
一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理
一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理
g4dpvqap0
 
ViewShift: Hassle-free Dynamic Policy Enforcement for Every Data Lake
ViewShift: Hassle-free Dynamic Policy Enforcement for Every Data LakeViewShift: Hassle-free Dynamic Policy Enforcement for Every Data Lake
ViewShift: Hassle-free Dynamic Policy Enforcement for Every Data Lake
Walaa Eldin Moustafa
 
一比一原版(UofS毕业证书)萨省大学毕业证如何办理
一比一原版(UofS毕业证书)萨省大学毕业证如何办理一比一原版(UofS毕业证书)萨省大学毕业证如何办理
一比一原版(UofS毕业证书)萨省大学毕业证如何办理
v3tuleee
 
State of Artificial intelligence Report 2023
State of Artificial intelligence Report 2023State of Artificial intelligence Report 2023
State of Artificial intelligence Report 2023
kuntobimo2016
 
Population Growth in Bataan: The effects of population growth around rural pl...
Population Growth in Bataan: The effects of population growth around rural pl...Population Growth in Bataan: The effects of population growth around rural pl...
Population Growth in Bataan: The effects of population growth around rural pl...
Bill641377
 
Learn SQL from basic queries to Advance queries
Learn SQL from basic queries to Advance queriesLearn SQL from basic queries to Advance queries
Learn SQL from basic queries to Advance queries
manishkhaire30
 
End-to-end pipeline agility - Berlin Buzzwords 2024
End-to-end pipeline agility - Berlin Buzzwords 2024End-to-end pipeline agility - Berlin Buzzwords 2024
End-to-end pipeline agility - Berlin Buzzwords 2024
Lars Albertsson
 
Everything you wanted to know about LIHTC
Everything you wanted to know about LIHTCEverything you wanted to know about LIHTC
Everything you wanted to know about LIHTC
Roger Valdez
 
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
Timothy Spann
 
Global Situational Awareness of A.I. and where its headed
Global Situational Awareness of A.I. and where its headedGlobal Situational Awareness of A.I. and where its headed
Global Situational Awareness of A.I. and where its headed
vikram sood
 
Enhanced Enterprise Intelligence with your personal AI Data Copilot.pdf
Enhanced Enterprise Intelligence with your personal AI Data Copilot.pdfEnhanced Enterprise Intelligence with your personal AI Data Copilot.pdf
Enhanced Enterprise Intelligence with your personal AI Data Copilot.pdf
GetInData
 
Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You...
Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You...Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You...
Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You...
Aggregage
 
Intelligence supported media monitoring in veterinary medicine
Intelligence supported media monitoring in veterinary medicineIntelligence supported media monitoring in veterinary medicine
Intelligence supported media monitoring in veterinary medicine
AndrzejJarynowski
 
一比一原版(牛布毕业证书)牛津布鲁克斯大学毕业证如何办理
一比一原版(牛布毕业证书)牛津布鲁克斯大学毕业证如何办理一比一原版(牛布毕业证书)牛津布鲁克斯大学毕业证如何办理
一比一原版(牛布毕业证书)牛津布鲁克斯大学毕业证如何办理
74nqk8xf
 
在线办理(英国UCA毕业证书)创意艺术大学毕业证在读证明一模一样
在线办理(英国UCA毕业证书)创意艺术大学毕业证在读证明一模一样在线办理(英国UCA毕业证书)创意艺术大学毕业证在读证明一模一样
在线办理(英国UCA毕业证书)创意艺术大学毕业证在读证明一模一样
v7oacc3l
 
一比一原版(Harvard毕业证书)哈佛大学毕业证如何办理
一比一原版(Harvard毕业证书)哈佛大学毕业证如何办理一比一原版(Harvard毕业证书)哈佛大学毕业证如何办理
一比一原版(Harvard毕业证书)哈佛大学毕业证如何办理
zsjl4mimo
 
Natural Language Processing (NLP), RAG and its applications .pptx
Natural Language Processing (NLP), RAG and its applications .pptxNatural Language Processing (NLP), RAG and its applications .pptx
Natural Language Processing (NLP), RAG and its applications .pptx
fkyes25
 

Recently uploaded (20)

一比一原版(CBU毕业证)卡普顿大学毕业证如何办理
一比一原版(CBU毕业证)卡普顿大学毕业证如何办理一比一原版(CBU毕业证)卡普顿大学毕业证如何办理
一比一原版(CBU毕业证)卡普顿大学毕业证如何办理
 
Analysis insight about a Flyball dog competition team's performance
Analysis insight about a Flyball dog competition team's performanceAnalysis insight about a Flyball dog competition team's performance
Analysis insight about a Flyball dog competition team's performance
 
一比一原版(Glasgow毕业证书)格拉斯哥大学毕业证如何办理
一比一原版(Glasgow毕业证书)格拉斯哥大学毕业证如何办理一比一原版(Glasgow毕业证书)格拉斯哥大学毕业证如何办理
一比一原版(Glasgow毕业证书)格拉斯哥大学毕业证如何办理
 
一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理
一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理
一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理
 
ViewShift: Hassle-free Dynamic Policy Enforcement for Every Data Lake
ViewShift: Hassle-free Dynamic Policy Enforcement for Every Data LakeViewShift: Hassle-free Dynamic Policy Enforcement for Every Data Lake
ViewShift: Hassle-free Dynamic Policy Enforcement for Every Data Lake
 
一比一原版(UofS毕业证书)萨省大学毕业证如何办理
一比一原版(UofS毕业证书)萨省大学毕业证如何办理一比一原版(UofS毕业证书)萨省大学毕业证如何办理
一比一原版(UofS毕业证书)萨省大学毕业证如何办理
 
State of Artificial intelligence Report 2023
State of Artificial intelligence Report 2023State of Artificial intelligence Report 2023
State of Artificial intelligence Report 2023
 
Population Growth in Bataan: The effects of population growth around rural pl...
Population Growth in Bataan: The effects of population growth around rural pl...Population Growth in Bataan: The effects of population growth around rural pl...
Population Growth in Bataan: The effects of population growth around rural pl...
 
Learn SQL from basic queries to Advance queries
Learn SQL from basic queries to Advance queriesLearn SQL from basic queries to Advance queries
Learn SQL from basic queries to Advance queries
 
End-to-end pipeline agility - Berlin Buzzwords 2024
End-to-end pipeline agility - Berlin Buzzwords 2024End-to-end pipeline agility - Berlin Buzzwords 2024
End-to-end pipeline agility - Berlin Buzzwords 2024
 
Everything you wanted to know about LIHTC
Everything you wanted to know about LIHTCEverything you wanted to know about LIHTC
Everything you wanted to know about LIHTC
 
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
 
Global Situational Awareness of A.I. and where its headed
Global Situational Awareness of A.I. and where its headedGlobal Situational Awareness of A.I. and where its headed
Global Situational Awareness of A.I. and where its headed
 
Enhanced Enterprise Intelligence with your personal AI Data Copilot.pdf
Enhanced Enterprise Intelligence with your personal AI Data Copilot.pdfEnhanced Enterprise Intelligence with your personal AI Data Copilot.pdf
Enhanced Enterprise Intelligence with your personal AI Data Copilot.pdf
 
Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You...
Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You...Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You...
Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You...
 
Intelligence supported media monitoring in veterinary medicine
Intelligence supported media monitoring in veterinary medicineIntelligence supported media monitoring in veterinary medicine
Intelligence supported media monitoring in veterinary medicine
 
一比一原版(牛布毕业证书)牛津布鲁克斯大学毕业证如何办理
一比一原版(牛布毕业证书)牛津布鲁克斯大学毕业证如何办理一比一原版(牛布毕业证书)牛津布鲁克斯大学毕业证如何办理
一比一原版(牛布毕业证书)牛津布鲁克斯大学毕业证如何办理
 
在线办理(英国UCA毕业证书)创意艺术大学毕业证在读证明一模一样
在线办理(英国UCA毕业证书)创意艺术大学毕业证在读证明一模一样在线办理(英国UCA毕业证书)创意艺术大学毕业证在读证明一模一样
在线办理(英国UCA毕业证书)创意艺术大学毕业证在读证明一模一样
 
一比一原版(Harvard毕业证书)哈佛大学毕业证如何办理
一比一原版(Harvard毕业证书)哈佛大学毕业证如何办理一比一原版(Harvard毕业证书)哈佛大学毕业证如何办理
一比一原版(Harvard毕业证书)哈佛大学毕业证如何办理
 
Natural Language Processing (NLP), RAG and its applications .pptx
Natural Language Processing (NLP), RAG and its applications .pptxNatural Language Processing (NLP), RAG and its applications .pptx
Natural Language Processing (NLP), RAG and its applications .pptx
 

Cibb2013

  • 1. DEPARTMENT OF ELECTRICAL AND COMPUTER ENGINEERING Advanced Circuits, Architecture, and Computing Lab Molecular Docking for Drug Discovery: Machine-Learning Approaches for Native Pose Prediction for Protein-Ligand Complexes Hossam M. Ashtawy ashtawy@egr.msu.edu Tenth International Meeting on Computational Intelligence Methods for Bioinformatics and Biostatistics (CIBB 2013) June 20, 2013 Nihar R. Mahapatra nrm@egr.msu.edu Department of Electrical & Computer Engineering Michigan State University, East Lansing, MI, U.S.A. © 2013
  • 2.  Accurately predicting BA of large sets of diverse protein-ligand complexes remains one of the most challenging unsolved problems in computational bimolecular science  Conventional SFs have been shown to have limited predictive and docking power  Size and diversity of protein-ligand complexes with known experimental BA is limited. Large and diverse datasets of protein-ligand complexes help in building more accurate statistical- based SFs Motivation 2
  • 3. • Motivation • Background and Scope of Work – Scoring Functions – Our Approach and Scope of Work • Materials and Methods – Compound Database and Characterization – Machine Learning Methods • Experiments, Results, and Discussion – Tuning, Training, and Testing Scoring Functions – Evaluation and Comparison of Scoring Functions • Concluding Remarks Outline 3
  • 6.  Lack of accurate accounting of intermolecular physicochemical interactions  Imprecise solvent modeling  Uncertainties in collected experimental affinity data  Inability to capture inherent nonlinear relationships correlating intermolecular interactions to binding affinity or native binding pose Scoring & Docking Challenges 6
  • 7.  Predict the binding pose explicitly.  Use sophisticated machine-learning methods to model closeness of a pose to the native conformation.  Use this nonparametric technique in conjunction with physiochemical features describing intermolecular interactions between proteins and ligands  Train predictive models on a large and diverse dataset of high-quality protein-ligand complexes  Evaluate the docking accuracies of resulting SF on diverse protein families Our Approach & Scope of Work 7
  • 9. Compound Database: PDBbind [1] 9  Protein-ligand complexes obtained from PDBbind 2007  PDBbind is a selective compilation of the Protein Data Bank (PDB) database
  • 11. PDB Ligand’s MW ≤ 1000 # non-hydrogen atoms of the ligand ≥ 6 Only one ligand is bound to the protein Protein & ligand non- covalently bound Resolution of the complex crystal structure ≤ 2.5Å Elements in complex must be C, N, O, P, S, F, Cl, Br, I, H Known Kd or Ki Hydrogenation Protonation & deprotonation Refined set of PDBbind PDBbind: Refined Set 11
  • 12. PDBbind: Core Set 12 Refin ed set Similarity search using BLAST Similarity cutoff of 90% Clusters with ≥ 4 complexes Binding affinity of highest- affinity complex is 100- fold the affinity of lowest one First, middle, and lowest affinity complexes from each cluster Core Set in PDBbind [2]
  • 13. Decoy Generation 13 A protein- ligand Complex Generate a random low- energy conformation Generate ~2000 conformations using 4 different docking protocols Discard poses > 10Å from native pose Group poses into 10 1Å bins based on their RMSD values Each bin is further clustered into 10 clusters Choose the pose with the lowest energy from each sub- cluster 100 Decoys [2]
  • 14.  Extracted features calculated for the following scoring functions: X-Score (6 features) AffiScore (30 features). RF-Score (36 features) GOLD (14 features) Compound Characterization 14
  • 15.  Primary training set : Pr 1105 (Y=BA) 39,085 (Y=RMSD)  Core test set: Cr 16,554 Training and Test Datasets 15
  • 16.  Single models Multiple linear regression (MLR) Multivariate adaptive regression splines (MARS) k-Nearest neighbors (kNN) Support vector machine (SVM)  Ensemble models Random forests (RF) Boosted regression trees (BRT) Machine Learning Methods 16
  • 17. Conventional SFs 17 Software SF Type Discovery Studio SYBYL GOLD Schrodinger Standalone |SFs| Empirical PLP JAIN LUDI ChemScore F-Score ChemScore ASP GlidScore X-Score 9 Knowledge Based LigScore PMF PMF-Score DrugScore 4 Force-field D-Score G-Score GoldScore 3 |SFs| 5 5 3 1 2 16
  • 19. SF Construction & Application Workflow 19
  • 21.  Docking power: Measures the ability of an SF to distinguish a promising binding mode from a less promising one 𝑆 𝐶 𝑁 (𝑖𝑛 %)  Success rate that accounts for the percentage of times an SF is able to find a pose whose RMSD is within a predefined cutoff value C Å by only considering the N topmost poses ranked by their predicted scores.  C (e.g. 0, 1, 2, and 3Å) N (e.g. ,1, 2, 3, and 5) Evaluation of Scoring Functions 21
  • 22. Success rates of Conv. & ML SFs: Cr 22 > 60% ~ 50% >70% ~80% 𝑆2 1 < 5%
  • 23. Success rates of Conv. & ML SFs: Core test set 23 GOLD::ASP from 82% to 92% RF::RG from 87% to 96% 𝑆0 5 ~60% 𝑆0 5 ~77%
  • 24. Success rates of Conv. & ML SFs: HIV & TRY 24 MLR: 𝑆1 1 =72% 𝑆3 1 = 90% MLR: 𝑆1 1 =80% 𝑆3 1 = 95% MLR: 𝑆0 1 =50% 𝑆0 5 = 90% MARS:𝑆0 1 =48% 𝑆0 5 = 83% MLR: 𝑆1 1 =41% 𝑆3 1 = 80% MLR: 𝑆1 1 =66% 𝑆3 1 = 90% MARS:𝑆0 1 =36% 𝑆0 5 = 80%MARS:𝑆0 1 =23% 𝑆0 5 = 68%
  • 25. Success rates of Conv. & ML SFs: CAR & THR 25 MLR: 𝑆1 1 =22% 𝑆3 1 = 53% SVM: 𝑆1 1 =32% 𝑆3 1 = 62% MLR: 𝑆0 1 =40% 𝑆0 5 = 79%MARS:𝑆0 1 =24% 𝑆0 5 = 74% MLR: 𝑆0 1 =15% 𝑆0 5 = 34%MARS:𝑆0 1 =9% 𝑆0 5 = 33% MLR: 𝑆1 1 =58% 𝑆3 1 = 82% MLR: 𝑆1 1 =92% 𝑆3 1 = 95%
  • 27.  ML models trained to explicitly predict RMSD values significantly outperform all conventional SFs  Estimated RMSD values of such models have a correlation of 0.7 on average with the true RMSD values. While predicted BA’s have a correlation of as low as 0.2 with the measured RMSD values.  The empirical SF GOLD::ASP achieved a success rate of 70% in identifying a pose that lies within 1Å from the native pose of 195 different complexes.  Our top RMSD-based SF, MARS::XARG, has a success rate of ~80% on the same test set Concluding Remarks 27
  • 29. [1] Berman, H. et al., The Protein Data Bank, Nucleic Acids Research 28 (1) (2000) 235-242. [2] Cheng, T., Li, X., Li, Y., Liu, Z., Wang, R.: Comparative assessment of scoring functions on a diverse test set. Journal of Chemical Information and Modeling 49 (4) (2009) 1079–1093. References 29