SlideShare a Scribd company logo
Dealing with ‘exotic’ similarity
metrics
How to set up a (ChemAxon-powered)
Similarity-driven Virtual Screening server…
Dragos Horvath, dhorvath@unistra.fr
UMR 7140 CNRS – Université de Strasbourg
Introduction & Definitions
• Similarity-based Virtual Screening (SVS):
– Search, in a database of candidates m for similar analogues
of a query compound M of wanted properties, hoping that
the « similarity principle » magic will operate.
• Molecular Similarity S(M,m):
– distance (metric) between the two Descriptor Space (DS)
points 𝐷 𝑀 , 𝐷 𝑚 - let us call these 𝐷, 𝑑, for simplicity.
• Similarity Radius s defines « how similar is similar »
– Delimits a sphere in descriptor space around M, thought to
contain a minimum of inactive, but a maximum of active m.
• Virtual Hits – aka True & False « Positives » (TP,FP):
– Compounds m with S(M,m)<s
Compound Sets
• For server calibration:
– Candidate database: 165 ChEMBL ligand sets with >50
molecules of reported pKi values with respect to the 165
associated receptors & enzymes (targets T).
– Queries of T: MT
1, MT
2 … MT
i, i=1..QT is composed of the top
1/5 (max 100) actives on T, plus 1/5 (max 100) of binders of
medium potency, can be classified by pharmacophore
complexity (Nr. of populated FPT1 triplets)
– 10,000 randomly picked commercial molecules from ZINC,
assumed to be inactive “decoys”.
• Operational database:
– 1.5 M commercial compounds, from various sources
– Above « reference » molecules, for annotation purposes
Compound Sets
Compound Sets
Descriptor Spaces
Descriptor Spaces
All are Feature Counts
Di(M) = integer (positive
or null) population level
of « feature » i (a
substructure or a
pharmacophore triplet)
in molecule M
Descriptor Spaces
Dissimilarity Scores…
• Based on the comparison of descriptor vectors 𝐷, 𝑑
𝑁 𝑀 𝑁𝑂𝑅𝑀(𝑀) = 𝐷𝑖
2
𝑁(𝑀)
𝑖=1
𝑁𝐴𝑁𝐷 𝑚, 𝑀 𝐴𝑁𝐷(𝑚, 𝑀) = 𝐷𝑖 × 𝑑𝑖
𝑁 𝑂𝑅(𝑚,𝑀)
𝑖=1
𝑁𝐸𝑋𝐶 𝑀, 𝑚 𝐸𝑋𝐶(𝑀, 𝑚) = 𝐷𝑖
2
𝑖|𝑑 𝑖=0
Euclidean & Related…
𝐸 𝑚, 𝑀 = 𝐷𝑖 − 𝑑𝑖
2
𝑁 𝑂𝑅(𝑚,𝑀)
𝑖=1
𝑅 𝑚, 𝑀 =
𝐷𝑖 − 𝑑𝑖
2𝑁 𝑂𝑅(𝑚,𝑀)
𝑖=1
𝑁 𝑂𝑅(𝑚, 𝑀)
𝐴 𝑚, 𝑀 =
𝐷𝑖 − 𝑑𝑖
𝑁 𝑂𝑅(𝑚,𝑀)
𝑖=1
𝑁 𝑂𝑅(𝑚, 𝑀)
𝑅𝑊 𝑚, 𝑀 = 𝑅 𝑚, 𝑀
𝑁 𝐸𝑋𝐶(𝑚, 𝑀) + 𝑁𝐸𝑋𝐶(𝑀, 𝑚)
𝑁 𝑂𝑅(𝑚, 𝑀)
𝐴𝑊 𝑚, 𝑀 = 𝐴 𝑚, 𝑀
𝑁 𝐸𝑋𝐶(𝑚, 𝑀) + 𝑁𝐸𝑋𝐶(𝑀, 𝑚)
𝑁 𝑂𝑅(𝑚, 𝑀)
(A)Symmetric Correlation Scores –
Tanimoto & Tversky
𝑇 𝑀, 𝑚 = 1 −
𝐴𝑁𝐷(𝑀, 𝑚)
𝑁𝑂𝑅𝑀(𝑀) + 𝑁𝑂𝑅𝑀(𝑚) − 𝐴𝑁𝐷(𝑀, 𝑚)
𝑇𝑣 𝑀, 𝑚, 𝛼 = 1 −
𝐴𝑁𝐷(𝑀, 𝑚)
𝛼𝐸𝑋𝐶(𝑀, 𝑚) + 1 − 𝛼 𝐸𝑋𝐶(𝑚, 𝑀) + 𝐴𝑁𝐷(𝑀, 𝑚)
Situations where:
(a) candidate m misses a feature seen in active
M, and
(b) it contains some novel feature not seen in M
may be distinguished! At a>0.5, cases (a) will be
relatively more penalized than the symmetric
situation (b).
A raw guess of a should suffice! Three
implementations of Tv are considered:
• Tv+ (a=0.9)
• Tv (a=0.7)
• Tv- (a=0.3)
2. Fine, but « how similar is similar »?
• You may be a believer of the dogma « Tanimoto>0.85 »
(T<0.15)
– But the Bible mentions not the other metrics, less subjected
to religious fervor.
• Alternatively, try to infer reasonable choices of
similarity radii for each Chemical Space (CS – the
combination of Descriptor Space & Similarity score)
– For each query, on every target, compute s* corresponding
to the « optimal » SVS scenario.
– This also allows to measure & benchmark SVS success with
respect to its Operational Premises (CS, nature of Target,
degree of complexity of the query, etc).
s
W
1.0
)()(
)()(
)( E
FN
E
FP
FNFP
NN
NN
s


W
SS
S


A basic SVS Optimality Criterion: W
L(M,m) l L(M,m)> l
S(M,m)s
True
Positives
(TP)  
False
Positives
(FP) 
False (?)
Negatives
(FN) 
True
Negatives
(TN) 
)()(
)()(
)( E
FN
E
FP
FNFP
NN
NN
s


W
SS
S


s
Activity (profile) differences L(m,M)
Λ 𝑀, 𝑚 =
0 𝑖𝑓 𝑝𝐾𝑖(𝑀) − 𝑝𝐾𝑖(𝑚) < 0.5
1 𝑖𝑓 𝑝𝐾𝑖(𝑀) − 𝑝𝐾𝑖(𝑚) > 3.0
𝑝𝐾𝑖(𝑀) − 𝑝𝐾𝑖(𝑚) − 0.5
2.5
𝑜𝑡ℎ𝑒𝑟𝑤𝑖𝑠𝑒
s
W
1.0
)()(
)()(
)( E
FN
E
FP
FNFP
NN
NN
s


W
SS
S


A basic SVS Optimality Criterion: W
L(M,m) l L(M,m)> l
S(M,m)s
True
Positives
(TP)  
False
Positives
(FP) 
False (?)
Negatives
(FN) 
True
Negatives
(TN) 
)()(
)()(
)( E
FN
E
FP
FNFP
NN
NN
s


W
SS
S


s
Activity (profile) differences L(m,M)
The Ascertained Optimality Excess X
Compound Pairs selected at cutoff s
Random S
values
Meaningful
S values
Var(W)W
X
Fraction of Compound Pairs selected at cutoff s
     sVars randrand
WWWX
X
Workflow
ForEach Target T
Set database db=set of tested ligands (known pKi ) + decoy set (pKi=0);
ForEach Query M of T
ForEach DescriptorSpace D
ForEach SimilarityScore S
# Start Current SVS experiment defined by Target, Query, Descriptors & Similarity Score
ForEach m!=M in db
Calculate S(M,m)|D ;
EndLoop(m)
Scan over s → X(s) and return s* such that X(s*)=maximal;
Classify SVS(T,M,D,S) wrt X(s*) as « failed », « acceptable », « good » or « excellent » ;
EndLoop S;
EndLoop D;
EndLoop M;
EndLoop T;
Analyze Success Rates & s* distributions in terms of various Operational premises (nature of T,
complexity of M, choice of D, of S or of D-S combinations)
Insights: (1) – So much for dogmas!
0
5
10
15
20
25
30
35
40
0.04 0.08 0.12 0.16 0.2 0.24 0.28 0.32 0.36 0.4 0.44 0.48 0.52 0.56 0.6
%ofTanimoto-basedqueriesofGoodoptimalitylevelatd*
d*
FPT1
treeSY03
s*
Use distribution to
« teach » the web
server how to
rank prospective
SVS hits!
Top Hits (0)
Good
Hits (1)
Average
Hits (2)
Acceptable
Hits (3)
Are these Hits? (4) Ignore…
Top Hits (0)
Good
Hits (1)
Average
Hits (2)
Acceptable
Hits (3)
Are these Hits? (4) Ignore…
Insights: (2) – Tversky at a>0.5: an
excellent similarity scoring scheme.
0
2
4
6
8
10
12
14
16
Tv+ Tv T RW AW Tv- E A R
R:all-acceptable
R:all-good
R:all-excellent
Relative«marketshare»ofmetric:fractionofSVSrunsbasedonshown
metric,outofallSVSexperimentshavingreachedgivensuccesslevels
Tv+ may pick actives that are more
complex than queries (NK1 example)
T
Insights: (3) – Trends with respect to
target classes could be evidenced…
Relative«marketshare»ofmetric:fractionofSVSruns–withintargetclasses-based
onshownmetric,outofallSVSexperimentshavingreachedgivensuccesslevels
0
2
4
6
8
10
12
14
16
Tv+ Tv T RW AW Tv- E A R
R:all-good
R:Kinases-good
R:monoamineGPCR-good
R:otherGPCR-good
Insights: (4) – when the query compound is
complex, the metric matters less
Relative«marketshare»ofmetric:fractionofSVSruns–withinquerycomplexity
classes-basedonshownmetric,outofallSVShavingreachedgivensuccesslevels
0
2
4
6
8
10
12
14
16
Tv+ Tv T RW AW Tv- E A R
R:all-good
R:pharm-high-good
R:pharm-low-good
Some conclusions
• The study has highlighted many interesting aspects
– Intrinsic usefulness of Tversky scores biased towards of query feature loss
penalty: a=0.9…0.7 will do!
– Other target-, query complexity-, query activity-, descriptor space-dependent
trends of the SVS success
– Some inevitable sources of bias, showing that not even ChEMBL is not
large/diverse enough to cover it all…
• Main message: use this protocol – or related – to calibrate web
servers, rather than sticking to well-studied metrics and descriptors
for which « Universal » similarity cutoffs are believed to hold.
• Try infochim.u-strasbg.fr/webserv/VSEngine.html – to our
knowledge, the only public SVS server to support atypical, but
powerful metrics coupled to chemically relevant, pH-sensitive
descriptor spaces… all while exploiting the power of ChemAxon
tools!

More Related Content

Viewers also liked

EUGM 2013 - Odon Farkas (Eotvos University) - Conformation search via cool dy...
EUGM 2013 - Odon Farkas (Eotvos University) - Conformation search via cool dy...EUGM 2013 - Odon Farkas (Eotvos University) - Conformation search via cool dy...
EUGM 2013 - Odon Farkas (Eotvos University) - Conformation search via cool dy...
ChemAxon
 
EUGM 2013 - Anh Kiet Tran Minh (CNRS): French Academic Compound Library: the ...
EUGM 2013 - Anh Kiet Tran Minh (CNRS): French Academic Compound Library: the ...EUGM 2013 - Anh Kiet Tran Minh (CNRS): French Academic Compound Library: the ...
EUGM 2013 - Anh Kiet Tran Minh (CNRS): French Academic Compound Library: the ...
ChemAxon
 
EUGM 2013 - Attila Szabo (ChemAxon) - Collaborate and search in SharePoint wi...
EUGM 2013 - Attila Szabo (ChemAxon) - Collaborate and search in SharePoint wi...EUGM 2013 - Attila Szabo (ChemAxon) - Collaborate and search in SharePoint wi...
EUGM 2013 - Attila Szabo (ChemAxon) - Collaborate and search in SharePoint wi...
ChemAxon
 
EUGM 2013 - Steve Hajkowski (Thomson Reuters): Patent analytics - what can Ma...
EUGM 2013 - Steve Hajkowski (Thomson Reuters): Patent analytics - what can Ma...EUGM 2013 - Steve Hajkowski (Thomson Reuters): Patent analytics - what can Ma...
EUGM 2013 - Steve Hajkowski (Thomson Reuters): Patent analytics - what can Ma...
ChemAxon
 
EUGM 2013 - Peter Englert, Peter Kovacs (ChemAxon) - The Next Generation of M...
EUGM 2013 - Peter Englert, Peter Kovacs (ChemAxon) - The Next Generation of M...EUGM 2013 - Peter Englert, Peter Kovacs (ChemAxon) - The Next Generation of M...
EUGM 2013 - Peter Englert, Peter Kovacs (ChemAxon) - The Next Generation of M...
ChemAxon
 
EUGM 2013 - Michael Dippolito (Deltasoft): Great Migrations! – Approaches to ...
EUGM 2013 - Michael Dippolito (Deltasoft): Great Migrations! – Approaches to ...EUGM 2013 - Michael Dippolito (Deltasoft): Great Migrations! – Approaches to ...
EUGM 2013 - Michael Dippolito (Deltasoft): Great Migrations! – Approaches to ...
ChemAxon
 
EUGM 2013 - Christopher Southan (TW2Informatics): Chemicalize.org, SureChemOp...
EUGM 2013 - Christopher Southan (TW2Informatics): Chemicalize.org, SureChemOp...EUGM 2013 - Christopher Southan (TW2Informatics): Chemicalize.org, SureChemOp...
EUGM 2013 - Christopher Southan (TW2Informatics): Chemicalize.org, SureChemOp...
ChemAxon
 
EUGM 2013 - Björn Windshügel (European ScreeningPort): Chemoinformatic tools ...
EUGM 2013 - Björn Windshügel (European ScreeningPort): Chemoinformatic tools ...EUGM 2013 - Björn Windshügel (European ScreeningPort): Chemoinformatic tools ...
EUGM 2013 - Björn Windshügel (European ScreeningPort): Chemoinformatic tools ...
ChemAxon
 
EUGM 2013 - Gyorgy Pirok (ChemAxon) - Prediction of Xenobiotic Metabolism
EUGM 2013 - Gyorgy Pirok (ChemAxon) - Prediction of Xenobiotic MetabolismEUGM 2013 - Gyorgy Pirok (ChemAxon) - Prediction of Xenobiotic Metabolism
EUGM 2013 - Gyorgy Pirok (ChemAxon) - Prediction of Xenobiotic Metabolism
ChemAxon
 
EUGM 2013 - Ian Berry, Bob Marmon (Evotec): Classification and analysis of 21...
EUGM 2013 - Ian Berry, Bob Marmon (Evotec): Classification and analysis of 21...EUGM 2013 - Ian Berry, Bob Marmon (Evotec): Classification and analysis of 21...
EUGM 2013 - Ian Berry, Bob Marmon (Evotec): Classification and analysis of 21...
ChemAxon
 
EUGM 2013 - Bernd Rupp (FMP) Chemical Information systems: From compound coll...
EUGM 2013 - Bernd Rupp (FMP) Chemical Information systems: From compound coll...EUGM 2013 - Bernd Rupp (FMP) Chemical Information systems: From compound coll...
EUGM 2013 - Bernd Rupp (FMP) Chemical Information systems: From compound coll...
ChemAxon
 
EUGM 2013 - Jon Patterson (ChemAxon) ChemAxon Platform for Scientists
EUGM 2013 - Jon Patterson (ChemAxon) ChemAxon Platform for Scientists EUGM 2013 - Jon Patterson (ChemAxon) ChemAxon Platform for Scientists
EUGM 2013 - Jon Patterson (ChemAxon) ChemAxon Platform for Scientists
ChemAxon
 
EUGM 2013 - Anna Tomin (ChemAxon) - Reaction Library Design
EUGM 2013 - Anna Tomin (ChemAxon) - Reaction Library DesignEUGM 2013 - Anna Tomin (ChemAxon) - Reaction Library Design
EUGM 2013 - Anna Tomin (ChemAxon) - Reaction Library Design
ChemAxon
 
EUGM 2013 - Timea Polgar (ChemAxon) - 3D visualization for medicinal chemists
EUGM 2013 - Timea Polgar (ChemAxon) - 3D visualization for medicinal chemistsEUGM 2013 - Timea Polgar (ChemAxon) - 3D visualization for medicinal chemists
EUGM 2013 - Timea Polgar (ChemAxon) - 3D visualization for medicinal chemists
ChemAxon
 

Viewers also liked (14)

EUGM 2013 - Odon Farkas (Eotvos University) - Conformation search via cool dy...
EUGM 2013 - Odon Farkas (Eotvos University) - Conformation search via cool dy...EUGM 2013 - Odon Farkas (Eotvos University) - Conformation search via cool dy...
EUGM 2013 - Odon Farkas (Eotvos University) - Conformation search via cool dy...
 
EUGM 2013 - Anh Kiet Tran Minh (CNRS): French Academic Compound Library: the ...
EUGM 2013 - Anh Kiet Tran Minh (CNRS): French Academic Compound Library: the ...EUGM 2013 - Anh Kiet Tran Minh (CNRS): French Academic Compound Library: the ...
EUGM 2013 - Anh Kiet Tran Minh (CNRS): French Academic Compound Library: the ...
 
EUGM 2013 - Attila Szabo (ChemAxon) - Collaborate and search in SharePoint wi...
EUGM 2013 - Attila Szabo (ChemAxon) - Collaborate and search in SharePoint wi...EUGM 2013 - Attila Szabo (ChemAxon) - Collaborate and search in SharePoint wi...
EUGM 2013 - Attila Szabo (ChemAxon) - Collaborate and search in SharePoint wi...
 
EUGM 2013 - Steve Hajkowski (Thomson Reuters): Patent analytics - what can Ma...
EUGM 2013 - Steve Hajkowski (Thomson Reuters): Patent analytics - what can Ma...EUGM 2013 - Steve Hajkowski (Thomson Reuters): Patent analytics - what can Ma...
EUGM 2013 - Steve Hajkowski (Thomson Reuters): Patent analytics - what can Ma...
 
EUGM 2013 - Peter Englert, Peter Kovacs (ChemAxon) - The Next Generation of M...
EUGM 2013 - Peter Englert, Peter Kovacs (ChemAxon) - The Next Generation of M...EUGM 2013 - Peter Englert, Peter Kovacs (ChemAxon) - The Next Generation of M...
EUGM 2013 - Peter Englert, Peter Kovacs (ChemAxon) - The Next Generation of M...
 
EUGM 2013 - Michael Dippolito (Deltasoft): Great Migrations! – Approaches to ...
EUGM 2013 - Michael Dippolito (Deltasoft): Great Migrations! – Approaches to ...EUGM 2013 - Michael Dippolito (Deltasoft): Great Migrations! – Approaches to ...
EUGM 2013 - Michael Dippolito (Deltasoft): Great Migrations! – Approaches to ...
 
EUGM 2013 - Christopher Southan (TW2Informatics): Chemicalize.org, SureChemOp...
EUGM 2013 - Christopher Southan (TW2Informatics): Chemicalize.org, SureChemOp...EUGM 2013 - Christopher Southan (TW2Informatics): Chemicalize.org, SureChemOp...
EUGM 2013 - Christopher Southan (TW2Informatics): Chemicalize.org, SureChemOp...
 
EUGM 2013 - Björn Windshügel (European ScreeningPort): Chemoinformatic tools ...
EUGM 2013 - Björn Windshügel (European ScreeningPort): Chemoinformatic tools ...EUGM 2013 - Björn Windshügel (European ScreeningPort): Chemoinformatic tools ...
EUGM 2013 - Björn Windshügel (European ScreeningPort): Chemoinformatic tools ...
 
EUGM 2013 - Gyorgy Pirok (ChemAxon) - Prediction of Xenobiotic Metabolism
EUGM 2013 - Gyorgy Pirok (ChemAxon) - Prediction of Xenobiotic MetabolismEUGM 2013 - Gyorgy Pirok (ChemAxon) - Prediction of Xenobiotic Metabolism
EUGM 2013 - Gyorgy Pirok (ChemAxon) - Prediction of Xenobiotic Metabolism
 
EUGM 2013 - Ian Berry, Bob Marmon (Evotec): Classification and analysis of 21...
EUGM 2013 - Ian Berry, Bob Marmon (Evotec): Classification and analysis of 21...EUGM 2013 - Ian Berry, Bob Marmon (Evotec): Classification and analysis of 21...
EUGM 2013 - Ian Berry, Bob Marmon (Evotec): Classification and analysis of 21...
 
EUGM 2013 - Bernd Rupp (FMP) Chemical Information systems: From compound coll...
EUGM 2013 - Bernd Rupp (FMP) Chemical Information systems: From compound coll...EUGM 2013 - Bernd Rupp (FMP) Chemical Information systems: From compound coll...
EUGM 2013 - Bernd Rupp (FMP) Chemical Information systems: From compound coll...
 
EUGM 2013 - Jon Patterson (ChemAxon) ChemAxon Platform for Scientists
EUGM 2013 - Jon Patterson (ChemAxon) ChemAxon Platform for Scientists EUGM 2013 - Jon Patterson (ChemAxon) ChemAxon Platform for Scientists
EUGM 2013 - Jon Patterson (ChemAxon) ChemAxon Platform for Scientists
 
EUGM 2013 - Anna Tomin (ChemAxon) - Reaction Library Design
EUGM 2013 - Anna Tomin (ChemAxon) - Reaction Library DesignEUGM 2013 - Anna Tomin (ChemAxon) - Reaction Library Design
EUGM 2013 - Anna Tomin (ChemAxon) - Reaction Library Design
 
EUGM 2013 - Timea Polgar (ChemAxon) - 3D visualization for medicinal chemists
EUGM 2013 - Timea Polgar (ChemAxon) - 3D visualization for medicinal chemistsEUGM 2013 - Timea Polgar (ChemAxon) - 3D visualization for medicinal chemists
EUGM 2013 - Timea Polgar (ChemAxon) - 3D visualization for medicinal chemists
 

Similar to EUGM 2013 - Dragos Horváth (Labooratoire de Chemoinformatique Univ Strasbourg-CNRS): Dealing with 'exotic' similarity metrics - live on the Web

ML MODULE 2.pdf
ML MODULE 2.pdfML MODULE 2.pdf
ML MODULE 2.pdf
Shiwani Gupta
 
Lect w8 w9_correlation_regression
Lect w8 w9_correlation_regressionLect w8 w9_correlation_regression
Lect w8 w9_correlation_regression
Rione Drevale
 
Topic 1 part 2
Topic 1 part 2Topic 1 part 2
Topic 1 part 2
Ryan Herzog
 
Bioinformatics t5-database searching-v2013_wim_vancriekinge
Bioinformatics t5-database searching-v2013_wim_vancriekingeBioinformatics t5-database searching-v2013_wim_vancriekinge
Bioinformatics t5-database searching-v2013_wim_vancriekinge
Prof. Wim Van Criekinge
 
Hypothesis Testing
Hypothesis TestingHypothesis Testing
Hypothesis Testing
Ryan Herzog
 
Stat2013
Stat2013Stat2013
BIIntro.ppt
BIIntro.pptBIIntro.ppt
BIIntro.ppt
PerumalPitchandi
 
Basic Inference Analysis
Basic Inference AnalysisBasic Inference Analysis
Basic Inference Analysis
Ameen AboDabash
 
Applied machine learning for search engine relevance 3
Applied machine learning for search engine relevance 3Applied machine learning for search engine relevance 3
Applied machine learning for search engine relevance 3
Charles Martin
 
1015 track2 abbott
1015 track2 abbott1015 track2 abbott
1015 track2 abbott
Rising Media, Inc.
 
1030 track2 abbott
1030 track2 abbott1030 track2 abbott
1030 track2 abbott
Rising Media, Inc.
 
Statistical Analysis with R -I
Statistical Analysis with R -IStatistical Analysis with R -I
Statistical Analysis with R -I
Akhila Prabhakaran
 
B.sc biochem i bobi u 3.1 sequence alignment
B.sc biochem i bobi u 3.1 sequence alignmentB.sc biochem i bobi u 3.1 sequence alignment
B.sc biochem i bobi u 3.1 sequence alignment
Rai University
 
B.sc biochem i bobi u 3.1 sequence alignment
B.sc biochem i bobi u 3.1 sequence alignmentB.sc biochem i bobi u 3.1 sequence alignment
B.sc biochem i bobi u 3.1 sequence alignment
Rai University
 
Bioinformatica 10-11-2011-t5-database searching
Bioinformatica 10-11-2011-t5-database searchingBioinformatica 10-11-2011-t5-database searching
Bioinformatica 10-11-2011-t5-database searching
Prof. Wim Van Criekinge
 
Knowledge extraction from support vector machines
Knowledge extraction from support vector machinesKnowledge extraction from support vector machines
Knowledge extraction from support vector machines
Eyad Alshami
 
Lecture 4
Lecture 4Lecture 4
Blast Algorithm
Blast AlgorithmBlast Algorithm
Presentation for blast algorithm bio-informatice
Presentation for blast algorithm bio-informaticePresentation for blast algorithm bio-informatice
Presentation for blast algorithm bio-informatice
zahid6
 
Topic Set Size Design with the Evaluation Measures for Short Text Conversation
Topic Set Size Design with the Evaluation Measures for Short Text ConversationTopic Set Size Design with the Evaluation Measures for Short Text Conversation
Topic Set Size Design with the Evaluation Measures for Short Text Conversation
Tetsuya Sakai
 

Similar to EUGM 2013 - Dragos Horváth (Labooratoire de Chemoinformatique Univ Strasbourg-CNRS): Dealing with 'exotic' similarity metrics - live on the Web (20)

ML MODULE 2.pdf
ML MODULE 2.pdfML MODULE 2.pdf
ML MODULE 2.pdf
 
Lect w8 w9_correlation_regression
Lect w8 w9_correlation_regressionLect w8 w9_correlation_regression
Lect w8 w9_correlation_regression
 
Topic 1 part 2
Topic 1 part 2Topic 1 part 2
Topic 1 part 2
 
Bioinformatics t5-database searching-v2013_wim_vancriekinge
Bioinformatics t5-database searching-v2013_wim_vancriekingeBioinformatics t5-database searching-v2013_wim_vancriekinge
Bioinformatics t5-database searching-v2013_wim_vancriekinge
 
Hypothesis Testing
Hypothesis TestingHypothesis Testing
Hypothesis Testing
 
Stat2013
Stat2013Stat2013
Stat2013
 
BIIntro.ppt
BIIntro.pptBIIntro.ppt
BIIntro.ppt
 
Basic Inference Analysis
Basic Inference AnalysisBasic Inference Analysis
Basic Inference Analysis
 
Applied machine learning for search engine relevance 3
Applied machine learning for search engine relevance 3Applied machine learning for search engine relevance 3
Applied machine learning for search engine relevance 3
 
1015 track2 abbott
1015 track2 abbott1015 track2 abbott
1015 track2 abbott
 
1030 track2 abbott
1030 track2 abbott1030 track2 abbott
1030 track2 abbott
 
Statistical Analysis with R -I
Statistical Analysis with R -IStatistical Analysis with R -I
Statistical Analysis with R -I
 
B.sc biochem i bobi u 3.1 sequence alignment
B.sc biochem i bobi u 3.1 sequence alignmentB.sc biochem i bobi u 3.1 sequence alignment
B.sc biochem i bobi u 3.1 sequence alignment
 
B.sc biochem i bobi u 3.1 sequence alignment
B.sc biochem i bobi u 3.1 sequence alignmentB.sc biochem i bobi u 3.1 sequence alignment
B.sc biochem i bobi u 3.1 sequence alignment
 
Bioinformatica 10-11-2011-t5-database searching
Bioinformatica 10-11-2011-t5-database searchingBioinformatica 10-11-2011-t5-database searching
Bioinformatica 10-11-2011-t5-database searching
 
Knowledge extraction from support vector machines
Knowledge extraction from support vector machinesKnowledge extraction from support vector machines
Knowledge extraction from support vector machines
 
Lecture 4
Lecture 4Lecture 4
Lecture 4
 
Blast Algorithm
Blast AlgorithmBlast Algorithm
Blast Algorithm
 
Presentation for blast algorithm bio-informatice
Presentation for blast algorithm bio-informaticePresentation for blast algorithm bio-informatice
Presentation for blast algorithm bio-informatice
 
Topic Set Size Design with the Evaluation Measures for Short Text Conversation
Topic Set Size Design with the Evaluation Measures for Short Text ConversationTopic Set Size Design with the Evaluation Measures for Short Text Conversation
Topic Set Size Design with the Evaluation Measures for Short Text Conversation
 

More from ChemAxon

Akos Tarcsay (ChemAxon): How fast is Chemaxon RDBMS Search?
Akos Tarcsay (ChemAxon): How fast is Chemaxon RDBMS Search?Akos Tarcsay (ChemAxon): How fast is Chemaxon RDBMS Search?
Akos Tarcsay (ChemAxon): How fast is Chemaxon RDBMS Search?
ChemAxon
 
Chemaxon EU UGM 2022 | Translating data to predictive models
Chemaxon EU UGM 2022 | Translating data to predictive modelsChemaxon EU UGM 2022 | Translating data to predictive models
Chemaxon EU UGM 2022 | Translating data to predictive models
ChemAxon
 
Translating data to predictive models
Translating data to predictive modelsTranslating data to predictive models
Translating data to predictive models
ChemAxon
 
Efficient biomolecular structural data handling and analysis - Webinar with D...
Efficient biomolecular structural data handling and analysis - Webinar with D...Efficient biomolecular structural data handling and analysis - Webinar with D...
Efficient biomolecular structural data handling and analysis - Webinar with D...
ChemAxon
 
Biomolecule structural data management
Biomolecule structural data managementBiomolecule structural data management
Biomolecule structural data management
ChemAxon
 
Cheminfo Stories 2021 | Virtual UGM | Marvin Pro: The first release
Cheminfo Stories 2021 | Virtual UGM | Marvin Pro: The first releaseCheminfo Stories 2021 | Virtual UGM | Marvin Pro: The first release
Cheminfo Stories 2021 | Virtual UGM | Marvin Pro: The first release
ChemAxon
 
Enhanced stereochemistry representation
Enhanced stereochemistry representation Enhanced stereochemistry representation
Enhanced stereochemistry representation
ChemAxon
 
Intellectual property (IP) intelligence solutions designed for the way resear...
Intellectual property (IP) intelligence solutions designed for the way resear...Intellectual property (IP) intelligence solutions designed for the way resear...
Intellectual property (IP) intelligence solutions designed for the way resear...
ChemAxon
 
GPS for Chemical Space - Digital Assistants to Support Molecule Design - Chem...
GPS for Chemical Space - Digital Assistants to Support Molecule Design - Chem...GPS for Chemical Space - Digital Assistants to Support Molecule Design - Chem...
GPS for Chemical Space - Digital Assistants to Support Molecule Design - Chem...
ChemAxon
 
Patent Data for Artificial Intelligence based Drug Discovery
Patent Data for Artificial Intelligence based Drug DiscoveryPatent Data for Artificial Intelligence based Drug Discovery
Patent Data for Artificial Intelligence based Drug Discovery
ChemAxon
 
Cheminfo Stories APAC 2020 - Chemical Descriptors & Standardizers for Machine...
Cheminfo Stories APAC 2020 - Chemical Descriptors & Standardizers for Machine...Cheminfo Stories APAC 2020 - Chemical Descriptors & Standardizers for Machine...
Cheminfo Stories APAC 2020 - Chemical Descriptors & Standardizers for Machine...
ChemAxon
 
Research data management on the cloud
Research data management on the cloudResearch data management on the cloud
Research data management on the cloud
ChemAxon
 
Cheminfo Stories APAC 2020 - Introducing Design Hub & Compound Registration
Cheminfo Stories APAC 2020 - Introducing Design Hub & Compound RegistrationCheminfo Stories APAC 2020 - Introducing Design Hub & Compound Registration
Cheminfo Stories APAC 2020 - Introducing Design Hub & Compound Registration
ChemAxon
 
Cheminfo Stories APAC 2020 - JChem Engines introduction
Cheminfo Stories APAC 2020 - JChem Engines introduction Cheminfo Stories APAC 2020 - JChem Engines introduction
Cheminfo Stories APAC 2020 - JChem Engines introduction
ChemAxon
 
Cheminfo Stories APAC 2020 - Database management on desktop with JChem for Of...
Cheminfo Stories APAC 2020 - Database management on desktop with JChem for Of...Cheminfo Stories APAC 2020 - Database management on desktop with JChem for Of...
Cheminfo Stories APAC 2020 - Database management on desktop with JChem for Of...
ChemAxon
 
Cheminfo Stories APAC 2020 -- Markush technology
Cheminfo Stories APAC 2020 -- Markush technology Cheminfo Stories APAC 2020 -- Markush technology
Cheminfo Stories APAC 2020 -- Markush technology
ChemAxon
 
JChem Microservices
JChem MicroservicesJChem Microservices
JChem Microservices
ChemAxon
 
Migration from joc to jpc or choral
Migration from joc to jpc or choralMigration from joc to jpc or choral
Migration from joc to jpc or choral
ChemAxon
 
ChemAxon's Compliance Checker - Cheminfo Stories 2020 Day 5
ChemAxon's Compliance Checker - Cheminfo Stories 2020 Day 5ChemAxon's Compliance Checker - Cheminfo Stories 2020 Day 5
ChemAxon's Compliance Checker - Cheminfo Stories 2020 Day 5
ChemAxon
 
Chemicalize Pro - Cheminfo Stories 2020 Day 5
Chemicalize Pro - Cheminfo Stories 2020 Day 5Chemicalize Pro - Cheminfo Stories 2020 Day 5
Chemicalize Pro - Cheminfo Stories 2020 Day 5
ChemAxon
 

More from ChemAxon (20)

Akos Tarcsay (ChemAxon): How fast is Chemaxon RDBMS Search?
Akos Tarcsay (ChemAxon): How fast is Chemaxon RDBMS Search?Akos Tarcsay (ChemAxon): How fast is Chemaxon RDBMS Search?
Akos Tarcsay (ChemAxon): How fast is Chemaxon RDBMS Search?
 
Chemaxon EU UGM 2022 | Translating data to predictive models
Chemaxon EU UGM 2022 | Translating data to predictive modelsChemaxon EU UGM 2022 | Translating data to predictive models
Chemaxon EU UGM 2022 | Translating data to predictive models
 
Translating data to predictive models
Translating data to predictive modelsTranslating data to predictive models
Translating data to predictive models
 
Efficient biomolecular structural data handling and analysis - Webinar with D...
Efficient biomolecular structural data handling and analysis - Webinar with D...Efficient biomolecular structural data handling and analysis - Webinar with D...
Efficient biomolecular structural data handling and analysis - Webinar with D...
 
Biomolecule structural data management
Biomolecule structural data managementBiomolecule structural data management
Biomolecule structural data management
 
Cheminfo Stories 2021 | Virtual UGM | Marvin Pro: The first release
Cheminfo Stories 2021 | Virtual UGM | Marvin Pro: The first releaseCheminfo Stories 2021 | Virtual UGM | Marvin Pro: The first release
Cheminfo Stories 2021 | Virtual UGM | Marvin Pro: The first release
 
Enhanced stereochemistry representation
Enhanced stereochemistry representation Enhanced stereochemistry representation
Enhanced stereochemistry representation
 
Intellectual property (IP) intelligence solutions designed for the way resear...
Intellectual property (IP) intelligence solutions designed for the way resear...Intellectual property (IP) intelligence solutions designed for the way resear...
Intellectual property (IP) intelligence solutions designed for the way resear...
 
GPS for Chemical Space - Digital Assistants to Support Molecule Design - Chem...
GPS for Chemical Space - Digital Assistants to Support Molecule Design - Chem...GPS for Chemical Space - Digital Assistants to Support Molecule Design - Chem...
GPS for Chemical Space - Digital Assistants to Support Molecule Design - Chem...
 
Patent Data for Artificial Intelligence based Drug Discovery
Patent Data for Artificial Intelligence based Drug DiscoveryPatent Data for Artificial Intelligence based Drug Discovery
Patent Data for Artificial Intelligence based Drug Discovery
 
Cheminfo Stories APAC 2020 - Chemical Descriptors & Standardizers for Machine...
Cheminfo Stories APAC 2020 - Chemical Descriptors & Standardizers for Machine...Cheminfo Stories APAC 2020 - Chemical Descriptors & Standardizers for Machine...
Cheminfo Stories APAC 2020 - Chemical Descriptors & Standardizers for Machine...
 
Research data management on the cloud
Research data management on the cloudResearch data management on the cloud
Research data management on the cloud
 
Cheminfo Stories APAC 2020 - Introducing Design Hub & Compound Registration
Cheminfo Stories APAC 2020 - Introducing Design Hub & Compound RegistrationCheminfo Stories APAC 2020 - Introducing Design Hub & Compound Registration
Cheminfo Stories APAC 2020 - Introducing Design Hub & Compound Registration
 
Cheminfo Stories APAC 2020 - JChem Engines introduction
Cheminfo Stories APAC 2020 - JChem Engines introduction Cheminfo Stories APAC 2020 - JChem Engines introduction
Cheminfo Stories APAC 2020 - JChem Engines introduction
 
Cheminfo Stories APAC 2020 - Database management on desktop with JChem for Of...
Cheminfo Stories APAC 2020 - Database management on desktop with JChem for Of...Cheminfo Stories APAC 2020 - Database management on desktop with JChem for Of...
Cheminfo Stories APAC 2020 - Database management on desktop with JChem for Of...
 
Cheminfo Stories APAC 2020 -- Markush technology
Cheminfo Stories APAC 2020 -- Markush technology Cheminfo Stories APAC 2020 -- Markush technology
Cheminfo Stories APAC 2020 -- Markush technology
 
JChem Microservices
JChem MicroservicesJChem Microservices
JChem Microservices
 
Migration from joc to jpc or choral
Migration from joc to jpc or choralMigration from joc to jpc or choral
Migration from joc to jpc or choral
 
ChemAxon's Compliance Checker - Cheminfo Stories 2020 Day 5
ChemAxon's Compliance Checker - Cheminfo Stories 2020 Day 5ChemAxon's Compliance Checker - Cheminfo Stories 2020 Day 5
ChemAxon's Compliance Checker - Cheminfo Stories 2020 Day 5
 
Chemicalize Pro - Cheminfo Stories 2020 Day 5
Chemicalize Pro - Cheminfo Stories 2020 Day 5Chemicalize Pro - Cheminfo Stories 2020 Day 5
Chemicalize Pro - Cheminfo Stories 2020 Day 5
 

Recently uploaded

Programming Foundation Models with DSPy - Meetup Slides
Programming Foundation Models with DSPy - Meetup SlidesProgramming Foundation Models with DSPy - Meetup Slides
Programming Foundation Models with DSPy - Meetup Slides
Zilliz
 
Choosing The Best AWS Service For Your Website + API.pptx
Choosing The Best AWS Service For Your Website + API.pptxChoosing The Best AWS Service For Your Website + API.pptx
Choosing The Best AWS Service For Your Website + API.pptx
Brandon Minnick, MBA
 
Freshworks Rethinks NoSQL for Rapid Scaling & Cost-Efficiency
Freshworks Rethinks NoSQL for Rapid Scaling & Cost-EfficiencyFreshworks Rethinks NoSQL for Rapid Scaling & Cost-Efficiency
Freshworks Rethinks NoSQL for Rapid Scaling & Cost-Efficiency
ScyllaDB
 
Northern Engraving | Nameplate Manufacturing Process - 2024
Northern Engraving | Nameplate Manufacturing Process - 2024Northern Engraving | Nameplate Manufacturing Process - 2024
Northern Engraving | Nameplate Manufacturing Process - 2024
Northern Engraving
 
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAUHCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
panagenda
 
Skybuffer SAM4U tool for SAP license adoption
Skybuffer SAM4U tool for SAP license adoptionSkybuffer SAM4U tool for SAP license adoption
Skybuffer SAM4U tool for SAP license adoption
Tatiana Kojar
 
Essentials of Automations: Exploring Attributes & Automation Parameters
Essentials of Automations: Exploring Attributes & Automation ParametersEssentials of Automations: Exploring Attributes & Automation Parameters
Essentials of Automations: Exploring Attributes & Automation Parameters
Safe Software
 
AppSec PNW: Android and iOS Application Security with MobSF
AppSec PNW: Android and iOS Application Security with MobSFAppSec PNW: Android and iOS Application Security with MobSF
AppSec PNW: Android and iOS Application Security with MobSF
Ajin Abraham
 
What is an RPA CoE? Session 1 – CoE Vision
What is an RPA CoE?  Session 1 – CoE VisionWhat is an RPA CoE?  Session 1 – CoE Vision
What is an RPA CoE? Session 1 – CoE Vision
DianaGray10
 
Y-Combinator seed pitch deck template PP
Y-Combinator seed pitch deck template PPY-Combinator seed pitch deck template PP
Y-Combinator seed pitch deck template PP
c5vrf27qcz
 
Dandelion Hashtable: beyond billion requests per second on a commodity server
Dandelion Hashtable: beyond billion requests per second on a commodity serverDandelion Hashtable: beyond billion requests per second on a commodity server
Dandelion Hashtable: beyond billion requests per second on a commodity server
Antonios Katsarakis
 
Columbus Data & Analytics Wednesdays - June 2024
Columbus Data & Analytics Wednesdays - June 2024Columbus Data & Analytics Wednesdays - June 2024
Columbus Data & Analytics Wednesdays - June 2024
Jason Packer
 
Overcoming the PLG Trap: Lessons from Canva's Head of Sales & Head of EMEA Da...
Overcoming the PLG Trap: Lessons from Canva's Head of Sales & Head of EMEA Da...Overcoming the PLG Trap: Lessons from Canva's Head of Sales & Head of EMEA Da...
Overcoming the PLG Trap: Lessons from Canva's Head of Sales & Head of EMEA Da...
saastr
 
JavaLand 2024: Application Development Green Masterplan
JavaLand 2024: Application Development Green MasterplanJavaLand 2024: Application Development Green Masterplan
JavaLand 2024: Application Development Green Masterplan
Miro Wengner
 
Harnessing the Power of NLP and Knowledge Graphs for Opioid Research
Harnessing the Power of NLP and Knowledge Graphs for Opioid ResearchHarnessing the Power of NLP and Knowledge Graphs for Opioid Research
Harnessing the Power of NLP and Knowledge Graphs for Opioid Research
Neo4j
 
Connector Corner: Seamlessly power UiPath Apps, GenAI with prebuilt connectors
Connector Corner: Seamlessly power UiPath Apps, GenAI with prebuilt connectorsConnector Corner: Seamlessly power UiPath Apps, GenAI with prebuilt connectors
Connector Corner: Seamlessly power UiPath Apps, GenAI with prebuilt connectors
DianaGray10
 
June Patch Tuesday
June Patch TuesdayJune Patch Tuesday
June Patch Tuesday
Ivanti
 
zkStudyClub - LatticeFold: A Lattice-based Folding Scheme and its Application...
zkStudyClub - LatticeFold: A Lattice-based Folding Scheme and its Application...zkStudyClub - LatticeFold: A Lattice-based Folding Scheme and its Application...
zkStudyClub - LatticeFold: A Lattice-based Folding Scheme and its Application...
Alex Pruden
 
Digital Banking in the Cloud: How Citizens Bank Unlocked Their Mainframe
Digital Banking in the Cloud: How Citizens Bank Unlocked Their MainframeDigital Banking in the Cloud: How Citizens Bank Unlocked Their Mainframe
Digital Banking in the Cloud: How Citizens Bank Unlocked Their Mainframe
Precisely
 
Deep Dive: AI-Powered Marketing to Get More Leads and Customers with HyperGro...
Deep Dive: AI-Powered Marketing to Get More Leads and Customers with HyperGro...Deep Dive: AI-Powered Marketing to Get More Leads and Customers with HyperGro...
Deep Dive: AI-Powered Marketing to Get More Leads and Customers with HyperGro...
saastr
 

Recently uploaded (20)

Programming Foundation Models with DSPy - Meetup Slides
Programming Foundation Models with DSPy - Meetup SlidesProgramming Foundation Models with DSPy - Meetup Slides
Programming Foundation Models with DSPy - Meetup Slides
 
Choosing The Best AWS Service For Your Website + API.pptx
Choosing The Best AWS Service For Your Website + API.pptxChoosing The Best AWS Service For Your Website + API.pptx
Choosing The Best AWS Service For Your Website + API.pptx
 
Freshworks Rethinks NoSQL for Rapid Scaling & Cost-Efficiency
Freshworks Rethinks NoSQL for Rapid Scaling & Cost-EfficiencyFreshworks Rethinks NoSQL for Rapid Scaling & Cost-Efficiency
Freshworks Rethinks NoSQL for Rapid Scaling & Cost-Efficiency
 
Northern Engraving | Nameplate Manufacturing Process - 2024
Northern Engraving | Nameplate Manufacturing Process - 2024Northern Engraving | Nameplate Manufacturing Process - 2024
Northern Engraving | Nameplate Manufacturing Process - 2024
 
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAUHCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
 
Skybuffer SAM4U tool for SAP license adoption
Skybuffer SAM4U tool for SAP license adoptionSkybuffer SAM4U tool for SAP license adoption
Skybuffer SAM4U tool for SAP license adoption
 
Essentials of Automations: Exploring Attributes & Automation Parameters
Essentials of Automations: Exploring Attributes & Automation ParametersEssentials of Automations: Exploring Attributes & Automation Parameters
Essentials of Automations: Exploring Attributes & Automation Parameters
 
AppSec PNW: Android and iOS Application Security with MobSF
AppSec PNW: Android and iOS Application Security with MobSFAppSec PNW: Android and iOS Application Security with MobSF
AppSec PNW: Android and iOS Application Security with MobSF
 
What is an RPA CoE? Session 1 – CoE Vision
What is an RPA CoE?  Session 1 – CoE VisionWhat is an RPA CoE?  Session 1 – CoE Vision
What is an RPA CoE? Session 1 – CoE Vision
 
Y-Combinator seed pitch deck template PP
Y-Combinator seed pitch deck template PPY-Combinator seed pitch deck template PP
Y-Combinator seed pitch deck template PP
 
Dandelion Hashtable: beyond billion requests per second on a commodity server
Dandelion Hashtable: beyond billion requests per second on a commodity serverDandelion Hashtable: beyond billion requests per second on a commodity server
Dandelion Hashtable: beyond billion requests per second on a commodity server
 
Columbus Data & Analytics Wednesdays - June 2024
Columbus Data & Analytics Wednesdays - June 2024Columbus Data & Analytics Wednesdays - June 2024
Columbus Data & Analytics Wednesdays - June 2024
 
Overcoming the PLG Trap: Lessons from Canva's Head of Sales & Head of EMEA Da...
Overcoming the PLG Trap: Lessons from Canva's Head of Sales & Head of EMEA Da...Overcoming the PLG Trap: Lessons from Canva's Head of Sales & Head of EMEA Da...
Overcoming the PLG Trap: Lessons from Canva's Head of Sales & Head of EMEA Da...
 
JavaLand 2024: Application Development Green Masterplan
JavaLand 2024: Application Development Green MasterplanJavaLand 2024: Application Development Green Masterplan
JavaLand 2024: Application Development Green Masterplan
 
Harnessing the Power of NLP and Knowledge Graphs for Opioid Research
Harnessing the Power of NLP and Knowledge Graphs for Opioid ResearchHarnessing the Power of NLP and Knowledge Graphs for Opioid Research
Harnessing the Power of NLP and Knowledge Graphs for Opioid Research
 
Connector Corner: Seamlessly power UiPath Apps, GenAI with prebuilt connectors
Connector Corner: Seamlessly power UiPath Apps, GenAI with prebuilt connectorsConnector Corner: Seamlessly power UiPath Apps, GenAI with prebuilt connectors
Connector Corner: Seamlessly power UiPath Apps, GenAI with prebuilt connectors
 
June Patch Tuesday
June Patch TuesdayJune Patch Tuesday
June Patch Tuesday
 
zkStudyClub - LatticeFold: A Lattice-based Folding Scheme and its Application...
zkStudyClub - LatticeFold: A Lattice-based Folding Scheme and its Application...zkStudyClub - LatticeFold: A Lattice-based Folding Scheme and its Application...
zkStudyClub - LatticeFold: A Lattice-based Folding Scheme and its Application...
 
Digital Banking in the Cloud: How Citizens Bank Unlocked Their Mainframe
Digital Banking in the Cloud: How Citizens Bank Unlocked Their MainframeDigital Banking in the Cloud: How Citizens Bank Unlocked Their Mainframe
Digital Banking in the Cloud: How Citizens Bank Unlocked Their Mainframe
 
Deep Dive: AI-Powered Marketing to Get More Leads and Customers with HyperGro...
Deep Dive: AI-Powered Marketing to Get More Leads and Customers with HyperGro...Deep Dive: AI-Powered Marketing to Get More Leads and Customers with HyperGro...
Deep Dive: AI-Powered Marketing to Get More Leads and Customers with HyperGro...
 

EUGM 2013 - Dragos Horváth (Labooratoire de Chemoinformatique Univ Strasbourg-CNRS): Dealing with 'exotic' similarity metrics - live on the Web

  • 1. Dealing with ‘exotic’ similarity metrics How to set up a (ChemAxon-powered) Similarity-driven Virtual Screening server… Dragos Horvath, dhorvath@unistra.fr UMR 7140 CNRS – Université de Strasbourg
  • 2. Introduction & Definitions • Similarity-based Virtual Screening (SVS): – Search, in a database of candidates m for similar analogues of a query compound M of wanted properties, hoping that the « similarity principle » magic will operate. • Molecular Similarity S(M,m): – distance (metric) between the two Descriptor Space (DS) points 𝐷 𝑀 , 𝐷 𝑚 - let us call these 𝐷, 𝑑, for simplicity. • Similarity Radius s defines « how similar is similar » – Delimits a sphere in descriptor space around M, thought to contain a minimum of inactive, but a maximum of active m. • Virtual Hits – aka True & False « Positives » (TP,FP): – Compounds m with S(M,m)<s
  • 3. Compound Sets • For server calibration: – Candidate database: 165 ChEMBL ligand sets with >50 molecules of reported pKi values with respect to the 165 associated receptors & enzymes (targets T). – Queries of T: MT 1, MT 2 … MT i, i=1..QT is composed of the top 1/5 (max 100) actives on T, plus 1/5 (max 100) of binders of medium potency, can be classified by pharmacophore complexity (Nr. of populated FPT1 triplets) – 10,000 randomly picked commercial molecules from ZINC, assumed to be inactive “decoys”. • Operational database: – 1.5 M commercial compounds, from various sources – Above « reference » molecules, for annotation purposes
  • 8. All are Feature Counts Di(M) = integer (positive or null) population level of « feature » i (a substructure or a pharmacophore triplet) in molecule M Descriptor Spaces
  • 9. Dissimilarity Scores… • Based on the comparison of descriptor vectors 𝐷, 𝑑 𝑁 𝑀 𝑁𝑂𝑅𝑀(𝑀) = 𝐷𝑖 2 𝑁(𝑀) 𝑖=1 𝑁𝐴𝑁𝐷 𝑚, 𝑀 𝐴𝑁𝐷(𝑚, 𝑀) = 𝐷𝑖 × 𝑑𝑖 𝑁 𝑂𝑅(𝑚,𝑀) 𝑖=1 𝑁𝐸𝑋𝐶 𝑀, 𝑚 𝐸𝑋𝐶(𝑀, 𝑚) = 𝐷𝑖 2 𝑖|𝑑 𝑖=0
  • 10. Euclidean & Related… 𝐸 𝑚, 𝑀 = 𝐷𝑖 − 𝑑𝑖 2 𝑁 𝑂𝑅(𝑚,𝑀) 𝑖=1 𝑅 𝑚, 𝑀 = 𝐷𝑖 − 𝑑𝑖 2𝑁 𝑂𝑅(𝑚,𝑀) 𝑖=1 𝑁 𝑂𝑅(𝑚, 𝑀) 𝐴 𝑚, 𝑀 = 𝐷𝑖 − 𝑑𝑖 𝑁 𝑂𝑅(𝑚,𝑀) 𝑖=1 𝑁 𝑂𝑅(𝑚, 𝑀) 𝑅𝑊 𝑚, 𝑀 = 𝑅 𝑚, 𝑀 𝑁 𝐸𝑋𝐶(𝑚, 𝑀) + 𝑁𝐸𝑋𝐶(𝑀, 𝑚) 𝑁 𝑂𝑅(𝑚, 𝑀) 𝐴𝑊 𝑚, 𝑀 = 𝐴 𝑚, 𝑀 𝑁 𝐸𝑋𝐶(𝑚, 𝑀) + 𝑁𝐸𝑋𝐶(𝑀, 𝑚) 𝑁 𝑂𝑅(𝑚, 𝑀)
  • 11. (A)Symmetric Correlation Scores – Tanimoto & Tversky 𝑇 𝑀, 𝑚 = 1 − 𝐴𝑁𝐷(𝑀, 𝑚) 𝑁𝑂𝑅𝑀(𝑀) + 𝑁𝑂𝑅𝑀(𝑚) − 𝐴𝑁𝐷(𝑀, 𝑚) 𝑇𝑣 𝑀, 𝑚, 𝛼 = 1 − 𝐴𝑁𝐷(𝑀, 𝑚) 𝛼𝐸𝑋𝐶(𝑀, 𝑚) + 1 − 𝛼 𝐸𝑋𝐶(𝑚, 𝑀) + 𝐴𝑁𝐷(𝑀, 𝑚) Situations where: (a) candidate m misses a feature seen in active M, and (b) it contains some novel feature not seen in M may be distinguished! At a>0.5, cases (a) will be relatively more penalized than the symmetric situation (b). A raw guess of a should suffice! Three implementations of Tv are considered: • Tv+ (a=0.9) • Tv (a=0.7) • Tv- (a=0.3)
  • 12. 2. Fine, but « how similar is similar »? • You may be a believer of the dogma « Tanimoto>0.85 » (T<0.15) – But the Bible mentions not the other metrics, less subjected to religious fervor. • Alternatively, try to infer reasonable choices of similarity radii for each Chemical Space (CS – the combination of Descriptor Space & Similarity score) – For each query, on every target, compute s* corresponding to the « optimal » SVS scenario. – This also allows to measure & benchmark SVS success with respect to its Operational Premises (CS, nature of Target, degree of complexity of the query, etc).
  • 13. s W 1.0 )()( )()( )( E FN E FP FNFP NN NN s   W SS S   A basic SVS Optimality Criterion: W L(M,m) l L(M,m)> l S(M,m)s True Positives (TP)   False Positives (FP)  False (?) Negatives (FN)  True Negatives (TN)  )()( )()( )( E FN E FP FNFP NN NN s   W SS S   s Activity (profile) differences L(m,M) Λ 𝑀, 𝑚 = 0 𝑖𝑓 𝑝𝐾𝑖(𝑀) − 𝑝𝐾𝑖(𝑚) < 0.5 1 𝑖𝑓 𝑝𝐾𝑖(𝑀) − 𝑝𝐾𝑖(𝑚) > 3.0 𝑝𝐾𝑖(𝑀) − 𝑝𝐾𝑖(𝑚) − 0.5 2.5 𝑜𝑡ℎ𝑒𝑟𝑤𝑖𝑠𝑒
  • 14. s W 1.0 )()( )()( )( E FN E FP FNFP NN NN s   W SS S   A basic SVS Optimality Criterion: W L(M,m) l L(M,m)> l S(M,m)s True Positives (TP)   False Positives (FP)  False (?) Negatives (FN)  True Negatives (TN)  )()( )()( )( E FN E FP FNFP NN NN s   W SS S   s Activity (profile) differences L(m,M)
  • 15. The Ascertained Optimality Excess X Compound Pairs selected at cutoff s Random S values Meaningful S values Var(W)W X Fraction of Compound Pairs selected at cutoff s      sVars randrand WWWX X
  • 16. Workflow ForEach Target T Set database db=set of tested ligands (known pKi ) + decoy set (pKi=0); ForEach Query M of T ForEach DescriptorSpace D ForEach SimilarityScore S # Start Current SVS experiment defined by Target, Query, Descriptors & Similarity Score ForEach m!=M in db Calculate S(M,m)|D ; EndLoop(m) Scan over s → X(s) and return s* such that X(s*)=maximal; Classify SVS(T,M,D,S) wrt X(s*) as « failed », « acceptable », « good » or « excellent » ; EndLoop S; EndLoop D; EndLoop M; EndLoop T; Analyze Success Rates & s* distributions in terms of various Operational premises (nature of T, complexity of M, choice of D, of S or of D-S combinations)
  • 17. Insights: (1) – So much for dogmas! 0 5 10 15 20 25 30 35 40 0.04 0.08 0.12 0.16 0.2 0.24 0.28 0.32 0.36 0.4 0.44 0.48 0.52 0.56 0.6 %ofTanimoto-basedqueriesofGoodoptimalitylevelatd* d* FPT1 treeSY03 s* Use distribution to « teach » the web server how to rank prospective SVS hits! Top Hits (0) Good Hits (1) Average Hits (2) Acceptable Hits (3) Are these Hits? (4) Ignore… Top Hits (0) Good Hits (1) Average Hits (2) Acceptable Hits (3) Are these Hits? (4) Ignore…
  • 18. Insights: (2) – Tversky at a>0.5: an excellent similarity scoring scheme. 0 2 4 6 8 10 12 14 16 Tv+ Tv T RW AW Tv- E A R R:all-acceptable R:all-good R:all-excellent Relative«marketshare»ofmetric:fractionofSVSrunsbasedonshown metric,outofallSVSexperimentshavingreachedgivensuccesslevels
  • 19. Tv+ may pick actives that are more complex than queries (NK1 example) T
  • 20. Insights: (3) – Trends with respect to target classes could be evidenced… Relative«marketshare»ofmetric:fractionofSVSruns–withintargetclasses-based onshownmetric,outofallSVSexperimentshavingreachedgivensuccesslevels 0 2 4 6 8 10 12 14 16 Tv+ Tv T RW AW Tv- E A R R:all-good R:Kinases-good R:monoamineGPCR-good R:otherGPCR-good
  • 21. Insights: (4) – when the query compound is complex, the metric matters less Relative«marketshare»ofmetric:fractionofSVSruns–withinquerycomplexity classes-basedonshownmetric,outofallSVShavingreachedgivensuccesslevels 0 2 4 6 8 10 12 14 16 Tv+ Tv T RW AW Tv- E A R R:all-good R:pharm-high-good R:pharm-low-good
  • 22. Some conclusions • The study has highlighted many interesting aspects – Intrinsic usefulness of Tversky scores biased towards of query feature loss penalty: a=0.9…0.7 will do! – Other target-, query complexity-, query activity-, descriptor space-dependent trends of the SVS success – Some inevitable sources of bias, showing that not even ChEMBL is not large/diverse enough to cover it all… • Main message: use this protocol – or related – to calibrate web servers, rather than sticking to well-studied metrics and descriptors for which « Universal » similarity cutoffs are believed to hold. • Try infochim.u-strasbg.fr/webserv/VSEngine.html – to our knowledge, the only public SVS server to support atypical, but powerful metrics coupled to chemically relevant, pH-sensitive descriptor spaces… all while exploiting the power of ChemAxon tools!