Your SlideShare is downloading. ×
EUGM 2013 - Dragos Horváth (Labooratoire de Chemoinformatique Univ Strasbourg-CNRS): Dealing with 'exotic' similarity metrics - live on the Web
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×
Saving this for later? Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime – even offline.
Text the download link to your phone
Standard text messaging rates apply

EUGM 2013 - Dragos Horváth (Labooratoire de Chemoinformatique Univ Strasbourg-CNRS): Dealing with 'exotic' similarity metrics - live on the Web

1,714
views

Published on

The latest developments of our ChemAxon-powered similarity-driven virtual screening servers included implementation of both classical and ‘exotic’ similarity search metrics, notably the famous Tversky …

The latest developments of our ChemAxon-powered similarity-driven virtual screening servers included implementation of both classical and ‘exotic’ similarity search metrics, notably the famous Tversky score everybody cites, but no one uses in practical drug discovery. Why? Are they really useful? Would our implementation help to promote their use?

Published in: Technology

0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
1,714
On Slideshare
0
From Embeds
0
Number of Embeds
3
Actions
Shares
0
Downloads
0
Comments
0
Likes
0
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide

Transcript

  • 1. Dealing with ‘exotic’ similaritymetricsHow to set up a (ChemAxon-powered)Similarity-driven Virtual Screening server…Dragos Horvath, dhorvath@unistra.frUMR 7140 CNRS – Université de Strasbourg
  • 2. Introduction & Definitions• Similarity-based Virtual Screening (SVS):– Search, in a database of candidates m for similar analoguesof a query compound M of wanted properties, hoping thatthe « similarity principle » magic will operate.• Molecular Similarity S(M,m):– distance (metric) between the two Descriptor Space (DS)points 𝐷 𝑀 , 𝐷 𝑚 - let us call these 𝐷, 𝑑, for simplicity.• Similarity Radius s defines « how similar is similar »– Delimits a sphere in descriptor space around M, thought tocontain a minimum of inactive, but a maximum of active m.• Virtual Hits – aka True & False « Positives » (TP,FP):– Compounds m with S(M,m)<s
  • 3. Compound Sets• For server calibration:– Candidate database: 165 ChEMBL ligand sets with >50molecules of reported pKi values with respect to the 165associated receptors & enzymes (targets T).– Queries of T: MT1, MT2 … MTi, i=1..QT is composed of the top1/5 (max 100) actives on T, plus 1/5 (max 100) of binders ofmedium potency, can be classified by pharmacophorecomplexity (Nr. of populated FPT1 triplets)– 10,000 randomly picked commercial molecules from ZINC,assumed to be inactive “decoys”.• Operational database:– 1.5 M commercial compounds, from various sources– Above « reference » molecules, for annotation purposes
  • 4. Compound Sets
  • 5. Compound Sets
  • 6. Descriptor Spaces
  • 7. Descriptor Spaces
  • 8. All are Feature CountsDi(M) = integer (positiveor null) population levelof « feature » i (asubstructure or apharmacophore triplet)in molecule MDescriptor Spaces
  • 9. Dissimilarity Scores…• Based on the comparison of descriptor vectors 𝐷, 𝑑𝑁 𝑀 𝑁𝑂𝑅𝑀(𝑀) = 𝐷𝑖2𝑁(𝑀)𝑖=1𝑁𝐴𝑁𝐷 𝑚, 𝑀 𝐴𝑁𝐷(𝑚, 𝑀) = 𝐷𝑖 × 𝑑𝑖𝑁 𝑂𝑅(𝑚,𝑀)𝑖=1𝑁𝐸𝑋𝐶 𝑀, 𝑚 𝐸𝑋𝐶(𝑀, 𝑚) = 𝐷𝑖2𝑖|𝑑 𝑖=0
  • 10. Euclidean & Related…𝐸 𝑚, 𝑀 = 𝐷𝑖 − 𝑑𝑖2𝑁 𝑂𝑅(𝑚,𝑀)𝑖=1𝑅 𝑚, 𝑀 =𝐷𝑖 − 𝑑𝑖2𝑁 𝑂𝑅(𝑚,𝑀)𝑖=1𝑁 𝑂𝑅(𝑚, 𝑀)𝐴 𝑚, 𝑀 =𝐷𝑖 − 𝑑𝑖𝑁 𝑂𝑅(𝑚,𝑀)𝑖=1𝑁 𝑂𝑅(𝑚, 𝑀)𝑅𝑊 𝑚, 𝑀 = 𝑅 𝑚, 𝑀𝑁 𝐸𝑋𝐶(𝑚, 𝑀) + 𝑁𝐸𝑋𝐶(𝑀, 𝑚)𝑁 𝑂𝑅(𝑚, 𝑀)𝐴𝑊 𝑚, 𝑀 = 𝐴 𝑚, 𝑀𝑁 𝐸𝑋𝐶(𝑚, 𝑀) + 𝑁𝐸𝑋𝐶(𝑀, 𝑚)𝑁 𝑂𝑅(𝑚, 𝑀)
  • 11. (A)Symmetric Correlation Scores –Tanimoto & Tversky𝑇 𝑀, 𝑚 = 1 −𝐴𝑁𝐷(𝑀, 𝑚)𝑁𝑂𝑅𝑀(𝑀) + 𝑁𝑂𝑅𝑀(𝑚) − 𝐴𝑁𝐷(𝑀, 𝑚)𝑇𝑣 𝑀, 𝑚, 𝛼 = 1 −𝐴𝑁𝐷(𝑀, 𝑚)𝛼𝐸𝑋𝐶(𝑀, 𝑚) + 1 − 𝛼 𝐸𝑋𝐶(𝑚, 𝑀) + 𝐴𝑁𝐷(𝑀, 𝑚)Situations where:(a) candidate m misses a feature seen in activeM, and(b) it contains some novel feature not seen in Mmay be distinguished! At a>0.5, cases (a) will berelatively more penalized than the symmetricsituation (b).A raw guess of a should suffice! Threeimplementations of Tv are considered:• Tv+ (a=0.9)• Tv (a=0.7)• Tv- (a=0.3)
  • 12. 2. Fine, but « how similar is similar »?• You may be a believer of the dogma « Tanimoto>0.85 »(T<0.15)– But the Bible mentions not the other metrics, less subjectedto religious fervor.• Alternatively, try to infer reasonable choices ofsimilarity radii for each Chemical Space (CS – thecombination of Descriptor Space & Similarity score)– For each query, on every target, compute s* correspondingto the « optimal » SVS scenario.– This also allows to measure & benchmark SVS success withrespect to its Operational Premises (CS, nature of Target,degree of complexity of the query, etc).
  • 13. sW1.0)()()()()( EFNEFPFNFPNNNNsWSSSA basic SVS Optimality Criterion: WL(M,m) l L(M,m)> lS(M,m)sTruePositives(TP)  FalsePositives(FP) False (?)Negatives(FN) TrueNegatives(TN) )()()()()( EFNEFPFNFPNNNNsWSSSsActivity (profile) differences L(m,M)Λ 𝑀, 𝑚 =0 𝑖𝑓 𝑝𝐾𝑖(𝑀) − 𝑝𝐾𝑖(𝑚) < 0.51 𝑖𝑓 𝑝𝐾𝑖(𝑀) − 𝑝𝐾𝑖(𝑚) > 3.0𝑝𝐾𝑖(𝑀) − 𝑝𝐾𝑖(𝑚) − 0.52.5𝑜𝑡ℎ𝑒𝑟𝑤𝑖𝑠𝑒
  • 14. sW1.0)()()()()( EFNEFPFNFPNNNNsWSSSA basic SVS Optimality Criterion: WL(M,m) l L(M,m)> lS(M,m)sTruePositives(TP)  FalsePositives(FP) False (?)Negatives(FN) TrueNegatives(TN) )()()()()( EFNEFPFNFPNNNNsWSSSsActivity (profile) differences L(m,M)
  • 15. The Ascertained Optimality Excess XCompound Pairs selected at cutoff sRandom SvaluesMeaningfulS valuesVar(W)WXFraction of Compound Pairs selected at cutoff s     sVars randrandWWWXX
  • 16. WorkflowForEach Target TSet database db=set of tested ligands (known pKi ) + decoy set (pKi=0);ForEach Query M of TForEach DescriptorSpace DForEach SimilarityScore S# Start Current SVS experiment defined by Target, Query, Descriptors & Similarity ScoreForEach m!=M in dbCalculate S(M,m)|D ;EndLoop(m)Scan over s → X(s) and return s* such that X(s*)=maximal;Classify SVS(T,M,D,S) wrt X(s*) as « failed », « acceptable », « good » or « excellent » ;EndLoop S;EndLoop D;EndLoop M;EndLoop T;Analyze Success Rates & s* distributions in terms of various Operational premises (nature of T,complexity of M, choice of D, of S or of D-S combinations)
  • 17. Insights: (1) – So much for dogmas!05101520253035400.04 0.08 0.12 0.16 0.2 0.24 0.28 0.32 0.36 0.4 0.44 0.48 0.52 0.56 0.6%ofTanimoto-basedqueriesofGoodoptimalitylevelatd*d*FPT1treeSY03s*Use distribution to« teach » the webserver how torank prospectiveSVS hits!Top Hits (0)GoodHits (1)AverageHits (2)AcceptableHits (3)Are these Hits? (4) Ignore…Top Hits (0)GoodHits (1)AverageHits (2)AcceptableHits (3)Are these Hits? (4) Ignore…
  • 18. Insights: (2) – Tversky at a>0.5: anexcellent similarity scoring scheme.0246810121416Tv+ Tv T RW AW Tv- E A RR:all-acceptableR:all-goodR:all-excellentRelative«marketshare»ofmetric:fractionofSVSrunsbasedonshownmetric,outofallSVSexperimentshavingreachedgivensuccesslevels
  • 19. Tv+ may pick actives that are morecomplex than queries (NK1 example)T
  • 20. Insights: (3) – Trends with respect totarget classes could be evidenced…Relative«marketshare»ofmetric:fractionofSVSruns–withintargetclasses-basedonshownmetric,outofallSVSexperimentshavingreachedgivensuccesslevels0246810121416Tv+ Tv T RW AW Tv- E A RR:all-goodR:Kinases-goodR:monoamineGPCR-goodR:otherGPCR-good
  • 21. Insights: (4) – when the query compound iscomplex, the metric matters lessRelative«marketshare»ofmetric:fractionofSVSruns–withinquerycomplexityclasses-basedonshownmetric,outofallSVShavingreachedgivensuccesslevels0246810121416Tv+ Tv T RW AW Tv- E A RR:all-goodR:pharm-high-goodR:pharm-low-good
  • 22. Some conclusions• The study has highlighted many interesting aspects– Intrinsic usefulness of Tversky scores biased towards of query feature losspenalty: a=0.9…0.7 will do!– Other target-, query complexity-, query activity-, descriptor space-dependenttrends of the SVS success– Some inevitable sources of bias, showing that not even ChEMBL is notlarge/diverse enough to cover it all…• Main message: use this protocol – or related – to calibrate webservers, rather than sticking to well-studied metrics and descriptorsfor which « Universal » similarity cutoffs are believed to hold.• Try infochim.u-strasbg.fr/webserv/VSEngine.html – to ourknowledge, the only public SVS server to support atypical, butpowerful metrics coupled to chemically relevant, pH-sensitivedescriptor spaces… all while exploiting the power of ChemAxontools!

×