Your SlideShare is downloading. ×
0
Smashing Molecules
Smashing Molecules
Smashing Molecules
Smashing Molecules
Smashing Molecules
Smashing Molecules
Smashing Molecules
Smashing Molecules
Smashing Molecules
Smashing Molecules
Smashing Molecules
Smashing Molecules
Smashing Molecules
Smashing Molecules
Smashing Molecules
Smashing Molecules
Smashing Molecules
Smashing Molecules
Smashing Molecules
Smashing Molecules
Smashing Molecules
Smashing Molecules
Smashing Molecules
Smashing Molecules
Smashing Molecules
Smashing Molecules
Smashing Molecules
Smashing Molecules
Smashing Molecules
Smashing Molecules
Smashing Molecules
Smashing Molecules
Smashing Molecules
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×
Saving this for later? Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime – even offline.
Text the download link to your phone
Standard text messaging rates apply

Smashing Molecules

663

Published on

Published in: Technology, Education
0 Comments
2 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
663
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
10
Comments
0
Likes
2
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide

Transcript

  • 1. Smashing  Molecules  How  Molecular  Fragments  Allow  us  to  Explore  Large   Chemical  Spaces   Rajarshi  Guha  &  Trung  Nguyen   NIH  Center  for     Transla9onal  Therapeu9cs     Chemaxon  UGM   September  2011  
  • 2. Outline  •  Fragments  as  the  building  blocks  of  chemistry  •  Fragments  and  SAR  •  Fragments  and  ac9vity  profiles  
  • 3. Big  Data  for  Some  Problems   •  Halevy  et  al  discuss  the  effec9veness  of   extremely  large  datasets   •  Their  applica9on  focuses  on  machine   transla9on  –  see  the  Google  n-­‐gram  corpus   •  They  suggest  that  such  extremely  large  datasets   are  useful  because  they  effec9vely  encompass   all  n-­‐grams  (phrases)  commonly  used   •  Domain  is  rela9vely  constrained  Halevy  et  al,  IEEE  Intelligent  Systems,  2009,  24,  8-­‐12  
  • 4. Google  Scale  in  Chemistry?   •  What  would  be  the  equivalent  of  an  n-­‐gram   corpus  in  chemistry?   –  Fragments   –  A  more  direct  analogy  can  be  made  by  using  LINGO’s   •  It  is  possible  to  generate  arbitrarily  large  (virtual)   compound  and    fragment  collec9ons   •  But  would  such  a  collec9on  span  all  of   “commonly  used”  chemistry?   –  Depending  on  the  ini9al  compound  set,  yes   –  But  we’re  also  interested  in  going  beyond  such  a   “commonly  used”  set  Fink  T,  Reymond  JL,  J  Chem  Inf  Model,  2007,  47,  342  
  • 5. Fragment  Diversity  •  Consider  a  set  of  bioac9ves  such  as  the  LOPAC   collec9on,  1280  compounds  •  Using  exhaus9ve     fragmenta9on  we  get     40 2,460  unique  fragments   Percent of Total 30•  On  the  MLSMR     (~  372K  compounds),     20 we  get    164,583     10 fragments   0 0 1 2 3 4 log Fragment Frequency
  • 6. Fragment  Diversity   6 All  fragments   4 Fragments  occurring  in     5  to  50  molecules   4 2 2PC 2 0 PC 2 0 -2 -2 -4 -4 -4 -2 0 2 -4 -2 0 2 4 PC 1 PC 1 •  Distribu9on  of  MLSMR  fragments  in  BCUT  space  
  • 7. What  Do  We  Do  with  Fragments?   •  Assuming  we  obtain  fragments  from  a  large   enough  collec9on  what  do  we  do?   –  Learning  from  fragments  –  QSARs,  genera9ve   models   –  Use  fragments  as     filters,  alterna9ve     to  clustering   –  Explore  chemotypes   and  ac9vity   –  Scaffold  level  promiscuity  White,  D  and  Wilson,  RC,  J  Chem  Inf  Model,  2010,  50,  1257-­‐1274  
  • 8. Scaffold  AcKvity  Diagrams  •  Network  oriented  view  of  fragment  (scaffold)   collec9ons   –  Similar  in  idea  to   Scaffold  Hunter  etc   –  Not  purely  hierarchical  •  Color  by  arbitrary     proper9es  •  Quickly  assess  u9lity   of  a  scaffold  •  Try  it  online    
  • 9. What  Makes  a  Good  Scaffold?  •  What  makes  a  good   scaffold?   –  Size,  complexity,  …   –  Do  the  members   represent  an  SAR  or  not?   –  Intui9on  and  experience   also  play  a  role  
  • 10. Scaffold  QSAR   Fit  PLS  or  ridge   regression  model   0 ! ! !! ! !2 ! ! ! ! Predicted ! !4 ! ! !! !Evaluate  topological     ! !and  physicochemical     ! !6descriptors  for  the     ! !R-­‐groups   !8 Characterize  the     !8 !6 !4 !2 0 Observed SAR  landscape  
  • 11. Scaffold  QSAR  -­‐  Drawbacks  •  Many  scaffolds  have  few  (5  to  10)  members  •  Invariably,  more  features  than  observa9ons  •  If  the  number  of  R-­‐groups  is  large,  the  feature   matrix  can  be  very  sparse   –  Less  of  a  problem  for  combinatorial  libraries  •  A  linear  fit  may  not  be  the  best  approach  to   correla9ng  R-­‐groups  to  the  ac9vi9es   –  Difficult  to  choose  a  model  type  a  priori  
  • 12. Fragment  AcKvity  Profiles  •  Using  scaffolds  in  HTS  triage  usually  leads  to   two  ques9ons   –  What  is  known  about  the  chemical  series  with   respect  to  the  intended  target?   –  What  compound  classes  are  known  to  modulate   the  intended  target  &  how  similar  are  they  to   series  in  ques9on  •  We’re  interested  in  exploring  summaries  of   ac=vity,  grouped  by  scaffolds  and  targets  
  • 13. Fragment  AcKvity  Profiles  •  We  use  ChEMBL  (08)  as  the  source  of   bioac9vity  across  mul9ple  targets  •  Preprocess  the  database   –  Generate  scaffolds  (exhaus9ve  enumera9on  of   combina9ons  of  SSSR’s)   –  Normalize  ac9vity  data  so  that  we  compare  the   ac9vity  of  a  molecule  across  different  assays  
  • 14. Database  Setup  •  Preprocessing  steps  available  as  a  Java  servlet   –  hkp://tripod.nih.gov/files/chembl-­‐servlets.zip  •  Need  ChEMBL  installed  in  Oracle;  we  add   some  extra  tables   –  Fragment  structures  and  computed  proper9es   –  Aggregated  assay  ac9vity  summary   •  Only  consider  assays  with  IC50’s  in  nM  and  uncensored   data,  more  than  5  observa9ons  and  a  MAD  >  0   –  (Robust)  z-­‐scored  ac9vi9es  
  • 15. Some  Fragment  StaKsKcs  •  Considered  Z-­‐score  range  of  -­‐40  to  15  •  There  were  12,887  molecules  lying  outside   this  range   15 50 Number of compounds Percentage of assays 40 10 30 20 5 10 0 0 1.0 1.5 2.0 2.5 -40 -30 -20 -10 0 10 log(Number of molecules) Z-score
  • 16. Some  Fragment  StaKsKcs  •  Next,  iden9fy  fragments  with  8  to  20  atoms   and  occurring  in  100  to  900  molecules  •  Gives  us  1,746  fragments   40 Percentage of Fragments 30 20 10 0 200 400 600 800 Num Molecules
  • 17. Some  Fragment  StaKsKcs  •  We  can  query  the  fragment  tables  to  get   ac9vity  summaries     40169 64473 115654 for  individual     60 N = 1457 N = 1595 N = 1515 50 40 fragments   30 20 10 0•  For  these  examples   -20 0 5390 20 -40 -20 5486 0 20 -20 -10 13485 0 10 60 we  consider  the   Percent of Total N = 1489 N = 1578 N = 1455 50 40 30 full  range  of  Z-­‐   20 10 0 scores   60 -5 N = 1280 0 778 5 10 15 0 N = 1918 10 2723 20 -60 -40 N = 2641 -20 4058 0 20 50 40 30 20 10 0 -30 -20 -10 0 10 -600 -400 -200 0 -50 0 50 Z-Score
  • 18. Exploring  AcKvity  Profiles   Ac9vity  distribu9ons   of  parent  molecules    Fragments  from  ChEMBL   across  all  targets   Z-­‐scores  for  individual   molecules  against  a     specific  target  
  • 19. Exploring  AcKvity  Profiles  •  User  can  draw  a  molecule  and  fragment  on   the  fly  •  Use  generated   fragments  to     create     ac9vity     histograms  
  • 20. Target  SelecKon  •  Employs  the  ChEMBL   target  hierarchy  •  Can  select  target     families  or  individual   targets  
  • 21. Similar  Fragments  with  Similar  Profiles?  •  Consider  658  fragments  with  >  10  atoms  and   occurring  in  500  to  1200  molecules  •  Overall,  the  fragments   25 tend  to  be  dissimilar     20 –  95th  percen9le  is  just   Percentage of pairs 0.50   15•  1,873  pairs  do  exhibit   10 Tc  >  0.8   5   0 0.0 0.2 0.4 0.6 0.8 1.0 Tanimoto Similarity
  • 22. Comparing  AcKvity  Profiles  •  Compare  ac9vity  profiles  with  the  K-­‐S  sta9s9c  •  Color  corresponds  to     1.0 p-­‐value  of  the  K-­‐S  test   0.6 0.5•  No  obvious  correla9on   0.8 between  fragment   0.4 0.6 K-S statistic similarity  &  ac9vity   0.3 0.4 profile  similarity   0.2 0.2•  Probably  not  rigorous   0.1 when  a  scaffold  has  few   0.0 0.0 0.80 0.85 0.90 0.95 1.00 parent  molecules   Tanimoto Similarity
  • 23. Exploring  Profiles  for  Fragment  Pairs  •  Compare  ac9vity   distribu9ons  across   all  targets  in  a   pairwise  fashion  •  Can  also  generate   comparison  for  a   single  target,  but   requires  data  for  all   the  fragments  
  • 24. Looking  for  SelecKve  Fragments  •  Interes9ng  to  visually  explore  fragment  pairs  •  Can  become  tedious,  especially  in  a  database   as  big  as  ChEMBL  •  Can  we  automate  this  type  of  analysis?   –  Iden9fy  fragment  pairs  with  very  different  ac9vity   distribu9ons?   –  Iden9fy  fragments  with  a  preference  for  a  certain   target  (class)?  
  • 25. Mean Z−Score Ac −10 −5 0 et yl ch Ad olin re e ne re rg cep ic 3 re tor An ce gi pt 50 ot or 4056459 en si n Ag 6ge re c ne ce −r p 14 el AN tor at IO ed N class   IC 107 pe pt id C e target   6 re 1A ce pt C 2 C or ch C em C am 5 ok AT k C in ION e 19 XC re IC ch ce em pt 1 or ok in Cm e 19 re gc c C ept 1 YP or _1 C 1 3 YP B1 _ C 11B 6 YP 2 _1 8 C 9A1 YP C _1A 14 YP 2 _2 C C1 7 YP 9 _ C 2C 17 YP 9 _ 13 C 2D6 YP _ 20 C 3A4 YP C _4A 2 YP 1 _4 24 C A11 YP _ 2 C 4A3 YP D _ op C 4F 24 am YP 2 in _5 e 9 re A1 En ce pt 18 do or th el in dru 4 G re g nR ce H H p is 2 ta re tor m in cep e 2et re tor ab ce ot pt ro 1 or pi c M gl C M1 ut H 2 0A a re N ma cep N e t 1 eu uro e re tor ro k c pe inin ept o 1 pt id rec r e ep Y to 2 N r or ece r ep pt in o 10 ep r hr in 1 N e R 1H 59 N 3 R 3A 4 N 1 R 3A 4 O 2 pi NR oi d 3C 2 re 3 ce pt 4 or po PA 86 ta F ss Se iu m 3 ro to •  Count  number  of  parent  molecules  tested  against  the   ni So n S1 12 di r A um ece _h pto 42 yd r ro ge 7 n 153 Tk •  Evaluate  mean  ac9vity  of  parent  molecules  within  a  target   •  Selec9vity  of  1-­‐phenylimidazole  for  CYP450  has  been  noted  Wilkinson  et  al,  Biochem  Pharmacol,  1983,  32,  997-­‐1003   Targetwise  AcKvity  Profiles  
  • 26. Mean Z−Score −8 −6 −4 −2 0 2 Ad re n er g ic A2 5 re A An ce pt 2 gi or 4055899 ot e Br nsin Ag 23 ad c yk rec in ep in t 7 al re or ci ce um pt 6 se or ns in C g 7 1A re ce pt 24 or C C C ch am e C 2C ho mo ATI k le k O cy ine N IC 67 st ok rec in ep in t 102 re or ce pt 6 or C m C g 18 YP c _2 C D 3 YP 6 _3 D 8 op Do A4 am pa in m e in 11 r e ED ece En G pt 19 do re or th ce el G in pt o 16 lu ca rec r go ep n to 2 G re r nR ce H H pt 1 is re or ta Le min cep e to 16 uk ot r r rie ece ne pt 49 re or ce pt 1 or M 10 3ro A pi c M C M1 gl H 2 ut 2B am rec N a ep t 33N eu te eu ro re or ro ki ce pe nin pt 18 r or pt id ece e pt Y 118 r or N or ece ep pt o 1 in ep r hr in 1 e N R 1I 4 O 1 pi NR oi 3C d 2 re 4 ce pt 11 or •  But  reported  as  dopamine  agonists   O Pr th 8 os er ta no id PA 3 re F ce pt 28 or R 5 eg S1 38 A S2 with  preference  for  a  specific  target  class   7 1 •  Iden9fied  benzylpyrrolidine  as  a  fragment   Se S9 45 ro Se A t on roto in ni 4 re n ce pt 9 or 29 Tk Tk 2 l Targetwise  AcKvity  Profiles  
  • 27. Fragment  or  Scaffold?  •  I’ve  been  using  fragment  &  scaffold   interchangeably  –  not  always  true  •  Chemists  have  an  intui9ve  idea  of  what  a   scaffold  is  •  Can  we  encode  the  idea  of  scaffold-­‐like  or   fragment-­‐like  •  We  use  the  concept  of     Size  of  fragment   Signal-­‐to-­‐Noise     µ SD  of  number  of  atoms   Ra9o   SNR = not  in  the  fragment,     ! considered  over  the     parent  molecules  
  • 28. Fragment  or  Scaffold  •  Par9al  distribu9on  of  SNR  values  for  fragments   with  atom  count  >  8  &  <  20   60 50 Percentage of Fragments 40 30 20 10 0 0 1 2 3 4 5 6 SNR
  • 29. Fragment  or  Scaffold  •  Large  SNR’s  associated  with  Murcko-­‐like  fragments  •  A  useful  SNR  cutoff  is  an  open  ques9on   SNR  =  8.50   SNR  =  9.10   SNR  =  12.09   SNR  =  0.83   SNR  =  0.43   SNR  =  0.36  
  • 30. AcKvity  Profiles  &  SNR  •  Given  a  fragment,  evaluate  SD  of  the  number  of   atoms  in  the  parent  molecules  that  are  not  part   of  the  fragment  •  Label  the  parent  molecules  based  on     –  If  number  of  atoms  not  in  the  fragment  >  SD,  non   core-­‐like   –  Otherwise  core-­‐like  •  Visualize  the  ac9vity  distribu9ons  of  the  parent   molecules,  grouped  by  the  label    
  • 31. AcKvity  Profiles  &  SNR   -50 0 50 -50 0 50 20967 20967 44591 44591 Core-like Not core-like Core-like Not core-likePercentage of Total 80 60 40 20 -50 0 50 -50 0 50 High  SNR   Z-Score -30 -20 -10 0 10 -30 -20 -10 0 10 801 801 68604 68604 Core-like Not core-like Core-like Not core-likePercentage of Total 80 60 40 20 Low  SNR   -30 -20 -10 0 10 -30 -20 -10 0 10 Z-Score
  • 32. Downloads  •  Scaffold  ac9vity  networks  •  Fragment  Ac9vity  Profiler   –  SQL  &  servlet  sources   –  Client  sources   –  Online  version  

×