Smashing Molecules
Upcoming SlideShare
Loading in...5
×
 

Like this? Share it with your network

Share

Smashing Molecules

on

  • 913 views

 

Statistics

Views

Total Views
913
Views on SlideShare
913
Embed Views
0

Actions

Likes
2
Downloads
7
Comments
0

0 Embeds 0

No embeds

Accessibility

Categories

Upload Details

Uploaded via as Adobe PDF

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

Smashing Molecules Presentation Transcript

  • 1. Smashing  Molecules  How  Molecular  Fragments  Allow  us  to  Explore  Large   Chemical  Spaces   Rajarshi  Guha  &  Trung  Nguyen   NIH  Center  for     Transla9onal  Therapeu9cs     Chemaxon  UGM   September  2011  
  • 2. Outline  •  Fragments  as  the  building  blocks  of  chemistry  •  Fragments  and  SAR  •  Fragments  and  ac9vity  profiles  
  • 3. Big  Data  for  Some  Problems   •  Halevy  et  al  discuss  the  effec9veness  of   extremely  large  datasets   •  Their  applica9on  focuses  on  machine   transla9on  –  see  the  Google  n-­‐gram  corpus   •  They  suggest  that  such  extremely  large  datasets   are  useful  because  they  effec9vely  encompass   all  n-­‐grams  (phrases)  commonly  used   •  Domain  is  rela9vely  constrained  Halevy  et  al,  IEEE  Intelligent  Systems,  2009,  24,  8-­‐12  
  • 4. Google  Scale  in  Chemistry?   •  What  would  be  the  equivalent  of  an  n-­‐gram   corpus  in  chemistry?   –  Fragments   –  A  more  direct  analogy  can  be  made  by  using  LINGO’s   •  It  is  possible  to  generate  arbitrarily  large  (virtual)   compound  and    fragment  collec9ons   •  But  would  such  a  collec9on  span  all  of   “commonly  used”  chemistry?   –  Depending  on  the  ini9al  compound  set,  yes   –  But  we’re  also  interested  in  going  beyond  such  a   “commonly  used”  set  Fink  T,  Reymond  JL,  J  Chem  Inf  Model,  2007,  47,  342  
  • 5. Fragment  Diversity  •  Consider  a  set  of  bioac9ves  such  as  the  LOPAC   collec9on,  1280  compounds  •  Using  exhaus9ve     fragmenta9on  we  get     40 2,460  unique  fragments   Percent of Total 30•  On  the  MLSMR     (~  372K  compounds),     20 we  get    164,583     10 fragments   0 0 1 2 3 4 log Fragment Frequency
  • 6. Fragment  Diversity   6 All  fragments   4 Fragments  occurring  in     5  to  50  molecules   4 2 2PC 2 0 PC 2 0 -2 -2 -4 -4 -4 -2 0 2 -4 -2 0 2 4 PC 1 PC 1 •  Distribu9on  of  MLSMR  fragments  in  BCUT  space  
  • 7. What  Do  We  Do  with  Fragments?   •  Assuming  we  obtain  fragments  from  a  large   enough  collec9on  what  do  we  do?   –  Learning  from  fragments  –  QSARs,  genera9ve   models   –  Use  fragments  as     filters,  alterna9ve     to  clustering   –  Explore  chemotypes   and  ac9vity   –  Scaffold  level  promiscuity  White,  D  and  Wilson,  RC,  J  Chem  Inf  Model,  2010,  50,  1257-­‐1274  
  • 8. Scaffold  AcKvity  Diagrams  •  Network  oriented  view  of  fragment  (scaffold)   collec9ons   –  Similar  in  idea  to   Scaffold  Hunter  etc   –  Not  purely  hierarchical  •  Color  by  arbitrary     proper9es  •  Quickly  assess  u9lity   of  a  scaffold  •  Try  it  online    
  • 9. What  Makes  a  Good  Scaffold?  •  What  makes  a  good   scaffold?   –  Size,  complexity,  …   –  Do  the  members   represent  an  SAR  or  not?   –  Intui9on  and  experience   also  play  a  role  
  • 10. Scaffold  QSAR   Fit  PLS  or  ridge   regression  model   0 ! ! !! ! !2 ! ! ! ! Predicted ! !4 ! ! !! !Evaluate  topological     ! !and  physicochemical     ! !6descriptors  for  the     ! !R-­‐groups   !8 Characterize  the     !8 !6 !4 !2 0 Observed SAR  landscape  
  • 11. Scaffold  QSAR  -­‐  Drawbacks  •  Many  scaffolds  have  few  (5  to  10)  members  •  Invariably,  more  features  than  observa9ons  •  If  the  number  of  R-­‐groups  is  large,  the  feature   matrix  can  be  very  sparse   –  Less  of  a  problem  for  combinatorial  libraries  •  A  linear  fit  may  not  be  the  best  approach  to   correla9ng  R-­‐groups  to  the  ac9vi9es   –  Difficult  to  choose  a  model  type  a  priori  
  • 12. Fragment  AcKvity  Profiles  •  Using  scaffolds  in  HTS  triage  usually  leads  to   two  ques9ons   –  What  is  known  about  the  chemical  series  with   respect  to  the  intended  target?   –  What  compound  classes  are  known  to  modulate   the  intended  target  &  how  similar  are  they  to   series  in  ques9on  •  We’re  interested  in  exploring  summaries  of   ac=vity,  grouped  by  scaffolds  and  targets  
  • 13. Fragment  AcKvity  Profiles  •  We  use  ChEMBL  (08)  as  the  source  of   bioac9vity  across  mul9ple  targets  •  Preprocess  the  database   –  Generate  scaffolds  (exhaus9ve  enumera9on  of   combina9ons  of  SSSR’s)   –  Normalize  ac9vity  data  so  that  we  compare  the   ac9vity  of  a  molecule  across  different  assays  
  • 14. Database  Setup  •  Preprocessing  steps  available  as  a  Java  servlet   –  hkp://tripod.nih.gov/files/chembl-­‐servlets.zip  •  Need  ChEMBL  installed  in  Oracle;  we  add   some  extra  tables   –  Fragment  structures  and  computed  proper9es   –  Aggregated  assay  ac9vity  summary   •  Only  consider  assays  with  IC50’s  in  nM  and  uncensored   data,  more  than  5  observa9ons  and  a  MAD  >  0   –  (Robust)  z-­‐scored  ac9vi9es  
  • 15. Some  Fragment  StaKsKcs  •  Considered  Z-­‐score  range  of  -­‐40  to  15  •  There  were  12,887  molecules  lying  outside   this  range   15 50 Number of compounds Percentage of assays 40 10 30 20 5 10 0 0 1.0 1.5 2.0 2.5 -40 -30 -20 -10 0 10 log(Number of molecules) Z-score
  • 16. Some  Fragment  StaKsKcs  •  Next,  iden9fy  fragments  with  8  to  20  atoms   and  occurring  in  100  to  900  molecules  •  Gives  us  1,746  fragments   40 Percentage of Fragments 30 20 10 0 200 400 600 800 Num Molecules
  • 17. Some  Fragment  StaKsKcs  •  We  can  query  the  fragment  tables  to  get   ac9vity  summaries     40169 64473 115654 for  individual     60 N = 1457 N = 1595 N = 1515 50 40 fragments   30 20 10 0•  For  these  examples   -20 0 5390 20 -40 -20 5486 0 20 -20 -10 13485 0 10 60 we  consider  the   Percent of Total N = 1489 N = 1578 N = 1455 50 40 30 full  range  of  Z-­‐   20 10 0 scores   60 -5 N = 1280 0 778 5 10 15 0 N = 1918 10 2723 20 -60 -40 N = 2641 -20 4058 0 20 50 40 30 20 10 0 -30 -20 -10 0 10 -600 -400 -200 0 -50 0 50 Z-Score
  • 18. Exploring  AcKvity  Profiles   Ac9vity  distribu9ons   of  parent  molecules    Fragments  from  ChEMBL   across  all  targets   Z-­‐scores  for  individual   molecules  against  a     specific  target  
  • 19. Exploring  AcKvity  Profiles  •  User  can  draw  a  molecule  and  fragment  on   the  fly  •  Use  generated   fragments  to     create     ac9vity     histograms  
  • 20. Target  SelecKon  •  Employs  the  ChEMBL   target  hierarchy  •  Can  select  target     families  or  individual   targets  
  • 21. Similar  Fragments  with  Similar  Profiles?  •  Consider  658  fragments  with  >  10  atoms  and   occurring  in  500  to  1200  molecules  •  Overall,  the  fragments   25 tend  to  be  dissimilar     20 –  95th  percen9le  is  just   Percentage of pairs 0.50   15•  1,873  pairs  do  exhibit   10 Tc  >  0.8   5   0 0.0 0.2 0.4 0.6 0.8 1.0 Tanimoto Similarity
  • 22. Comparing  AcKvity  Profiles  •  Compare  ac9vity  profiles  with  the  K-­‐S  sta9s9c  •  Color  corresponds  to     1.0 p-­‐value  of  the  K-­‐S  test   0.6 0.5•  No  obvious  correla9on   0.8 between  fragment   0.4 0.6 K-S statistic similarity  &  ac9vity   0.3 0.4 profile  similarity   0.2 0.2•  Probably  not  rigorous   0.1 when  a  scaffold  has  few   0.0 0.0 0.80 0.85 0.90 0.95 1.00 parent  molecules   Tanimoto Similarity
  • 23. Exploring  Profiles  for  Fragment  Pairs  •  Compare  ac9vity   distribu9ons  across   all  targets  in  a   pairwise  fashion  •  Can  also  generate   comparison  for  a   single  target,  but   requires  data  for  all   the  fragments  
  • 24. Looking  for  SelecKve  Fragments  •  Interes9ng  to  visually  explore  fragment  pairs  •  Can  become  tedious,  especially  in  a  database   as  big  as  ChEMBL  •  Can  we  automate  this  type  of  analysis?   –  Iden9fy  fragment  pairs  with  very  different  ac9vity   distribu9ons?   –  Iden9fy  fragments  with  a  preference  for  a  certain   target  (class)?  
  • 25. Mean Z−Score Ac −10 −5 0 et yl ch Ad olin re e ne re rg cep ic 3 re tor An ce gi pt 50 ot or 4056459 en si n Ag 6ge re c ne ce −r p 14 el AN tor at IO ed N class   IC 107 pe pt id C e target   6 re 1A ce pt C 2 C or ch C em C am 5 ok AT k C in ION e 19 XC re IC ch ce em pt 1 or ok in Cm e 19 re gc c C ept 1 YP or _1 C 1 3 YP B1 _ C 11B 6 YP 2 _1 8 C 9A1 YP C _1A 14 YP 2 _2 C C1 7 YP 9 _ C 2C 17 YP 9 _ 13 C 2D6 YP _ 20 C 3A4 YP C _4A 2 YP 1 _4 24 C A11 YP _ 2 C 4A3 YP D _ op C 4F 24 am YP 2 in _5 e 9 re A1 En ce pt 18 do or th el in dru 4 G re g nR ce H H p is 2 ta re tor m in cep e 2et re tor ab ce ot pt ro 1 or pi c M gl C M1 ut H 2 0A a re N ma cep N e t 1 eu uro e re tor ro k c pe inin ept o 1 pt id rec r e ep Y to 2 N r or ece r ep pt in o 10 ep r hr in 1 N e R 1H 59 N 3 R 3A 4 N 1 R 3A 4 O 2 pi NR oi d 3C 2 re 3 ce pt 4 or po PA 86 ta F ss Se iu m 3 ro to •  Count  number  of  parent  molecules  tested  against  the   ni So n S1 12 di r A um ece _h pto 42 yd r ro ge 7 n 153 Tk •  Evaluate  mean  ac9vity  of  parent  molecules  within  a  target   •  Selec9vity  of  1-­‐phenylimidazole  for  CYP450  has  been  noted  Wilkinson  et  al,  Biochem  Pharmacol,  1983,  32,  997-­‐1003   Targetwise  AcKvity  Profiles  
  • 26. Mean Z−Score −8 −6 −4 −2 0 2 Ad re n er g ic A2 5 re A An ce pt 2 gi or 4055899 ot e Br nsin Ag 23 ad c yk rec in ep in t 7 al re or ci ce um pt 6 se or ns in C g 7 1A re ce pt 24 or C C C ch am e C 2C ho mo ATI k le k O cy ine N IC 67 st ok rec in ep in t 102 re or ce pt 6 or C m C g 18 YP c _2 C D 3 YP 6 _3 D 8 op Do A4 am pa in m e in 11 r e ED ece En G pt 19 do re or th ce el G in pt o 16 lu ca rec r go ep n to 2 G re r nR ce H H pt 1 is re or ta Le min cep e to 16 uk ot r r rie ece ne pt 49 re or ce pt 1 or M 10 3ro A pi c M C M1 gl H 2 ut 2B am rec N a ep t 33N eu te eu ro re or ro ki ce pe nin pt 18 r or pt id ece e pt Y 118 r or N or ece ep pt o 1 in ep r hr in 1 e N R 1I 4 O 1 pi NR oi 3C d 2 re 4 ce pt 11 or •  But  reported  as  dopamine  agonists   O Pr th 8 os er ta no id PA 3 re F ce pt 28 or R 5 eg S1 38 A S2 with  preference  for  a  specific  target  class   7 1 •  Iden9fied  benzylpyrrolidine  as  a  fragment   Se S9 45 ro Se A t on roto in ni 4 re n ce pt 9 or 29 Tk Tk 2 l Targetwise  AcKvity  Profiles  
  • 27. Fragment  or  Scaffold?  •  I’ve  been  using  fragment  &  scaffold   interchangeably  –  not  always  true  •  Chemists  have  an  intui9ve  idea  of  what  a   scaffold  is  •  Can  we  encode  the  idea  of  scaffold-­‐like  or   fragment-­‐like  •  We  use  the  concept  of     Size  of  fragment   Signal-­‐to-­‐Noise     µ SD  of  number  of  atoms   Ra9o   SNR = not  in  the  fragment,     ! considered  over  the     parent  molecules  
  • 28. Fragment  or  Scaffold  •  Par9al  distribu9on  of  SNR  values  for  fragments   with  atom  count  >  8  &  <  20   60 50 Percentage of Fragments 40 30 20 10 0 0 1 2 3 4 5 6 SNR
  • 29. Fragment  or  Scaffold  •  Large  SNR’s  associated  with  Murcko-­‐like  fragments  •  A  useful  SNR  cutoff  is  an  open  ques9on   SNR  =  8.50   SNR  =  9.10   SNR  =  12.09   SNR  =  0.83   SNR  =  0.43   SNR  =  0.36  
  • 30. AcKvity  Profiles  &  SNR  •  Given  a  fragment,  evaluate  SD  of  the  number  of   atoms  in  the  parent  molecules  that  are  not  part   of  the  fragment  •  Label  the  parent  molecules  based  on     –  If  number  of  atoms  not  in  the  fragment  >  SD,  non   core-­‐like   –  Otherwise  core-­‐like  •  Visualize  the  ac9vity  distribu9ons  of  the  parent   molecules,  grouped  by  the  label    
  • 31. AcKvity  Profiles  &  SNR   -50 0 50 -50 0 50 20967 20967 44591 44591 Core-like Not core-like Core-like Not core-likePercentage of Total 80 60 40 20 -50 0 50 -50 0 50 High  SNR   Z-Score -30 -20 -10 0 10 -30 -20 -10 0 10 801 801 68604 68604 Core-like Not core-like Core-like Not core-likePercentage of Total 80 60 40 20 Low  SNR   -30 -20 -10 0 10 -30 -20 -10 0 10 Z-Score
  • 32. Downloads  •  Scaffold  ac9vity  networks  •  Fragment  Ac9vity  Profiler   –  SQL  &  servlet  sources   –  Client  sources   –  Online  version