Your SlideShare is downloading. ×
0
Smashing	  Molecules	  How	  Molecular	  Fragments	  Allow	  us	  to	  Explore	  Large	                                   ...
Outline	  •  Fragments	  as	  the	  building	  blocks	  of	  chemistry	  •  Fragments	  and	  SAR	  •  Fragments	  and	  a...
Big	  Data	  for	  Some	  Problems	             •  Halevy	  et	  al	  discuss	  the	  effec9veness	  of	                ext...
Google	  Scale	  in	  Chemistry?	             •  What	  would	  be	  the	  equivalent	  of	  an	  n-­‐gram	               ...
Fragment	  Diversity	  •  Consider	  a	  set	  of	  bioac9ves	  such	  as	  the	  LOPAC	     collec9on,	  1280	  compounds...
Fragment	  Diversity	         6                               All	  fragments	              4                             ...
What	  Do	  We	  Do	  with	  Fragments?	     •  Assuming	  we	  obtain	  fragments	  from	  a	  large	        enough	  col...
Scaffold	  AcKvity	  Diagrams	  •  Network	  oriented	  view	  of	  fragment	  (scaffold)	     collec9ons	      –  Similar	 ...
What	  Makes	  a	  Good	  Scaffold?	  •  What	  makes	  a	  good	     scaffold?	      –  Size,	  complexity,	  …	      –  Do...
Scaffold	  QSAR	                                                                               Fit	  PLS	  or	  ridge	     ...
Scaffold	  QSAR	  -­‐	  Drawbacks	  •  Many	  scaffolds	  have	  few	  (5	  to	  10)	  members	  •  Invariably,	  more	  fea...
Fragment	  AcKvity	  Profiles	  •  Using	  scaffolds	  in	  HTS	  triage	  usually	  leads	  to	     two	  ques9ons	      – ...
Fragment	  AcKvity	  Profiles	  •  We	  use	  ChEMBL	  (08)	  as	  the	  source	  of	     bioac9vity	  across	  mul9ple	  t...
Database	  Setup	  •  Preprocessing	  steps	  available	  as	  a	  Java	  servlet	      –  hkp://tripod.nih.gov/files/chemb...
Some	  Fragment	  StaKsKcs	  •  Considered	  Z-­‐score	  range	  of	  -­‐40	  to	  15	  •  There	  were	  12,887	  molecul...
Some	  Fragment	  StaKsKcs	  •  Next,	  iden9fy	  fragments	  with	  8	  to	  20	  atoms	     and	  occurring	  in	  100	 ...
Some	  Fragment	  StaKsKcs	  •  We	  can	  query	  the	  fragment	  tables	  to	  get	     ac9vity	  summaries	  	        ...
Exploring	  AcKvity	  Profiles	                                    Ac9vity	  distribu9ons	                                 ...
Exploring	  AcKvity	  Profiles	  •  User	  can	  draw	  a	  molecule	  and	  fragment	  on	     the	  fly	  •  Use	  generat...
Target	  SelecKon	  •  Employs	  the	  ChEMBL	     target	  hierarchy	  •  Can	  select	  target	  	     families	  or	  i...
Similar	  Fragments	  with	  Similar	  Profiles?	  •  Consider	  658	  fragments	  with	  >	  10	  atoms	  and	     occurri...
Comparing	  AcKvity	  Profiles	  •  Compare	  ac9vity	  profiles	  with	  the	  K-­‐S	  sta9s9c	  •  Color	  corresponds	  t...
Exploring	  Profiles	  for	  Fragment	  Pairs	  •  Compare	  ac9vity	     distribu9ons	  across	     all	  targets	  in	  a...
Looking	  for	  SelecKve	  Fragments	  •  Interes9ng	  to	  visually	  explore	  fragment	  pairs	  •  Can	  become	  tedi...
Mean Z−Score                                                                                    Ac                     −10...
Mean Z−Score                                                −8             −6   −4         −2   0   2       Ad           r...
Fragment	  or	  Scaffold?	  •  I’ve	  been	  using	  fragment	  &	  scaffold	     interchangeably	  –	  not	  always	  true	...
Fragment	  or	  Scaffold	  •  Par9al	  distribu9on	  of	  SNR	  values	  for	  fragments	     with	  atom	  count	  >	  8	 ...
Fragment	  or	  Scaffold	  •  Large	  SNR’s	  associated	  with	  Murcko-­‐like	  fragments	  •  A	  useful	  SNR	  cutoff	 ...
AcKvity	  Profiles	  &	  SNR	  •  Given	  a	  fragment,	  evaluate	  SD	  of	  the	  number	  of	     atoms	  in	  the	  pa...
AcKvity	  Profiles	  &	  SNR	                                                                       -50         0          ...
Downloads	  •  Scaffold	  ac9vity	  networks	  •  Fragment	  Ac9vity	  Profiler	     –  SQL	  &	  servlet	  sources	     –  ...
Smashing Molecules
Upcoming SlideShare
Loading in...5
×

Smashing Molecules

673

Published on

Published in: Technology, Education
0 Comments
2 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
673
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
10
Comments
0
Likes
2
Embeds 0
No embeds

No notes for slide

Transcript of "Smashing Molecules"

  1. 1. Smashing  Molecules  How  Molecular  Fragments  Allow  us  to  Explore  Large   Chemical  Spaces   Rajarshi  Guha  &  Trung  Nguyen   NIH  Center  for     Transla9onal  Therapeu9cs     Chemaxon  UGM   September  2011  
  2. 2. Outline  •  Fragments  as  the  building  blocks  of  chemistry  •  Fragments  and  SAR  •  Fragments  and  ac9vity  profiles  
  3. 3. Big  Data  for  Some  Problems   •  Halevy  et  al  discuss  the  effec9veness  of   extremely  large  datasets   •  Their  applica9on  focuses  on  machine   transla9on  –  see  the  Google  n-­‐gram  corpus   •  They  suggest  that  such  extremely  large  datasets   are  useful  because  they  effec9vely  encompass   all  n-­‐grams  (phrases)  commonly  used   •  Domain  is  rela9vely  constrained  Halevy  et  al,  IEEE  Intelligent  Systems,  2009,  24,  8-­‐12  
  4. 4. Google  Scale  in  Chemistry?   •  What  would  be  the  equivalent  of  an  n-­‐gram   corpus  in  chemistry?   –  Fragments   –  A  more  direct  analogy  can  be  made  by  using  LINGO’s   •  It  is  possible  to  generate  arbitrarily  large  (virtual)   compound  and    fragment  collec9ons   •  But  would  such  a  collec9on  span  all  of   “commonly  used”  chemistry?   –  Depending  on  the  ini9al  compound  set,  yes   –  But  we’re  also  interested  in  going  beyond  such  a   “commonly  used”  set  Fink  T,  Reymond  JL,  J  Chem  Inf  Model,  2007,  47,  342  
  5. 5. Fragment  Diversity  •  Consider  a  set  of  bioac9ves  such  as  the  LOPAC   collec9on,  1280  compounds  •  Using  exhaus9ve     fragmenta9on  we  get     40 2,460  unique  fragments   Percent of Total 30•  On  the  MLSMR     (~  372K  compounds),     20 we  get    164,583     10 fragments   0 0 1 2 3 4 log Fragment Frequency
  6. 6. Fragment  Diversity   6 All  fragments   4 Fragments  occurring  in     5  to  50  molecules   4 2 2PC 2 0 PC 2 0 -2 -2 -4 -4 -4 -2 0 2 -4 -2 0 2 4 PC 1 PC 1 •  Distribu9on  of  MLSMR  fragments  in  BCUT  space  
  7. 7. What  Do  We  Do  with  Fragments?   •  Assuming  we  obtain  fragments  from  a  large   enough  collec9on  what  do  we  do?   –  Learning  from  fragments  –  QSARs,  genera9ve   models   –  Use  fragments  as     filters,  alterna9ve     to  clustering   –  Explore  chemotypes   and  ac9vity   –  Scaffold  level  promiscuity  White,  D  and  Wilson,  RC,  J  Chem  Inf  Model,  2010,  50,  1257-­‐1274  
  8. 8. Scaffold  AcKvity  Diagrams  •  Network  oriented  view  of  fragment  (scaffold)   collec9ons   –  Similar  in  idea  to   Scaffold  Hunter  etc   –  Not  purely  hierarchical  •  Color  by  arbitrary     proper9es  •  Quickly  assess  u9lity   of  a  scaffold  •  Try  it  online    
  9. 9. What  Makes  a  Good  Scaffold?  •  What  makes  a  good   scaffold?   –  Size,  complexity,  …   –  Do  the  members   represent  an  SAR  or  not?   –  Intui9on  and  experience   also  play  a  role  
  10. 10. Scaffold  QSAR   Fit  PLS  or  ridge   regression  model   0 ! ! !! ! !2 ! ! ! ! Predicted ! !4 ! ! !! !Evaluate  topological     ! !and  physicochemical     ! !6descriptors  for  the     ! !R-­‐groups   !8 Characterize  the     !8 !6 !4 !2 0 Observed SAR  landscape  
  11. 11. Scaffold  QSAR  -­‐  Drawbacks  •  Many  scaffolds  have  few  (5  to  10)  members  •  Invariably,  more  features  than  observa9ons  •  If  the  number  of  R-­‐groups  is  large,  the  feature   matrix  can  be  very  sparse   –  Less  of  a  problem  for  combinatorial  libraries  •  A  linear  fit  may  not  be  the  best  approach  to   correla9ng  R-­‐groups  to  the  ac9vi9es   –  Difficult  to  choose  a  model  type  a  priori  
  12. 12. Fragment  AcKvity  Profiles  •  Using  scaffolds  in  HTS  triage  usually  leads  to   two  ques9ons   –  What  is  known  about  the  chemical  series  with   respect  to  the  intended  target?   –  What  compound  classes  are  known  to  modulate   the  intended  target  &  how  similar  are  they  to   series  in  ques9on  •  We’re  interested  in  exploring  summaries  of   ac=vity,  grouped  by  scaffolds  and  targets  
  13. 13. Fragment  AcKvity  Profiles  •  We  use  ChEMBL  (08)  as  the  source  of   bioac9vity  across  mul9ple  targets  •  Preprocess  the  database   –  Generate  scaffolds  (exhaus9ve  enumera9on  of   combina9ons  of  SSSR’s)   –  Normalize  ac9vity  data  so  that  we  compare  the   ac9vity  of  a  molecule  across  different  assays  
  14. 14. Database  Setup  •  Preprocessing  steps  available  as  a  Java  servlet   –  hkp://tripod.nih.gov/files/chembl-­‐servlets.zip  •  Need  ChEMBL  installed  in  Oracle;  we  add   some  extra  tables   –  Fragment  structures  and  computed  proper9es   –  Aggregated  assay  ac9vity  summary   •  Only  consider  assays  with  IC50’s  in  nM  and  uncensored   data,  more  than  5  observa9ons  and  a  MAD  >  0   –  (Robust)  z-­‐scored  ac9vi9es  
  15. 15. Some  Fragment  StaKsKcs  •  Considered  Z-­‐score  range  of  -­‐40  to  15  •  There  were  12,887  molecules  lying  outside   this  range   15 50 Number of compounds Percentage of assays 40 10 30 20 5 10 0 0 1.0 1.5 2.0 2.5 -40 -30 -20 -10 0 10 log(Number of molecules) Z-score
  16. 16. Some  Fragment  StaKsKcs  •  Next,  iden9fy  fragments  with  8  to  20  atoms   and  occurring  in  100  to  900  molecules  •  Gives  us  1,746  fragments   40 Percentage of Fragments 30 20 10 0 200 400 600 800 Num Molecules
  17. 17. Some  Fragment  StaKsKcs  •  We  can  query  the  fragment  tables  to  get   ac9vity  summaries     40169 64473 115654 for  individual     60 N = 1457 N = 1595 N = 1515 50 40 fragments   30 20 10 0•  For  these  examples   -20 0 5390 20 -40 -20 5486 0 20 -20 -10 13485 0 10 60 we  consider  the   Percent of Total N = 1489 N = 1578 N = 1455 50 40 30 full  range  of  Z-­‐   20 10 0 scores   60 -5 N = 1280 0 778 5 10 15 0 N = 1918 10 2723 20 -60 -40 N = 2641 -20 4058 0 20 50 40 30 20 10 0 -30 -20 -10 0 10 -600 -400 -200 0 -50 0 50 Z-Score
  18. 18. Exploring  AcKvity  Profiles   Ac9vity  distribu9ons   of  parent  molecules    Fragments  from  ChEMBL   across  all  targets   Z-­‐scores  for  individual   molecules  against  a     specific  target  
  19. 19. Exploring  AcKvity  Profiles  •  User  can  draw  a  molecule  and  fragment  on   the  fly  •  Use  generated   fragments  to     create     ac9vity     histograms  
  20. 20. Target  SelecKon  •  Employs  the  ChEMBL   target  hierarchy  •  Can  select  target     families  or  individual   targets  
  21. 21. Similar  Fragments  with  Similar  Profiles?  •  Consider  658  fragments  with  >  10  atoms  and   occurring  in  500  to  1200  molecules  •  Overall,  the  fragments   25 tend  to  be  dissimilar     20 –  95th  percen9le  is  just   Percentage of pairs 0.50   15•  1,873  pairs  do  exhibit   10 Tc  >  0.8   5   0 0.0 0.2 0.4 0.6 0.8 1.0 Tanimoto Similarity
  22. 22. Comparing  AcKvity  Profiles  •  Compare  ac9vity  profiles  with  the  K-­‐S  sta9s9c  •  Color  corresponds  to     1.0 p-­‐value  of  the  K-­‐S  test   0.6 0.5•  No  obvious  correla9on   0.8 between  fragment   0.4 0.6 K-S statistic similarity  &  ac9vity   0.3 0.4 profile  similarity   0.2 0.2•  Probably  not  rigorous   0.1 when  a  scaffold  has  few   0.0 0.0 0.80 0.85 0.90 0.95 1.00 parent  molecules   Tanimoto Similarity
  23. 23. Exploring  Profiles  for  Fragment  Pairs  •  Compare  ac9vity   distribu9ons  across   all  targets  in  a   pairwise  fashion  •  Can  also  generate   comparison  for  a   single  target,  but   requires  data  for  all   the  fragments  
  24. 24. Looking  for  SelecKve  Fragments  •  Interes9ng  to  visually  explore  fragment  pairs  •  Can  become  tedious,  especially  in  a  database   as  big  as  ChEMBL  •  Can  we  automate  this  type  of  analysis?   –  Iden9fy  fragment  pairs  with  very  different  ac9vity   distribu9ons?   –  Iden9fy  fragments  with  a  preference  for  a  certain   target  (class)?  
  25. 25. Mean Z−Score Ac −10 −5 0 et yl ch Ad olin re e ne re rg cep ic 3 re tor An ce gi pt 50 ot or 4056459 en si n Ag 6ge re c ne ce −r p 14 el AN tor at IO ed N class   IC 107 pe pt id C e target   6 re 1A ce pt C 2 C or ch C em C am 5 ok AT k C in ION e 19 XC re IC ch ce em pt 1 or ok in Cm e 19 re gc c C ept 1 YP or _1 C 1 3 YP B1 _ C 11B 6 YP 2 _1 8 C 9A1 YP C _1A 14 YP 2 _2 C C1 7 YP 9 _ C 2C 17 YP 9 _ 13 C 2D6 YP _ 20 C 3A4 YP C _4A 2 YP 1 _4 24 C A11 YP _ 2 C 4A3 YP D _ op C 4F 24 am YP 2 in _5 e 9 re A1 En ce pt 18 do or th el in dru 4 G re g nR ce H H p is 2 ta re tor m in cep e 2et re tor ab ce ot pt ro 1 or pi c M gl C M1 ut H 2 0A a re N ma cep N e t 1 eu uro e re tor ro k c pe inin ept o 1 pt id rec r e ep Y to 2 N r or ece r ep pt in o 10 ep r hr in 1 N e R 1H 59 N 3 R 3A 4 N 1 R 3A 4 O 2 pi NR oi d 3C 2 re 3 ce pt 4 or po PA 86 ta F ss Se iu m 3 ro to •  Count  number  of  parent  molecules  tested  against  the   ni So n S1 12 di r A um ece _h pto 42 yd r ro ge 7 n 153 Tk •  Evaluate  mean  ac9vity  of  parent  molecules  within  a  target   •  Selec9vity  of  1-­‐phenylimidazole  for  CYP450  has  been  noted  Wilkinson  et  al,  Biochem  Pharmacol,  1983,  32,  997-­‐1003   Targetwise  AcKvity  Profiles  
  26. 26. Mean Z−Score −8 −6 −4 −2 0 2 Ad re n er g ic A2 5 re A An ce pt 2 gi or 4055899 ot e Br nsin Ag 23 ad c yk rec in ep in t 7 al re or ci ce um pt 6 se or ns in C g 7 1A re ce pt 24 or C C C ch am e C 2C ho mo ATI k le k O cy ine N IC 67 st ok rec in ep in t 102 re or ce pt 6 or C m C g 18 YP c _2 C D 3 YP 6 _3 D 8 op Do A4 am pa in m e in 11 r e ED ece En G pt 19 do re or th ce el G in pt o 16 lu ca rec r go ep n to 2 G re r nR ce H H pt 1 is re or ta Le min cep e to 16 uk ot r r rie ece ne pt 49 re or ce pt 1 or M 10 3ro A pi c M C M1 gl H 2 ut 2B am rec N a ep t 33N eu te eu ro re or ro ki ce pe nin pt 18 r or pt id ece e pt Y 118 r or N or ece ep pt o 1 in ep r hr in 1 e N R 1I 4 O 1 pi NR oi 3C d 2 re 4 ce pt 11 or •  But  reported  as  dopamine  agonists   O Pr th 8 os er ta no id PA 3 re F ce pt 28 or R 5 eg S1 38 A S2 with  preference  for  a  specific  target  class   7 1 •  Iden9fied  benzylpyrrolidine  as  a  fragment   Se S9 45 ro Se A t on roto in ni 4 re n ce pt 9 or 29 Tk Tk 2 l Targetwise  AcKvity  Profiles  
  27. 27. Fragment  or  Scaffold?  •  I’ve  been  using  fragment  &  scaffold   interchangeably  –  not  always  true  •  Chemists  have  an  intui9ve  idea  of  what  a   scaffold  is  •  Can  we  encode  the  idea  of  scaffold-­‐like  or   fragment-­‐like  •  We  use  the  concept  of     Size  of  fragment   Signal-­‐to-­‐Noise     µ SD  of  number  of  atoms   Ra9o   SNR = not  in  the  fragment,     ! considered  over  the     parent  molecules  
  28. 28. Fragment  or  Scaffold  •  Par9al  distribu9on  of  SNR  values  for  fragments   with  atom  count  >  8  &  <  20   60 50 Percentage of Fragments 40 30 20 10 0 0 1 2 3 4 5 6 SNR
  29. 29. Fragment  or  Scaffold  •  Large  SNR’s  associated  with  Murcko-­‐like  fragments  •  A  useful  SNR  cutoff  is  an  open  ques9on   SNR  =  8.50   SNR  =  9.10   SNR  =  12.09   SNR  =  0.83   SNR  =  0.43   SNR  =  0.36  
  30. 30. AcKvity  Profiles  &  SNR  •  Given  a  fragment,  evaluate  SD  of  the  number  of   atoms  in  the  parent  molecules  that  are  not  part   of  the  fragment  •  Label  the  parent  molecules  based  on     –  If  number  of  atoms  not  in  the  fragment  >  SD,  non   core-­‐like   –  Otherwise  core-­‐like  •  Visualize  the  ac9vity  distribu9ons  of  the  parent   molecules,  grouped  by  the  label    
  31. 31. AcKvity  Profiles  &  SNR   -50 0 50 -50 0 50 20967 20967 44591 44591 Core-like Not core-like Core-like Not core-likePercentage of Total 80 60 40 20 -50 0 50 -50 0 50 High  SNR   Z-Score -30 -20 -10 0 10 -30 -20 -10 0 10 801 801 68604 68604 Core-like Not core-like Core-like Not core-likePercentage of Total 80 60 40 20 Low  SNR   -30 -20 -10 0 10 -30 -20 -10 0 10 Z-Score
  32. 32. Downloads  •  Scaffold  ac9vity  networks  •  Fragment  Ac9vity  Profiler   –  SQL  &  servlet  sources   –  Client  sources   –  Online  version  
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.

×