Blum

580 views

Published on

Published in: Technology
0 Comments
1 Like
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
580
On SlideShare
0
From Embeds
0
Number of Embeds
1
Actions
Shares
0
Downloads
2
Comments
0
Likes
1
Embeds 0
No embeds

No notes for slide

Blum

  1. 1.   When  Bayes  meets  Darwin:  a   journey  in  popula6on  genomics       michael.blum@imag.fr   Laboratoire  TIMC-­‐IMAG,  Grenoble    
  2. 2. In   the   “descent   of   man”,   Darwin   concluded  that  the  visual  differences     between   human   popula6on   were   not   adap6ve  to  any  significant  degree  […]   “Natural  selec,on  has  almost  become   irrelevant   in   human   evolu,on.   There's   been   no   biological   change   in   humans   in  40,000  or  50,000  years”     Stephen  J.  Gould  
  3. 3. But  here  is  a  counter-­‐example   •  Tibetan   popula6ons   got   adapted   to   their   high-­‐al6tude   and   low-­‐oxygen   environment   thanks   to   increased   respiratory   rate  and  increased  blood  flow.   •  These   traits   are   transmiTed   from   genera6on   to   genera6on.   •  Tibetan  plateau  has  been  inhabited  since  ~  20,000  years.  
  4. 4. Local  adapta6on   •  Human   adapta6on   to   high-­‐al6tude   is   an   instance   of   local   adapta6on.   •  Understanding   how   individuals   adapt   to   their   local   environment   is   central   in   biology.   Plants   adapt   to   their   environment,  bacteria  adapt  to  an6bio6cs…   •  Defini6on   of   local   adapta6on:   greater   fitness   (a   measure   of   reproduc6ve  fitness)  of  individuals  in  their  local  habitats  due   to  natural  selec6on.   How  to  find  genomic  regions   involved  in  local  adapta6on?  
  5. 5. Data  descrip6on  
  6. 6. Single  Nucleo6de  Polymorphism  (SNP)   Indiv  1                                                                          ....ACCCG……….                                                                                  ....AACCG……….     Number  of  copy            1          0   Indiv  2                    ….ACCCT……….                              ….ACCCT……….       Number  of  copy            0          2   -­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐   •  3  billion  base  pairs  in  the  human  genome   •  Commercial  SNP  chips,  100€  for  500,000  SNPs   •  dbSNP  >106  SNPS    
  7. 7. Single  Nucleo6de  Polymorphism  (SNP)   Data  matrix  Y     Locus  1     Locus  2     Locus  3     Indiv  1   1   0   2   Indiv  2   0   2   0   Indiv  3   0   0   0   Indiv  4   0   1   1   Indiv  5   1   1   1  
  8. 8. Main  principle  of  popula6on  genomics     •  Genome-­‐wide  paTerns  are  influenced  by   neutral  processes.   Migra6on,  admixture,  expansion   •  Genes  involved  in  local  adapta6on  are   outliers.      
  9. 9. Adapta6on  to  al6tude   Manha?an  plot   Xu  et  al.  MBE  2011  
  10. 10. Human  HGDP  data  
  11. 11. Genome-­‐wide  paTerns  
  12. 12. 30 Principal  component  analysis   Africa America Oceania 10 Asia 0 PC2 20 Middle-East Europe East Asia 0 10 PC1 20
  13. 13. Principal  component  analysis   Novembre  et  al.   Nature  2008  
  14. 14. Genome  scan  for  local  adapta6on:  a   Bayesian  PCA  approach  
  15. 15.   Singular  Value  Decomposi6on  (SVD)   viewpoint  of  PCA     In  matrix  nota6on,  we  have   Y = UV, where  Y  is  the  genotype  (n,p)  matrix,  U  is  the  (n,K)   score  matrix  and  V  is  the  loadings  (K,p)  matrix.     Varia6ons  around  SVD  in  machine  learning   matrix  factoriza,on,  low-­‐rank  approxima,on,   probabilis,c  PCA,  factor  analysis,…  
  16. 16.   Singular  Value  Decomposi6on    (SVD)   viewpoint  of  PCA     An  op6mal  approxima6on  of  rank  K  for  the  matrix  of   genotypes  Y       K Yi = ∑ u V k i k k=1 Yi:  Genotype  of  the  ith  individual   (0,1,1,2,0,0,…..)   k,1 k,2 k,3 Vk:  vector  of  loadings    (v , v , v ,...) of  the  same  length  as  Yi  
  17. 17. Bayesian  principal     component  analysis     •  A  probabilis6c  version  of  PCA                Tipping  and  Bishop  1999   K Yi = ∑ u V + εi . k i k k=1 •  The  variance-­‐infla6on  model  for  outlier  detec6on            Box  and  Tiao  1968     p(v j ) = (1− π ) Ν(0,σ 2 ) + π Ν(0,c 2σ 2 ), where  π  is  the  genome-­‐wide  outlier  probability,   and  the  prior  for  c2  is  uniform(1,c2max).  
  18. 18. Accoun6ng  for  local  correla6on  in  the   genome   Local  correla6on  because  of  recombina6on   Ising  model  (Outlier  Zj=1,  non-­‐outlier  Zj=0)   P(Z j = 1) ∝ π exp(β.∑ Z k ), where  β>0  is  an  hyperparameter.     k ~j
  19. 19. A  hierarchical  Bayesian  model   Gibbs  sampler  for  sampling  the  posterior   π   β   σ   Z   K   U   V   Y   c   σ0   cmax  
  20. 20. Low-­‐rank  approxima6on  for  outlier   detec6on  in  video  sequences  
  21. 21. Bayesian  scores  for  detec6ng  outliers   •  Bayes  factors:  a  Bayesian  alterna6ve  to  P-­‐values   BF = P(Y j outlier) / P(Y j non − outlier) •  Posterior  odds   P(outlier Y j ) / P(non − outlier Y j ) = prior.odds * BF •  For  any  list  of  outlier  SNPs,  a  false  discovery  rate   can  be  es6mated  based  on  posterior  odds.  
  22. 22. Ex  1:  a  simula6on  study  in  a     divergence  model     Neutral  divergence  (ms)   Divergence  with  selec6on  (SimuPOP)   4%  out  of  10,000  SNPs  under  selec6on    
  23. 23. Other  methods  for  genome  scan  of   local  adapta6on   •  Fst    A    measure  of  differen6a6on  between  popula6ons     •  BayeScan  (Foll  and  Gaggios  2008)   •  Both  methods  assume  (implicitely  or  explicitely)  a  mechanis6c   model  of  instantaneous  divergence  
  24. 24. Popula6on  structure   PC2 Neutral   Adap6ve  
  25. 25. Selec6on  scan   0 2 log10(BF) 4 6 8 PC 1 PC 2 PC 3 0 2000 4000 6000 SNP 8000 10000
  26. 26. Comparing  methods  of     selec6on  scan   0.6 Advantage  of  non-­‐parametric  methods  in  data-­‐rich  situa6ons     BayeScan Fst 0.1 0.2 0.3 0.4 T   0.0 False discovery rate 0.5 PCAdapt 0.01 0.02 0.03 Divergence time 0.04 0.05
  27. 27. Ex  2:  a  spa6ally-­‐explicit     simula6on   with  a  gradient  of  selec6on   0.5 0.5 1.5 2 1 0.5 0 0. 5
  28. 28. Popula6on  structure   PC 1 PC 2 PC 3 0.5 0.5 1 0 1.5 1 0.5 0.5 1.5 0 1 1 2 1.5 1 0.5 0 0. 5
  29. 29. Selec6on  scan   150 100 50 0 log10(BF) 200 250 PC 1 PC 2 PC 3 0 500 1000 SNP 1500 2000
  30. 30. 30 Applica6on  to  the  human     HGDP  data   Africa Americas Oceania 10 Asia 0 PC2 20 Middle-East Europe East Asia 0 10 PC1 20
  31. 31. ManhaTan  plot     Top  hit  is  in  chromosome  16   4 ABCC11 PC2 PC3 0 2 3 PC4 0e+00 2e+07 4e+07 6e+07 Physical position 8e+07
  32. 32. Geographic  distribu6on  of  the  top-­‐SNP     Involved  in  earwax  type   (cerumen)  and  transpira6on  
  33. 33. Enrichment  analysis   30 Are  PC2  outliers  enriched  for  genes  involved  in  immunity?   Africa Americas Oceania 10 Asia 0 PC2 20 Middle-East Europe East Asia 0 10 PC1 20
  34. 34. Big  data   What  can  you  do  with   millions  of  SNPs?   Scalable  Bayesian   computa6on?   Standard  PCA   and  permuta6on  tests.  
  35. 35. A  George  Box  (1919-­‐2013)  story  to   conclude   •  Box  wanted  to  write  a  paper  with  Cox  because  having  a  Box   and  Cox  paper  would  be  fun.   •  They  decided  to  write  a  paper  on  transforma6on.   •  One   author   wrote   the   Bayesian   version   and   the   other   one   wrote  the  maximum  likelihood  version.  We  do  not  know  who   wrote  what.   •  At  the  end,  it  did  not  make  much  prac6cal  difference.  
  36. 36. Nicolas  Duforet-­‐Frebourg  
  37. 37. Spa6al  autocorrela6on  explains  the   PCA  paTern  
  38. 38. 0.160 0.165 0.170 0.175 0.180 Mean squared error 0.185 Choice  of  K   2 4 6 8 K 10 12
  39. 39. 1.0 Robustness  w.r.t.  the    choice  of  K   0.6 0.2 0.4 K=2 K>2 0.0 False discovery rate 0.8 K=1 0.01 0.02 0.03 Divergence time 0.04 0.05

×