Mixed	
  linear	
  model	
  analyses	
  of	
  
human	
  complex	
  traits	
  using	
  SNP	
  
data	
  	
  
Jian	
  Yang	
 ...
Why	
  do	
  we	
  need	
  a	
  mixed	
  linear	
  
model?	
  
Linear	
  model	
  
•  y	
  =	
  b0	
  +	
  x1b1	
  +	
  x2b2	
  +	
  …	
  +	
  xpbp	
  +	
  e	
  
	
  
y	
  =	
  phenotyp...
Linear	
  model	
  
•  In	
  matrix	
  form	
  
y	
  =	
  Xb	
  +	
  e	
  
y	
  =	
  {yj}n	
  x	
  1;	
  X	
  =	
  {Xij}n	...
Special	
  cases	
  
•  Simple	
  regression	
  
y	
  =	
  b0	
  +	
  x1b1	
  +	
  e	
  
b1-­‐hat	
  =	
  b-­‐hat	
  =	
  ...
Limita1ons	
  
•  n	
  >	
  p:	
  sample	
  size	
  needs	
  to	
  be	
  >	
  than	
  the	
  
number	
  of	
  parameters	
...
What	
  is	
  a	
  mixed	
  linear	
  model	
  (MLM)?	
  
•  y	
  =	
  Xb	
  +	
  Zu	
  +	
  e	
  
Fixed	
  effects:	
  b	
...
Parameter	
  es1ma1on	
  
•  Es1ma1on	
  of	
  variance	
  components	
  (σ2
u)	
  	
  
	
  logL	
  =	
  -­‐1/2(log|V|	
  ...
MLM	
  analysis	
  of	
  human	
  complex	
  traits	
  
•  Animal	
  and	
  plant	
  breeding	
  
–  predic1ng	
  breeding...
Background	
  
Mendelian	
  traits	
   Complex	
  traits	
  
Cys1c	
  fibrosis	
  
Human	
  height	
  
Schizophrenia	
  
Obesity	
  
Major	
  ques1ons	
  
•  Are	
  these	
  traits	
  heritable?	
  
•  If	
  so,	
  what	
  is	
  the	
  heritability?	
  
•...
Risk	
  of	
  schizophrenia	
  (%)	
  
13	
  
Resemblance	
  between	
  twins	
  for	
  human	
  height	
  
Heritability	
...
14	
  
8	
  genes	
  for	
  human	
  complex	
  traits	
  before	
  2002	
  
Glazier	
  et	
  al.	
  2002	
  Science	
  
I...
15	
  
Genome-­‐wide	
  AssociaBon	
  Study	
  (GWAS)	
  
Manolio	
  2010	
  NEJM	
  
Genome-­‐wide	
  threshold	
  P	
  =...
16	
  
An	
  explosion	
  of	
  gene	
  discoveries	
  
~5000	
  geneBc	
  variants	
  associated	
  with	
  ~650	
  trait...
 
	
  
	
  
	
  
	
  
	
  
	
  
	
  
	
  
	
  
	
  
	
  
	
  
	
  
	
  
	
  
	
  
	
  
	
  
	
  
	
  
	
  
Height:	
  
•  ...
Fiwng	
  all	
  SNPs	
  in	
  a	
  MLM	
  
•  y	
  =	
  Wu	
  +	
  e	
  
	
  W	
  =	
  {wij}n	
  x	
  m,	
  wij	
  =	
  st...
Family	
  studies:	
  comparing	
  phenotypic	
  similarity	
  to	
  family	
  relatedness	
  	
  
–	
  Our	
  method:	
  ...
20	
  
GWAS	
  	
  vs	
  All-­‐SNP	
  esBmaBon	
  
Yang	
  et	
  al.	
  2011	
  Nat	
  Genet	
  
Lee	
  et	
  al.	
  2012	...
Genome	
  par11oning	
  
•  Single	
  component	
  MLM	
  
y	
  =	
  g	
  +	
  e	
  (or	
  y	
  =	
  Wu	
  +	
  e)	
  
	
 ...
1"
2"
3"4"
5"
6"
7"
8"
9"
10"
11"12"
13"
14"
15"
16"
17"
18"
19"
20"
21"22"
0.00"
0.01"
0.02"
0.03"
0" 50" 100" 150" 200" ...
ParBBoning	
  the	
  geneBc	
  variance	
  based	
  on	
  
funcBonal	
  annotaBon	
  	
  
23	
  
GeneBc	
  signals	
  are	...
More	
  …	
  
•  Bivariate	
  analysis	
  –	
  es1ma1ng	
  the	
  gene1c	
  
correla1on	
  between	
  two	
  traits	
  or	...
25	
  
Linear	
  model	
  (simple	
  regression	
  based	
  associaBon	
  test)	
  
y	
  =	
  b0	
  +	
  x1b1	
  +	
  e	
 ...
Popula1on	
  stra1fica1on	
  es1mated	
  
from	
  SNP	
  data	
  	
  
Solu1on:	
  MLM	
  based	
  associa1on	
  
analysis	
  
•  y	
  =	
  Xb	
  +	
  Zu	
  +	
  e	
  or	
  y	
  =	
  Xb	
  +	
 ...
Excluding	
  the	
  SNP	
  from	
  calcula1ng	
  the	
  
gene1c	
  rela1onship	
  matrix	
  	
  
So^ware	
  tool	
  
h_p://gump.qimr.edu.au/gcta/	
  
Complex	
  Traits	
  Genomics	
  Group	
  (UQ)	
  
•  Peter	
  Visscher	
  
•  Naomi	
  Wray	
  
•  Hong	
  Lee	
  
	
  
U...
The	
  Australian	
  NeurogeneBcs	
  
Conference	
  	
  
	
  
at	
  the	
  Queensland	
  Brain	
  InsBtute	
  (QBI),	
  Th...
Upcoming SlideShare
Loading in …5
×

Jian Yang - Mixed linear model analyses of human complex traits using SNP data

2,682 views

Published on

Most traits and common diseases in humans, such as height, cognitive ability, psychiatric disorders and obesity, are influenced by many genes and their interplay with environmental factors. These diseases/traits are called “complex” traits to differentiate them from “Mendelian” traits that are caused by single genes. Understanding the
genetic architecture of human complex traits, e.g. how much of the difference between people’s susceptibilities to diseases are accounted for by their difference in DNA sequence, how many genes are involved in the etiology of diseases, where the genes are located and how much effects of the genes are on the disease risks, is essential to
diagnosis, discovery of new drug targets and prevention. To date, thousands gene loci as represented by single nucleotide polymorphisms (SNPs) have been identified to be associated with hundreds of human complex traits by the genome‐wide association study (GWAS) technique. In this lecture, I will be introducing the use of mixed linear
model in the analyses of GWAS data, to estimate the proportion of variance for a trait that can be explained by all SNPs (or called SNP heritability), to quantify the extent to which two traits (or diseases) share a common genetic basis (genetic correlation) using all SNPs, and to control for population structure in genome‐wide association analyses of individuals SNPs.

First presented at the 2014 Winter School in Mathematical and Computational Biology http://bioinformatics.org.au/ws14/program/

Published in: Science
0 Comments
1 Like
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
2,682
On SlideShare
0
From Embeds
0
Number of Embeds
3
Actions
Shares
0
Downloads
40
Comments
0
Likes
1
Embeds 0
No embeds

No notes for slide

Jian Yang - Mixed linear model analyses of human complex traits using SNP data

  1. 1. Mixed  linear  model  analyses  of   human  complex  traits  using  SNP   data     Jian  Yang   Queensland  Brain  Ins1tute   The  University  of  Queensland   1  
  2. 2. Why  do  we  need  a  mixed  linear   model?  
  3. 3. Linear  model   •  y  =  b0  +  x1b1  +  x2b2  +  …  +  xpbp  +  e     y  =  phenotype     xi  =  independent  variable     y  ~  N(b0  +  x1b1  +  x2b2  +  …  +  xpbp,  σ2 e)     b0  =  mean  term   b1  …  bp  =  effect  sizes  (regression  coefficients)     e  =    residual,  e  ~  N(0,  σ2 e)    
  4. 4. Linear  model   •  In  matrix  form   y  =  Xb  +  e   y  =  {yj}n  x  1;  X  =  {Xij}n  x  p;  b  =  {bi}p  x  1;  e  =  {ej}n  x  1     •  Es1ma1on   b-­‐hat  =  (XTX)-­‐1XTy   var(b-­‐hat)  =  σ2 e(XTX)-­‐1  
  5. 5. Special  cases   •  Simple  regression   y  =  b0  +  x1b1  +  e   b1-­‐hat  =  b-­‐hat  =  X1 Ty  /  (X1 TX1)   E(b1-­‐hat)  =  b1  =  cov(x1,  y)  /  var(x1)     var(b1-­‐hat)  =  σ2 e  /  [n*var(x1)]   •  Condi1onal  analysis   y  |  b2  …  bp  =  b0  +  x1b1  +  e    
  6. 6. Limita1ons   •  n  >  p:  sample  size  needs  to  be  >  than  the   number  of  parameters   •  All  the  effect  sizes  are  treated  as  fixed  (we   have  no  idea  about  the  varia1on  in  effect   sizes)   •  What  if  n  <<  p?  
  7. 7. What  is  a  mixed  linear  model  (MLM)?   •  y  =  Xb  +  Zu  +  e   Fixed  effects:  b  (special  case:  X  =  1  and  b  =  b0)   Random  effects:  u  =  {ui},  u  ~  N(0,  σ2 uA)   A  =  correla1on  matrix  between  ui  and  uj   E(y)  =  Xb   var(y)  =  V  =  ZAZTσ2 u  +  Iσ2 e  
  8. 8. Parameter  es1ma1on   •  Es1ma1on  of  variance  components  (σ2 u)      logL  =  -­‐1/2(log|V|  +  log|XTV-­‐1X|  +  yTPy    P  =  V-­‐1  -­‐  V-­‐1X(XTV-­‐1X)-­‐1XTV-­‐1     •  Predic1on  of  random  effects  (u)      u-­‐hat  =  σ2 u-­‐hat  ZTPy     •  Es1ma1on  of  fixed  effects  (b)    b-­‐hat  =  (XTV-­‐1X)-­‐1XTy            Linear  model:  b-­‐hat  =  (XTX)-­‐1XTy    
  9. 9. MLM  analysis  of  human  complex  traits   •  Animal  and  plant  breeding   –  predic1ng  breeding  values   –  linkage  mapping  (QTL  mapping)   •  Human  gene1cs  (before  2007)   –  pedigree  based    analysis  of  variance  (heritability)   –  linkage  mapping   •  Human  gene1cs  (aker  2007)   –  esBmaBng  SNP-­‐based  heritability   –  associa1on  analysis   –  gene1c  risk  predic1on  
  10. 10. Background  
  11. 11. Mendelian  traits   Complex  traits   Cys1c  fibrosis   Human  height   Schizophrenia   Obesity  
  12. 12. Major  ques1ons   •  Are  these  traits  heritable?   •  If  so,  what  is  the  heritability?   •  How  many  genes  involved  and  where  are  they   located?    
  13. 13. Risk  of  schizophrenia  (%)   13   Resemblance  between  twins  for  human  height   Heritability  =  ~80%   Heritability  =  ~80%   Heritability  =  40%~60%   Resemblance  between  relaBves  for  body  mass  index  (BMI)   Relatedness  CorrelaBon   Full-­‐sibs      0.36   Father-­‐son    0.28   Complex  traits  such  as  height,  BMI  and  SCZ  are  highly  heritable.  
  14. 14. 14   8  genes  for  human  complex  traits  before  2002   Glazier  et  al.  2002  Science   IdenBfying  genes  underlying  complex  traits   1700   30   8  
  15. 15. 15   Genome-­‐wide  AssociaBon  Study  (GWAS)   Manolio  2010  NEJM   Genome-­‐wide  threshold  P  =  5×10-­‐8   Linear  model  (simple  regression)   y  =  b0  +  x1b1  +  e   y  =  trait  value   x1  =  SNP  genotype  (0,  1  or  2)   b1-­‐hat  =  X1 Ty  /  (X1 TX1)  =  cov(x1,y)  /  var(x1)   SE2(b1-­‐hat)  =  σ2 e  /  [n  var(x1)]  
  16. 16. 16   An  explosion  of  gene  discoveries   ~5000  geneBc  variants  associated  with  ~650  traits  /  diseases   Glazier  et  al.  2002  Science   Prior  to  GWAS   GWAS   0" 1000" 2000" 3000" 4000" 5000" 6000" 2006" 2007" 2008" 2009" 2010" 2011" 2012" 2013" first"half" Number'of'SNPs' Year'
  17. 17.                                             Height:   •  180  loci     •  ~180K  samples   •  <  10%  of  variance  explained   •  heritability  =  ~80%   17   Schizophrenia:   •  22  loci     •  ~21K  cases  /  ~38K  controls     •  <  3%  of  variance  explained   •  heritability  =  ~80%   The  missing  heritability  problem   BMI:   •  32  loci   •  ~250K  samples   •  ~1%  of  variance  explained     •  Heritability  =  40%  ~  60%   Lango  Allen  et  al.  2010  Nature   Speliotes  et  al.  2010  Nat  Genet   Ripke  et  al.  2013  Nat  Genet  
  18. 18. Fiwng  all  SNPs  in  a  MLM   •  y  =  Wu  +  e    W  =  {wij}n  x  m,  wij  =  standardised  SNP  genotype    u  ~  N(0,  Iσ2 u)    var(y)  =  ZZTσ2 u  +  Iσ2 e    variance  explained  =  mσ2 u    /  (mσ2 u    +  σ2 e)       •  Let  g  =  Zu    y  =  g  +  e    g  ~  N(0,  Aσ2 g),  A  =  gene1c  rela1onship  matrix    var(y)  =  Aσ2 g  +  Iσ2 e    variance  explained  =  σ2 g    /  (σ2 g    +  σ2 e)     •  var(y)  =  (1/m)ZZT(mσ2 u)  +  Iσ2 e      A  =  ZZT  /  m  
  19. 19. Family  studies:  comparing  phenotypic  similarity  to  family  relatedness     –  Our  method:  comparing  phenotypic  similarity  to  gene8c  similarity   (es8mated  from  SNPs)  in  unrelated  individuals     GWAS:  tes8ng  a  SNP  at  a  8me  in  unrelated  samples     –  Our  method:  Es8ma8ng  the  contribu8on  from  all  SNPs  together   19   ~50%  of  variaBon  explained  by  all  SNPs  for  height  vs.  ~10%  from  GWAS   Reconciling  family  studies  and  GWAS  
  20. 20. 20   GWAS    vs  All-­‐SNP  esBmaBon   Yang  et  al.  2011  Nat  Genet   Lee  et  al.  2012  Nat  Genet   Yang  et  al.  2010  Nat  Genet   Many  geneBc  variants  each  with  a  small  effect  contribuBng  to  the  trait   variaBon   0%  10%  20%  30%  40%  50%   Height   Schizophrenia   Obesity  (BMI)  GWAS   Our  method  
  21. 21. Genome  par11oning   •  Single  component  MLM   y  =  g  +  e  (or  y  =  Wu  +  e)     •  Mul1-­‐component  MLM   y  =  g1  +  g2  +  …  +  g22  +  e    var(y)  =  A1σ2 g1  +  A2σ2 g2  +  …  +  A22σ2 g22  +  Iσ2 e  
  22. 22. 1" 2" 3"4" 5" 6" 7" 8" 9" 10" 11"12" 13" 14" 15" 16" 17" 18" 19" 20" 21"22" 0.00" 0.01" 0.02" 0.03" 0" 50" 100" 150" 200" 250" 300" Heritability* Chromosome*length* 22   Yang  et  al.  2011  Nat  Genet   Lee  et  al.  2012  Nat  Genet  Yang  et  al.  unpublished   1 2 3 4 5 6 7 89 10 11 12 13 14 15 16 17 18 19 20 21 22 0 0.01 0.02 0.03 0.04 0.05 0.06 0 50 100 150 200 250 Heritability Chromosome  length  (Mb) 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 0 0.005 0.01 0.015 0.02 0.025 0 50 100 150 200 250 Heritability Chromosome  length  (Mb) ~12,000  individuals   9000  cases     12,000  controls     Schizophrenia  Height   BMI   ~25,000  individuals   ParBBoning  the  geneBc  variance  into  individual  chromosomes   GeneBc  variants  distributed  across  the  whole  genome  
  23. 23. ParBBoning  the  geneBc  variance  based  on   funcBonal  annotaBon     23   GeneBc  signals  are  enriched  in  or  close  to  funcBonal  genes   Yang  et  al.  2011  Nat  Genet   Lee  et  al.  2012  Nat  Genet   Schizophrenia   30%   35%   35%   CNS+  genes   intergenic   Other  genes  83%   17%   Height   68%   32%   Genic  es1mate   Intergenic  es1mate   BMI  
  24. 24. More  …   •  Bivariate  analysis  –  es1ma1ng  the  gene1c   correla1on  between  two  traits  or  two  diseases   using  SNP  data  (Deary  et  al.  2012  Nature;  Lee   et  al.  2013  Nat  Genet)   •  Fiwng  a  mixture  distribu1on  rather  than  a   single  normal  distribu1on  to  the  random   effects  –  e.g.  Zhou  et  al.  2013  PLoS  Genet    
  25. 25. 25   Linear  model  (simple  regression  based  associaBon  test)   y  =  b0  +  x1b1  +  e   y  =  trait  value;  x1  =  SNP  genotype  (0,  1  or  2)     b1-­‐hat  =  X1 Ty  /  (X1 TX1)  =  cov(x1,y)  /  var(x1)   SE(b1-­‐hat)  =  σ2 e  /  [n  var(x1)]     Assump1on:  e  is  independent  and  iden1cally  distributed     Issues:   1)  Relatedness:  there  are  rela1ves  in  the  sample  –   inflated  false  posi1ve  rate   2)  Popula1on  stra1fica1on:  individuals  of  different   ancestries  –  spurious  associa1on;  e.g.  trait  =  ea1ng   with  chops1cks,  data  =  a  random  sample  of  US   popula1on.  
  26. 26. Popula1on  stra1fica1on  es1mated   from  SNP  data    
  27. 27. Solu1on:  MLM  based  associa1on   analysis   •  y  =  Xb  +  Zu  +  e  or  y  =  Xb  +  g  +  e      V  =  var(y)  =  Aσ2 g  +  Iσ2 e     •  Tes1ng  for  fixed  effects  given  sample  structure    b-­‐hat  =  (XTV-­‐1X)-­‐1XTy    var(b-­‐hat)  =  σ2 e(XTV-­‐1X)-­‐1   •  Issue:  a  SNP  is  fi}ed  twice.      
  28. 28. Excluding  the  SNP  from  calcula1ng  the   gene1c  rela1onship  matrix    
  29. 29. So^ware  tool   h_p://gump.qimr.edu.au/gcta/  
  30. 30. Complex  Traits  Genomics  Group  (UQ)   •  Peter  Visscher   •  Naomi  Wray   •  Hong  Lee     University  of  Melbourne   •  Mike  Goddard     QIMR  cohort   •  Nick  Mar1n   •  Grant  Montgomery   GENEVA  Consor8um   •  Teri  Manolio   •  Bruce  Weir     dbGaP   30   Acknowledgements  
  31. 31. The  Australian  NeurogeneBcs   Conference       at  the  Queensland  Brain  InsBtute  (QBI),  The   University  of  Queensland,  on  September  11th  and   12th,  2014     h_p://web.qbi.uq.edu.au/anc2014/  

×