Genome-wide association studies Misha Kapushesky Slides: Johan Rung, EBI St. Petersburg Russia 2010
Overview <ul><li>Methods for genome-wide association studies </li></ul><ul><li>Montreal GWAS for Type 2 Diabetes </li></ul...
Study coverage <ul><li>Associating phenotype/disease state to genetic variation </li></ul><ul><li>Cost per genotype has de...
Recombination
Linkage disequilibrium Two markers on the genome are inherited together more often than would be expected by chance This l...
Haplotypes and genotype tagging
Association studies <ul><li>Linkage disequilibrium enables association studies, because of detection by proxy - not every ...
Study power 1 2 3 4 1 2 3 4 A B Cases Controls
<ul><li>The power of a study is to correctly predict a true positive </li></ul><ul><li>To calculate this, you need:  </li>...
How many SNPs should be tested? Studies of small regions revealed linkage disequilibrium blocks in which common SNPs are h...
Quality controls <ul><li>Call rates for samples and SNPs </li></ul><ul><li>Exclusion of low frequency SNPs </li></ul><ul><...
Hardy-Weinberg Equilibrium <ul><li>If the alleles A and B have frequencies p and q, you would expect the following genotyp...
Hardy-Weinberg Equilibrium <ul><li>When observed genotype frequencies deviate from the ones expected under HWE, this is in...
<ul><li>Binary traits are typically disease state labels (case or control) </li></ul><ul><li>Real-valued traits are quanti...
Molecular vs disease phenotypes <ul><li>Disease phenotypes are the result of combinations of molecular phenotypes in the b...
Molecular vs disease phenotypes <ul><li>Many physiological phenotypes involved in disease dynamics </li></ul>
Molecular vs disease phenotypes Molecular phenotypes can give more precise information about disease state
<ul><li>Association statistics for binary traits are most often based on a   2 -statistic, based on the genotype count ta...
<ul><li>For aa in cases, you would expect  </li></ul><ul><li>The sum of the squares of the differences is   2 -distribute...
<ul><li>For real-valued phenotypes, use linear regression </li></ul><ul><li>For binary phenotypes, use logistic regression...
<ul><li>Population stratification occurs when groups or subpopulations within your sample are more related than would be e...
Genomic control
Eigenstrat
Imputation <ul><li>Using a reference population (like HapMap or 1000 genomes) we can infer the genotype of SNPs that were ...
Imputation Wu et al, Nat. Genet. 41, 991-995, 2009
Montreal GWAS
Type 2 diabetes <ul><li>Blood glucose levels are regulated by insulin release </li></ul><ul><li>Increased blood glucose le...
Type 2 diabetes
Genetics of type 2 diabetes <ul><li>Before GWAS, T2D genetics was studied with linkage studies and candidate gene approach...
Montreal GWAS <ul><li>Part of a larger T2D project at McGill and Genome Quebec </li></ul><ul><li>After initial planning fo...
Multi-stage GWAS <ul><li>Two main strategies for increasing study power </li></ul><ul><li>Meta-analyses increase effective...
Multi-stage GWAS
Study design Stage 1: Genome-wide scan - 392,365 SNPs French (N=1,376) 679 cases, 697 controls Focused Stage 2 - 16,273 SN...
Stage 1 samples <ul><li>French individuals: 690 cases, 670 controls </li></ul><ul><li>Criteria for cases: </li></ul><ul><u...
Stage 1 SNPs <ul><li>Tested on Illumina Human1 (100k) and HumanHap300 (300k) </li></ul><ul><li>392,935 unique SNPs from th...
Stage 1 results
Fast-track validation <ul><li>Top 57 fast-tracked and tested on a Sequenom panel on 2,617 cases, 2,894 controls </li></ul>...
Results SNP Chr Position pMAX Closest  gene rs7903146 10 114748339 1.5 x 10 -34 TCF7L2 rs13266634 8 118253964 6.1 x 10 -8 ...
SLC30A8 Chimienti et al. Biometals 18:313
HHEX KIF11 HHEX IDE D' 0 0.2 0.4 0.6 0.8 1
HHEX controls pancreatic development Habener  Endocrinology 146:1025 Hex homeobox gene-dependent tissue positioning is req...
Stage 2 <ul><li>Top 5% of GWAS hits were selected for design of a focused Stage 2 </li></ul><ul><li>Control for population...
QC Exclusion criterion Samples Call rate < 95% 27 Continental stratification 296 Sex mismatch 64 Related individuals 70 To...
EIGENSTRATcorrection filters for MAF, HWE, call rate filters for MAF, HWE, call rate and r 2
Results - stage 1 vs stage 2
Results - taking out known loci
 
Stage 3 <ul><li>The top 28 SNPs were tested using a Sequenom panel in ~7,700 Danish cases and controls </li></ul><ul><li>W...
rs2943641 <ul><li>We studied the effect of variation in rs2943641 on T2D risk and metabolic phenotypes in general populati...
Metabolic traits  <ul><li>A variety of indexes to capture   -cell function and insulin resistance </li></ul><ul><li>HOMA-...
Oral Glucose Tolerance Test
Metabolic traits 1 Metabolic trait Cohort rs2943641 P  add P  dom P  rec C/C C/T T/T Age NFBC 1986 16 16 16 DESIR 47.1 ± 9...
Metabolic traits 2 HOMA-B NFBC 1986 141 ± 95.1 136 ± 80.1 131 ± 91.6 0.006 0.05 0.009 DESIR 109 ± 87.0 103 ± 64.8 108 ± 92...
IRS1 locus - rs2943641
IRS1 <ul><li>G972R is a missense polymorphism in  IRS1  that is known to impair insulin signalling (rs1801278)  (Almind 19...
rs2943641 - IRS1 protein association
rs2943641 - IRS1 protein association rs2943641 CC rs2943641 CT rs2943641 TT P Add P Dom P Rec n  (male/female) 74 (35/39) ...
Conclusions <ul><li>The multi-stage study detected T2D risk loci that were later confirmed in other cohorts (SLC30A8, HHEX...
Paper Rung et al., Nature Genetics, 41, 1110-1115, 2009
Acknowledgements <ul><li>Johan Rung </li></ul><ul><li>Rob Sladek </li></ul><ul><li>Philippe Froguel </li></ul><ul><li>Oluf...
GWAS into context <ul><li>Complexity of interactions in biological systems... </li></ul>
Complexity <ul><li>...a lot of complexity </li></ul>
A B G B E F D A C
Redundancy
Network structure <ul><li>Biological networks have a scale-free structure </li></ul>Log(#edges) Log(# genes) Most genes ha...
Signal propagation <ul><li>The structure of biological networks result in robustness against random errors </li></ul><ul><...
Common diseases <ul><li>What is most common - disease cause by many variants with low effect, or few rare variants with st...
Common disease / common variant <ul><li>The hypothesis that most common diseases are caused by a large number of variants,...
Rare variants <ul><li>With improved and lower cost sequencing, we can address rare variants </li></ul><ul><li>Not just SNP...
Polygenic contributions <ul><li>Groups of non-genomewide significant SNPs proven to be associated with phenotype </li></ul...
Meta-analysis caveats <ul><li>Meta-analysis on heterogeneous data </li></ul><ul><ul><li>Phenotypes </li></ul></ul><ul><ul>...
Future directions for GWAS <ul><li>Sequencing is cheaper and yielding higher quality data </li></ul><ul><li>Better basis f...
Future directions for GWAS <ul><li>Complex phenotypes </li></ul><ul><li>Association of genetic loci to  </li></ul><ul><ul>...
Future directions for GWAS <ul><li>More data shared => better quality of results </li></ul><ul><li>As in other branches of...
Resources <ul><li>Analysis software packages </li></ul><ul><ul><li>PLINK -  http://pngu.mgh.harvard.edu/~purcell/plink/   ...
Upcoming SlideShare
Loading in...5
×

20100515 bioinformatics kapushesky_lecture06

877

Published on

0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
877
On Slideshare
0
From Embeds
0
Number of Embeds
3
Actions
Shares
0
Downloads
50
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

20100515 bioinformatics kapushesky_lecture06

  1. 1. Genome-wide association studies Misha Kapushesky Slides: Johan Rung, EBI St. Petersburg Russia 2010
  2. 2. Overview <ul><li>Methods for genome-wide association studies </li></ul><ul><li>Montreal GWAS for Type 2 Diabetes </li></ul><ul><li>GWAS results - context and caveats </li></ul>
  3. 3. Study coverage <ul><li>Associating phenotype/disease state to genetic variation </li></ul><ul><li>Cost per genotype has decreased </li></ul><ul><li>Instead of a candidate gene approach, just scan the entire genome </li></ul><ul><li>SNP microarrays covering up to 5M SNPs on one chip </li></ul><ul><li>Increased sample sizes </li></ul>
  4. 4. Recombination
  5. 5. Linkage disequilibrium Two markers on the genome are inherited together more often than would be expected by chance This leads to high correlation between nearby markers in its haplotype block
  6. 6. Haplotypes and genotype tagging
  7. 7. Association studies <ul><li>Linkage disequilibrium enables association studies, because of detection by proxy - not every variant need to be typed </li></ul>
  8. 8. Study power 1 2 3 4 1 2 3 4 A B Cases Controls
  9. 9. <ul><li>The power of a study is to correctly predict a true positive </li></ul><ul><li>To calculate this, you need: </li></ul><ul><ul><li>risk model </li></ul></ul><ul><ul><li>genotype relative risk </li></ul></ul><ul><ul><li>allele frequency </li></ul></ul><ul><ul><li>number of cases and controls </li></ul></ul><ul><ul><li>population penetrance </li></ul></ul><ul><ul><li>Acceptable rate of false positives </li></ul></ul>Study power
  10. 10. How many SNPs should be tested? Studies of small regions revealed linkage disequilibrium blocks in which common SNPs are highly correlated (usually <10,000–30,000 base pairs in African populations or 30,000–50,000 base pairs in the newer European and Asian populations) (22). This motivated the HapMap Project (www.hapmap.org [12]), which has validated approximately 4 million SNPs, including 2.8 million of the estimated 10 million common SNPs in major world populations, while creating competition among biotechnology companies to develop high-throughput genotyping technologies. Sequencing and genotyping studies showed that sets of 500,000 (European populations) to 1,000,000 (African populations) SNPs could &quot;tag&quot; (serve as proxies for) approximately 80% of common SNPs (23).
  11. 11. Quality controls <ul><li>Call rates for samples and SNPs </li></ul><ul><li>Exclusion of low frequency SNPs </li></ul><ul><li>Exclusion of SNPs out of Hardy-Weinberg Equilibrium </li></ul><ul><li>Clean (or take into account) population stratification </li></ul>
  12. 12. Hardy-Weinberg Equilibrium <ul><li>If the alleles A and B have frequencies p and q, you would expect the following genotype frequencies: </li></ul><ul><ul><li>AA: p 2 </li></ul></ul><ul><ul><li>AB: 2pq </li></ul></ul><ul><ul><li>BB: q 2 </li></ul></ul>
  13. 13. Hardy-Weinberg Equilibrium <ul><li>When observed genotype frequencies deviate from the ones expected under HWE, this is indicative of </li></ul><ul><ul><li>population stratification </li></ul></ul><ul><ul><li>different mutation rates between males and females </li></ul></ul><ul><ul><li>different fitness between alleles </li></ul></ul><ul><ul><li>genotype calling problems </li></ul></ul><ul><ul><li>true association at the locus </li></ul></ul>
  14. 14. <ul><li>Binary traits are typically disease state labels (case or control) </li></ul><ul><li>Real-valued traits are quantitatively measured phenotypes </li></ul><ul><ul><li>blood sugar </li></ul></ul><ul><ul><li>lipids </li></ul></ul><ul><ul><li>height </li></ul></ul><ul><ul><li>BMI </li></ul></ul><ul><ul><li>gene expression </li></ul></ul>Binary or real-valued phenotypes
  15. 15. Molecular vs disease phenotypes <ul><li>Disease phenotypes are the result of combinations of molecular phenotypes in the body </li></ul><ul><li>Progression with time </li></ul><ul><li>Precision of phenotype measurement </li></ul>
  16. 16. Molecular vs disease phenotypes <ul><li>Many physiological phenotypes involved in disease dynamics </li></ul>
  17. 17. Molecular vs disease phenotypes Molecular phenotypes can give more precise information about disease state
  18. 18. <ul><li>Association statistics for binary traits are most often based on a  2 -statistic, based on the genotype count table, or a logistic regression model </li></ul><ul><li> 2 -statistic summarizes independence between disease state and genotype </li></ul>Association statistics
  19. 19. <ul><li>For aa in cases, you would expect </li></ul><ul><li>The sum of the squares of the differences is  2 -distributed </li></ul>Association statistics aa aA AA Sum Cases r 0 r 1 r 2 R Controls s 0 s 1 s 2 S Count n 0 n 1 n 2 N
  20. 20. <ul><li>For real-valued phenotypes, use linear regression </li></ul><ul><li>For binary phenotypes, use logistic regression </li></ul>Regression
  21. 21. <ul><li>Population stratification occurs when groups or subpopulations within your sample are more related than would be expected by random </li></ul><ul><li>This introduces correlations and inflates association p-values and need to be corrected for </li></ul>Population stratification
  22. 22. Genomic control
  23. 23. Eigenstrat
  24. 24. Imputation <ul><li>Using a reference population (like HapMap or 1000 genomes) we can infer the genotype of SNPs that were not tested </li></ul><ul><li>IMPUTE or MACH commonly used </li></ul><ul><li>Yields probabilistic genotypes that need special treatment </li></ul>
  25. 25. Imputation Wu et al, Nat. Genet. 41, 991-995, 2009
  26. 26. Montreal GWAS
  27. 27. Type 2 diabetes <ul><li>Blood glucose levels are regulated by insulin release </li></ul><ul><li>Increased blood glucose levels triggers release of insulin, that signals to the cells in muscle for glucose intake </li></ul><ul><li>Through  -cell dysfunction or insulin resistance, insulin regulation is impaired, leading to increased glucose levels and eventually type 2 diabetes </li></ul>
  28. 28. Type 2 diabetes
  29. 29. Genetics of type 2 diabetes <ul><li>Before GWAS, T2D genetics was studied with linkage studies and candidate gene approaches </li></ul><ul><li>Results in particular for MODY variants, caused by disruptions of single genes </li></ul><ul><li>Genome-wide association studies and SNP arrays made it possible to study complex diseases </li></ul><ul><li>Five large GWAS for T2D in 2007 </li></ul><ul><li>DIAGRAM meta-analysis in 2008 </li></ul>
  30. 30. Montreal GWAS <ul><li>Part of a larger T2D project at McGill and Genome Quebec </li></ul><ul><li>After initial planning for candidate gene genotyping, we switched to a GWAS strategy </li></ul>
  31. 31. Multi-stage GWAS <ul><li>Two main strategies for increasing study power </li></ul><ul><li>Meta-analyses increase effective sample size by combining results from different studies </li></ul><ul><li>Multi-stage approaches scan the whole genome with relatively low power, followed by focusing in on the hits with higher power </li></ul><ul><li>Maximizing power in a single study in a cost-effective way </li></ul>
  32. 32. Multi-stage GWAS
  33. 33. Study design Stage 1: Genome-wide scan - 392,365 SNPs French (N=1,376) 679 cases, 697 controls Focused Stage 2 - 16,273 SNPs French (N=4,977) 2,245 cases, 2,732 controls Focused Stage 3 - 28 SNPs Danish (N=7,698) 3,334 cases, 4,364 controls Stage 4: population effect study - 1 SNP (rs2943641) Population based study samples French (N=3,351), Finnish (N=5,183), Danish (N=5,824) CASE-CONTROL T2D ASSOCIATION QT ASSOCIATION IN POPULATIONS Fast-track confirmation - 57 SNPs French (N=5,511) 2,617 cases, 2,894 controls Previously published, Nature, Feb 2007 Fasting glucose Normoglycemic individuals Stage 1: French (N=654) Stage 2: rs560887 (N=9,353) Previously published, Science, May 2007 Stage 1: Genome-wide scan - 392,365 SNPs French (N=1,376) 679 cases, 697 controls Focused Stage 2 - 16,273 SNPs French (N=4,977) 2,245 cases, 2,732 controls Fast-track confirmation - 57 SNPs French (N=5,511) 2,617 cases, 2,894 controls Previously published, Nature, Feb 2007 Stage 1: Genome-wide scan - 392,365 SNPs French (N=1,376) 679 cases, 697 controls Focused Stage 2 - 16,273 SNPs French (N=4,977) 2,245 cases, 2,732 controls Fasting glucose Normoglycemic individuals Stage 1: French (N=654) Stage 2: rs560887 (N=9,353) Previously published, Science, May 2007 Fast-track confirmation - 57 SNPs French (N=5,511) 2,617 cases, 2,894 controls Previously published, Nature, Feb 2007 Stage 1: Genome-wide scan - 392,365 SNPs French (N=1,376) 679 cases, 697 controls Focused Stage 2 - 16,273 SNPs French (N=4,977) 2,245 cases, 2,732 controls Fasting glucose Normoglycemic individuals Stage 1: French (N=654) Stage 2: rs560887 (N=9,353) Previously published, Science, May 2007 Fast-track confirmation - 57 SNPs French (N=5,511) 2,617 cases, 2,894 controls Previously published, Nature, Feb 2007 Stage 1: Genome-wide scan - 392,365 SNPs French (N=1,376) 679 cases, 697 controls Focused Stage 2 - 16,273 SNPs French (N=4,977) 2,245 cases, 2,732 controls Stage 1: Genome-wide scan - 392,365 SNPs French (N=1,376) 679 cases, 697 controls Focused Stage 2 - 16,273 SNPs French (N=4,977) 2,245 cases, 2,732 controls Focused Stage 3 - 28 SNPs Danish (N=7,698) 3,334 cases, 4,364 controls Stage 4: population effect study - 1 SNP (rs2943641) Population based study samples French (N=3,351), Finnish (N=5,183), Danish (N=5,824) Fast-track confirmation - 57 SNPs French (N=5,511) 2,617 cases, 2,894 controls Previously published, Nature, Feb 2007 Stage 1: Genome-wide scan - 392,365 SNPs French (N=1,376) 679 cases, 697 controls Focused Stage 2 - 16,273 SNPs French (N=4,977) 2,245 cases, 2,732 controls Fast-track confirmation - 57 SNPs French (N=5,511) 2,617 cases, 2,894 controls Previously published, Nature, Feb 2007 Stage 1: Genome-wide scan - 392,365 SNPs French (N=1,376) 679 cases, 697 controls Focused Stage 2 - 16,273 SNPs French (N=4,977) 2,245 cases, 2,732 controls Fast-track confirmation - 57 SNPs French (N=5,511) 2,617 cases, 2,894 controls Previously published, Nature, Feb 2007 Stage 1: Genome-wide scan - 392,365 SNPs French (N=1,376) 679 cases, 697 controls Focused Stage 2 - 16,273 SNPs French (N=4,977) 2,245 cases, 2,732 controls Fasting glucose Normoglycemic individuals Stage 1: French (N=654) Stage 2: rs560887 (N=9,353) Previously published, Science, May 2007 Fast-track confirmation - 57 SNPs French (N=5,511) 2,617 cases, 2,894 controls Previously published, Nature, Feb 2007 Stage 1: Genome-wide scan - 392,365 SNPs French (N=1,376) 679 cases, 697 controls Focused Stage 2 - 16,273 SNPs French (N=4,977) 2,245 cases, 2,732 controls
  34. 34. Stage 1 samples <ul><li>French individuals: 690 cases, 670 controls </li></ul><ul><li>Criteria for cases: </li></ul><ul><ul><li>T2D </li></ul></ul><ul><ul><li>First degree relative with T2D </li></ul></ul><ul><ul><li>Non-obese (BMI < 31 kg/m² , 25.8 ± 2.8 kg/m²) </li></ul></ul><ul><li>Controls from DESIR, a prospective French cohort </li></ul><ul><ul><li>Normal glucose tolerance for the 9 years of the study </li></ul></ul>
  35. 35. Stage 1 SNPs <ul><li>Tested on Illumina Human1 (100k) and HumanHap300 (300k) </li></ul><ul><li>392,935 unique SNPs from the combined arrays </li></ul>
  36. 36. Stage 1 results
  37. 37. Fast-track validation <ul><li>Top 57 fast-tracked and tested on a Sequenom panel on 2,617 cases, 2,894 controls </li></ul><ul><li>Relaxed criteria for cases </li></ul><ul><ul><li>BMI < 35 kg/m² (28.9 ± 3.7 kg/m²) </li></ul></ul><ul><li>Sladek et al., Nature 445, 881-885, 2007 </li></ul>
  38. 38. Results SNP Chr Position pMAX Closest gene rs7903146 10 114748339 1.5 x 10 -34 TCF7L2 rs13266634 8 118253964 6.1 x 10 -8 SLC30A8 rs1111875 10 94452862 3.0 x 10 -6 HHEX rs7923837 10 94471897 7.5 x 10 -6 HHEX rs7480010 11 42203294 1.1 x 10 -4 LOC387761 rs3740878 11 44214378 1.2 x 10 -4 EXT2 rs11037909 11 44212190 1.8 x 10 -4 EXT2 rs1113132 11 44209979 3.3 x 10 -4 EXT2
  39. 39. SLC30A8 Chimienti et al. Biometals 18:313
  40. 40. HHEX KIF11 HHEX IDE D' 0 0.2 0.4 0.6 0.8 1
  41. 41. HHEX controls pancreatic development Habener Endocrinology 146:1025 Hex homeobox gene-dependent tissue positioning is required for organogenesis of the ventral pancreas. Bort (2004) Heart induction by Wnt antagonists depends on the homeodomain transcription factor Hex. Foley (2005) The homeobox gene Hex is required in definitive endodermal tissues for normal forebrain, liver and thyroid formation. Martinez Barbera (2000)
  42. 42. Stage 2 <ul><li>Top 5% of GWAS hits were selected for design of a focused Stage 2 </li></ul><ul><li>Control for population bias with EIGENSTRAT </li></ul><ul><li>iSelect array with 16,405 SNPs, tested on 2,245 cases, 2,732 controls (French) </li></ul><ul><li>Analysis with EIGENSTRATand selection of 28 SNPs for a focused Stage 3 </li></ul>
  43. 43. QC Exclusion criterion Samples Call rate < 95% 27 Continental stratification 296 Sex mismatch 64 Related individuals 70 Total 457 Chromosome SNPs Failed HWE Failed MAF Successful TOTAL 16,360 48 43 16,273
  44. 44. EIGENSTRATcorrection filters for MAF, HWE, call rate filters for MAF, HWE, call rate and r 2
  45. 45. Results - stage 1 vs stage 2
  46. 46. Results - taking out known loci
  47. 48. Stage 3 <ul><li>The top 28 SNPs were tested using a Sequenom panel in ~7,700 Danish cases and controls </li></ul><ul><li>We confirm association of TCF7L2, WFS1, CDKAL1 and find one new association: rs2943641 near IRS1 </li></ul>
  48. 49. rs2943641 <ul><li>We studied the effect of variation in rs2943641 on T2D risk and metabolic phenotypes in general populations: </li></ul><ul><li>DESIR: 3,351 French adults </li></ul><ul><li>Inter99: 5,183 Danish adults </li></ul><ul><li>NFBC 1986: 5,824 Finnish adolescents </li></ul>
  49. 50. Metabolic traits <ul><li>A variety of indexes to capture  -cell function and insulin resistance </li></ul><ul><li>HOMA-B and HOMA-IR based on fasting levels of glucose and insulin </li></ul><ul><li>For Inter99, we had access to OGTT data and could calculate other measures of insulin response </li></ul><ul><ul><li>time course data </li></ul></ul><ul><ul><li>AUC </li></ul></ul><ul><ul><li>corrected insulin response (CIR) </li></ul></ul><ul><ul><li>disposition indexes </li></ul></ul>
  50. 51. Oral Glucose Tolerance Test
  51. 52. Metabolic traits 1 Metabolic trait Cohort rs2943641 P add P dom P rec C/C C/T T/T Age NFBC 1986 16 16 16 DESIR 47.1 ± 9.8 47.5 ± 9.9 47.6 ± 10.1 INTER99 44.9 ± 7.9 45.4 ± 7.8 45.2 ± 7.6 Sex NFBC 1986 1062/1092 1153/1208 322/346 DESIR 645/728 728/812 216/222 INTER99 776/942 974/1070 307/354 BMI (kg/m 2 ) NFBC 1986 21.3 ± 3.8 21.3 ± 3.7 21.1 ± 3.5 0.24 0.43 0.21 DESIR 24.5 ± 3.7 24.4 ± 3.5 24.4 ± 3.4 0.55 0.63 0.61 INTER99 25.6 ± 3.9 25.4 ± 4.1 25.7 ± 4.2 0.57 0.094 0.24 Fasting plasma glucose (mmol/l) NFBC 1986 5.13 ± 0.41 5.14 ± 0.40 5.13 ± 0.41 0.77 0.62 0.90 DESIR 5.21 ± 0.44 5.20 ± 0.42 5.18 ± 0.43 0.05 0.32 0.07 INTER99 5.31 ± 0.40 5.31 ± 0.41 5.33 ± 0.39 0.66 0.93 0.32 Fasting serum insulin (pmol/l) NFBC 1986 78.7 ± 48.6 76.8 ± 44.5 71.7 ± 32.1 0.001 0.03 0.0009 DESIR 50.6 ± 32.9 48.4 ± 29.7 49.1 ± 29.1 0.05 0.003 0.76 INTER99 38.8 ± 24.7 36.4 ± 21.9 37.6 ± 23.3 0.018 0.0043 0.49
  52. 53. Metabolic traits 2 HOMA-B NFBC 1986 141 ± 95.1 136 ± 80.1 131 ± 91.6 0.006 0.05 0.009 DESIR 109 ± 87.0 103 ± 64.8 108 ± 92.2 0.16 0.006 0.24 INTER99 75.2   ±  65.6 68.3  ±  42.2 71.0  ±  49.9 0.005 0.0011 0.32 HOMA-IR NFBC 1986 2.52 ± 1.63 2.47 ± 1.58 2.29 ± 1.06 0.007 0.07 0.005 DESIR 1.95 ± 1.35 1.86 ± 1.20 1.88 ± 1.17 0.03 0.004 0.95 INTER99 1.54  ±  1.00 1.44  ±  0.89 1.49  ±  0.95 0.026 0.0058 0.59 Insulin 30’ INTER99 300 ± 183 277 ± 172 281 ± 169 0.0019 8.1 x 10 ‑4 0.14 Insulin 120’ 176 ± 138 163 ± 127 162 ± 124 0.0059 0.011 0.057 AUC insulin 22000 ± 13800 20300 ± 12900 20500 ± 12700 6.9 x 10 ‑4 2.2 x 10 ‑4 0.12 Glucose 30’ 8.19 ± 1.53 8.17 ± 1.56 8.22 ± 1.50 0.72 0.34 0.55 Glucose 120’ 5.51 ± 1.11 5.51 ± 1.11 5.47 ± 1.15 0.54 0.99 0.23 AUC glucose 182 ± 101 181 ± 102 180 ± 99.5 0.44 0.48 0.59 AUC insulin / AUC glucose 32.5  ±  17.4 30.1  ±  16.2 30.6  ±  16.1 6.0 x 10 ‑4 1.6 x 10 ‑4 0.13 CIR 1140  ± 4210 1000  ±  1130 1000  ±  1060 0.045 0.066 0.17 ISI 0.151  ±  0.095 0.16  ±  0.098 0.156  ±  0.096 0.026 0.0058 0.59 Disp. Index (CIR * ISI) 180  ±  1610 147  ±  220 143  ±  174 0.73 1.0 0.50
  53. 54. IRS1 locus - rs2943641
  54. 55. IRS1 <ul><li>G972R is a missense polymorphism in IRS1 that is known to impair insulin signalling (rs1801278) (Almind 1993) </li></ul><ul><li>G972R associated to insulin resistance and insulin release (Clausen 1995, Sesti 2001) </li></ul><ul><li>In mice, IRS1 disruption causes disrupted insulin action, both in target tissues and in  -cells (Nandi 2004) </li></ul><ul><li>Also linked to insulin resistance, glucose intolerance, islet hyperplasia (Tamemoto 1994, Araki 1994, Terauchi 1997, Withers 1998) </li></ul><ul><li>G972R not conclusively associated to T2D (Florez 2004, Florez 2007, Jellema 2003, Zeggini 2004) </li></ul><ul><li>We detect no epistasis between rs2943641 and G972R in DESIR or NFBC, only nominal significance in Inter99 </li></ul><ul><li>Evidence for link between rs2943641 and IRS1? </li></ul>
  55. 56. rs2943641 - IRS1 protein association
  56. 57. rs2943641 - IRS1 protein association rs2943641 CC rs2943641 CT rs2943641 TT P Add P Dom P Rec n (male/female) 74 (35/39) 88 (51/37) 28 (10/18) Age (years) 42.5 ± 17.1 43.5 ± 16.9 43.2 ± 17.6 BMI (kg/m 2 ) 25.0 ± 3.8 24.9 ± 3.9 25.3 ± 4.1 0.3 0.7 0.2 R d insulin clamp (mg/kg FFM /min) 10.4 ± 3.5 11.0 ± 3.2 11.7 ± 3.7 0.2 0.2 0.4 D i (x 10 ‑7 ) 1.7 ± 1.1 1.8 ± 1.3 1.8 ± 1.1 0.8 0.8 0.9 IRS-1 protein basal (AU) 296.7 ± 167.7 314.0 ± 155.1 413.1 ± 227.6 0.03 0.3 0.009 IRS-1 protein insulin (AU) 276.6 ± 143.6 280.9 ± 156.4 313.3 ± 147.9 0.3 0.7 0.2 IRS-1-associated PI3K activity basal (AU) 25.0 ± 12.6 26.6 ± 15.4 30.1 ± 17.2 0.3 0.4 0.4 IRS-1-associated PI3K activity insulin (AU) 47.1 ± 29.9 56.6 ± 32.1 72.2 ± 41.3 0.001 0.02 0.002
  57. 58. Conclusions <ul><li>The multi-stage study detected T2D risk loci that were later confirmed in other cohorts (SLC30A8, HHEX) </li></ul><ul><li>Variation in rs2943641 is associated to </li></ul><ul><ul><li>T2D risk </li></ul></ul><ul><ul><li>increased insulin levels </li></ul></ul><ul><ul><li>impaired insulin sensitivity </li></ul></ul><ul><ul><li>IRS1 protein levels </li></ul></ul><ul><ul><li>IRS1 activity in insulin signaling pathway </li></ul></ul><ul><li>Study provided a ”full story” from GWAS scan to functional evidence thanks to rich phenotyping </li></ul>
  58. 59. Paper Rung et al., Nature Genetics, 41, 1110-1115, 2009
  59. 60. Acknowledgements <ul><li>Johan Rung </li></ul><ul><li>Rob Sladek </li></ul><ul><li>Philippe Froguel </li></ul><ul><li>Oluf Pedersen </li></ul><ul><li>Constantin Polychronakos </li></ul><ul><li>Ghislain Rocheleau </li></ul><ul><li>Alexander Mazur </li></ul><ul><li>Lishuang Shen </li></ul><ul><li>David Serre </li></ul><ul><li>Philippe Boutin </li></ul><ul><li>Daniel Vincent </li></ul><ul><li>Alexandre Belisle </li></ul><ul><li>Samy Hadjadj </li></ul><ul><li>Beverley Balkau </li></ul><ul><li>Barbara Heude </li></ul><ul><li>Guillaume Charpentier </li></ul><ul><li>Tom Hudson </li></ul><ul><li>Sebastien Brunet </li></ul><ul><li>François Bacot </li></ul>Rosalie Frechette Valérie Catudal Philippe Laflamme Stephane Cauchi Christian Dina David Meyre Christine Cavalcanti-Proença Anders Albrechtsen Torben Hansen Knut Borch-Johnsen Torsten Lauritzen Marjo-Riitta J ärvelin Jaana Laitinen Emmanuelle Durand Paul Elliott Samy Hadjadj Michel Marre Alexander Montpetit Charlotta Pisinger Barry Posner Anneli Pouta Marc Prentki Rasmus Ribel-Madsen Aimo Ruokonen Anelli Sandbaek Jean Tichet Martine Vaxillaire Jorgen Wojtaszewski Allan Vaag
  60. 61. GWAS into context <ul><li>Complexity of interactions in biological systems... </li></ul>
  61. 62. Complexity <ul><li>...a lot of complexity </li></ul>
  62. 63. A B G B E F D A C
  63. 64. Redundancy
  64. 65. Network structure <ul><li>Biological networks have a scale-free structure </li></ul>Log(#edges) Log(# genes) Most genes have few connections Few genes have many connections
  65. 66. Signal propagation <ul><li>The structure of biological networks result in robustness against random errors </li></ul><ul><li>Most mutations, even knockouts, can go by unnoticed because of redundancy and network wiring </li></ul><ul><li>Low probability to knock out a hub </li></ul>
  66. 67. Common diseases <ul><li>What is most common - disease cause by many variants with low effect, or few rare variants with strong effects? </li></ul><ul><li>GWAS so far have by necessity focused on common variants </li></ul><ul><li>Many known rare variants associated with common diseases - or phenotypes that may contribute and progress to disease </li></ul>
  67. 68. Common disease / common variant <ul><li>The hypothesis that most common diseases are caused by a large number of variants, common in a general population, but each adding just a small risk </li></ul><ul><li>GWAS results find many loci for common complex diseases, with small risk </li></ul><ul><li>But... GWAS detected loci so far only explain a very small fraction of the observed variation </li></ul>
  68. 69. Rare variants <ul><li>With improved and lower cost sequencing, we can address rare variants </li></ul><ul><li>Not just SNPs </li></ul><ul><li>Utility of “extreme cohorts” </li></ul><ul><li>Ex. “A new highly penetrant form of obesity due to deletions on chromosome 16p11.2” (Nature Feb 4, 2010) </li></ul>
  69. 70. Polygenic contributions <ul><li>Groups of non-genomewide significant SNPs proven to be associated with phenotype </li></ul><ul><li>Individual SNPs can not be inferred, just “group action” </li></ul><ul><li>Supports the idea of many weak variants responsible for effect </li></ul><ul><li>Ex. “Common polygenic variation contributes to risk of schizophrenia and bipolar disorder” (Nature 460, 748-752) </li></ul>
  70. 71. Meta-analysis caveats <ul><li>Meta-analysis on heterogeneous data </li></ul><ul><ul><li>Phenotypes </li></ul></ul><ul><ul><li>Quality control </li></ul></ul><ul><ul><li>Platforms </li></ul></ul><ul><ul><li>Genotype calling </li></ul></ul><ul><ul><li>Analysis </li></ul></ul>
  71. 72. Future directions for GWAS <ul><li>Sequencing is cheaper and yielding higher quality data </li></ul><ul><li>Better basis for studying and detecting rare variants and their effect on diseases or phenotypes </li></ul><ul><li>Copy number variants </li></ul><ul><li>Genetic interactions, GxE interactions </li></ul><ul><li>More samples => higher power </li></ul>
  72. 73. Future directions for GWAS <ul><li>Complex phenotypes </li></ul><ul><li>Association of genetic loci to </li></ul><ul><ul><li>genome-wide expression levels </li></ul></ul><ul><ul><li>protein levels </li></ul></ul><ul><ul><li>metabolite levels </li></ul></ul>
  73. 74. Future directions for GWAS <ul><li>More data shared => better quality of results </li></ul><ul><li>As in other branches of science, data sharing, transparency and openness should be promoted </li></ul>
  74. 75. Resources <ul><li>Analysis software packages </li></ul><ul><ul><li>PLINK - http://pngu.mgh.harvard.edu/~purcell/plink/ </li></ul></ul><ul><ul><li>*Abel - http://mga.bionet.nsc.ru/~yurii/ABEL/ </li></ul></ul><ul><ul><li>MERLIN - http://www.sph.umich.edu/csg/abecasis/merlin/ </li></ul></ul><ul><li>Imputations </li></ul><ul><ul><li>IMPUTE - http://mathgen.stats.ox.ac.uk/impute/impute.html </li></ul></ul><ul><ul><li>MACH - http://www.sph.umich.edu/csg/abecasis/MACH/ </li></ul></ul><ul><li>Population structure </li></ul><ul><ul><li>Eigenstrat - http://genepath.med.harvard.edu/~reich/Software.htm </li></ul></ul><ul><ul><li>EMMA(X) - http://genetics.cs.ucla.edu/emmax/index.html </li></ul></ul><ul><li>Meta-analysis </li></ul><ul><ul><li>METAL - http://www.sph.umich.edu/csg/abecasis/METAL/ </li></ul></ul><ul><ul><li>GWAMA - http://www.well.ox.ac.uk/gwama/index.shtml </li></ul></ul><ul><li>Data </li></ul><ul><ul><li>EGA - http://www.ebi.ac.uk/ega/ </li></ul></ul><ul><ul><li>dbGAP - http://www.ncbi.nlm.nih.gov/gap </li></ul></ul>
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.

×