Your SlideShare is downloading. ×
Human genetic variation and its contribution to complex traits
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×
Saving this for later? Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime – even offline.
Text the download link to your phone
Standard text messaging rates apply

Human genetic variation and its contribution to complex traits

2,499
views

Published on

Guest lecture by Prof. Dr. ir. Bart Deplancke introducing the basic principles of systems genetics.

Guest lecture by Prof. Dr. ir. Bart Deplancke introducing the basic principles of systems genetics.

Published in: Technology

0 Comments
1 Like
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
2,499
On Slideshare
0
From Embeds
0
Number of Embeds
1
Actions
Shares
0
Downloads
63
Comments
0
Likes
1
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide

Transcript

  • 1. Deplancke LabMonica AlbarcaJean-Daniel FeuzCarine GubelmannKorneel HensAlina IsakovaIrina KrierAndreas MassourasSunil RaghavJovan Simicevic deplanckelab.epfl.chSebastian WaszakWiebke WesthallYou?
  • 2. Laboratory of Systems Biology and Genetics Bart Deplancke (bart.deplancke@epfl.ch)Human genetic variation and its contribution to complex traits 26 June 2000
  • 3. The human genome First announcementIn June 2000: first announcement of a working draft (haplotype!)with the Nature and Science papers in February 2001 James Kent (UCSC) Eugene Myers (Celera) International Human Genome Sequencing Consortium (2001) Nature 409:860-921; Venter et al. (2001) Science 291:1304-1351.In June 2001: finished chromosome 20, with others followinguntil finishing of chromosome 1 in May 2006 Gregory et al. (2006), Nature, 441, 315-321
  • 4. Why are we so phenotypically different?
  • 5. Classes of human genetic variationCommon versus rareRefers to the frequency of the minor allele in the human population: • Common variants = minor allele frequency (MAF) >1% in the population. Also described as polymorphisms. • Rare variants = MAF < 1%Neutrality: • The vast majority of genetic variants are likely neutral = no contribution to phenotypic variation. • Some may reach significant frequencies, but this is chance.Two different nucleotide composition classes: • Single nucleotide variants • Structural variants
  • 6. Single nucleotide variants T/G T/G A/CATTGCAATCCGTGG...ATCGAGCCA…TACGATTGCACGCCG…ATTGCAAGCCGTGG...ATCTAGCCA…TACGATTGCAAGCCG…ATTGCAAGCCGTGG...ATCTAGCCA…TACGATTGCAAGCCG…ATTGCAATCCGTGG...ATCGAGCCA…TACGATTGCACGCCG…ATTGCAAGCCGTGG...ATCTAGCCA…TACGATTGCAAGCCG…
  • 7. How are SNPs detected?High-density oligonucleotide arrays Chee et al., Science, 1996 Simple 5’ to 3’ read-out Flanking issues Unique oligonucleotide primers to generate minimally overlapping lone range-PCR products of 10-kb average length
  • 8. How are SNPs detected? Other strategies Clustered Reduced alignment representationshotgun sequencingfollowed by genomic alignment Gene-centric studies Reference sequence From Rothberg et al. Nature Biotech, 2001
  • 9. The SNP database - dbSNP http://www.ncbi.nlm.nih.gov/projects/SNP/ > High >Three “out of Africa” genomes:• 1.2 million (67%) (all three), 1.7 million (52%) (any two), 1.0 million (30%) unique• Overall, 5.2 million SNPs in the three genomes, the majority being present in dbSNP• Data indicate that most SNVs are common rather than rare
  • 10. Single nucleotide variants• Estimated that the human genome contains > 11 million SNPs(~7 million with MAF > 5%, rest between 1-5%).• Unknown how many rare or even novel (“de novo”) SNVs• SNP alleles in the same genomic interval are often correlated withone another  “Linkage disequilibrium (LD)” = Nonrandomassociation of alleles – varies in complex and unpredictable manneracross the genome and between different populations.• International HapMap Project  can we divide the genome intogroups of highly correlated SNPs that are generally inheritedtogether = “LD bins” Number of tag SNPs required to capture common Phase II SNPs
  • 11. Single nucleotide variants Recap • International HapMap Project  can we divide the genome into groups of highly correlated SNPs that are generally inherited together = “LD bins” Number of tag SNPs required to capture common Phase II SNPs Based on genotyping over 3.1Pairwise linkage disequilibrium million SNPs in 270 individuals(LD) r2 (if 1  SNPs statistically from 4 geographically diverseindistinguishable) populations (Frazer et al., Nature, 2007) By genotyping the DNA sample of an individual with a “tagging” SNP from each LD bin, knowledge regarding 80% of SNPs with a MAF > 5% across the genome is gained. (Frazer et al., Nature Rev. Genetic., 2010)
  • 12. Querying human genetic variation Scan Entire Genome - 500,000 SNPs
  • 13. Population Stratification Subdivision of a population into different ethnic groups with potentially different marker allele frequencies and thus different disease prevalence From SvenBergmann, UNIL Principle Component Analysis reveals SNP-vectors explaining largest variation in the data
  • 14. Population StratificationEthnic groups cluster according to geographic distances PC2 PC2 From Sven PC1 PC1 Bergmann, UNIL
  • 15. Population StratificationPCA of POPRES cohort From Sven Bergmann, UNIL
  • 16. Structural variants (Frazer et al., Nature Rev. Genetic., 2010) A classic that opened the door to structural variant research:Sebat et al. Large-Scale Copy Number Polymorphism in the Human Genome. Science, 2004. Used ROMA technique to detect copy number variants
  • 17. Representational Oligonucleotide Microarray Analysis (ROMA) 1) Genome digestion 2) Adapters to sticky ends and PCR amplification 3) After PCR, representations of the entire genome (restriction fragments) are amplified to pronounce relative increases, decreases or preserve equal copy number in the two genomes. 4) Representations of the two different genomes are labeled with different fluorophores and co-hybridized to a microarray with probes specific to restriction site locations across the entire human genome.
  • 18. Representational Oligonucleotide Microarray Analysis (ROMA) On average, individuals (20 tested) differed by 11 CNPs (average length = 465 kb) affecting 70 genes.
  • 19. Structural variants (SVs) (Frazer et al., Nature Rev. Genetic., 2010)Our ability to detect SVs is still very poor (see later)
  • 20. Structural variants (SVs) Fosmid-based library sequencing of 8 humans (4 Yorubian and 4 non-African) (Kidd et al., Nature, 2008) • 1 million fosmid clones/individual • Both ends of each clone insert sequenced  a pair of high-quality end sequences (termed an end-sequence pair (ESP). Only SVs over 8 kb can be detected(~450 bp/sequence)
  • 21. Structural variants (SVs)Fosmid-based library sequencing of 8 humans (4 Yorubian and 4 non-African) (Kidd et al., Nature, 2008) ~2,000 SVs that were experimentally verified Novel sequence (either in gaps (black) or not (orange))
  • 22. Structural variants (SVs) Fosmid-based library sequencing of 8 humans (4 Yorubian and 4 non-African) (Kidd et al., Nature, 2008)• 50% of SVs seen >1 individual ~2,000 SVs that were• ~50% outside regions previously annotated as SVs experimentally verifiednearly half lay outside regions of the genome previously Noveldescribed as structurally variant sequence• 525 new insertion sequences (either in• 20% of all genetic variants = SVs, but covers >70% of gaps (black) or notnucleotide variation (orange))• SVs  b/w 9- 25 Mb (~0.5-1% of the genome)• The majority of SVs are yet to be discovered
  • 23. Structural variants (SVs)Fosmid-based library sequencing of 8 humans (4 Yorubian and 4 non-African) (Kidd et al., Nature, 2008) Regions of increased SNV density
  • 24. Structural variants and linkage disequilibrium McCarroll et al., Nature Genet., 2008 • Most common, diallelic CNPs (with MAF greater than 5%) were perfectly captured (r2 = 1.0) by at least one SNP tag from HapMap Phase II • Mean r2 as a function of distance from a polymorphism = indistinguishable for SNPs and diallelic CNPs  common, diallelic CNPs are ancestral mutations Common SVs are in LD with tagging SNPs
  • 25. Contribution of variants to phenotypes?
  • 26. Common versus rare “Common disease – common variant hypothesis” versusCommon complex traits are the summation of low-frequency, high-penetrance variants OR = odd ratio or PAR = population attributable risk = measure of the multifactorial inherited component of a disease
  • 27. Whole Genome Association studies How significant is this?
  • 28. Whole genome association studies P-valueNote: “Genome-wide” is a misnomer • 20% of common SNPs not or only partially tagged • Rare variants not tagged at all
  • 29. Whole Genome Association studies Concept -log10(p)Scan Entire Genome * *- 500,000 SNPs -log10(p) * **Identify local regionsof interest, examinegenes, SNP densityregulatory regions, etcReplicate the finding From Sven Bergmann, UNIL
  • 30. Whole Genome Association studies VisualizationWellcome Trust Case Control Consortium. Genome-wide associationstudy of 14,000 cases of seven common diseases and 3,000 sharedcontrols. Nature 447, 661–678 (2007). McCarthy et al., Nature Rev. Genet., 2008
  • 31. Whole genome association studies Concept -log10(p)Scan Entire Genome * *- 500,000s SNPs -log10(p) * **Identify local regionsof interest, examinegenes, SNP densityregulatory regions, etcReplicate the finding From Sven Bergmann (UNIL)
  • 32. Whole genome association studies An avalanche of GWA studies• From 2006  >220 studies reported to date• For over 80 phenotypes  300 loci have been implicated• Most implicated loci were identified for the first time (no prior knowledge)
  • 33. Whole genome association studies Type 2 diabetes: an example Frazer et al., Nat. Rev. Genet., 2010• 18 genomic intervals with 4 containing previously implicated genes• Major message: the molecular diversity of T2D genes was not anticipated, thus: (Patients with = disease) ≠ (Patients with = underlying biological disorder)
  • 34. Whole genome association studiesOverlap of genetic risk factor loci for common diseases Frazer et al., Nat. Rev. Genet., 2010• 15 loci are associated with two or more diseases (8 are shown)• Not necessarily same impact (PTPN22 + Crohn’s, - for other ai diseases• Different diseases may have similar molecular underpinnings • Expected: ai diseases (same clinical features) • Unexpected: e.g. GCKR in both TGC levels and ai disease
  • 35. Whole genome association studies From association to molecular mechanism• Very difficult: • what are the precise variants associated with a trait? • if located in exons: easy, but outside, then what? • most are located outside exons! (e.g. 9p21 <-> myocardial infarction is located 150 kb from the nearest gene!) • May have a regulatory function, i.e. control gene expression AG 1 c2 3• humans are heterozygous at more functional cis-regulatory sites than at amino acid positions, with10,700 functional biallelic cis-regulatory polymorphisms in a typical human (Rockman and Wray. Mol.Biol. Evol., 2002: 19, 1991).• 34% of promoter polymorphisms (170 tested) significantly modulated reporter gene expression(>1.5-fold) (Hoogendoorn et al., Hum. Mol. Genet., 2003: 12, 2249).• Case study with the CC chemokine receptor 5, a major chemokine coreceptor of HIV-1 necessary forviral entry into cells • G to A SNP of CCR5 at –2459 nt • CCR5 density – low (homozygous GG), intermediate (GA), and highest (homozygous –2459AA) (Salkowitz et al., Clin. Immunol., 2003: 108, 234).
  • 36. Whole genome association studies Mapping eQTLs• Transcript abundance = a quantitative trait that can be mapped with considerable power = eQTLs Environment Genetics Heritability (H2) = genetic variance over total trait variance with 0 = no genetic effects and 1 = all variance is under genetic control Classic paper: Schadt et al., Nature, 2003 Genetics of gene expression surveyed in maize, mouse and man • Liver tissues from 111 F2 mice constructed (from C57BL/6J and DBA/2J) • Microarray analysis of 23,574 genes: 7,861 significantly differentially expressed (either in the parental strains or in at least 10% of the F2 mice) • eQTL identification (log of the odds ratio (LOD) > 4.3 (P-value < 0.00005))for 2,123 genes • These eQTLs explained 25% of the transcription variation of the corresponding genes
  • 37. Whole genome association studies Mapping eQTLs Schadt et al., Nature, 2003% eQTL across 920 evenly spaced bins, each 2 cM wide • Several hotspots (>1% of detected eQTLs are located within a 4 cM interval) • 40% of genes with ≥ 1 eQTL (LOD > 3.0) had more than one eQTL, and close to 4% of such genes had more than three eQTL  Gene expression = complex trait
  • 38. Whole genome association studies Mapping eQTLs Schadt et al., Nature, 2003Known polymorphisms between the two parental strains • Overlap between polymorphism and eQTL = cis-acting transcriptional regulation For example: • The C5 gene 2 bp deletion in the coding region in DBA mice resulting in rapid transcript decay compared with B6. A LOD of 27.4 centred over the C5 gene on chromosome 2 is readily detected (black curve). • The Alad gene present in 2 copies in DBA
  • 39. Whole genome association studies Mapping eQTLs Schadt et al., Nature, 2003Combining clinical, gene expression and genetic factors • Classical QTLs for FPM: 4 significant loci • Further analyses with subgroups: additional loci identified • Some QTLs only affect a subset of the F2 population, demonstrating the complexity underlying traits such as obesity
  • 40. Whole genome association studies Mapping eQTLsDixon et al., Nature Genet., 2007: A genome-wide association study of global gene expression • 206 families of British descent using immortalized lymphoblastoid cell lines (LCLs) from 400 children (Affy microarrays; 54,675 transcripts ~ 20,599 genes) ~15,000 H2 > 0.3 Gene Ontology descriptors for: • Response to unfolded protein (HSFs, chaperones) • Immune responses and apoptosis • Regulation of progression through the cell cycle, • RNA processing and DNA repair.
  • 41. Whole genome association studies Mapping eQTLsDixon et al., Nature Genet., 2007: A genome-wide association study of global gene expression • 206 families of British descent using immortalized lymphoblastoid cell lines (LCLs) from 400 children (Affy microarrays; 54,675 transcripts ~ 20,599 genes) • Trans effects are weaker than those in cis • Nevertheless, significant trans associations were detected: e.g. 1) ~700 transcripts with the peak of association on the same chromosome but >100 kb from the nearest transcribed gene, 2) 10,382 transcripts, the peak of association was on a different chromosome
  • 42. Whole genome association studies Mapping eQTLs Using eQTLs to better understand GWAS results Libioulle et al., PLOS Genet., 2007GWAS for Crohn’s disease • One of the neighboring genes PTGER4 may be 1.25 Mb Gene desert involved • Trace eQTLs in LCL data • Disease-associated polymorphisms may be regulating PTGER4 expression in cis, but >250 kb away  more research needed but likely regulatory polymorphism
  • 43. Whole genome association studies Mapping eQTLs We looked at SNPs but what about other structural variants?Stranger et al., Science, 2007: Relative Impact of Nucleotide and Copy Number Variation on Gene Expression Phenotypes • LCLs of 210 unrelated HapMap individuals from four populations • Copy number variants were identified via CGH against a common reference individual SNP CNV From probe associated with linked gene From probe associated with linked gene • 83.6% and 17.7% of the total detected genetic variation in gene expression • SNPs close to their respective genes, less so for CNVs • Little overlap between SNP and CNV associations (only 20%) • Not “mere” gene dosage effects
  • 44. Whole genome association studies How universal are GWAS findings?Frazer et al., Nat. Rev. Genet., 2010 Associated with myocardial infarction • Allele frequencies are different in different populations • LD patterns across loci that co-segregate with a causally associated variant may be different LD less strong in African population from population to  bottleneck principle population • Control for population differences is essential Red = high pairwise SNPs that efficiently (r2 > in large studies SNP correlation 0.8) tag one another are connected
  • 45. Whole genome association studies Impact so far• No complex traits for which there is > 10% of the genetic variance explained e.g. T2D: 18 genetic variants together < 4% of the total trait liability• Sample size may compensate (increased statistical power) But…studies for lipid phenotypes involving >40,000 people still <10% … some diseases have only a low number of affected individuals• Does the answer lie in structural variants? Most are still unmapped But… they are likely in LD with common SNPs• Does the answer lie in rare variants? Possibly… • Rare variants are not in LD with tagging SNPs and thus so far undetected (Amish study) • Can have very high penetrance • However, how to detect on a population-wide basis?
  • 46. Whole genome association studies The power of whole-genome sequencingMiller syndrome: autosomal recessive genetic trait (Roach et al., Science, 2010)• Sequenced genomes of 2 parents and 2 children, both affected by Miller Syndrome• Identified 3.7 million SNPs that varied within the family• Resequenced 34000 candidate mutations  28 de novo mutations• Narrowing down via “rare” assumption and knowledge of recessive inheritance• Found one gene, dihydroorotate dehydrogenase (DHOH) known to be involved
  • 47. Entering the age of personalized medicineToward the elucidation of each person’s genetic make-upNecessary for: 1) DNA-based risk assessment for common complex disease 2) Drug discovery (new implicated genes can be identified)But also to: 3) Identify molecular signatures for disease diagnosis and prognosisAnd for: 4) A DNA-guided therapy and dose selectionA person’s genetic make-up significantly affects the efficacy of a drug • Polymorphisms in the VKORC1 and CYP2C9 genes dictate the effective dose levels of the anti-coagulant Warfarin • Polymorphisms in the UGT1A1 gene correlate with increased toxicity of the anti-colon cancer drug Irinotecan • Polymorphisms in the MTHFR gene are associated with increased toxicity of Methotrexate used to treat Crohn’s disease • Polymorphisms in the CYP2D6 gene dictates the probability of relapse in women with breastcancer treated with Tamoxifen
  • 48. Entering the age of personalized medicine The revolution of high-throughput sequencing: Illumina Metzker et al., Nat. Rev. Genet., 2010Solid phase amplification: 1) initial priming and extendingof the single-stranded, single-molecule template, and 2)bridge amplification of the immobilized template withimmediately adjacent primers to form clusters. 1 1
  • 49. Entering the age of personalized medicine From sequence to genome: mapping reads Trapnell and Salzberg, Nat. Biotech., 2009 Using BW, the index for the entire human Four sequences of equal genome fits into < 2 strength = seeds Gb of memoryIf 1SNP, the other 3 Is 30 times fasterseeds intact; than indexingIf 2 SNPs, the other 2seeds intact; Also is limited to 2 SNPs within oneThus, max 2 SNPs/read readLimitation:Indexing takes up hugememory
  • 50. Entering the age of personalized medicine Burrows-Wheeler transform Wikipedia Easier to compress strings with runs of repeated characters
  • 51. Entering the age of personalized medicine A first human genome project using HTS Bentley et al., Nature, 2008 • Solexa Technology • First: X-chromosome • 204 million reads • Sampling of sequence fragments is close to random (GC content slight effect)
  • 52. Entering the age of personalized medicine A first human genome project using HTS Bentley et al., Nature, 2008 • 135 Gb of sequence (~4 billion paired 35-base reads) (8 weeks) • The approximate consumables cost = $250,000 • 97% of the reads were aligned using MAQ • 99.9% of the human reference covered with ≥ 1 reads at 40.6X 99% agreement with HapMap results!
  • 53. Entering the age of personalized medicine More human genome projects Snyder et al., G&D, 2010
  • 54. Entering the age of personalized medicine More human genome projects Snyder et al., G&D, 2010
  • 55. Entering the age of personalized medicine More human genome projects Snyder et al., G&D, 2010
  • 56. Entering the age of personalized medicine Tackling the SV problem using HTS• Really difficult and progress is limited.• Existing methods are based on two approaches: • Paired-end mapping (PEM) • Depth-of-coverage (DOC) approach• The ends of each fragment tagged by a biotinylated (B) nucleotide• Circularization forms a junction between the two ends• Random fragmentation and recovery of biotinylated fragments• Circularized DNA is randomly fragmented and the biotinylated junction fragments arerecovered• Standard sequencing procedure thereafter
  • 57. Entering the age of personalized medicineTackling the SV problem using HTS: paired-end mapping Medvedev et al., Nature Meth., 2009
  • 58. Entering the age of personalized medicine Tackling the SV problem using HTS: DOCSnyder et al., G&D, 2010 Campbell et al., Nature Genet., 2008
  • 59. Entering the age of personalized medicineTackling the SV problem using HTS: state-of-the-art Snyder et al., G&D, 2010