SNPs Presentation Cavalcanti Lab


Published on

Published in: Technology
  • Be the first to comment

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

SNPs Presentation Cavalcanti Lab

  1. 1. SNPs: the HapMap and 1000 Genomes Projects Joseph Replogle Cavalcanti Lab Group 5/25/2012
  2. 2. Understanding Human Genetic Variation Within and Among Populations
  3. 3. Types of Human Genetic Variation• Individual: de novo and rare variations• Population: variations which have become fixed within a population – Single Nucleotide Polymorphisms (SNPs): base pair substitutions • Transition: purine -> purine (A<->G), pyrimidine -> pyrimidine (C<->T) • Transversion: purine <-> pyrimidine • common ~1-5% minor allele frequency (MAF) in major populations
  4. 4. Types of Human Genetic Variation (cont.)– Copy-Number Variations (CNVs): • insertions, deletions, duplications of DNA segments (>1kb)– Other Variations: • Structural: inversions • Repeats: microsatellites (STRs), minisatellites (VNTRs) • Frameshift mutations
  5. 5. SNP Distribution throughout the HLA! Genome Sachidanandam et al. 2001• Genetic variability throughout the genome reflects function (among other factors)
  6. 6. Factors Affecting SNP Distribution • Intrinsic, Structural: Mutation clusters due to recombination events and sequence context-specific effects [3,4] – a) Time to Most Recent Common Ancestor of genes in population influences SNPs (older genes -> more SNPs in population) – b) base composition, local recombination, gene density, chromatin structure, nucleosome position, replication timingLercher and Hurst 2002
  7. 7. Factors Affecting SNP Distribution (cont.)• Functional: mutation clusters due to natural selection (examples include immunoglobulin genes) a) balancing selection increases diversity b) purifying and directional selection decrease diversity c) transcriptional activity• Ascertainment bias: better characterization of SNPs around genes of interest [5]
  8. 8. Effects of Genetic Variation• Pathogenic and non-pathogenic heritable traits• Genetic variation reveals millions of years of human history – “One can think of selective pressures as natural, in vivo human experiments in which we can measure the response of human populations to unknown perturbations, and these alterations can inform the function of genes within a given locus.” Raj et al. 2012 – Understand the history of mutation, selection and recombination within the human genome
  9. 9. Potential Uses of SNP dataUltimately, synergy of genomics and functional work will allow us to understand human traits and disease.• Association Mapping: Genome Wide Association (GWA) studies, Pharmacogenomics• Modeling Mendelian and Complex diseases• eQTL and functional genomics• Selection!
  10. 10. Selection: EHH and iHS• Extended Haplotype Homozygosity (EHH)• Integrated Haplotype Score (iHS) Chromosome 2 Voight et al. 2006
  11. 11. Selection of Lassa Fever Susceptibility Genes in YRI populationsAndersen et al (2012)
  12. 12. eQTL SLE susceptibility locus (rs11755393; GWAS p= 2.20 x 10 -08 ) Positive SelectionSlide from Replogleand Raj
  13. 13. International HapMap Project• “to identify and catalog genetic similarities and differences in human beings”• Haplotype Map: SNPs (genotypes) at separate loci whose alleles are statistically associated due to limited genetic recombination HapMap Project
  14. 14. Linkage Disequilibrium (LD)• Alleles at different loci are not independent due to Linkage equilibrium Linkage disequilibrium fB fb fB fb AB Ab fA AB Ab fA fa fa aB aB ab ab Image by Gil McVean
  15. 15. Origin of LD . . . . . . . . . The mutation arises on a If the mutation Over time the particular genetic increases in association between the background frequency, the new mutation and linked associated haplotype mutations will decay by will also increase in recombination frequency. Recombination is the Factors Increasing LD: only factor which 1) Genetic Drift decreases LD. (stochastic sampling) 2) SelectionImage modified from 3) Non-RandomGil McVean Mating
  16. 16. HaplotypeHapMap Project • ~107 common (MAF >1%) SNPs in the human genome • ‘tag SNPs’ allow for identification of an individual’s haplotypes • Estimated 300,000-600,000 tag SNPs in genome • Genotyping: testing tag SNPs • Sequencing: whole genome sequence
  17. 17. HapMap Populations• 270 total DNA samples• Yoruba in Ibadan, Nigeria (YRI)• Japanese in Tokyo, Japan (JPT)• Han Chinese in Beijing, China (CHB)• CEPH (Utah residents with ancestry from northern and western Europe) (CEU)
  18. 18. HapMap Methodology• Genotype individuals for several million SNPs – 1 SNP per 5kb or less – MAF >1% as estimated by TSC project, JSNP, dbSNP, and initial SNP map – Random shotgun sequencing to obtain additional SNPs – Coding and noncoding SNPs• Data analysis to identify LD and Haplotype maps• Tag SNPs are useful with haplotype and recombination map• Data available online in multiple formats l.en
  19. 19. HapMap Methodology (cont.)• Data analysis to identify LD and Haplotype maps• Tag SNPs are useful with haplotype and recombination map• Data available online in multiple formats ndex.html.en• Phase III data released 2009
  20. 20. Reference Genome?• Mosaic haploid DNA sequence• GRCh37
  21. 21. 1000 Genomes• “to find most genetic variants that have frequencies of at least 1% in the populations studied”• Low coverage sequencing of >2000 individuals, exome sequencing, trios• Characterization of SNPs and Structural Variants (INDELs)
  22. 22. 1000 Genomes Populations• Yoruba in Ibadan, Nigeria (YRI)• Japanese in Tokyo, Japan (JPT)• Han Chinese in Beijing, China (CHB)• CEPH (Utah residents with ancestry from northern and western Europe) (CEU)• Luhya in Webuye, Kenya (LWK)• Toscani in Italy (TSI)• Peruvians in Lima, Peru (PER)• Mexican ancestry in Los Angeles, CA (MXL)• And many more!
  23. 23. “Low-Coverage” Sequencing• Sequencing:1) DNA copies broken into short pieces2) Each piece is sequenced (random pieces means most of genome is covered)3) Sequenced fragments are aligned and joined to determine complete genome• 28X sequencing coverage necessary for complete genome• Low-coverage sequencing (4X coverage): many pieces of individual genomes are missed
  24. 24. 1000 Genomes Data• Latest release: – 1092 samples – SNP, indel, and large deletion – Autosomes and chrX – ~38.2 M SNPs from low coverage and exome sequencing• 1000genomes site has a link to a NCBI FTP with their latest data
  25. 25. VCF file format• Variant Call Format 4.1: meta-info followed by header and data• tab-delimited text file• Compressed .gzzcat file.vcf.gz| grep -e ^# -e SNP | bgzip -c > snps.vcf.gz• ant%20Call%20Format/vcf-variant-call-format- version-41
  26. 26. Columns in VCF format• CHROM: chromosome (no colons)• POS: numerical reference position, with the 1st base having position 1 (some variants have multiple pos records)• ID: semi-colon separated list of unique identifiers where available (ex. dbSNP rs number)• EF: reference base(s) A,C,G,T,N (case insensitive) for a given variant• ALT: comma separated list of alternate non-reference alleles called on at least one of the samples.• QUAL: phred-scaled quality score for the assertion made in ALT. i.e. -10log_10 prob(call in ALT is wrong)• FILTER: another quality measure; PASS if this position has passed all filters• INFO: semicolon seperated additional info; ex. AF (allele frequency), DB (dbSNP membership), VALIDATED
  27. 27. Durbin et al. 2004
  28. 28. Interested?• Get Prof. Cavalcanti to buy Human Evolutionary Genetics: Origins, Peoples and Disease
  29. 29. References1. Sachidanandam R et al. (2001) A map of human genome sequence variation containing 1.42 million single nucleotide polymorphisms. Nature 409: 928-933.2. Lercher MJ and Hurst LD (2002) Human SNP variability and mutation rate are higher in regions of high recombination Trends Genet. 18: 337-340.3. Rogozin IB and Pavlov YI (2003) Theoretical analysis of mutational hotspots and their DNA sequence context specificity. Mutat Res 544(1): 65-85.4. Ma X, et al. (2012) Mutation Hot Spots in Yeast Caused by Long-Range Clustering of Homopolymeric Sequences.Cell Reports 1(1): 36-42.5. Clark AG, et al. (2005) Ascertainment bias in studies of human genome-wide polymorphism. Genome Res 15: 1496-1502.6. Raj T et al. (2012) Alzheimer Disease Susceptibility Loci: Evidence for a Protein Network under Natural Selection. AJHG 90 720-726.7. Voight BF et al. (2006) A Map of Recent Positive Selection in the Human Genome. PLoS Biology 4(3): e72.8. Andersen KG et al. (2012) Genome-wide scans provide evidence for positive selection of genes implicated in Lassa fever. Philos Trans R Soc Lond B Biol Sci 367(1590): 868-877.9. Hapmap.org10. McVean, Gil (2004). Population Genetics of the Human Genome. Oxford Human Genome Lecture Series.11. Gibbs RA et al. (2003) The International HapMap Project. Nature 426: 789-796.12. 1000genomes.org13. Durbin R M et al. (2010). A map of human genome variation from population-scale sequencing. Nature 467(7319): 1061-1073.