SlideShare a Scribd company logo
1 of 29
SNPs: the HapMap and 1000
    Genomes Projects
       Joseph Replogle
     Cavalcanti Lab Group
          5/25/2012
Understanding Human Genetic Variation
   Within and Among Populations
Types of Human Genetic Variation
• Individual: de novo and rare variations
• Population: variations which have become
  fixed within a population
  – Single Nucleotide Polymorphisms (SNPs): base
    pair substitutions
     • Transition: purine -> purine (A<->G), pyrimidine ->
       pyrimidine (C<->T)
     • Transversion: purine <-> pyrimidine
     • common ~1-5% minor allele frequency (MAF) in major
       populations
Types of Human Genetic Variation
            (cont.)
– Copy-Number Variations (CNVs):
   • insertions, deletions, duplications of DNA segments
     (>1kb)
– Other Variations:
   • Structural: inversions
   • Repeats: microsatellites (STRs), minisatellites (VNTRs)
   • Frameshift mutations
SNP Distribution throughout the
       HLA!
              Genome
                                  Sachidanandam et al. 2001




• Genetic variability throughout the genome
  reflects function (among other factors)
Factors Affecting SNP Distribution
                         • Intrinsic, Structural:
                           Mutation clusters due to
                           recombination events and
                           sequence context-specific
                           effects [3,4]
                             – a) Time to Most Recent
                               Common Ancestor of
                               genes in population
                               influences SNPs (older
                               genes -> more SNPs in
                               population)
                             – b) base
                               composition, local
                               recombination, gene
                               density, chromatin
                               structure, nucleosome
                               position, replication
                               timing


Lercher and Hurst 2002
Factors Affecting SNP Distribution
                 (cont.)
• Functional: mutation clusters due to natural
  selection (examples include immunoglobulin
  genes)
      a) balancing selection increases diversity
      b) purifying and directional selection
      decrease diversity
      c) transcriptional activity
• Ascertainment bias: better characterization of
  SNPs around genes of interest [5]
Effects of Genetic Variation
• Pathogenic and non-pathogenic heritable traits
• Genetic variation reveals millions of years of
  human history
  – “One can think of selective pressures as natural, in
    vivo human experiments in which we can measure the
    response of human populations to unknown
    perturbations, and these alterations can inform the
    function of genes within a given locus.” Raj et al. 2012
  – Understand the history of mutation, selection and
    recombination within the human genome
Potential Uses of SNP data
Ultimately, synergy of genomics and functional work
  will allow us to understand human traits and disease.

• Association Mapping: Genome Wide
  Association (GWA) studies,
  Pharmacogenomics
• Modeling Mendelian and Complex diseases
• eQTL and functional genomics
• Selection!
Selection: EHH and iHS
• Extended Haplotype Homozygosity (EHH)
• Integrated Haplotype Score (iHS)
                                   Chromosome 2

                                   Voight et al. 2006
Selection of Lassa Fever Susceptibility
             Genes in YRI populations




Andersen et al (2012)
eQTL
                      SLE susceptibility locus
                      (rs11755393; GWAS p= 2.20 x 10 -08 )




           Positive Selection




Slide from Replogle
and Raj
International HapMap Project
• “to identify and catalog genetic
  similarities and differences in
  human beings”
• Haplotype Map: SNPs (genotypes)
  at separate loci whose alleles are
  statistically associated due to
  limited genetic recombination


                                       HapMap Project
Linkage Disequilibrium (LD)
• Alleles at different loci are not independent
  due to
           Linkage equilibrium         Linkage disequilibrium
                   fB        fb                 fB        fb

                                         AB
                                                     Ab
      fA          AB        Ab    fA




      fa                          fa          aB
                 aB         ab                             ab



           Image by Gil McVean
Origin of LD
                  .
                  .
                  .                           .
                                              .
                                              .                         .
                                                                        .
                                                                        .




       The mutation arises on a     If the mutation            Over time the
       particular genetic          increases in                association between the
       background                  frequency, the              new mutation and linked
                                   associated haplotype        mutations will decay by
                                   will also increase in       recombination
                                   frequency.
                                                               Recombination is the
                                   Factors Increasing LD:      only factor which
                                   1) Genetic Drift            decreases LD.
                                       (stochastic sampling)
                                   2) Selection
Image modified from                3) Non-Random
Gil McVean                             Mating
Haplotype
HapMap Project




      •   ~107 common (MAF >1%) SNPs in the human genome
      •   ‘tag SNPs’ allow for identification of an individual’s haplotypes
      •   Estimated 300,000-600,000 tag SNPs in genome
      •   Genotyping: testing tag SNPs
      •   Sequencing: whole genome sequence
HapMap Populations
•   270 total DNA samples
•   Yoruba in Ibadan, Nigeria (YRI)
•   Japanese in Tokyo, Japan (JPT)
•   Han Chinese in Beijing, China (CHB)
•   CEPH (Utah residents with ancestry from
    northern and western Europe) (CEU)
HapMap Methodology
• Genotype individuals for several million SNPs
   – 1 SNP per 5kb or less
   – MAF >1% as estimated by TSC project, JSNP, dbSNP, and
     initial SNP map
   – Random shotgun sequencing to obtain additional SNPs
   – Coding and noncoding SNPs
• Data analysis to identify LD and Haplotype maps
• Tag SNPs are useful with haplotype and recombination
  map
• Data available online in multiple formats
  http://hapmap.ncbi.nlm.nih.gov/downloads/index.htm
  l.en
HapMap Methodology (cont.)
• Data analysis to identify LD and Haplotype
  maps
• Tag SNPs are useful with haplotype and
  recombination map
• Data available online in multiple formats
  http://hapmap.ncbi.nlm.nih.gov/downloads/i
  ndex.html.en
• Phase III data released 2009
Reference
   Genome?
• Mosaic haploid
  DNA sequence
• GRCh37
1000 Genomes
• “to find most genetic variants that have
  frequencies of at least 1% in the populations
  studied”
• Low coverage sequencing of >2000
  individuals, exome sequencing, trios
• Characterization of SNPs and Structural
  Variants (INDELs)
1000 Genomes Populations
•   Yoruba in Ibadan, Nigeria (YRI)
•   Japanese in Tokyo, Japan (JPT)
•   Han Chinese in Beijing, China (CHB)
•   CEPH (Utah residents with ancestry from
    northern and western Europe) (CEU)
•   Luhya in Webuye, Kenya (LWK)
•   Toscani in Italy (TSI)
•   Peruvians in Lima, Peru (PER)
•   Mexican ancestry in Los Angeles, CA (MXL)
•   And many more!
“Low-Coverage” Sequencing
• Sequencing:
1) DNA copies broken into short pieces
2) Each piece is sequenced (random pieces means
   most of genome is covered)
3) Sequenced fragments are aligned and joined to
   determine complete genome
• 28X sequencing coverage necessary for
   complete genome
• Low-coverage sequencing (4X coverage): many
   pieces of individual genomes are missed
1000 Genomes Data
• Latest release:
  – 1092 samples
  – SNP, indel, and large deletion
  – Autosomes and chrX
  – ~38.2 M SNPs from low coverage and exome
    sequencing
• 1000genomes site has a link to a NCBI FTP
  with their latest data
VCF file format
• Variant Call Format 4.1: meta-info followed by
  header and data
• tab-delimited text file
• Compressed .gz
zcat file.vcf.gz| grep -e ^# -e SNP | bgzip -c >
  snps.vcf.gz
• http://www.1000genomes.org/wiki/Analysis/Vari
  ant%20Call%20Format/vcf-variant-call-format-
  version-41
Columns in VCF format
• CHROM: chromosome (no colons)
• POS: numerical reference position, with the 1st base having
  position 1 (some variants have multiple pos records)
• ID: semi-colon separated list of unique identifiers where available
  (ex. dbSNP rs number)
• EF: reference base(s) A,C,G,T,N (case insensitive) for a given variant
• ALT: comma separated list of alternate non-reference alleles called
  on at least one of the samples.
• QUAL: phred-scaled quality score for the assertion made in
  ALT. i.e. -10log_10 prob(call in ALT is wrong)
• FILTER: another quality measure; PASS if this position has passed all
  filters
• INFO: semicolon seperated additional info; ex. AF (allele
  frequency), DB (dbSNP membership), VALIDATED
Durbin et al. 2004
Interested?
• Get Prof. Cavalcanti to buy Human
  Evolutionary Genetics: Origins, Peoples and
  Disease
References
1.    Sachidanandam R et al. (2001) A map of human genome sequence variation containing 1.42 million single
      nucleotide polymorphisms. Nature 409: 928-933.
2.    Lercher MJ and Hurst LD (2002) Human SNP variability and mutation rate are higher in regions of high
      recombination Trends Genet. 18: 337-340.
3.    Rogozin IB and Pavlov YI (2003) Theoretical analysis of mutational hotspots and their DNA sequence context
      specificity. Mutat Res 544(1): 65-85.
4.    Ma X, et al. (2012) Mutation Hot Spots in Yeast Caused by Long-Range Clustering of Homopolymeric
      Sequences.Cell Reports 1(1): 36-42.
5.    Clark AG, et al. (2005) Ascertainment bias in studies of human genome-wide polymorphism. Genome Res
      15: 1496-1502.
6.    Raj T et al. (2012) Alzheimer Disease Susceptibility Loci: Evidence for a Protein Network under Natural Selection.
      AJHG 90 720-726.
7.    Voight BF et al. (2006) A Map of Recent Positive Selection in the Human Genome. PLoS Biology 4(3): e72.
8.    Andersen KG et al. (2012) Genome-wide scans provide evidence for positive selection of genes implicated in
      Lassa fever. Philos Trans R Soc Lond B Biol Sci 367(1590): 868-877.
9.    Hapmap.org
10.   McVean, Gil (2004). Population Genetics of the Human Genome. Oxford Human Genome Lecture Series.
11.   Gibbs RA et al. (2003) The International HapMap Project. Nature 426: 789-796.
12.   1000genomes.org
13.   Durbin R M et al. (2010). A map of human genome variation from population-scale sequencing. Nature
      467(7319): 1061-1073.

More Related Content

Viewers also liked

Aug2015 analysis team 08 genome in a bottle alex hastie 2015_final
Aug2015 analysis team 08 genome in a bottle alex hastie 2015_finalAug2015 analysis team 08 genome in a bottle alex hastie 2015_final
Aug2015 analysis team 08 genome in a bottle alex hastie 2015_finalGenomeInABottle
 
An integrated map of genetic variation from 1,092
An integrated map of genetic variation from 1,092An integrated map of genetic variation from 1,092
An integrated map of genetic variation from 1,092Grigory Sapunov
 
Diversity of cell life end of ch7
Diversity of cell life end of ch7Diversity of cell life end of ch7
Diversity of cell life end of ch7Maria Donohue
 
GEE & GLMM in GWAS
GEE & GLMM in GWASGEE & GLMM in GWAS
GEE & GLMM in GWASJinseob Kim
 
Accelerating GWAS epistatic interaction analysis methods
Accelerating GWAS epistatic interaction analysis methodsAccelerating GWAS epistatic interaction analysis methods
Accelerating GWAS epistatic interaction analysis methodsPriscill Orue Esquivel
 
Association mapping using local genealogies
Association mapping using local genealogiesAssociation mapping using local genealogies
Association mapping using local genealogiesmailund
 
Probability And Stats Intro
Probability And Stats IntroProbability And Stats Intro
Probability And Stats Intromailund
 
Estimation of Linkage Disequilibrium using GGT2 Software
Estimation of Linkage Disequilibrium using GGT2 SoftwareEstimation of Linkage Disequilibrium using GGT2 Software
Estimation of Linkage Disequilibrium using GGT2 SoftwareAwais Khan
 
Creating a Kinship Matrix Using MSA
Creating a Kinship Matrix Using MSACreating a Kinship Matrix Using MSA
Creating a Kinship Matrix Using MSAheathermerk
 
Mapping and Applications of Linkage Disequilibrium and Association Mapping in...
Mapping and Applications of Linkage Disequilibrium and Association Mapping in...Mapping and Applications of Linkage Disequilibrium and Association Mapping in...
Mapping and Applications of Linkage Disequilibrium and Association Mapping in...FAO
 
Epi519 Gwas Talk
Epi519 Gwas TalkEpi519 Gwas Talk
Epi519 Gwas Talkjoshbis
 
Genelinkagemap
GenelinkagemapGenelinkagemap
Genelinkagemapsarahhg
 
Introduction to association mapping and tutorial using tassel
Introduction to association mapping and tutorial using tasselIntroduction to association mapping and tutorial using tassel
Introduction to association mapping and tutorial using tasselAwais Khan
 
How to solve linkage map problems
How to solve linkage map problemsHow to solve linkage map problems
How to solve linkage map problemsmartyynyyte
 
Cardiovascular system
Cardiovascular systemCardiovascular system
Cardiovascular systemtemesgen sete
 

Viewers also liked (20)

Aug2015 analysis team 08 genome in a bottle alex hastie 2015_final
Aug2015 analysis team 08 genome in a bottle alex hastie 2015_finalAug2015 analysis team 08 genome in a bottle alex hastie 2015_final
Aug2015 analysis team 08 genome in a bottle alex hastie 2015_final
 
An integrated map of genetic variation from 1,092
An integrated map of genetic variation from 1,092An integrated map of genetic variation from 1,092
An integrated map of genetic variation from 1,092
 
Diversity of cell life end of ch7
Diversity of cell life end of ch7Diversity of cell life end of ch7
Diversity of cell life end of ch7
 
GEE & GLMM in GWAS
GEE & GLMM in GWASGEE & GLMM in GWAS
GEE & GLMM in GWAS
 
Accelerating GWAS epistatic interaction analysis methods
Accelerating GWAS epistatic interaction analysis methodsAccelerating GWAS epistatic interaction analysis methods
Accelerating GWAS epistatic interaction analysis methods
 
Association mapping using local genealogies
Association mapping using local genealogiesAssociation mapping using local genealogies
Association mapping using local genealogies
 
Probability And Stats Intro
Probability And Stats IntroProbability And Stats Intro
Probability And Stats Intro
 
linkage
linkagelinkage
linkage
 
Intro gwas
Intro gwasIntro gwas
Intro gwas
 
Ch5 linkage
Ch5 linkageCh5 linkage
Ch5 linkage
 
Estimation of Linkage Disequilibrium using GGT2 Software
Estimation of Linkage Disequilibrium using GGT2 SoftwareEstimation of Linkage Disequilibrium using GGT2 Software
Estimation of Linkage Disequilibrium using GGT2 Software
 
Creating a Kinship Matrix Using MSA
Creating a Kinship Matrix Using MSACreating a Kinship Matrix Using MSA
Creating a Kinship Matrix Using MSA
 
Mapping and Applications of Linkage Disequilibrium and Association Mapping in...
Mapping and Applications of Linkage Disequilibrium and Association Mapping in...Mapping and Applications of Linkage Disequilibrium and Association Mapping in...
Mapping and Applications of Linkage Disequilibrium and Association Mapping in...
 
Epi519 Gwas Talk
Epi519 Gwas TalkEpi519 Gwas Talk
Epi519 Gwas Talk
 
Genelinkagemap
GenelinkagemapGenelinkagemap
Genelinkagemap
 
Introduction to association mapping and tutorial using tassel
Introduction to association mapping and tutorial using tasselIntroduction to association mapping and tutorial using tassel
Introduction to association mapping and tutorial using tassel
 
GWAS
GWASGWAS
GWAS
 
Application of Genome-Wide Association Study (GWAS) and transcriptomics to st...
Application of Genome-Wide Association Study (GWAS) and transcriptomics to st...Application of Genome-Wide Association Study (GWAS) and transcriptomics to st...
Application of Genome-Wide Association Study (GWAS) and transcriptomics to st...
 
How to solve linkage map problems
How to solve linkage map problemsHow to solve linkage map problems
How to solve linkage map problems
 
Cardiovascular system
Cardiovascular systemCardiovascular system
Cardiovascular system
 

Similar to SNPs Presentation Cavalcanti Lab

Human genetic variation and its contribution to complex traits
Human genetic variation and its contribution to complex traitsHuman genetic variation and its contribution to complex traits
Human genetic variation and its contribution to complex traitsgroovescience
 
2007. stephen chanock. technologic issues in gwas and follow up studies
2007. stephen chanock. technologic issues in gwas and follow up studies2007. stephen chanock. technologic issues in gwas and follow up studies
2007. stephen chanock. technologic issues in gwas and follow up studiesFOODCROPS
 
Gene expression introduction
Gene expression introductionGene expression introduction
Gene expression introductionSetia Pramana
 
Genomics Technologies
Genomics TechnologiesGenomics Technologies
Genomics TechnologiesSean Davis
 
Genotyping, linkage mapping and binary data
Genotyping, linkage mapping and binary dataGenotyping, linkage mapping and binary data
Genotyping, linkage mapping and binary dataFAO
 
Introduction to haplotype blocks .pptx
Introduction to haplotype blocks .pptxIntroduction to haplotype blocks .pptx
Introduction to haplotype blocks .pptxFatma Sayed Ibrahim
 
The wheat genome sequence: a foundation for accelerating improvment of bread ...
The wheat genome sequence: a foundation for accelerating improvment of bread ...The wheat genome sequence: a foundation for accelerating improvment of bread ...
The wheat genome sequence: a foundation for accelerating improvment of bread ...Borlaug Global Rust Initiative
 
Mapping the bacteriophage genome
Mapping the bacteriophage genomeMapping the bacteriophage genome
Mapping the bacteriophage genomevibhakhanna1
 
Unilag workshop complex genome analysis
Unilag workshop   complex genome analysisUnilag workshop   complex genome analysis
Unilag workshop complex genome analysisDr. Olusoji Adewumi
 
Association mapping for improvement of agronomic traits in rice
Association mapping  for improvement of agronomic traits in riceAssociation mapping  for improvement of agronomic traits in rice
Association mapping for improvement of agronomic traits in riceSopan Zuge
 
L11 dna__polymorphisms__mutations_and_genetic_diseases
L11  dna__polymorphisms__mutations_and_genetic_diseasesL11  dna__polymorphisms__mutations_and_genetic_diseases
L11 dna__polymorphisms__mutations_and_genetic_diseasesMUBOSScz
 
L11 dna__polymorphisms__mutations_and_genetic_diseases4
L11  dna__polymorphisms__mutations_and_genetic_diseases4L11  dna__polymorphisms__mutations_and_genetic_diseases4
L11 dna__polymorphisms__mutations_and_genetic_diseases4MUBOSScz
 
Association mapping approaches for tagging quality traits in maize
Association mapping approaches for tagging quality traits in maizeAssociation mapping approaches for tagging quality traits in maize
Association mapping approaches for tagging quality traits in maizeSenthil Natesan
 
Genome wide association studies seminar Prepared by Ms Varsha Gaitonde.
Genome wide association studies seminar Prepared by Ms Varsha Gaitonde.Genome wide association studies seminar Prepared by Ms Varsha Gaitonde.
Genome wide association studies seminar Prepared by Ms Varsha Gaitonde.Varsha Gayatonde
 
Genome wide association studies seminar
Genome wide association studies seminarGenome wide association studies seminar
Genome wide association studies seminarVarsha Gayatonde
 

Similar to SNPs Presentation Cavalcanti Lab (20)

Human genetic variation and its contribution to complex traits
Human genetic variation and its contribution to complex traitsHuman genetic variation and its contribution to complex traits
Human genetic variation and its contribution to complex traits
 
2007. stephen chanock. technologic issues in gwas and follow up studies
2007. stephen chanock. technologic issues in gwas and follow up studies2007. stephen chanock. technologic issues in gwas and follow up studies
2007. stephen chanock. technologic issues in gwas and follow up studies
 
Gene expression introduction
Gene expression introductionGene expression introduction
Gene expression introduction
 
Genomics Technologies
Genomics TechnologiesGenomics Technologies
Genomics Technologies
 
Genotyping, linkage mapping and binary data
Genotyping, linkage mapping and binary dataGenotyping, linkage mapping and binary data
Genotyping, linkage mapping and binary data
 
Introduction to haplotype blocks .pptx
Introduction to haplotype blocks .pptxIntroduction to haplotype blocks .pptx
Introduction to haplotype blocks .pptx
 
The wheat genome sequence: a foundation for accelerating improvment of bread ...
The wheat genome sequence: a foundation for accelerating improvment of bread ...The wheat genome sequence: a foundation for accelerating improvment of bread ...
The wheat genome sequence: a foundation for accelerating improvment of bread ...
 
Mapping the bacteriophage genome
Mapping the bacteriophage genomeMapping the bacteriophage genome
Mapping the bacteriophage genome
 
THE human genome
THE human genomeTHE human genome
THE human genome
 
Snp
SnpSnp
Snp
 
Unilag workshop complex genome analysis
Unilag workshop   complex genome analysisUnilag workshop   complex genome analysis
Unilag workshop complex genome analysis
 
Association mapping for improvement of agronomic traits in rice
Association mapping  for improvement of agronomic traits in riceAssociation mapping  for improvement of agronomic traits in rice
Association mapping for improvement of agronomic traits in rice
 
L11 dna__polymorphisms__mutations_and_genetic_diseases
L11  dna__polymorphisms__mutations_and_genetic_diseasesL11  dna__polymorphisms__mutations_and_genetic_diseases
L11 dna__polymorphisms__mutations_and_genetic_diseases
 
L11 dna__polymorphisms__mutations_and_genetic_diseases4
L11  dna__polymorphisms__mutations_and_genetic_diseases4L11  dna__polymorphisms__mutations_and_genetic_diseases4
L11 dna__polymorphisms__mutations_and_genetic_diseases4
 
Association mapping
Association mappingAssociation mapping
Association mapping
 
Lecture 2
Lecture 2Lecture 2
Lecture 2
 
Association mapping approaches for tagging quality traits in maize
Association mapping approaches for tagging quality traits in maizeAssociation mapping approaches for tagging quality traits in maize
Association mapping approaches for tagging quality traits in maize
 
Genome wide association studies seminar Prepared by Ms Varsha Gaitonde.
Genome wide association studies seminar Prepared by Ms Varsha Gaitonde.Genome wide association studies seminar Prepared by Ms Varsha Gaitonde.
Genome wide association studies seminar Prepared by Ms Varsha Gaitonde.
 
Genome wide association studies seminar
Genome wide association studies seminarGenome wide association studies seminar
Genome wide association studies seminar
 
3UnitGeneMapping.pptx
3UnitGeneMapping.pptx3UnitGeneMapping.pptx
3UnitGeneMapping.pptx
 

Recently uploaded

Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...shyamraj55
 
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhisoniya singh
 
Key Features Of Token Development (1).pptx
Key  Features Of Token  Development (1).pptxKey  Features Of Token  Development (1).pptx
Key Features Of Token Development (1).pptxLBM Solutions
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure servicePooja Nehwal
 
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...HostedbyConfluent
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationSafe Software
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticscarlostorres15106
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 3652toLead Limited
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesSinan KOZAK
 

Recently uploaded (20)

Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
 
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
 
Key Features Of Token Development (1).pptx
Key  Features Of Token  Development (1).pptxKey  Features Of Token  Development (1).pptx
Key Features Of Token Development (1).pptx
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
 
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food Manufacturing
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen Frames
 

SNPs Presentation Cavalcanti Lab

  • 1. SNPs: the HapMap and 1000 Genomes Projects Joseph Replogle Cavalcanti Lab Group 5/25/2012
  • 2. Understanding Human Genetic Variation Within and Among Populations
  • 3. Types of Human Genetic Variation • Individual: de novo and rare variations • Population: variations which have become fixed within a population – Single Nucleotide Polymorphisms (SNPs): base pair substitutions • Transition: purine -> purine (A<->G), pyrimidine -> pyrimidine (C<->T) • Transversion: purine <-> pyrimidine • common ~1-5% minor allele frequency (MAF) in major populations
  • 4. Types of Human Genetic Variation (cont.) – Copy-Number Variations (CNVs): • insertions, deletions, duplications of DNA segments (>1kb) – Other Variations: • Structural: inversions • Repeats: microsatellites (STRs), minisatellites (VNTRs) • Frameshift mutations
  • 5. SNP Distribution throughout the HLA! Genome Sachidanandam et al. 2001 • Genetic variability throughout the genome reflects function (among other factors)
  • 6. Factors Affecting SNP Distribution • Intrinsic, Structural: Mutation clusters due to recombination events and sequence context-specific effects [3,4] – a) Time to Most Recent Common Ancestor of genes in population influences SNPs (older genes -> more SNPs in population) – b) base composition, local recombination, gene density, chromatin structure, nucleosome position, replication timing Lercher and Hurst 2002
  • 7. Factors Affecting SNP Distribution (cont.) • Functional: mutation clusters due to natural selection (examples include immunoglobulin genes) a) balancing selection increases diversity b) purifying and directional selection decrease diversity c) transcriptional activity • Ascertainment bias: better characterization of SNPs around genes of interest [5]
  • 8. Effects of Genetic Variation • Pathogenic and non-pathogenic heritable traits • Genetic variation reveals millions of years of human history – “One can think of selective pressures as natural, in vivo human experiments in which we can measure the response of human populations to unknown perturbations, and these alterations can inform the function of genes within a given locus.” Raj et al. 2012 – Understand the history of mutation, selection and recombination within the human genome
  • 9. Potential Uses of SNP data Ultimately, synergy of genomics and functional work will allow us to understand human traits and disease. • Association Mapping: Genome Wide Association (GWA) studies, Pharmacogenomics • Modeling Mendelian and Complex diseases • eQTL and functional genomics • Selection!
  • 10. Selection: EHH and iHS • Extended Haplotype Homozygosity (EHH) • Integrated Haplotype Score (iHS) Chromosome 2 Voight et al. 2006
  • 11. Selection of Lassa Fever Susceptibility Genes in YRI populations Andersen et al (2012)
  • 12. eQTL SLE susceptibility locus (rs11755393; GWAS p= 2.20 x 10 -08 ) Positive Selection Slide from Replogle and Raj
  • 13. International HapMap Project • “to identify and catalog genetic similarities and differences in human beings” • Haplotype Map: SNPs (genotypes) at separate loci whose alleles are statistically associated due to limited genetic recombination HapMap Project
  • 14. Linkage Disequilibrium (LD) • Alleles at different loci are not independent due to Linkage equilibrium Linkage disequilibrium fB fb fB fb AB Ab fA AB Ab fA fa fa aB aB ab ab Image by Gil McVean
  • 15. Origin of LD . . . . . . . . . The mutation arises on a If the mutation Over time the particular genetic increases in association between the background frequency, the new mutation and linked associated haplotype mutations will decay by will also increase in recombination frequency. Recombination is the Factors Increasing LD: only factor which 1) Genetic Drift decreases LD. (stochastic sampling) 2) Selection Image modified from 3) Non-Random Gil McVean Mating
  • 16. Haplotype HapMap Project • ~107 common (MAF >1%) SNPs in the human genome • ‘tag SNPs’ allow for identification of an individual’s haplotypes • Estimated 300,000-600,000 tag SNPs in genome • Genotyping: testing tag SNPs • Sequencing: whole genome sequence
  • 17. HapMap Populations • 270 total DNA samples • Yoruba in Ibadan, Nigeria (YRI) • Japanese in Tokyo, Japan (JPT) • Han Chinese in Beijing, China (CHB) • CEPH (Utah residents with ancestry from northern and western Europe) (CEU)
  • 18. HapMap Methodology • Genotype individuals for several million SNPs – 1 SNP per 5kb or less – MAF >1% as estimated by TSC project, JSNP, dbSNP, and initial SNP map – Random shotgun sequencing to obtain additional SNPs – Coding and noncoding SNPs • Data analysis to identify LD and Haplotype maps • Tag SNPs are useful with haplotype and recombination map • Data available online in multiple formats http://hapmap.ncbi.nlm.nih.gov/downloads/index.htm l.en
  • 19. HapMap Methodology (cont.) • Data analysis to identify LD and Haplotype maps • Tag SNPs are useful with haplotype and recombination map • Data available online in multiple formats http://hapmap.ncbi.nlm.nih.gov/downloads/i ndex.html.en • Phase III data released 2009
  • 20. Reference Genome? • Mosaic haploid DNA sequence • GRCh37
  • 21. 1000 Genomes • “to find most genetic variants that have frequencies of at least 1% in the populations studied” • Low coverage sequencing of >2000 individuals, exome sequencing, trios • Characterization of SNPs and Structural Variants (INDELs)
  • 22. 1000 Genomes Populations • Yoruba in Ibadan, Nigeria (YRI) • Japanese in Tokyo, Japan (JPT) • Han Chinese in Beijing, China (CHB) • CEPH (Utah residents with ancestry from northern and western Europe) (CEU) • Luhya in Webuye, Kenya (LWK) • Toscani in Italy (TSI) • Peruvians in Lima, Peru (PER) • Mexican ancestry in Los Angeles, CA (MXL) • And many more!
  • 23. “Low-Coverage” Sequencing • Sequencing: 1) DNA copies broken into short pieces 2) Each piece is sequenced (random pieces means most of genome is covered) 3) Sequenced fragments are aligned and joined to determine complete genome • 28X sequencing coverage necessary for complete genome • Low-coverage sequencing (4X coverage): many pieces of individual genomes are missed
  • 24. 1000 Genomes Data • Latest release: – 1092 samples – SNP, indel, and large deletion – Autosomes and chrX – ~38.2 M SNPs from low coverage and exome sequencing • 1000genomes site has a link to a NCBI FTP with their latest data
  • 25. VCF file format • Variant Call Format 4.1: meta-info followed by header and data • tab-delimited text file • Compressed .gz zcat file.vcf.gz| grep -e ^# -e SNP | bgzip -c > snps.vcf.gz • http://www.1000genomes.org/wiki/Analysis/Vari ant%20Call%20Format/vcf-variant-call-format- version-41
  • 26. Columns in VCF format • CHROM: chromosome (no colons) • POS: numerical reference position, with the 1st base having position 1 (some variants have multiple pos records) • ID: semi-colon separated list of unique identifiers where available (ex. dbSNP rs number) • EF: reference base(s) A,C,G,T,N (case insensitive) for a given variant • ALT: comma separated list of alternate non-reference alleles called on at least one of the samples. • QUAL: phred-scaled quality score for the assertion made in ALT. i.e. -10log_10 prob(call in ALT is wrong) • FILTER: another quality measure; PASS if this position has passed all filters • INFO: semicolon seperated additional info; ex. AF (allele frequency), DB (dbSNP membership), VALIDATED
  • 28. Interested? • Get Prof. Cavalcanti to buy Human Evolutionary Genetics: Origins, Peoples and Disease
  • 29. References 1. Sachidanandam R et al. (2001) A map of human genome sequence variation containing 1.42 million single nucleotide polymorphisms. Nature 409: 928-933. 2. Lercher MJ and Hurst LD (2002) Human SNP variability and mutation rate are higher in regions of high recombination Trends Genet. 18: 337-340. 3. Rogozin IB and Pavlov YI (2003) Theoretical analysis of mutational hotspots and their DNA sequence context specificity. Mutat Res 544(1): 65-85. 4. Ma X, et al. (2012) Mutation Hot Spots in Yeast Caused by Long-Range Clustering of Homopolymeric Sequences.Cell Reports 1(1): 36-42. 5. Clark AG, et al. (2005) Ascertainment bias in studies of human genome-wide polymorphism. Genome Res 15: 1496-1502. 6. Raj T et al. (2012) Alzheimer Disease Susceptibility Loci: Evidence for a Protein Network under Natural Selection. AJHG 90 720-726. 7. Voight BF et al. (2006) A Map of Recent Positive Selection in the Human Genome. PLoS Biology 4(3): e72. 8. Andersen KG et al. (2012) Genome-wide scans provide evidence for positive selection of genes implicated in Lassa fever. Philos Trans R Soc Lond B Biol Sci 367(1590): 868-877. 9. Hapmap.org 10. McVean, Gil (2004). Population Genetics of the Human Genome. Oxford Human Genome Lecture Series. 11. Gibbs RA et al. (2003) The International HapMap Project. Nature 426: 789-796. 12. 1000genomes.org 13. Durbin R M et al. (2010). A map of human genome variation from population-scale sequencing. Nature 467(7319): 1061-1073.