SlideShare a Scribd company logo
Insights into the functional effect
of promoter-associated short
tandem repeats in the human
genome using targeted sequencing
Javier Quilez Oliete
Postdoctoral Fellow in the Andrew Sharp’s Lab
Dept. of Genetics and Genomics Sciences
Icahn School of Medicine at Mount Sinai
javier.quilez@mssm.edu
Twitter: @jaquol
Tandem repeats (TRs) are stretches of DNA comprised of ≥2
contiguous copies of a motif arranged in a head-to-tail pattern
CAG CAG CAG CAG
3-bp motif
CAG
Repeat length = 18 bp (6 copies x 3-bp motif)
CAG
▶ Repeat motif lengths up to hundreds of Kbp
▶ 2–6 bp: short tandem repeats (STRs) or microsatellites
Features of TRs
3
CAGIndividual 1 CAG CAG CAG CAG CAG
CAGIndividual 2 CAG CAG CAG
CAGIndividual 3 CAG CAG CAG CAG CAG
CAGIndividual 4 CAG CAG CAG CAG CAG
CAGIndividual 5 CAG CAG
CAGIndividual 6 CAG
Multi-allelic
Number of mutations per locus per
generation):
▶ SNPs: ~10-8
▶ CNVs: ~10-6
–10-4
▶ TRs: ~10-4
–10-3
Therefore, TRs mutation rate is
several orders of magnitude higher
than other forms of genetic
variation
High mutation rate
TRs represent an important source of genetic variation
Features of TRs
4
CAGIndividual 1 CAG CAG CAG CAG CAG
CAGIndividual 2 CAG CAG CAG
CAGIndividual 3 CAG CAG CAG CAG CAG
CAGIndividual 4 CAG CAG CAG CAG CAG
CAGIndividual 5 CAG CAG
CAGIndividual 6 CAG
Multi-allelic High mutation rate
No. mutations per locus per generation:
▶ SNPs: ~10-8
▶ CNVs: ~10-6
–10-4
▶ TRs: ~10-4
–10-3
The mutation rate of TRs is several orders of
magnitude higher than other forms of
genetic variation
TRs represent an important source of genetic variation
Features of TRs
5
CAGIndividual 1 CAG CAG CAG CAG CAG
CAGIndividual 2 CAG CAG CAG
CAGIndividual 3 CAG CAG CAG CAG CAG
CAGIndividual 4 CAG CAG CAG CAG CAG
CAGIndividual 5 CAG CAG
CAGIndividual 6 CAG
Multi-allelic High mutation rate
No. mutations per locus per generation:
▶ SNPs: ~10-8
▶ CNVs: ~10-6
–10-4
▶ TRs: ~10-4
–10-3
The mutation rate of TRs is several orders of
magnitude higher than other forms of
genetic variation
Abundant: ~1M annotated TRs in the
human genome
TRs represent an important source of genetic variation
Features of TRs
6
CAGIndividual 1 CAG CAG CAG CAG CAG
CAGIndividual 2 CAG CAG CAG
CAGIndividual 3 CAG CAG CAG CAG CAG
CAGIndividual 4 CAG CAG CAG CAG CAG
CAGIndividual 5 CAG CAG
CAGIndividual 6 CAG
Multi-allelic High mutation rate
No. mutations per locus per generation:
▶ SNPs: ~10-8
▶ CNVs: ~10-6
–10-4
▶ TRs: ~10-4
–10-3
The mutation rate of TRs is several orders of
magnitude higher than other forms of
genetic variation
Abundant: ~1M annotated TRs in the
human genome
TRs represent an important source of genetic variation
Functional impacts of TRs remains relatively unexplored
7
▶ Growing evidence supporting the functional impact of TRs:
▶ In many species, TRs are often located within coding regions of genes with
specific biological functions, including those that confer beneficial phenotypes
▶ Several diseases are caused by repeat expansions (humans, dogs, plants)
▶ TRs remain poorly studied due to:
▶ Considered as mere “junk” DNA
▶ Technical difficulties in their characterization:
▶ Even with SNP arrays, aCGH and next-generation sequencing, TRs are
ignored features of the genome in most studies (GWAS, 1,000 Genomes)
▶ Only recently, novel approaches for genotyping repetitive elements
(Gymrek et al. 2012; Highnam et al. 2013; Guilmatre et al. 2013;
Brahmachary et al. 2014)
aCGH: comparative genomic hybridization
GWAS: genome wide association study
Gymrek et al. Genome Res. 2012 Jun;22(6):1154-62. doi: 10.1101/gr.135780.111
Highnam et al. Nucleic Acids Res. 2013 Jan 7;41(1):e32. doi: 10.1093/nar/gks981
Guilmatre et al. Hum Mutat. 2013 Sep;34(9):1304-11. doi: 10.1002/humu.22359
Brahmachary et al. PLoS Genet. 2014 Jun 19;10(6):e1004418. doi: 10.1371/journal.pgen.1004418
Hypothesis, aim and approach
8
▶ Hypothesis: STRs are important functional components of the genome
with overlooked roles in phenotypic variation, including human disease
▶ Aim: identify functional STRs by searching for repeat length variants
altering local gene expression and DNA methylation
▶ Approach:
▶ Characterized repeat length variation in STRs located in gene
promoters
▶ more likely to have a cis effect on the activity of nearby genes
▶ Performed cis-association analysis to identify expression (eQTL) and
methylation (mQTL) quantitative loci
Characterizing repeat length variation in promoter-associated
STRs
9
Targeted-sequencing methodology
(Guilmatre et al. 2013) – overcomes technical
difficulties in genotyping STRs
Capture: Target STRs in gene promoters
(within ±1 Kbp of TSS)
▶ ~6,000 promoter-associated STR
▶ 31% of RefSeq genes have a
promoter-associated STR
Guilmatre et al. Hum Mutat. 2013 Sep;34(9):1304-11. doi: 10.1002/humu.22359
Characterizing repeat length variation in promoter-associated
STRs
10
Targeted-sequencing methodology
(Guilmatre et al. 2013) – overcomes technical
difficulties in genotyping STRs
Sequencing: Illumina 100-bp reads
multiplexing 24 individuals per lane
▶ 120 HapMap individuals:
▶ 58 CEU (European)
▶ 62 YRI (African)
▶ Median coverage per STR: 47x
Guilmatre et al. Hum Mutat. 2013 Sep;34(9):1304-11. doi: 10.1002/humu.22359
Characterizing repeat length variation in promoter-associated
STRs
11
Targeted-sequencing methodology
(Guilmatre et al. 2013) – overcomes technical
difficulties in genotyping STRs
Genotyping: RepeatSeq (Highnam et al. 2013)
– uses sequencing reads to call the two STR
repeat length alleles in each individual and loci
Guilmatre et al. Hum Mutat. 2013 Sep;34(9):1304-11. doi: 10.1002/humu.22359
Highnam et al. Nucleic Acids Res. 2013 Jan 7;41(1):e32. doi: 10.1093/nar/gks981
▶ We produced
promoter-associated STRs
genotypes in 120 HapMap
individuals
▶ Same individuals previously
characterized for gene
expression and DNA
methylation
eQTL and mQTL cis-association analysis
12
Illumina RNA-seq
Montgomery et al. 2010
Pickrell et al. 2010
Illumina 450K array
Moen et al. 2013
Montgomery et al. Nature. 2010 Apr 1;464(7289):773-7. doi: 10.1038/nature08903
Pickrell et al. Nature. 2010 Apr 1;464(7289):768-72. doi: 10.1038/nature08872
Moen et al. Genetics. 2013 Aug;194(4):987-96. doi: 10.1534/genetics.113.151381
▶ We produced
promoter-associated STRs
genotypes in 120 HapMap
individuals
▶ Same individuals previously
characterized for gene
expression and DNA
methylation
▶ Correlation of STR
genotypes with:
▶ Transcript expression
eQTL and mQTL cis-association analysis
13
eQTL and mQTL cis-association analysis
14
Calculated in CEU and YRI
individuals separately
▶ We produced
promoter-associated STRs
genotypes in 120 HapMap
individuals
▶ Same individuals previously
characterized for gene
expression and DNA
methylation
▶ Correlation of STR
genotypes with:
▶ Transcript expression
eQTL and mQTL cis-association analysis
15
Calculated in CEU and YRI
individuals separately
▶ We produced
promoter-associated STRs
genotypes in 120 HapMap
individuals
▶ Same individuals previously
characterized for gene
expression and DNA
methylation
▶ Correlation of STR
genotypes with:
▶ Transcript expression
▶ We produced
promoter-associated STRs
genotypes in 120 HapMap
individuals
▶ Same individuals previously
characterized for gene
expression and DNA
methylation
▶ Correlation of STR
genotypes with:
▶ Promoter CpG
methylation
eQTL and mQTL cis-association analysis
16
eQTL and mQTL cis-association analysis
17
Calculated in CEU and YRI
individuals separately
▶ We produced
promoter-associated STRs
genotypes in 120 HapMap
individuals
▶ Same individuals previously
characterized for gene
expression and DNA
methylation
▶ Correlation of STR
genotypes with:
▶ Promoter CpG
methylation
eQTL and mQTL cis-association analysis
18
Calculated in CEU and YRI
individuals separately
▶ We produced
promoter-associated STRs
genotypes in 120 HapMap
individuals
▶ Same individuals previously
characterized for gene
expression and DNA
methylation
▶ Correlation of STR
genotypes with:
▶ Promoter CpG
methylation
Examples of candidate eQTL and mQTL
19
STR length (bp)
STR variation correlates with
NFE2L1 expression in YRI
Examples of candidate eQTL and mQTL
20
STR length (bp) STR length (bp)
STR variation in CEU correlates
with methylation in FBLN5
promoter
STR variation correlates with
NFE2L1 expression in YRI
Sharing between CEU and YRI individuals
21
eQTLs
CEU YRI
▶ In each population, scored as
eQTLs those STRs showing
correlation values with corrected
p<0.05
Sharing between CEU and YRI individuals
22
eQTLs
mQTLs
CEU YRI
▶ In each population, scored as
eQTLs those STRs showing
correlation values with corrected
p<0.05
Sharing between CEU and YRI individuals
23
eQTLs
mQTLs
Limited power due to small
sample size? Only 40–60
individuals per population
for correlations
CEU YRI
▶ In each population, scored as
eQTLs those STRs showing
correlation values with corrected
p<0.05
Overlap between eQTL and mQTL
24
▶ STRs scored as eQTL/mQTL in any of
the two populations
▶ ~97% of eQTL also scored as
mQTL
▶ ~⅓ of mQTL also scored as
eQTL
eQTL
mQTL
Genomic features of genetic variants operating as QTLs
25
▶ Previous work (Stranger et al.
2007) indicate that SNPs acting as
eQTL are located very close to the
gene transcription start site (TSS)
Modified from Stranger et al. Nat Genet. 2007 Oct;39(10):1217-24
Statisticalsignificance
Genomic features of genetic variants operating as QTLs
26
▶ Previous work (Stranger et al.
2007) indicate that SNPs acting as
eQTL are located very close to the
gene transcription start site (TSS)
▶ To gain confidence in our
candidate eQTL/mQTL STRs we
analyzed:
▶ Distance relative to their
target gene and CpG sites
Modified from Stranger et al. Nat Genet. 2007 Oct;39(10):1217-24
Statisticalsignificance
Enrichment of candidate eQTLs/mQTLs within ±1 Kbp of their
target
27
Gene expression
Enrichment of candidate eQTLs/mQTLs within ±1 Kbp of their
target
28
Gene expression
Nominal p<0.05 in CEU or YRI
Enrichment of candidate eQTLs/mQTLs within ±1 Kbp of their
target
29
Gene expression
Bin: -1–0 Kbp
Empirical p = 0.022
(Randomization test of distances)
Enrichment of candidate eQTLs/mQTLs within ±1 Kbp of their
target
30
Gene expression
Bin: -1–0 Kbp
Empirical p = 0.022
(Randomization test of distances)
DNA methylation
Bin: 0–1 Kbp
Empirical p < 0.001
STR distance to methylation probe (Kbp)
Genomic features of genetic variants operating as QTLs
31
▶ Previous work (Degner et al. 2012) showed that thousands of SNPs affect
nearby (~200–300 bp) chromatin accessibility (dsQTLs)
Modified from Degner et al. Nature. 2012 Feb 5;482(7385):390-4. doi: 10.1038/nature10808
Genomic features of genetic variants operating as QTLs
32
▶ Previous work (Degner et al. 2012) showed that thousands of SNPs affect
nearby (~200–300 bp) chromatin accessibility (dsQTLs)
Modified from Degner et al. Nature. 2012 Feb 5;482(7385):390-4. doi: 10.1038/nature10808
▶ Overlap between SNPs
acting as dsQTLs and
altering gene expression
(eQTLs)
Overlap of candidate eQTLs/mQTLs with regulatory elements
33
▶ High-quality genome-wide maps of regulatory elements (inferred in HapMap
lymphoblastoid cell lines):
▶ ~1M transcription factor binding sites (TFBS) (http://centipede.uchicago.edu)
▶ ~200K DNaseI hypersensitive sites (DHS) (ENCODE)
Enrichment of candidate eQTLs/mQTLs in regulatory elements
34
Enrichment of candidate eQTLs/mQTLs in regulatory elements
35
Enrichment of candidate eQTLs/mQTLs in regulatory elements
36
Overlooked (STR) variation?
37
▶ Millions of SNPs have been used in genome wide association studies
(GWAS) to map functional genetic variants (e.g. disease susceptibility,
height)
▶ Variants identified in GWAS explain only a small proportion of phenotypic
variance → missing heritability…?
▶ GWAS rely on high linkage disequilibrium (LD) between genotyped SNPs
and functional variants to be identified
▶ Variants in low LD may be overlooked by SNP-based approaches...
▶ To determine whether STR variation can be effectively tagged by SNP
data, we studied LD between STR and SNP loci
LD analysis between STRs and nearby SNPs
38
▶ HapMap Phase II SNP
genotypes for the same
120 individuals
▶ Phased STR and SNP
alleles with BEAGLE
▶ Measured LD as the
correlation (r2
) between
STR and SNP alleles (pairs
<250 Kbps)
▶ for each STR retained
the maximum r2
(i.e.
best tagging by a
nearby SNP)
LD analysis between STRs and nearby SNPs
39
LD analysis between STRs and nearby SNPs
40
LD analysis between STRs and nearby SNPs
41
▶ Sharp decay in LD with
STR diversity, especially in
the YRI population
▶ Similar pattern seen
for copy number
variants
▶ Africans also show
lower LD between
SNP variants
LD analysis between STRs and nearby SNPs
42
Variants with r2
≥0.8 typically
considered as tagged
Median r2
=0.14
Only 11% STRs with r2
≥0.8
Median r2
=0.30
▶ Sharp decay in LD with
STR diversity, especially in
the YRI population
▶ Similar pattern seen
for copy number
variants
▶ Africans also show
lower LD between
SNP variants
▶ STR variants are poorly
tagged by nearby SNPs
Summary
43
▶ This is the first systematic attempt to assign biological significance to STR
variation in the human genome
▶ Through targeted sequencing and genotyping of ~6,000
promoter-associated STRs in 120 individuals
▶ Our results suggests that there are potentially thousands of STR variants
that exert functional effects via alterations of local gene expression or
epigenetics
▶ Conventional SNP-based mapping approaches are most likely blind to
potential functional STR variants, as STR variation is poorly tagged by
nearby SNPs
▶ Therefore, specific studies that focus on genotyping STR variants are
required to fully ascertain functional variation in the genome
Acknowledgements
44
Icahn School of Medicine at Mount Sinai
Andrew Sharp
Bharati Jadhav
Chloe Tessereau
Corey Watson
Daniel Ho
Kakit Cheung
Mafalda Barbosa
Ricky Joshi
(Paras Garg)
(Audrey Guilmatre)
Virginia Tech Research Funding
David Mittelman
Gareth Highnam
NHGRI R01-HG006696
NIDA R01-DA033660
NICHD R03-HD073731
March of Dimes Grant,
6-FY13-92
Extra slides...
Additional information
▶ Range STRs motif sizes: 1-28 bp
▶ Enrichment of candidate eQTLs/mQTLs within ±1 Kbp of their target:
▶ enrichment (red line): calculated as the difference in relative frequencies between the
significant and non-significant distribution in a given distance bin...
46

More Related Content

What's hot

Mitochondrial ND-1 gene-specific primer polymerase chain reaction to determin...
Mitochondrial ND-1 gene-specific primer polymerase chain reaction to determin...Mitochondrial ND-1 gene-specific primer polymerase chain reaction to determin...
Mitochondrial ND-1 gene-specific primer polymerase chain reaction to determin...
UniversitasGadjahMada
 
Cerullo_verona2010_AdenovirusGT
Cerullo_verona2010_AdenovirusGTCerullo_verona2010_AdenovirusGT
Cerullo_verona2010_AdenovirusGTVincenzo Cerullo
 
Eradicating diseases (genome)
Eradicating diseases (genome)Eradicating diseases (genome)
Eradicating diseases (genome)Utkarsh Verma
 
In search of tissue specific regulators in periodontium - a bioinformatic ap...
In search of tissue specific regulators in periodontium  - a bioinformatic ap...In search of tissue specific regulators in periodontium  - a bioinformatic ap...
In search of tissue specific regulators in periodontium - a bioinformatic ap...
Agnieszka Caruso
 
IRJET- Subcellular Localization of Transmembrane E-cadherin-GFP Fusion Pr...
IRJET-  	  Subcellular Localization of Transmembrane E-cadherin-GFP Fusion Pr...IRJET-  	  Subcellular Localization of Transmembrane E-cadherin-GFP Fusion Pr...
IRJET- Subcellular Localization of Transmembrane E-cadherin-GFP Fusion Pr...
IRJET Journal
 
Translating Genomes | Personalizing Medicine
Translating Genomes | Personalizing MedicineTranslating Genomes | Personalizing Medicine
Translating Genomes | Personalizing Medicine
Candy Smellie
 
Effects of splicing mutations on NF2 transcripts
Effects of splicing mutations on NF2 transcriptsEffects of splicing mutations on NF2 transcripts
Effects of splicing mutations on NF2 transcriptsBianca Heinrich
 
Seminar2015
Seminar2015Seminar2015
Seminar2015
Kevin Thornton
 
Arjun's Poster ACTUAL FINAL POSTER
Arjun's Poster ACTUAL FINAL POSTERArjun's Poster ACTUAL FINAL POSTER
Arjun's Poster ACTUAL FINAL POSTERArjun Mahadevan
 
Specific inhibition of CK2α from an anchor outside the active site
Specific inhibition of CK2α from an anchor outside the active siteSpecific inhibition of CK2α from an anchor outside the active site
Specific inhibition of CK2α from an anchor outside the active site
Paul Brear
 
Artesunate improves drug resistance of lung carcinomas via regulation of mi r...
Artesunate improves drug resistance of lung carcinomas via regulation of mi r...Artesunate improves drug resistance of lung carcinomas via regulation of mi r...
Artesunate improves drug resistance of lung carcinomas via regulation of mi r...
Clinical Surgery Research Communications
 
OKC Grand Rounds 2009
OKC Grand Rounds 2009OKC Grand Rounds 2009
OKC Grand Rounds 2009Sean Davis
 
Gene Editing for everyone
Gene Editing for everyoneGene Editing for everyone
Gene Editing for everyone
Mike Jowett
 
Church dm grc_workshop
Church dm grc_workshopChurch dm grc_workshop
Church dm grc_workshop
Genome Reference Consortium
 
Gene expression profile analysis of human hepatocellular carcinoma using sage...
Gene expression profile analysis of human hepatocellular carcinoma using sage...Gene expression profile analysis of human hepatocellular carcinoma using sage...
Gene expression profile analysis of human hepatocellular carcinoma using sage...
Ahmed Madni
 
Identification, annotation and visualisation of extreme changes in splicing w...
Identification, annotation and visualisation of extreme changes in splicing w...Identification, annotation and visualisation of extreme changes in splicing w...
Identification, annotation and visualisation of extreme changes in splicing w...
Mar Gonzàlez-Porta
 
Chromosome 7 in lung cancer_Journal club
Chromosome 7 in lung cancer_Journal clubChromosome 7 in lung cancer_Journal club
Chromosome 7 in lung cancer_Journal club
AIIMS
 
Antioxidant-mediated up-regulation of OGG1 via NRF2 induction is associated ...
Antioxidant-mediated up-regulation of OGG1 via  NRF2 induction is associated ...Antioxidant-mediated up-regulation of OGG1 via  NRF2 induction is associated ...
Antioxidant-mediated up-regulation of OGG1 via NRF2 induction is associated ...
Enrique Moreno Gonzalez
 

What's hot (20)

Mitochondrial ND-1 gene-specific primer polymerase chain reaction to determin...
Mitochondrial ND-1 gene-specific primer polymerase chain reaction to determin...Mitochondrial ND-1 gene-specific primer polymerase chain reaction to determin...
Mitochondrial ND-1 gene-specific primer polymerase chain reaction to determin...
 
Cerullo_verona2010_AdenovirusGT
Cerullo_verona2010_AdenovirusGTCerullo_verona2010_AdenovirusGT
Cerullo_verona2010_AdenovirusGT
 
Eradicating diseases (genome)
Eradicating diseases (genome)Eradicating diseases (genome)
Eradicating diseases (genome)
 
In search of tissue specific regulators in periodontium - a bioinformatic ap...
In search of tissue specific regulators in periodontium  - a bioinformatic ap...In search of tissue specific regulators in periodontium  - a bioinformatic ap...
In search of tissue specific regulators in periodontium - a bioinformatic ap...
 
2013_WCBSURC.pptx
2013_WCBSURC.pptx2013_WCBSURC.pptx
2013_WCBSURC.pptx
 
IRJET- Subcellular Localization of Transmembrane E-cadherin-GFP Fusion Pr...
IRJET-  	  Subcellular Localization of Transmembrane E-cadherin-GFP Fusion Pr...IRJET-  	  Subcellular Localization of Transmembrane E-cadherin-GFP Fusion Pr...
IRJET- Subcellular Localization of Transmembrane E-cadherin-GFP Fusion Pr...
 
Translating Genomes | Personalizing Medicine
Translating Genomes | Personalizing MedicineTranslating Genomes | Personalizing Medicine
Translating Genomes | Personalizing Medicine
 
Effects of splicing mutations on NF2 transcripts
Effects of splicing mutations on NF2 transcriptsEffects of splicing mutations on NF2 transcripts
Effects of splicing mutations on NF2 transcripts
 
Seminar2015
Seminar2015Seminar2015
Seminar2015
 
Arjun's Poster ACTUAL FINAL POSTER
Arjun's Poster ACTUAL FINAL POSTERArjun's Poster ACTUAL FINAL POSTER
Arjun's Poster ACTUAL FINAL POSTER
 
Specific inhibition of CK2α from an anchor outside the active site
Specific inhibition of CK2α from an anchor outside the active siteSpecific inhibition of CK2α from an anchor outside the active site
Specific inhibition of CK2α from an anchor outside the active site
 
Artesunate improves drug resistance of lung carcinomas via regulation of mi r...
Artesunate improves drug resistance of lung carcinomas via regulation of mi r...Artesunate improves drug resistance of lung carcinomas via regulation of mi r...
Artesunate improves drug resistance of lung carcinomas via regulation of mi r...
 
OKC Grand Rounds 2009
OKC Grand Rounds 2009OKC Grand Rounds 2009
OKC Grand Rounds 2009
 
Gene Editing for everyone
Gene Editing for everyoneGene Editing for everyone
Gene Editing for everyone
 
Church dm grc_workshop
Church dm grc_workshopChurch dm grc_workshop
Church dm grc_workshop
 
CGI.Paper
CGI.PaperCGI.Paper
CGI.Paper
 
Gene expression profile analysis of human hepatocellular carcinoma using sage...
Gene expression profile analysis of human hepatocellular carcinoma using sage...Gene expression profile analysis of human hepatocellular carcinoma using sage...
Gene expression profile analysis of human hepatocellular carcinoma using sage...
 
Identification, annotation and visualisation of extreme changes in splicing w...
Identification, annotation and visualisation of extreme changes in splicing w...Identification, annotation and visualisation of extreme changes in splicing w...
Identification, annotation and visualisation of extreme changes in splicing w...
 
Chromosome 7 in lung cancer_Journal club
Chromosome 7 in lung cancer_Journal clubChromosome 7 in lung cancer_Journal club
Chromosome 7 in lung cancer_Journal club
 
Antioxidant-mediated up-regulation of OGG1 via NRF2 induction is associated ...
Antioxidant-mediated up-regulation of OGG1 via  NRF2 induction is associated ...Antioxidant-mediated up-regulation of OGG1 via  NRF2 induction is associated ...
Antioxidant-mediated up-regulation of OGG1 via NRF2 induction is associated ...
 

Viewers also liked

The Right Choice Article
The Right Choice ArticleThe Right Choice Article
The Right Choice ArticleJeffery Raich
 
Asociativismoy autogestión.docx
Asociativismoy autogestión.docxAsociativismoy autogestión.docx
Asociativismoy autogestión.docx
Claudia Alvarez
 
Denuncia internacional
Denuncia internacionalDenuncia internacional
Denuncia internacional
Amalia Pando
 
Territorio y espacio personal
Territorio y espacio personalTerritorio y espacio personal
Territorio y espacio personal
youuunes
 
Motivacion
MotivacionMotivacion
Motivacion
felame
 
Swim spas
Swim spasSwim spas
Swim spas
Gomez bailey
 
Samsung SSD 960 PRO specs m2 mz-v6 p1t0bw / mz-v6p2t0bw / MZ-V6P512BW
Samsung SSD 960 PRO specs m2 mz-v6 p1t0bw / mz-v6p2t0bw / MZ-V6P512BWSamsung SSD 960 PRO specs m2 mz-v6 p1t0bw / mz-v6p2t0bw / MZ-V6P512BW
Samsung SSD 960 PRO specs m2 mz-v6 p1t0bw / mz-v6p2t0bw / MZ-V6P512BW
Worldspan Communications Ltd
 
Resumen ejecutivo tcnl.rolando del villar
Resumen ejecutivo tcnl.rolando del villarResumen ejecutivo tcnl.rolando del villar
Resumen ejecutivo tcnl.rolando del villar
Amalia Pando
 
Webinar: Are you ready for your peak season?
Webinar: Are you ready for your peak season?Webinar: Are you ready for your peak season?
Webinar: Are you ready for your peak season?
Jennifer Finney
 
North Central Region PACT
North Central Region PACTNorth Central Region PACT
North Central Region PACT
Lawrence Lippke
 
Matecak
MatecakMatecak
Matecak
ZkurvenyKelt
 
VERB_BROCHURE_PRINT.COMPRESSED
VERB_BROCHURE_PRINT.COMPRESSEDVERB_BROCHURE_PRINT.COMPRESSED
VERB_BROCHURE_PRINT.COMPRESSEDDean Currall
 
Object Oriented Programming
Object Oriented ProgrammingObject Oriented Programming
Object Oriented ProgrammingSerdar Cavdar
 
Surdulica 4
Surdulica 4Surdulica 4
Surdulica 4
Mircic
 
MAT100 Technology Presentation: Oceans
MAT100 Technology Presentation: OceansMAT100 Technology Presentation: Oceans
MAT100 Technology Presentation: Oceans
ndmaresca
 
Tipos de energía eléctrica
Tipos de energía eléctricaTipos de energía eléctrica
Tipos de energía eléctrica
DAPO0628
 

Viewers also liked (18)

The Right Choice Article
The Right Choice ArticleThe Right Choice Article
The Right Choice Article
 
Asociativismoy autogestión.docx
Asociativismoy autogestión.docxAsociativismoy autogestión.docx
Asociativismoy autogestión.docx
 
Denuncia internacional
Denuncia internacionalDenuncia internacional
Denuncia internacional
 
Territorio y espacio personal
Territorio y espacio personalTerritorio y espacio personal
Territorio y espacio personal
 
Motivacion
MotivacionMotivacion
Motivacion
 
Swim spas
Swim spasSwim spas
Swim spas
 
Samsung SSD 960 PRO specs m2 mz-v6 p1t0bw / mz-v6p2t0bw / MZ-V6P512BW
Samsung SSD 960 PRO specs m2 mz-v6 p1t0bw / mz-v6p2t0bw / MZ-V6P512BWSamsung SSD 960 PRO specs m2 mz-v6 p1t0bw / mz-v6p2t0bw / MZ-V6P512BW
Samsung SSD 960 PRO specs m2 mz-v6 p1t0bw / mz-v6p2t0bw / MZ-V6P512BW
 
Resumen ejecutivo tcnl.rolando del villar
Resumen ejecutivo tcnl.rolando del villarResumen ejecutivo tcnl.rolando del villar
Resumen ejecutivo tcnl.rolando del villar
 
Webinar: Are you ready for your peak season?
Webinar: Are you ready for your peak season?Webinar: Are you ready for your peak season?
Webinar: Are you ready for your peak season?
 
North Central Region PACT
North Central Region PACTNorth Central Region PACT
North Central Region PACT
 
Matecak
MatecakMatecak
Matecak
 
VERB_BROCHURE_PRINT.COMPRESSED
VERB_BROCHURE_PRINT.COMPRESSEDVERB_BROCHURE_PRINT.COMPRESSED
VERB_BROCHURE_PRINT.COMPRESSED
 
2030 OFFICIAL
2030 OFFICIAL2030 OFFICIAL
2030 OFFICIAL
 
Object Oriented Programming
Object Oriented ProgrammingObject Oriented Programming
Object Oriented Programming
 
Surdulica 4
Surdulica 4Surdulica 4
Surdulica 4
 
MAT100 Technology Presentation: Oceans
MAT100 Technology Presentation: OceansMAT100 Technology Presentation: Oceans
MAT100 Technology Presentation: Oceans
 
Tipos de energía eléctrica
Tipos de energía eléctricaTipos de energía eléctrica
Tipos de energía eléctrica
 
Rishi Ram Adhikari mphil thesis
Rishi Ram Adhikari mphil thesisRishi Ram Adhikari mphil thesis
Rishi Ram Adhikari mphil thesis
 

Similar to 20150115_JQO_NYAPopulationGenomics

ppgardner-lecture03-genomesize-complexity.pdf
ppgardner-lecture03-genomesize-complexity.pdfppgardner-lecture03-genomesize-complexity.pdf
ppgardner-lecture03-genomesize-complexity.pdf
Paul Gardner
 
FORENSIC EPIGENETICS FOR BODILY FLUID TYPING, SUSPECT AGE, AND PHENOTYPING
FORENSIC EPIGENETICS FOR BODILY FLUID TYPING, SUSPECT AGE, AND PHENOTYPINGFORENSIC EPIGENETICS FOR BODILY FLUID TYPING, SUSPECT AGE, AND PHENOTYPING
FORENSIC EPIGENETICS FOR BODILY FLUID TYPING, SUSPECT AGE, AND PHENOTYPING
iQHub
 
Next Generation Sequencing - Prof. Frans Cremers
Next Generation Sequencing - Prof. Frans CremersNext Generation Sequencing - Prof. Frans Cremers
Next Generation Sequencing - Prof. Frans Cremers
Rahajeng Tunjungputri
 
MS thesis presentation_FINAL
MS thesis presentation_FINALMS thesis presentation_FINAL
MS thesis presentation_FINALTom Hajek
 
Avoidance of stochastic RNA interactions can be harnessed to control protein ...
Avoidance of stochastic RNA interactions can be harnessed to control protein ...Avoidance of stochastic RNA interactions can be harnessed to control protein ...
Avoidance of stochastic RNA interactions can be harnessed to control protein ...
Paul Gardner
 
Bioinformatic Analysis of Synthetic Lethality in Breast Cancer
Bioinformatic Analysis of Synthetic Lethality in Breast CancerBioinformatic Analysis of Synthetic Lethality in Breast Cancer
Bioinformatic Analysis of Synthetic Lethality in Breast Cancer
Tom Kelly
 
Aug2013 Heidi Rehm integrating large scale sequencing into clinical practice
Aug2013 Heidi Rehm integrating large scale sequencing into clinical practiceAug2013 Heidi Rehm integrating large scale sequencing into clinical practice
Aug2013 Heidi Rehm integrating large scale sequencing into clinical practiceGenomeInABottle
 
Genome Editing Comes of Age
Genome Editing Comes of AgeGenome Editing Comes of Age
Genome Editing Comes of Age
Candy Smellie
 
2009 09 08 Wiltshire Ipit Seminar Slides
2009 09 08 Wiltshire Ipit Seminar Slides2009 09 08 Wiltshire Ipit Seminar Slides
2009 09 08 Wiltshire Ipit Seminar Slides
UNC Eshelman School of Pharmacy
 
Cell Authentication By STR Profiling
Cell Authentication By STR ProfilingCell Authentication By STR Profiling
Cell Authentication By STR Profiling
Creative-Bioarray
 
Gene Profiling in Clinical Oncology - Slide 4 - L. Lacroix - New markers to d...
Gene Profiling in Clinical Oncology - Slide 4 - L. Lacroix - New markers to d...Gene Profiling in Clinical Oncology - Slide 4 - L. Lacroix - New markers to d...
Gene Profiling in Clinical Oncology - Slide 4 - L. Lacroix - New markers to d...European School of Oncology
 
short tandem repeats profile
short tandem repeats profileshort tandem repeats profile
short tandem repeats profile
Bennie George
 
Cell authentication by str profile
Cell authentication by str profileCell authentication by str profile
Cell authentication by str profile
Bennie George
 
Nextgenerationsequencing 131218163555-phpapp02
Nextgenerationsequencing 131218163555-phpapp02Nextgenerationsequencing 131218163555-phpapp02
Nextgenerationsequencing 131218163555-phpapp02t7260678
 
Nextgenerationsequencing ngs 131218163555-phpapp02
Nextgenerationsequencing     ngs  131218163555-phpapp02Nextgenerationsequencing     ngs  131218163555-phpapp02
Nextgenerationsequencing ngs 131218163555-phpapp02鋒博 蔡
 
NAISTビッグデータシンポジウム - バイオ久保先生
NAISTビッグデータシンポジウム - バイオ久保先生NAISTビッグデータシンポジウム - バイオ久保先生
NAISTビッグデータシンポジウム - バイオ久保先生
ysuzuki-naist
 

Similar to 20150115_JQO_NYAPopulationGenomics (20)

ppgardner-lecture03-genomesize-complexity.pdf
ppgardner-lecture03-genomesize-complexity.pdfppgardner-lecture03-genomesize-complexity.pdf
ppgardner-lecture03-genomesize-complexity.pdf
 
FORENSIC EPIGENETICS FOR BODILY FLUID TYPING, SUSPECT AGE, AND PHENOTYPING
FORENSIC EPIGENETICS FOR BODILY FLUID TYPING, SUSPECT AGE, AND PHENOTYPINGFORENSIC EPIGENETICS FOR BODILY FLUID TYPING, SUSPECT AGE, AND PHENOTYPING
FORENSIC EPIGENETICS FOR BODILY FLUID TYPING, SUSPECT AGE, AND PHENOTYPING
 
Next Generation Sequencing - Prof. Frans Cremers
Next Generation Sequencing - Prof. Frans CremersNext Generation Sequencing - Prof. Frans Cremers
Next Generation Sequencing - Prof. Frans Cremers
 
MS thesis presentation_FINAL
MS thesis presentation_FINALMS thesis presentation_FINAL
MS thesis presentation_FINAL
 
Avoidance of stochastic RNA interactions can be harnessed to control protein ...
Avoidance of stochastic RNA interactions can be harnessed to control protein ...Avoidance of stochastic RNA interactions can be harnessed to control protein ...
Avoidance of stochastic RNA interactions can be harnessed to control protein ...
 
Bioinformatic Analysis of Synthetic Lethality in Breast Cancer
Bioinformatic Analysis of Synthetic Lethality in Breast CancerBioinformatic Analysis of Synthetic Lethality in Breast Cancer
Bioinformatic Analysis of Synthetic Lethality in Breast Cancer
 
Aug2013 Heidi Rehm integrating large scale sequencing into clinical practice
Aug2013 Heidi Rehm integrating large scale sequencing into clinical practiceAug2013 Heidi Rehm integrating large scale sequencing into clinical practice
Aug2013 Heidi Rehm integrating large scale sequencing into clinical practice
 
RapportHicham
RapportHichamRapportHicham
RapportHicham
 
Genome Editing Comes of Age
Genome Editing Comes of AgeGenome Editing Comes of Age
Genome Editing Comes of Age
 
2009 09 08 Wiltshire Ipit Seminar Slides
2009 09 08 Wiltshire Ipit Seminar Slides2009 09 08 Wiltshire Ipit Seminar Slides
2009 09 08 Wiltshire Ipit Seminar Slides
 
Shahbaz Str
Shahbaz StrShahbaz Str
Shahbaz Str
 
Shahbaz Str
Shahbaz StrShahbaz Str
Shahbaz Str
 
Mt DNA
Mt DNAMt DNA
Mt DNA
 
Cell Authentication By STR Profiling
Cell Authentication By STR ProfilingCell Authentication By STR Profiling
Cell Authentication By STR Profiling
 
Gene Profiling in Clinical Oncology - Slide 4 - L. Lacroix - New markers to d...
Gene Profiling in Clinical Oncology - Slide 4 - L. Lacroix - New markers to d...Gene Profiling in Clinical Oncology - Slide 4 - L. Lacroix - New markers to d...
Gene Profiling in Clinical Oncology - Slide 4 - L. Lacroix - New markers to d...
 
short tandem repeats profile
short tandem repeats profileshort tandem repeats profile
short tandem repeats profile
 
Cell authentication by str profile
Cell authentication by str profileCell authentication by str profile
Cell authentication by str profile
 
Nextgenerationsequencing 131218163555-phpapp02
Nextgenerationsequencing 131218163555-phpapp02Nextgenerationsequencing 131218163555-phpapp02
Nextgenerationsequencing 131218163555-phpapp02
 
Nextgenerationsequencing ngs 131218163555-phpapp02
Nextgenerationsequencing     ngs  131218163555-phpapp02Nextgenerationsequencing     ngs  131218163555-phpapp02
Nextgenerationsequencing ngs 131218163555-phpapp02
 
NAISTビッグデータシンポジウム - バイオ久保先生
NAISTビッグデータシンポジウム - バイオ久保先生NAISTビッグデータシンポジウム - バイオ久保先生
NAISTビッグデータシンポジウム - バイオ久保先生
 

20150115_JQO_NYAPopulationGenomics

  • 1. Insights into the functional effect of promoter-associated short tandem repeats in the human genome using targeted sequencing Javier Quilez Oliete Postdoctoral Fellow in the Andrew Sharp’s Lab Dept. of Genetics and Genomics Sciences Icahn School of Medicine at Mount Sinai javier.quilez@mssm.edu Twitter: @jaquol
  • 2. Tandem repeats (TRs) are stretches of DNA comprised of ≥2 contiguous copies of a motif arranged in a head-to-tail pattern CAG CAG CAG CAG 3-bp motif CAG Repeat length = 18 bp (6 copies x 3-bp motif) CAG ▶ Repeat motif lengths up to hundreds of Kbp ▶ 2–6 bp: short tandem repeats (STRs) or microsatellites
  • 3. Features of TRs 3 CAGIndividual 1 CAG CAG CAG CAG CAG CAGIndividual 2 CAG CAG CAG CAGIndividual 3 CAG CAG CAG CAG CAG CAGIndividual 4 CAG CAG CAG CAG CAG CAGIndividual 5 CAG CAG CAGIndividual 6 CAG Multi-allelic Number of mutations per locus per generation): ▶ SNPs: ~10-8 ▶ CNVs: ~10-6 –10-4 ▶ TRs: ~10-4 –10-3 Therefore, TRs mutation rate is several orders of magnitude higher than other forms of genetic variation High mutation rate TRs represent an important source of genetic variation
  • 4. Features of TRs 4 CAGIndividual 1 CAG CAG CAG CAG CAG CAGIndividual 2 CAG CAG CAG CAGIndividual 3 CAG CAG CAG CAG CAG CAGIndividual 4 CAG CAG CAG CAG CAG CAGIndividual 5 CAG CAG CAGIndividual 6 CAG Multi-allelic High mutation rate No. mutations per locus per generation: ▶ SNPs: ~10-8 ▶ CNVs: ~10-6 –10-4 ▶ TRs: ~10-4 –10-3 The mutation rate of TRs is several orders of magnitude higher than other forms of genetic variation TRs represent an important source of genetic variation
  • 5. Features of TRs 5 CAGIndividual 1 CAG CAG CAG CAG CAG CAGIndividual 2 CAG CAG CAG CAGIndividual 3 CAG CAG CAG CAG CAG CAGIndividual 4 CAG CAG CAG CAG CAG CAGIndividual 5 CAG CAG CAGIndividual 6 CAG Multi-allelic High mutation rate No. mutations per locus per generation: ▶ SNPs: ~10-8 ▶ CNVs: ~10-6 –10-4 ▶ TRs: ~10-4 –10-3 The mutation rate of TRs is several orders of magnitude higher than other forms of genetic variation Abundant: ~1M annotated TRs in the human genome TRs represent an important source of genetic variation
  • 6. Features of TRs 6 CAGIndividual 1 CAG CAG CAG CAG CAG CAGIndividual 2 CAG CAG CAG CAGIndividual 3 CAG CAG CAG CAG CAG CAGIndividual 4 CAG CAG CAG CAG CAG CAGIndividual 5 CAG CAG CAGIndividual 6 CAG Multi-allelic High mutation rate No. mutations per locus per generation: ▶ SNPs: ~10-8 ▶ CNVs: ~10-6 –10-4 ▶ TRs: ~10-4 –10-3 The mutation rate of TRs is several orders of magnitude higher than other forms of genetic variation Abundant: ~1M annotated TRs in the human genome TRs represent an important source of genetic variation
  • 7. Functional impacts of TRs remains relatively unexplored 7 ▶ Growing evidence supporting the functional impact of TRs: ▶ In many species, TRs are often located within coding regions of genes with specific biological functions, including those that confer beneficial phenotypes ▶ Several diseases are caused by repeat expansions (humans, dogs, plants) ▶ TRs remain poorly studied due to: ▶ Considered as mere “junk” DNA ▶ Technical difficulties in their characterization: ▶ Even with SNP arrays, aCGH and next-generation sequencing, TRs are ignored features of the genome in most studies (GWAS, 1,000 Genomes) ▶ Only recently, novel approaches for genotyping repetitive elements (Gymrek et al. 2012; Highnam et al. 2013; Guilmatre et al. 2013; Brahmachary et al. 2014) aCGH: comparative genomic hybridization GWAS: genome wide association study Gymrek et al. Genome Res. 2012 Jun;22(6):1154-62. doi: 10.1101/gr.135780.111 Highnam et al. Nucleic Acids Res. 2013 Jan 7;41(1):e32. doi: 10.1093/nar/gks981 Guilmatre et al. Hum Mutat. 2013 Sep;34(9):1304-11. doi: 10.1002/humu.22359 Brahmachary et al. PLoS Genet. 2014 Jun 19;10(6):e1004418. doi: 10.1371/journal.pgen.1004418
  • 8. Hypothesis, aim and approach 8 ▶ Hypothesis: STRs are important functional components of the genome with overlooked roles in phenotypic variation, including human disease ▶ Aim: identify functional STRs by searching for repeat length variants altering local gene expression and DNA methylation ▶ Approach: ▶ Characterized repeat length variation in STRs located in gene promoters ▶ more likely to have a cis effect on the activity of nearby genes ▶ Performed cis-association analysis to identify expression (eQTL) and methylation (mQTL) quantitative loci
  • 9. Characterizing repeat length variation in promoter-associated STRs 9 Targeted-sequencing methodology (Guilmatre et al. 2013) – overcomes technical difficulties in genotyping STRs Capture: Target STRs in gene promoters (within ±1 Kbp of TSS) ▶ ~6,000 promoter-associated STR ▶ 31% of RefSeq genes have a promoter-associated STR Guilmatre et al. Hum Mutat. 2013 Sep;34(9):1304-11. doi: 10.1002/humu.22359
  • 10. Characterizing repeat length variation in promoter-associated STRs 10 Targeted-sequencing methodology (Guilmatre et al. 2013) – overcomes technical difficulties in genotyping STRs Sequencing: Illumina 100-bp reads multiplexing 24 individuals per lane ▶ 120 HapMap individuals: ▶ 58 CEU (European) ▶ 62 YRI (African) ▶ Median coverage per STR: 47x Guilmatre et al. Hum Mutat. 2013 Sep;34(9):1304-11. doi: 10.1002/humu.22359
  • 11. Characterizing repeat length variation in promoter-associated STRs 11 Targeted-sequencing methodology (Guilmatre et al. 2013) – overcomes technical difficulties in genotyping STRs Genotyping: RepeatSeq (Highnam et al. 2013) – uses sequencing reads to call the two STR repeat length alleles in each individual and loci Guilmatre et al. Hum Mutat. 2013 Sep;34(9):1304-11. doi: 10.1002/humu.22359 Highnam et al. Nucleic Acids Res. 2013 Jan 7;41(1):e32. doi: 10.1093/nar/gks981
  • 12. ▶ We produced promoter-associated STRs genotypes in 120 HapMap individuals ▶ Same individuals previously characterized for gene expression and DNA methylation eQTL and mQTL cis-association analysis 12 Illumina RNA-seq Montgomery et al. 2010 Pickrell et al. 2010 Illumina 450K array Moen et al. 2013 Montgomery et al. Nature. 2010 Apr 1;464(7289):773-7. doi: 10.1038/nature08903 Pickrell et al. Nature. 2010 Apr 1;464(7289):768-72. doi: 10.1038/nature08872 Moen et al. Genetics. 2013 Aug;194(4):987-96. doi: 10.1534/genetics.113.151381
  • 13. ▶ We produced promoter-associated STRs genotypes in 120 HapMap individuals ▶ Same individuals previously characterized for gene expression and DNA methylation ▶ Correlation of STR genotypes with: ▶ Transcript expression eQTL and mQTL cis-association analysis 13
  • 14. eQTL and mQTL cis-association analysis 14 Calculated in CEU and YRI individuals separately ▶ We produced promoter-associated STRs genotypes in 120 HapMap individuals ▶ Same individuals previously characterized for gene expression and DNA methylation ▶ Correlation of STR genotypes with: ▶ Transcript expression
  • 15. eQTL and mQTL cis-association analysis 15 Calculated in CEU and YRI individuals separately ▶ We produced promoter-associated STRs genotypes in 120 HapMap individuals ▶ Same individuals previously characterized for gene expression and DNA methylation ▶ Correlation of STR genotypes with: ▶ Transcript expression
  • 16. ▶ We produced promoter-associated STRs genotypes in 120 HapMap individuals ▶ Same individuals previously characterized for gene expression and DNA methylation ▶ Correlation of STR genotypes with: ▶ Promoter CpG methylation eQTL and mQTL cis-association analysis 16
  • 17. eQTL and mQTL cis-association analysis 17 Calculated in CEU and YRI individuals separately ▶ We produced promoter-associated STRs genotypes in 120 HapMap individuals ▶ Same individuals previously characterized for gene expression and DNA methylation ▶ Correlation of STR genotypes with: ▶ Promoter CpG methylation
  • 18. eQTL and mQTL cis-association analysis 18 Calculated in CEU and YRI individuals separately ▶ We produced promoter-associated STRs genotypes in 120 HapMap individuals ▶ Same individuals previously characterized for gene expression and DNA methylation ▶ Correlation of STR genotypes with: ▶ Promoter CpG methylation
  • 19. Examples of candidate eQTL and mQTL 19 STR length (bp) STR variation correlates with NFE2L1 expression in YRI
  • 20. Examples of candidate eQTL and mQTL 20 STR length (bp) STR length (bp) STR variation in CEU correlates with methylation in FBLN5 promoter STR variation correlates with NFE2L1 expression in YRI
  • 21. Sharing between CEU and YRI individuals 21 eQTLs CEU YRI ▶ In each population, scored as eQTLs those STRs showing correlation values with corrected p<0.05
  • 22. Sharing between CEU and YRI individuals 22 eQTLs mQTLs CEU YRI ▶ In each population, scored as eQTLs those STRs showing correlation values with corrected p<0.05
  • 23. Sharing between CEU and YRI individuals 23 eQTLs mQTLs Limited power due to small sample size? Only 40–60 individuals per population for correlations CEU YRI ▶ In each population, scored as eQTLs those STRs showing correlation values with corrected p<0.05
  • 24. Overlap between eQTL and mQTL 24 ▶ STRs scored as eQTL/mQTL in any of the two populations ▶ ~97% of eQTL also scored as mQTL ▶ ~⅓ of mQTL also scored as eQTL eQTL mQTL
  • 25. Genomic features of genetic variants operating as QTLs 25 ▶ Previous work (Stranger et al. 2007) indicate that SNPs acting as eQTL are located very close to the gene transcription start site (TSS) Modified from Stranger et al. Nat Genet. 2007 Oct;39(10):1217-24 Statisticalsignificance
  • 26. Genomic features of genetic variants operating as QTLs 26 ▶ Previous work (Stranger et al. 2007) indicate that SNPs acting as eQTL are located very close to the gene transcription start site (TSS) ▶ To gain confidence in our candidate eQTL/mQTL STRs we analyzed: ▶ Distance relative to their target gene and CpG sites Modified from Stranger et al. Nat Genet. 2007 Oct;39(10):1217-24 Statisticalsignificance
  • 27. Enrichment of candidate eQTLs/mQTLs within ±1 Kbp of their target 27 Gene expression
  • 28. Enrichment of candidate eQTLs/mQTLs within ±1 Kbp of their target 28 Gene expression Nominal p<0.05 in CEU or YRI
  • 29. Enrichment of candidate eQTLs/mQTLs within ±1 Kbp of their target 29 Gene expression Bin: -1–0 Kbp Empirical p = 0.022 (Randomization test of distances)
  • 30. Enrichment of candidate eQTLs/mQTLs within ±1 Kbp of their target 30 Gene expression Bin: -1–0 Kbp Empirical p = 0.022 (Randomization test of distances) DNA methylation Bin: 0–1 Kbp Empirical p < 0.001 STR distance to methylation probe (Kbp)
  • 31. Genomic features of genetic variants operating as QTLs 31 ▶ Previous work (Degner et al. 2012) showed that thousands of SNPs affect nearby (~200–300 bp) chromatin accessibility (dsQTLs) Modified from Degner et al. Nature. 2012 Feb 5;482(7385):390-4. doi: 10.1038/nature10808
  • 32. Genomic features of genetic variants operating as QTLs 32 ▶ Previous work (Degner et al. 2012) showed that thousands of SNPs affect nearby (~200–300 bp) chromatin accessibility (dsQTLs) Modified from Degner et al. Nature. 2012 Feb 5;482(7385):390-4. doi: 10.1038/nature10808 ▶ Overlap between SNPs acting as dsQTLs and altering gene expression (eQTLs)
  • 33. Overlap of candidate eQTLs/mQTLs with regulatory elements 33 ▶ High-quality genome-wide maps of regulatory elements (inferred in HapMap lymphoblastoid cell lines): ▶ ~1M transcription factor binding sites (TFBS) (http://centipede.uchicago.edu) ▶ ~200K DNaseI hypersensitive sites (DHS) (ENCODE)
  • 34. Enrichment of candidate eQTLs/mQTLs in regulatory elements 34
  • 35. Enrichment of candidate eQTLs/mQTLs in regulatory elements 35
  • 36. Enrichment of candidate eQTLs/mQTLs in regulatory elements 36
  • 37. Overlooked (STR) variation? 37 ▶ Millions of SNPs have been used in genome wide association studies (GWAS) to map functional genetic variants (e.g. disease susceptibility, height) ▶ Variants identified in GWAS explain only a small proportion of phenotypic variance → missing heritability…? ▶ GWAS rely on high linkage disequilibrium (LD) between genotyped SNPs and functional variants to be identified ▶ Variants in low LD may be overlooked by SNP-based approaches... ▶ To determine whether STR variation can be effectively tagged by SNP data, we studied LD between STR and SNP loci
  • 38. LD analysis between STRs and nearby SNPs 38 ▶ HapMap Phase II SNP genotypes for the same 120 individuals ▶ Phased STR and SNP alleles with BEAGLE ▶ Measured LD as the correlation (r2 ) between STR and SNP alleles (pairs <250 Kbps) ▶ for each STR retained the maximum r2 (i.e. best tagging by a nearby SNP)
  • 39. LD analysis between STRs and nearby SNPs 39
  • 40. LD analysis between STRs and nearby SNPs 40
  • 41. LD analysis between STRs and nearby SNPs 41 ▶ Sharp decay in LD with STR diversity, especially in the YRI population ▶ Similar pattern seen for copy number variants ▶ Africans also show lower LD between SNP variants
  • 42. LD analysis between STRs and nearby SNPs 42 Variants with r2 ≥0.8 typically considered as tagged Median r2 =0.14 Only 11% STRs with r2 ≥0.8 Median r2 =0.30 ▶ Sharp decay in LD with STR diversity, especially in the YRI population ▶ Similar pattern seen for copy number variants ▶ Africans also show lower LD between SNP variants ▶ STR variants are poorly tagged by nearby SNPs
  • 43. Summary 43 ▶ This is the first systematic attempt to assign biological significance to STR variation in the human genome ▶ Through targeted sequencing and genotyping of ~6,000 promoter-associated STRs in 120 individuals ▶ Our results suggests that there are potentially thousands of STR variants that exert functional effects via alterations of local gene expression or epigenetics ▶ Conventional SNP-based mapping approaches are most likely blind to potential functional STR variants, as STR variation is poorly tagged by nearby SNPs ▶ Therefore, specific studies that focus on genotyping STR variants are required to fully ascertain functional variation in the genome
  • 44. Acknowledgements 44 Icahn School of Medicine at Mount Sinai Andrew Sharp Bharati Jadhav Chloe Tessereau Corey Watson Daniel Ho Kakit Cheung Mafalda Barbosa Ricky Joshi (Paras Garg) (Audrey Guilmatre) Virginia Tech Research Funding David Mittelman Gareth Highnam NHGRI R01-HG006696 NIDA R01-DA033660 NICHD R03-HD073731 March of Dimes Grant, 6-FY13-92
  • 46. Additional information ▶ Range STRs motif sizes: 1-28 bp ▶ Enrichment of candidate eQTLs/mQTLs within ±1 Kbp of their target: ▶ enrichment (red line): calculated as the difference in relative frequencies between the significant and non-significant distribution in a given distance bin... 46