2. Yaniv Erlich7/15/15 @erlichya
We know quite a lot about genetic variations…
Intro Genotyping STRs eSTRs eSTR or eSNPs? eSTRs in diseases
Expression STRs (eSTRs)
3. Yaniv Erlich7/15/15 @erlichya
What about Short Tandem
Repeats (STRs)?
CTCAATACAAGTCTAACAGCAGCAGCAGCAGCAGCAGCAGCAGTTGATGAAC
Intro Genotyping STRs eSTRs eSTR or eSNPs? eSTRs in diseases
Expression STRs (eSTRs)
4. Yaniv Erlich7/15/15 @erlichya
Short Tandem Repeats
• 1% of the human genome!
• Fast mutation rates!
• Multiple Mendelian
diseases!
• Evolvability! HuntingtonFragile X
OPMDSynpolydactyly Ataxia (10 types)
HFG syndrome
Holoprosen-
cephaly
Pseudoach-
ondroplasia
Myotonic dystrophy
Cleidocranial
Dysplasia ALS-FTD
Intro Genotyping STRs eSTRs eSTR or eSNPs? eSTRs in diseases
Expression STRs (eSTRs)
5. Yaniv Erlich7/15/15 @erlichya
lobSTR – Whole genome solution for STR genotyping
Intro Genotyping STRs eSTRs eSTR or eSNPs? eSTRs in diseases
Expression STRs (eSTRs)
7. Yaniv Erlich7/15/15 @erlichya
New: HipSTR
• HipSTR: Haplotype-based imputation, phasing
and genotyping of STRs
• Major improvements:
– Learns locus-specific stutter models
– Physical-phasing
– Impute missing STRs or give priors based on
SNPs.
– Reports not only the length of an STR but also
its sequence
Intro Genotyping STRs eSTRs eSTR or eSNPs? eSTRs in diseases
Expression STRs (eSTRs)
8. Yaniv Erlich7/15/15 @erlichya
HipSTR: solving homoplasy
!
• Can now correctly detect STRs with identical lengths but different
sequences (homoplasy)!
• Real example: !
– Length based genotype: -4/-4!
– HipSTR genotype: (AGAT)8(ACAT)9 / (AGAT)10(ACAT)7!
• HipSTR available at
https://github.com/tfwillems/HipSTR!
Intro Genotyping STRs eSTRs eSTR or eSNPs? eSTRs in diseases
Expression STRs (eSTRs)
9. Yaniv Erlich7/15/15 @erlichya
Capillary-based validation
! Simons Genome Diversity
Project sequenced 280
individuals to 30x
! For 105 of these samples,
~300 Marshfield STRs were
genotyped using capillary
electrophoresis
! Compare the length of the
STR genotypes to the
capillary PCR products to
assess accuracy
R2=0.987
Intro Genotyping STRs eSTRs eSTR or eSNPs? eSTRs in diseases
Expression STRs (eSTRs)
10. Yaniv Erlich7/15/15 @erlichya
Analyzing 1000Genomes STRs
Good allele frequency spectrum for 90% of the STRs in the
genome.!
Intro Genotyping STRs eSTRs eSTR or eSNPs? eSTRs in diseases
Expression STRs (eSTRs)
12. Yaniv Erlich7/15/15 @erlichya
Summary of the 1000 Genomes analysis
About 100,000 of the STRs in your genome
show are different from the person next to
you…!
Intro Genotyping STRs eSTRs eSTR or eSNPs? eSTRs in diseases
Expression STRs (eSTRs)
13. Yaniv Erlich7/15/15 @erlichya
Summary of the 1000 Genomes
analysis
About 100,000 of the STRs in your
genome show are different from the
person next to you…
Part 1: lobSTR
Challenges Algorithm Benchmarking Validation Summary
But do normal STR
variations have
phenotypic
consequences?!
15. Yaniv Erlich7/15/15 @erlichya
Contenteetal.,2002
PIG3
Warpehaetal.,1999
15 14 13 12
#of repeats
Expressi
on
NOS2A
EGFR
Gebhardtetal.,1999
MMP9 Shimajirietal.,1999
Expression STRs (eSTRs): single gene studies in human
But we want a genome wide analysis!
Intro Genotyping STRs eSTRs eSTR or eSNPs? eSTRs in diseases
Expression STRs (eSTRs)
16. Yaniv Erlich7/15/15 @erlichya
STR
Expression
H0: effect = 0
H1: effect ≠ 0
Y
expression
Analysis pipeline
~190,000 tests for [genes x STR]!
+ negative controls!
X
STR calls
384 samples
RNA-seq
Regression tests +/-100kb from transcripts!
Intro Genotyping STRs eSTRs eSTR or eSNPs? eSTRs in diseases
Expression STRs (eSTRs)
17. Yaniv Erlich7/15/15 @erlichya
Genome-wide survey of eSTRs in human
Observedp-value[-log10]
Expected p-value under the null [-log10]
2060 eSTRs
Negative controls
follow the null
Signal
Expression STRs (eSTRs)
Intro Genotyping STRs eSTRs eSTR or eSNPs? eSTRs in diseases
18. Yaniv Erlich7/15/15 @erlichya
Orthogonal populations
Orthogonal expression assay (array) +
Replication
83% of eSTRs showed the same direction
of effect (N=822; p<10-93)
Also the effects were highly correlated
(R=0.73; p<10-140) Effect
RNA-
seq
Effect Array
+Data from Stranger et al., PLoS Genetics, 2012
Most of the eSTRs are replicable
Expression STRs (eSTRs)
Intro Genotyping STRs eSTRs eSTR or eSNPs? eSTRs in diseases
19. Yaniv Erlich7/15/15 @erlichya
SNPs or STRs?
gene
STRTF
Causality
Tagging
Biologically, not very interesting.!
SNP
Intro Genotyping STRs eSTRs eSTR or eSNPs? eSTRs in diseases
Expression STRs (eSTRs)
20. Yaniv Erlich7/15/15 @erlichya
Decomposing variationsh2
b!
Simulations of negative controls (no eSTR contribution):!
h2
STR!
Simulated SNP-eQTL! Simulated SNP-eQTL!
+ XBY ~ XSTR
Take home message: LMM is calibrated.!
Intro Genotyping STRs eSTRs eSTR or eSNPs? eSTRs in diseases
Expression STRs (eSTRs)
21. Yaniv Erlich7/15/15 @erlichya
LMM results of real data
Linear mixed model (LMM) for variance decomposition for all genes:!
eSTR vs. all common variants on the haplotype!
eSTRs contribute 10%-15% of the gene
expression on cis region.!
Intro Genotyping STRs eSTRs eSTR or eSNPs? eSTRs in diseases
Expression STRs (eSTRs)
22. Yaniv Erlich7/15/15 @erlichya
Mean STR allele!
Expression!
AA
AB
BB
Null hypothesis:!
random slopes!
Regressing conditioned on best SNP
gene
STRTF
Causality
Tagging
Intro Genotyping STRs eSTRs eSTR or eSNPs? eSTRs in diseases
Expression STRs (eSTRs)
23. Yaniv Erlich7/15/15 @erlichya
Mean STR allele!
Expression!
AA
AB
BB
Slopes in the same
direction as the original
association!
Regressing conditioned on best SNP
gene
STR
TF
Causality?
Tagging
Alternative hypothesis:!
Intro Genotyping STRs eSTRs eSTR or eSNPs? eSTRs in diseases
Expression STRs (eSTRs)
24. Yaniv Erlich7/15/15 @erlichya
75% of condition effects were in the same direction (p<10-108)
Regressing conditioned on best SNP
Unconditioned
effect
Conditioned Effect!
Intro Genotyping STRs eSTRs eSTR or eSNPs? eSTRs in diseases
Expression STRs (eSTRs)
25. Yaniv Erlich7/15/15 @erlichya
Evidence for function of eSTRs
Conservation
Expression STRs (eSTRs)
PhyloP!
0
0.4
0.8
1.2
1.6
2
±1000 ±500 ±250 ±100 ±50
10-3×!
Window size(bp)!
p<7%
p<3%
p<0.1%
p<0.1%
p<1%
eSTRs are significantly enriched in more conserved regions!
Intro Genotyping STRs eSTRs eSTR or eSNPs? eSTRs in diseases
26. Yaniv Erlich7/15/15 @erlichya
Co-localization with functional elements
Expression STRs (eSTRs)
Peak shift: eSTRs co-localizations with histone signatures: p<0.01!
But maybe these
signatures are created by
nearby causal variants?ENCODE LCL!
Null (peak shifting):!
Trynka, bioRxiv,2015!
Intro Genotyping STRs eSTRs eSTR or eSNPs? eSTRs in diseases
27. Yaniv Erlich7/15/15 @erlichya
A potential role of eSTRs in human diseases
Expression STRs (eSTRs)
Associating the 2060 eSTRs x 31 phenotypes of
~1300 individuals in the UK10K!
FDR<10%!
Diastolic blood pressure!
CLCC1!
DIP2B!
Intro Genotyping STRs eSTRs eSTR or eSNPs? eSTRs in diseases
28. Yaniv Erlich7/15/15 @erlichya
A potential role of eSTRs in human diseases
Expression STRs (eSTRs)
Name% Symbol% P%value% Phenotype% Class%
4:9955416' SLC2A9' 3.49E008' Uric_Acid' Metabolic'funcCon'
10:27124545' Abi1' 4.61E007' Phosphate' Metabolic'funcCon'
17:44048491' KIAA1267' 6.86E006' FEV1.FVC_RaCo' Pulmonary'funcCon'
16:473880' DECR2' 2.51E005' ApoA1' Metabolic'funcCon'
1:109393265' CLCC1' 2.89E005' Diastolic_BP' Blood'Pressure'
6:20195837' MBOAT1' 3.26E005' Albumin' Metabolic'funcCon'
1:110516300' FAM40A' 5.07E005' Urea' Metabolic'funcCon'
12:51036810' DIP2B' 1.02E004' Diastolic_BP' Blood'Pressure'
Intro Genotyping STRs eSTRs eSTR or eSNPs? eSTRs in diseases
29. Yaniv Erlich7/15/15 @erlichya
Summary
The first genome-wide expression STR analysis.
1. Over 2,000 eSTRs in the discovery set.
2. Replication in independent platforms/populations.
3. eSTRs account for 10-15% of cis-heritability by common variants
4. Functional evidence
5. eSTRs are associated with human phenotypes
Expression STRs (eSTRs)
How much missing heritability in GWAS studies by not
analyzing repetitive elements?
Intro Genotyping STRs eSTRs eSTR or eSNPs? eSTRs in diseases
30. Yaniv Erlich7/15/15 @erlichya
Team eSTR:!
Melissa Gymrek!
Thomas Willems!
Dina Zielinski !
Stoyan Georgiev!
Barak Marcus!
Alkes Price!
Mark Daly!
Jonathan Pritchard!
!
!
!
Acknowledgements
Funding
Burroughs Wellcome Career Award
National Institute of Justice
31. Yaniv Erlich7/15/15 @erlichya
Outline
Yaniv Erlich7/12/12 Towards a population scale map of STR variations
lobSTR: Profiling STR variations from WGS data
STR variations across 2,500 datasets: Preliminary results
All
CEU
GBR
FIN
IBS
YRI
LWK
ACB
ASW
CHB
CDX
CHS
JPT
KHV
0.0
0.2
0.4
0.6
0.8
1.0
Heterozygosity
The End