5. ~11,022 genes from 20,634 probes
# of probes per gene symbol
Probes # of Genes
1 10,528
2 469
3 23
4 2
exon
Gene
intron
Alternatve
Splicing
6. General structure of our group presentatons
• 3 subgroups
– Gene expression alone (Renaud Tissier)
– Genetcs of gene expression (August Blackburn)
– Genetcs of gene expression and phenotype
(Heather Cordell)
7. Aims
• Understand the correlaton structure of
expression of 1000s of genes across
individuals, pedigrees
• … and their relatonship to phenotype (SBP)
8. Data used
• All probes; all individuals; no phenotype
(Gallaugher –P3)
• WGCNA to get 14K probes; 82 individuals with
SBP>75% at all 4 tme points in real data
(Gadaleta)
• 25% most heritable probes (4.9K, Göring et al.,
2007); 5 largest pedigrees (2,5,6,8,10) (n=276);
SBP visit 1, rep 1 of simulated (Tissier –P6)
• External data (HaemAtlas, DAVID; Gallaugher &
Tissier)
• No-one used genotypes or WGS (yet)
10. Gallaugher
• T-, B- lymphocyte and monocyte counts vary
between people, are heritable, and thus may
confound genetc mapping of eQTL
• Principal Component analysis to identfy
variaton in gene expression between people
• Determine if PCs associated with variables
(age, sex, BP, HT, medicaton, pedigree)
11. Peds 5, 6, and 8 signifcantly diferent for PC2 (p<10-3)
12. Estmate proporton of cells for each individual
using sorted cell expression data (HaemAtlas)
Cytotoxic T
cell
proporton
helper T cell proporton
Gross outlier from
ped #8 for both Tc
and Th
? Acute infecton
Conclusions:
Variaton in gene
expression in PBMC
could be
incorporated into
genetc analysis to
improve power
19. GAW DATA ANALYSIS
WGCNA
SBP @75%
samples
probes less probes
less samples
Gadaleta
20. GENERAL
IDEA
min
penalty (sparsity)
gene matrix
covariance matrix
(associaton)
response
Gadaleta
21. RESULTS /CONCLUSION
No signifcant gene networks detected in cases with SBP>75%
Small number of samples vs. high number of covariates
Computatonal burden of LASSO too high
Gadaleta
22. Sub-group Conclusions
• Complex correlaton structure of gene expression
(Gallaugher)
– Diferent for specifc pedigrees ; outlier
– Biological (rare variants) or technical (mixed cells, batch efects,
acute illness)
• Only 1 gene (DUSP1) was in the answers (Tissier)
– Meta-analysis across pedigrees can be more robust for fltering
than correctng for family structure
• High-dimensional data needs larger sample sizes and
controls (Gadaleta)
– Diferental network analysis
23. General structure of our group presentatons
• 3 subgroups
– Gene expression alone (Renaud Tissier)
– Genetcs of gene expression (August Blackburn)
– Genetcs of gene expression and phenotype
(Heather Cordell)
24. Identfying Genetc contributon to Gene
Expression
• All used pedigree genotype and expression data
• Cis-eQTL regions genetc architecture (Cantor, 3 genes
with high eQTL LODs (Göring 2007), Imputed genotype
dosages)
• Allele Specifc Binding flters potental regulatory SNPs
(Peralta – P4, ENCODE, Imputed genotype dosages)
• Replicaton of reported epistatc interactons (candidate
SNPs (Hemani, 2014), GWAS)
• Haplotype specifc gene expression estmates (Blackburn
– P2, RFSs identfed using HIPster, GWAS data)
26. Gene
Name Probe_id
Original
LOD
Bp
range
# SNPs conditoned
on # sig SNPs
Min
p-val
TIMM10 GI_6912707-S 37 12120 01
89
1.6e-66
9.9e-86
RPL14 GI_16753224-S 34 14582 0
29 3.8e-124
LR8
GI_21361500-S 43 19100 012
29
14
1
9.2e-83
2.1e-22
1.1e-11
Independent Associatons
SOLAR-MGA; alpha = 5e-8
Conclusions:
• Multple independent SNPs contribute to single eQTL regions
• Number of independent cis eQTL associatons varies with the
level of signifcance and sofware used
27. Identfying Genetc contributon to Gene
Expression
• All used pedigree genotype and expression data
• Cis-eQTL regions genetc architecture (Cantor, Siegmund,
3 genes with high eQTL LODs (Göring 2007), Imputed
genotype dosages)
• Allele Specifc Binding (ASB) flters potental regulatory
SNPs (Peralta - P4, ENCODE, Imputed genotype dosages)
• Replicaton of reported epistatc interactons (candidate
SNPs (Hemani, 2014), GWAS)
• Haplotype specifc gene expression estmates (Blackburn
– P2, RFSs identfed using HIPster, GWAS data)
29. 10,552 ASB SNPs used to build the covariance
kernel
Null model
10k simulated phenotypes
0.15 < h2r < 0.25
0.01 < afreq < 0.50
Significant eQTL signals obtained for the 2 ASB based covariance kernels used
Peralta P4
30. Peralta – P4
• ASB is a biologically meaningful flter for the prioritzaton
of non-coding variaton
– can be used to prioritze non-coding variants based on potental
regulatory functon
• ASB correlates with gene expression levels
– cis-ASB accounts for 53-83% of the variaton in neigboring gene
expression
• Segregaton of ASB in pedigrees can act as a background
noise flter
– known biases in ASB predicton can be incorporated as weights
into the correlaton kernel to improve signal to noise rato
31. Identfying Genetc contributon to Gene
Expression
• All used pedigree genotype and expression data
• Cis-eQTL regions genetc architecture (Cantor, Siegmund, 3
genes with high eQTL LODs (Göring 2007), Imputed genotype
dosages)
• Allele Specifc Binding flters potental regulatory SNPs (Peralta –
P4, ENCODE, Imputed genotype dosages)
• Replicaton of reported epistatc interactons (Howey,
candidate SNPs, GWAS data)
– Hemani et al. Detection and replication of epistasis influencing
transcription in humans. Nature. 2014 508:249–253.
• Haplotype specifc gene expression estmates (Blackburn – P2,
RFSs identfed using HIPster, GWAS data)
33. Howey Conclusions
• SNP-SNP interactons associated with gene
expressions showed combined evidence of
replicaton, p-value= 0.007
• Expression data is argued to give higher power
for detectng associaton. This replicaton
exercise seems to refect this
34. Identfying Genetc contributon to Gene
Expression
• All used pedigree genotype and expression data
• Cis-eQTL regions genetc architecture (Cantor, Siegmund,
3 genes with high eQTL LODs (Göring 2007), Imputed
genotype dosages)
• Allele Specifc Binding flters potental regulatory SNPs
(Peralta – P4, ENCODE, Imputed genotype dosages)
• Replicaton of reported epistatc interactons (candidate
SNPs (Hemani, 2014), GWAS)
• Haplotype specifc gene expression estmates (Blackburn
– P2, RFSs identfed using HIPster, GWAS data)
35. Blackburn – P2
• Aim: To estmate haplotype-specifc gene
expression levels and identfy diferences
• Methods:
– Phased genotypes / IBD structure using HIPster.
Identfed recombinaton free segments (RFS).
– Haplotype specifc estmates generated using EM
– Diferences between haplotypes assessed using LRT
37. Haplotype diferences (Blackburn)
• Null simulaton adheres to uniform
distributon
• 542 of 8624 tests signifcant (q<0.1)
Haploytpe specific cis−eQTL
p
Frequency
0.0 0.2 0.4 0.6 0.8 1.0
0 100 300 500
pi0=0.725
−3 −2 −1 0 1 2 3
0.0 PTGS2
Expression
density
0.2 0.4 0.6
T2DG0800492_1
38. Methods Adjustng for Non-Independence
Due to Relatedness
• Theoretcal kinship matrix
– Variance component (Peralta, SOLAR)
– Eigensimplifcaton (Blackburn & Cantor, SOLAR-MGA)
• Empirical kinship matrix
– Linear mixed model (Cantor, FaST-LMM & Howey,
GEMMA)
39. Advantages of Pedigrees
• Permit identfcaton of recombinaton free
segments (RFS, Blackburn)
• True allele specifc binding (ASB) signals will
segregate (Peralta)
40. Sub-group Conclusions
• Biological informaton from allele specifc binding
can be used to flter potentally functonal
regulatory SNPs
• Multple independent signals are observed at
eQTL
• Epistasis
• Expression varies between haplotypes
• Genetc architecture of gene expression is
complex (duh!)
41. General structure of our group presentatons
• 3 subgroups
– Gene expression alone (Renaud Tissier)
– Genetcs of gene expression (August Blackburn)
– Genetcs of gene expression and phenotype
(Heather Cordell)
43. Aims
• Pitsillides modeled gene expression as the primary outcome
– Also looked for enrichment of GWAS results (GWAS for SBP or
DBP) in SNPs associated with expression
• Three papers tried to model phenotypes as the primary
outcome
– Radkowski (P5) tried to model future HT using expression
– Tong investgated whether using E+G did beter than using E or G
alone
– Ainsworth (P1) fted causal models for relatonship between G, E
and P
44. Expression data
• 2 papers used individual expression variables as predictors
• 1 paper used individual expression variables as outcomes
– All expression variables, with SNPs located in same genetc
region used as predictors
• 1 paper used both individual expression variables and a
clustered summary measure (from WGCNA)
– Both as outcomes and predictors
45. Genetc/sample Data
• Two papers used WGS
– Tong collapsed variants (common and rare) within genes, used
142 unrelated individuals from families
– Pitsillides used common SNPs, used all individuals in families
• Ainsworth used GWAS (common SNPs), all individuals in
families
• Radkowski did not use genetc data
– Used 340 family members without baseline HT or HT at frst visit
• All used real SBP, DBP, HT
46. Pedigree relatonships
• Ainsworth & Pitsillides used linear mixed models
when modeling SNPs as predictors (for family
data)
• Tong used unrelated individuals
• Two papers ignored family relatonships
– When relatng E to P (Ainsworth & Radkowski)
– Or when doing causal modeling (Ainsworth)
47. Methods
• Linear mixed models: lmekin and FaST-LMM
• Unrelated individuals (Tong)
– Non-parametric weighted U statstcs
– Models similarites in genotype (burden), gene expression and phenotype
• Causal modeling: structural equaton models (SEM) and Bayesian
Unifed Framework (BUF) (Ainsworth)
– Applied to a set of fltered variables for G, E, P
• Predictng future HT (Radkowski)
– Calculated slope of regression of BP on tme-point
– Multple regression of slope on gene expression (with/without adjustment
for medicaton efect)
48. Results
• No p values reached statstcal signifcance (once multple
testng taken into account)
– Probably due to low power
– Nevertheless all papers presented their “top fndings”
• Incorporaton of both G and E improved signifcance of
associaton test (compared to G or E alone) (Tong)
• Adjustment for efect of medicaton gave a larger number of
“signifcant” results than non-adjustment (Radkowski)
• SEM and BUF implicated very similar causal models (Ainsworth)
50. Results
• No p values reached statstcal signifcance (once multple
testng taken into account)
– Probably due to low power
– Nevertheless all papers presented their “top fndings”
• Incorporaton of both G and E improved signifcance of
associaton test (compared to G or E alone) (Tong)
• Adjustment for efect of medicaton gave a larger number of
“signifcant” results than non-adjustment (Radkowski)
• SEM and BUF implicated very similar causal models (Ainsworth)
52. Causal modeling (Ainsworth)
• SEM always implicated either model (b) or (d)
– Model (d) was not considered by BUF, model (f) was implicated
instead
• Generally good agreement between SEM and BUF
53. Sub-group Conclusions
• Top results show no replicaton of previous fndings
– Diferent (Mexican-American) populaton?
– Low power?
• Lots of diferent ways to consider gene expression data
– Incorporate directly into analysis of G and P (e.g. to improve
power)
– Use directly as outcome
– As predictor of (future) phenotype
– To infer causal relatonships
54. Group-wide Conclusions
• Documented complexity of gene expression
– One-gene at-a-tme vs. multple genes
simultaneously
– Multple alleles contribute to a single eQTL region
• Power
– High for genotype -> expression (inc. epistasis)
– Low for genotype/expression -> phenotype
– Pedigrees present challenges, but can be useful