Gene expression group presentation at GAW 19

Gene expression group
• Fearless leader: Rita Cantor

General structure of our group presentatons
• 3 subgroups
– Gene expression alone (Renaud Tissier)
– Genetcs of gene expression (August Blackburn)
– Genetcs of gene expression and phenotype
(Heather Cordell)

Biological/technical background

X X X
X
5%
30%
1% 60% 2%
Cytotoxic
Helper

~11,022 genes from 20,634 probes
# of probes per gene symbol
Probes # of Genes
1 10,528
2 469
3 23
4 2
exon
Gene
intron
Alternatve
Splicing

Aims
• Understand the correlaton structure of
expression of 1000s of genes across
individuals, pedigrees
• … and their relatonship to phenotype (SBP)

Data used
• All probes; all individuals; no phenotype
(Gallaugher –P3)
• WGCNA to get 14K probes; 82 individuals with
SBP>75% at all 4 tme points in real data
(Gadaleta)
• 25% most heritable probes (4.9K, Göring et al.,
2007); 5 largest pedigrees (2,5,6,8,10) (n=276);
SBP visit 1, rep 1 of simulated (Tissier –P6)
• External data (HaemAtlas, DAVID; Gallaugher &
Tissier)
• No-one used genotypes or WGS (yet)

Methods
• Principal Components (Gallaugher –P3 &
Tissier – P6)
• Lasso regression (Gadaleta)
• WGCNA: weighted gene co-expression
network analysis (Gadaleta & Tissier)
• Meta-analysis across pedigrees (Tissier)
• Gene enrichment (Tissier)
• Linear Mixed Models (Tissier & Gallaugher)

Gallaugher
• T-, B- lymphocyte and monocyte counts vary
between people, are heritable, and thus may
confound genetc mapping of eQTL
• Principal Component analysis to identfy
variaton in gene expression between people
• Determine if PCs associated with variables
(age, sex, BP, HT, medicaton, pedigree)

Peds 5, 6, and 8 signifcantly diferent for PC2 (p<10-3)

Estmate proporton of cells for each individual
using sorted cell expression data (HaemAtlas)
Cytotoxic T
cell
proporton
helper T cell proporton
Gross outlier from
ped #8 for both Tc
and Th
? Acute infecton
Conclusions:
Variaton in gene
expression in PBMC
could be
incorporated into
genetc analysis to
improve power

WGCNA: Weighted Gene Co-expression Network Analysis
(Tissier & Gadaleta)

5K genes
Gene clusters
Tissier

NETWORK
CONSTRUCTION
ii
Gadaleta

GAW DATA ANALYSIS
WGCNA
SBP @75%
samples
probes less probes
less samples
Gadaleta

GENERAL
IDEA
min
penalty (sparsity)
gene matrix
covariance matrix
(associaton)
response
Gadaleta

RESULTS /CONCLUSION
No signifcant gene networks detected in cases with SBP>75%
Small number of samples vs. high number of covariates
Computatonal burden of LASSO too high
Gadaleta

Sub-group Conclusions
• Complex correlaton structure of gene expression
(Gallaugher)
– Diferent for specifc pedigrees ; outlier
– Biological (rare variants) or technical (mixed cells, batch efects,
acute illness)
• Only 1 gene (DUSP1) was in the answers (Tissier)
– Meta-analysis across pedigrees can be more robust for fltering
than correctng for family structure
• High-dimensional data needs larger sample sizes and
controls (Gadaleta)
– Diferental network analysis

Identfying Genetc contributon to Gene
Expression
• All used pedigree genotype and expression data
• Cis-eQTL regions genetc architecture (Cantor, 3 genes
with high eQTL LODs (Göring 2007), Imputed genotype
dosages)
• Allele Specifc Binding flters potental regulatory SNPs
(Peralta – P4, ENCODE, Imputed genotype dosages)
• Replicaton of reported epistatc interactons (candidate
SNPs (Hemani, 2014), GWAS)
• Haplotype specifc gene expression estmates (Blackburn
– P2, RFSs identfed using HIPster, GWAS data)

Independent Associatons for 3 Genes with best eQTL
(LODs 37-43): alpha = 0.05
Gene Enumeraton of Independent Signals by Sofware
FaST-LMM SOLAR MGA
# SNPs
Conditoned
on
# Signifcant
SNPs
Minimum
P-value
# Signifcant SNPs Minimum
P-value
TIMM10
0 25 2.9e-68 24 1.6e-66
1 23 2.2e-87 23 9.9e-86
2 10 5.0e-07 10 1.9e-07
3 2 0.03
4
1 0.04
RPL14 0 73 1.5e-128 74 3.80e-124
1 29 0.001 29 0.0009
2 13 0.006 13 0.003
3 11 0.006 4 0.01
4 1 0.02 2 0.03
5 1 0.04
LR8 0 67 3.6e-86 65 9.2e-83
1 39 2.2e-24 55 2.1e-22
2 47 1.1e-11
3 46 0.0001
4 37 0.0001
5 40 0.0003
6 23 0.0004
7 14 0.00002
8 14 0.003
9 8 0.002

Gene
Name Probe_id
Original
LOD
Bp
range
# SNPs conditoned
on # sig SNPs
Min
p-val
TIMM10 GI_6912707-S 37 12120 01
89
1.6e-66
9.9e-86
RPL14 GI_16753224-S 34 14582 0
29 3.8e-124
LR8
GI_21361500-S 43 19100 012
29
14
1
9.2e-83
2.1e-22
1.1e-11
Independent Associatons
SOLAR-MGA; alpha = 5e-8
Conclusions:
• Multple independent SNPs contribute to single eQTL regions
• Number of independent cis eQTL associatons varies with the
level of signifcance and sofware used

Expression
• Cis-eQTL regions genetc architecture (Cantor, Siegmund,
3 genes with high eQTL LODs (Göring 2007), Imputed
genotype dosages)
• Allele Specifc Binding (ASB) flters potental regulatory
SNPs (Peralta - P4, ENCODE, Imputed genotype dosages)

http://www.genome.duke.edu/labs/crawford/images/dnase.gif
http://www.discoveryandinnovation.com/BIOL202/notes/lecture18.html
Peralta P4
ENCODE

10,552 ASB SNPs used to build the covariance
kernel
Null model
10k simulated phenotypes
0.15 < h2r < 0.25
0.01 < afreq < 0.50
Significant eQTL signals obtained for the 2 ASB based covariance kernels used
Peralta P4

Peralta – P4
• ASB is a biologically meaningful flter for the prioritzaton
of non-coding variaton
– can be used to prioritze non-coding variants based on potental
regulatory functon
• ASB correlates with gene expression levels
– cis-ASB accounts for 53-83% of the variaton in neigboring gene
expression
• Segregaton of ASB in pedigrees can act as a background
noise flter
– known biases in ASB predicton can be incorporated as weights
into the correlaton kernel to improve signal to noise rato

Expression
• Cis-eQTL regions genetc architecture (Cantor, Siegmund, 3
genes with high eQTL LODs (Göring 2007), Imputed genotype
dosages)
• Allele Specifc Binding flters potental regulatory SNPs (Peralta –
P4, ENCODE, Imputed genotype dosages)
• Replicaton of reported epistatc interactons (Howey,
candidate SNPs, GWAS data)
– Hemani et al. Detection and replication of epistasis influencing
transcription in humans. Nature. 2014 508:249–253.
• Haplotype specifc gene expression estmates (Blackburn – P2,
RFSs identfed using HIPster, GWAS data)

Evidence for replicaton of epistasis (Howey)
-

Howey Conclusions
• SNP-SNP interactons associated with gene
expressions showed combined evidence of
replicaton, p-value= 0.007
• Expression data is argued to give higher power
for detectng associaton. This replicaton
exercise seems to refect this

Expression
• Cis-eQTL regions genetc architecture (Cantor, Siegmund,
3 genes with high eQTL LODs (Göring 2007), Imputed
genotype dosages)
• Allele Specifc Binding flters potental regulatory SNPs
(Peralta – P4, ENCODE, Imputed genotype dosages)

Blackburn – P2
• Aim: To estmate haplotype-specifc gene
expression levels and identfy diferences
• Methods:
– Phased genotypes / IBD structure using HIPster.
Identfed recombinaton free segments (RFS).
– Haplotype specifc estmates generated using EM
– Diferences between haplotypes assessed using LRT

Recombinaton free segments (RFS)
Blackburn
Recombination Free Segment Lengths
Length in bases
Frequency
0 50000 150000 250000
0 500 1000 1500

Haplotype diferences (Blackburn)
• Null simulaton adheres to uniform
distributon
• 542 of 8624 tests signifcant (q<0.1)
Haploytpe specific cis−eQTL
p
Frequency
0.0 0.2 0.4 0.6 0.8 1.0
0 100 300 500
pi0=0.725
−3 −2 −1 0 1 2 3
0.0 PTGS2
Expression
density
0.2 0.4 0.6
T2DG0800492_1

Methods Adjustng for Non-Independence
Due to Relatedness
• Theoretcal kinship matrix
– Variance component (Peralta, SOLAR)
– Eigensimplifcaton (Blackburn & Cantor, SOLAR-MGA)
• Empirical kinship matrix
– Linear mixed model (Cantor, FaST-LMM & Howey,
GEMMA)

Advantages of Pedigrees
• Permit identfcaton of recombinaton free
segments (RFS, Blackburn)
• True allele specifc binding (ASB) signals will
segregate (Peralta)

• Biological informaton from allele specifc binding
can be used to flter potentally functonal
regulatory SNPs
• Multple independent signals are observed at
eQTL
• Epistasis
• Expression varies between haplotypes
• Genetc architecture of gene expression is
complex (duh!)

Expression Phenotype
(SBP,DBP,HT)
• With (3 papers) or without (1 paper) use of
genotype data

Aims
• Pitsillides modeled gene expression as the primary outcome
– Also looked for enrichment of GWAS results (GWAS for SBP or
DBP) in SNPs associated with expression
• Three papers tried to model phenotypes as the primary
outcome
– Radkowski (P5) tried to model future HT using expression
– Tong investgated whether using E+G did beter than using E or G
alone
– Ainsworth (P1) fted causal models for relatonship between G, E
and P

Expression data
• 2 papers used individual expression variables as predictors
• 1 paper used individual expression variables as outcomes
– All expression variables, with SNPs located in same genetc
region used as predictors
• 1 paper used both individual expression variables and a
clustered summary measure (from WGCNA)
– Both as outcomes and predictors

Genetc/sample Data
• Two papers used WGS
– Tong collapsed variants (common and rare) within genes, used
142 unrelated individuals from families
– Pitsillides used common SNPs, used all individuals in families
• Ainsworth used GWAS (common SNPs), all individuals in
families
• Radkowski did not use genetc data
– Used 340 family members without baseline HT or HT at frst visit
• All used real SBP, DBP, HT

Pedigree relatonships
• Ainsworth & Pitsillides used linear mixed models
when modeling SNPs as predictors (for family
data)
• Tong used unrelated individuals
• Two papers ignored family relatonships
– When relatng E to P (Ainsworth & Radkowski)
– Or when doing causal modeling (Ainsworth)

Methods
• Linear mixed models: lmekin and FaST-LMM
• Unrelated individuals (Tong)
– Non-parametric weighted U statstcs
– Models similarites in genotype (burden), gene expression and phenotype
• Causal modeling: structural equaton models (SEM) and Bayesian
Unifed Framework (BUF) (Ainsworth)
– Applied to a set of fltered variables for G, E, P
• Predictng future HT (Radkowski)
– Calculated slope of regression of BP on tme-point
– Multple regression of slope on gene expression (with/without adjustment
for medicaton efect)

Results
• No p values reached statstcal signifcance (once multple
testng taken into account)
– Probably due to low power
– Nevertheless all papers presented their “top fndings”
• Incorporaton of both G and E improved signifcance of
associaton test (compared to G or E alone) (Tong)
• Adjustment for efect of medicaton gave a larger number of
“signifcant” results than non-adjustment (Radkowski)
• SEM and BUF implicated very similar causal models (Ainsworth)

Tong results
Table 1. Top 5 genes associated with SBP, DBP and HTN
E E

Causal modeling (Ainsworth)
• SEM always implicated either model (b) or (d)
– Model (d) was not considered by BUF, model (f) was implicated
instead
• Generally good agreement between SEM and BUF

• Top results show no replicaton of previous fndings
– Diferent (Mexican-American) populaton?
– Low power?
• Lots of diferent ways to consider gene expression data
– Incorporate directly into analysis of G and P (e.g. to improve
power)
– Use directly as outcome
– As predictor of (future) phenotype
– To infer causal relatonships

Group-wide Conclusions
• Documented complexity of gene expression
– One-gene at-a-tme vs. multple genes
simultaneously
– Multple alleles contribute to a single eQTL region
• Power
– High for genotype -> expression (inc. epistasis)
– Low for genotype/expression -> phenotype
– Pedigrees present challenges, but can be useful

Gene expression group presentation at GAW 19

Gene expression group presentation at GAW 19

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Viewers also liked

Viewers also liked (20)

Similar to Gene expression group presentation at GAW 19

Similar to Gene expression group presentation at GAW 19 (20)

Recently uploaded

Recently uploaded (20)

Gene expression group presentation at GAW 19