UKBEC (United Kingdom Brain Expression Consortium) has the aim of studying the mechanisms of gene expression regulation in human brain. For that, it works on the creation of regulation models based on (1) expression quantitative trait loci, (2) allele specific expression and (3) co-expression networks. During the first data release from the consortium, braineac.org was created to facilitate sharing results with the research community. Braineac is a database of gene expression and its regulation for 10 brain regions based on samples collected by the Medical Research Council (MRC) Sudden Death Brain and Tissue Bank, Edinburgh, UK from 134 neuro-pathologically normal individuals. Gene expression profiling were based on Affymetrix Human Exon 1.0 ST Arrays. Genotyping were performed with Illumina Infinium Omni1-Quad BeadChip and on Immunochip. We will introduce this resource, currently hosted at Universidad de Murcia, and explain how it can help researchers working on brain diseases. On a second part of the talk, we will focus on the creation of the second release of the Braineac resource, based on RNA-seq technology that allows a genome-wide study of regulation mechanisms.
MANUFACTURING PROCESS-II UNIT-1 THEORY OF METAL CUTTING
Bioinformática aplicada al estudio del control de la expresión de genes en el cerebro humano
1. Juan
A.
Bo*a
Ins-tute
of
Neurology,
University
College
London,
UK
Facultad
de
Informá-ca,
Universidad
de
Murcia,
Spain
Algorithmic
Approaches
for
the
construc3on
of
gene
co-‐expression
networks
from
control
brain
3ssue
samples
mRNA
RNA-‐seq
Substan-a
nigra
and
Putamen
brain
co-‐expression
networks
on
the
UKBEC
project
to
study
Parkinson’s
Disease
03/04/17
Conferencias
de
Inves-gación
para
Posgrado,
Fac.
Informátca,
UCM
2
2. The
central
dogma
of
biology
source
Wikipedia
We
use
pre-‐mRNA
03/04/17
Conferencias
de
Inves-gación
para
Posgrado,
Fac.
Informátca,
UCM
3
3. Chapter
I.
The
dataset
03/04/17
Conferencias
de
Inves-gación
para
Posgrado,
Fac.
Informátca,
UCM
4
4. Braineacv2,
RNA-‐seq
based,
focused
on
Parkinson’s
Disease
l Affects 1% to 2% of the population older than 65 years
l Symptons: resting tremor, bradykinesia, rigidity and impairment in ability
to initiate and sustain movements
l The hallmark of this disease is the progressive loss of dopaminergic
neurons, mainly in the substantia nigra
03/04/17
Conferencias
de
Inves-gación
para
Posgrado,
Fac.
Informátca,
UCM
5
5. Chapter
II.
The
computa-onal
model
03/04/17
Conferencias
de
Inves-gación
para
Posgrado,
Fac.
Informátca,
UCM
6
6. Network
analysis:
aprioris-c
versus
free
approaches
03/04/17
Conferencias
de
Inves-gación
para
Posgrado,
Fac.
Informátca,
UCM
7
7. Are
networks
something
more
than
a
fancy
graph
and
nice
plots?
Yes
they
are!!
• Can
be
used
to
iden-fy
the
ac-ve
pathways
in
specific
samples
(cases
vs.
controls)
• Describe
subsystems
(i.e.
cell
types)
• Iden-fy
candidate
genes
(GBA)
03/04/17
Conferencias
de
Inves-gación
para
Posgrado,
Fac.
Informátca,
UCM
8
8. To
create
networks
we
need
to
es-mate
links
between
genes
03/04/17
Conferencias
de
Inves-gación
para
Posgrado,
Fac.
Informátca,
UCM
9
9. From
gene
expression
to
gene
co-‐expression
networks
TREM2
forms
a
receptor
signaling
complex
with
TYROBP,
which
triggers
the
ac-va-on
of
immune
responses
in
macrophages
and
dendri-c
cells,
and
the
func-onal
polymorphism
of
TREM2
is
r e p o r t e d
t o
b e
a s s o c i a t e d
w i t h
neurodegenera-ve
disorders
such
as
Alzheimer’s
disease
(AD).
03/04/17
Conferencias
de
Inves-gación
para
Posgrado,
Fac.
Informátca,
UCM
10
10. From
gene
expression
to
gene
co-‐expression
networks
TREM2
forms
a
receptor
signaling
complex
with
TYROBP,
which
triggers
the
ac-va-on
of
immune
responses
in
macrophages
and
dendri-c
cells,
and
the
func-onal
polymorphism
of
TREM2
is
r e p o r t e d
t o
b e
a s s o c i a t e d
w i t h
neurodegenera-ve
disorders
such
as
Alzheimer’s
disease
(AD).
03/04/17
Conferencias
de
Inves-gación
para
Posgrado,
Fac.
Informátca,
UCM
11
11. From
gene
expression
to
gene
co-‐expression
networks
TREM2
forms
a
receptor
signaling
complex
with
TYROBP,
which
triggers
the
ac-va-on
of
immune
responses
in
macrophages
and
dendri-c
cells,
and
the
func-onal
polymorphism
of
TREM2
is
r e p o r t e d
t o
b e
a s s o c i a t e d
w i t h
neurodegenera-ve
disorders
such
as
Alzheimer’s
disease
(AD).
TYROBP
TREM2
0.76
03/04/17
Conferencias
de
Inves-gación
para
Posgrado,
Fac.
Informátca,
UCM
12
12. From
gene
expression
to
gene
co-‐expression
networks
TREM2
forms
a
receptor
signaling
complex
with
TYROBP,
which
triggers
the
ac-va-on
of
immune
responses
in
macrophages
and
dendri-c
cells,
and
the
func-onal
polymorphism
of
TREM2
is
r e p o r t e d
t o
b e
a s s o c i a t e d
w i t h
neurodegenera-ve
disorders
such
as
Alzheimer’s
disease
(AD).
TYROBP
TREM2
0.76
03/04/17
Conferencias
de
Inves-gación
para
Posgrado,
Fac.
Informátca,
UCM
13
13. But
before
reaching
that
• Scale
free
topology
assump-on
– The
degree
distribu-on
p(k)
of
a
network
follows
a
power
law
so
p(k)
~
k-‐ϒ
– Evidence
supports
this
for
many
organisms
(ϒ
is
approx.
2.2)
03/04/17
Conferencias
de
Inves-gación
para
Posgrado,
Fac.
Informátca,
UCM
14
14. But
before
reaching
that
(&
2)
• Modularity
assump-on
– Varia-on
coefficient
of
organisms,
Ci
=︎
2n/ki(ki
–
1)
with
n
number
of
direct
links
connec-ng
the
ki
nearest
neighbours
of
i-‐th
node,
suggests
strong
modular
organiza-on
– Evidence
suggests
the
coefficient
of
varia-on
is
higher
than
expected
in
SFT
networks
03/04/17
Conferencias
de
Inves-gación
para
Posgrado,
Fac.
Informátca,
UCM
15
15. But
before
reaching
that
(&
3)
• Hierarchies
solve
this
apparent
dilemma
03/04/17
Conferencias
de
Inves-gación
para
Posgrado,
Fac.
Informátca,
UCM
16
16. Chapter
III.
The
problem
03/04/17
Conferencias
de
Inves-gación
para
Posgrado,
Fac.
Informátca,
UCM
17
17. Our
main
focus:
Parkison's
Disease
l Affects 1% to 2% of the population older than 65 years
l Symptons: resting tremor, bradykinesia, rigidity and impairment in ability
to initiate and sustain movements
l The hallmark of this disease is the progressive loss of dopaminergic
neurons, mainly in the substantia nigra excitatory
inhibitory
Substantia Nigra
Pars Compacta
Brain regions most typically
affected by adult-onset disease
03/04/17
Conferencias
de
Inves-gación
para
Posgrado,
Fac.
Informátca,
UCM
18
18. Step 1: RPKM exonic gene quantification and
CQN normalization
Step 2: RPKM-CQN > 0.2 & missingness < 70%
Step 3: Data correcting for Sex, Age and 7/8 Peer
axes
Step 4: WGCNA “signed” network construction
Step 5: k-Means optimization of module partitions
Step 6: Network assessment
Step 7: Within tissue and between tissues
subsystem characterization
33670 Ensembl genes
Approx. 19K genes, two
datasets
Two corrected
datasets
SNIG and PUTM networks
And gene modules assignment
Modified gene modules
assignment for SNIG and
PUTM
Quality metrics for networks and
Gene partitions
Functional characterization,
correlation with traits, gene
function prediction
Steps on the pipeline Outcomes
03/04/17
Conferencias
de
Inves-gación
para
Posgrado,
Fac.
Informátca,
UCM
19
20. A
measure
of
similarity
between
genes,
values
in
[0,1]
From
similarity
to
adjacency,
hard
thresholding
From
similarity
to
adjacency,
sou
thresholding
From
adjacency
to
TOM
values
03/04/17
Conferencias
de
Inves-gación
para
Posgrado,
Fac.
Informátca,
UCM
21
21. From
TOM
values
to
clusters
by
1-‐TOM
as
a
distance
complete
linkage
hierarchical
approach
for
clustering
summarisa-on
based
on
eigenvalue
03/04/17
Conferencias
de
Inves-gación
para
Posgrado,
Fac.
Informátca,
UCM
22
22. l Hierarchical clustering's results are highly variable
depending on linkage (max/complete, min/single, average
linkages)
l Module membership (MM) of g is the correlation of g and
the 1st PC of gene expression (module eigengene)
l This doesn't necessarily mean all genes are in the best
module according to MM
l Previous approaches based on reassigning some/all genes
l k-means algorithm helps finding a better partition in which
genes are (hopefully) assigned to a module in a more
natural way
Why do we need an optimization process for
WGCNA
03/04/17
Conferencias
de
Inves-gación
para
Posgrado,
Fac.
Informátca,
UCM
23
23. A
k-‐means
heuris-c
How
does
it
work?
03/04/17
Conferencias
de
Inves-gación
para
Posgrado,
Fac.
Informátca,
UCM
24
30. Outline
of
the
op-miza-on
Accepted
in
BCM
Systems
Biology
03/04/17
Conferencias
de
Inves-gación
para
Posgrado,
Fac.
Informátca,
UCM
31
31. Chapter
IV.
The
results
03/04/17
Conferencias
de
Inves-gación
para
Posgrado,
Fac.
Informátca,
UCM
32
32. What we get from the optimization
• More
accurate
par--on
construc-on
• Bever
func-on
annota-on
for
modules
• Bever
cell
markers
enrichment
• More
preserved
modules
across
similar
-ssues
03/04/17
Conferencias
de
Inves-gación
para
Posgrado,
Fac.
Informátca,
UCM
33
33. How to assess the accuracy of a
co-expression network
cluster driven
validation
data driven
validation by
replication
Are the gene groups
good according to a
given index
same tissue similar tissue
same network
model
diff. network
model
Biology:
Does my module
make sense?
functional
characterization
03/04/17
Conferencias
de
Inves-gación
para
Posgrado,
Fac.
Informátca,
UCM
34
34. How to assess the accuracy of a
co-expression network
cluster driven
validation
data driven
validation by
replication
Are the gene groups
good according to a
given index
same tissue similar tissue
same network
model
diff. network
model
Biology:
Does my module
make sense?
functional
characterization
03/04/17
Conferencias
de
Inves-gación
para
Posgrado,
Fac.
Informátca,
UCM
35
35. Replication in GTEx GNAT networks for
Substantia Nigra
lightgreen
midnightblue
cyan
tan
turquoise
grey60
lightyellow
green
pink
blue
magenta
purple
yellow
red
black
lightcyan
brown
salmon
greenyellow
Mantel fold SNIG GTEx coexpression within
0.0
0.5
1.0
1.5
2.0
2.5
3.0
*** 340
*** 412
*** 449
*** 385
*** 574
** 295
*** 250
*** 427
*** 457
*** 783
*** 505
*** 417
*** 579
*** 521
244
* 410
260
88
372
red
purple
magenta
turquoise
blue
yellow
lightyellow
cyan
lightcyan
tan
green
grey60
midnightblue
lightgreen
pink
brown
greenyellow
black
salmon
Mantel fold SNIG microarray binary between
0.0
0.5
1.0
1.5
2.0
*** 701
*** 624
*** 760
*** 837
*** 1070
*** 837
*** 365
*** 624
*** 653
*** 460
*** 658
477
475
417
743
406
579
402
149
03/04/17
Conferencias
de
Inves-gación
para
Posgrado,
Fac.
Informátca,
UCM
36
36. Replication in GTEx GNAT networks for
Putamen
lightcyan
grey60
yellow
salmon
greenyellow
pink
green
black
tan
brown
purple
magenta
turquoise
lightgreen
midnightblue
blue
cyan
Mantel fold PUTM GTEx coexpression within
0.0
0.5
1.0
1.5
2.0
2.5
3.0
*** 429
*** 275
* 72
*** 372
*** 268
*** 444
*** 486
*** 484
*** 541
*** 611
*** 574
*** 617
*** 546
*** 440
*** 461
** 386
*** 759
greenyellow
salmon
lightcyan
brown
green
pink
grey60
cyan
tan
magenta
black
lightgreen
purple
turquoise
midnightblue
blue
yellow
Mantel fold PUTM GTEx binary between
0.0
0.5
1.0
1.5
2.0
2.5
*** 268
*** 372
*** 429
*** 611
*** 486
*** 444
*** 275
*** 759
*** 541
*** 617
*** 484
*** 440
*** 574
*** 546
*** 461
*** 386
72
03/04/17
Conferencias
de
Inves-gación
para
Posgrado,
Fac.
Informátca,
UCM
37
37. How to assess the accuracy of a
co-expression network
cluster driven
validation
data driven
validation by
replication
Are the gene groups
good according to a
given index
same tissue similar tissue
same network
model
diff. network
model
Biology:
Does my module
make sense?
functional
characterization
03/04/17
Conferencias
de
Inves-gación
para
Posgrado,
Fac.
Informátca,
UCM
38
38. Asignment
of
biological
func3on
to
modules
with
gProfiler
• Based
on
GO
(BP,
MF,
CC)
and
gProfileR
• Fisher's
exact
test
and
Bonferroni
corrected
p-‐values
• What
should
we
expect?
• Normal
cell
processes
like
respira-on,
cell
development,
immune
func-on
• But
also
brain
related
terms
(hopefully
movement
disorders,
signalling)
in
some
of
the
modules
• What
should
we
consider
when
looking
for
enrichment?
• GO
is
not
a
closed
world
ontology
• Something
not
found
doesn't
imply
it
doesn't
exist
• Genes
can
play
new
roles
• Groups
of
genes
can
have
new
func-ons
• It
is
possible
to
find
modules
with
no
GO
and
s-ll
be
valid
03/04/17
Conferencias
de
Inves-gación
para
Posgrado,
Fac.
Informátca,
UCM
39
39. Significant similarities in practically all modules
This is a tabular
View of significant
agreements
(Fisher's Exact test)
on genes between
modules from the
two tissues
03/04/17
Conferencias
de
Inves-gación
para
Posgrado,
Fac.
Informátca,
UCM
40
41. Subsystems
cell type & function
Neuron cells,
Synapse/NADH
Microglia cells,
Immune system
Nucleus,
transcription
Neuron, astrocytes & microglia cell types
Response to
stimulus
Endothelial cell
type,
Cell division
Oligodendrocytes cell type,
synapse & ion transport
Mitochondrion
Cytosolic
rybosome
Ubiqutin
03/04/17
Conferencias
de
Inves-gación
para
Posgrado,
Fac.
Informátca,
UCM
42
42. Lessons learned
l The default WGCNA can be improved to get more
coherent gene groups
l Network analysis reveals
l cell specific subsystems in putamen and substantia
nigra
l Interesting differences between the two tissues at the
subsystem level
Ongoing work
l Models to explain the differences between subsystems
l Function prediction for non coding species and intergenic
regions
03/04/17
Conferencias
de
Inves-gación
para
Posgrado,
Fac.
Informátca,
UCM
43