SlideShare a Scribd company logo
1 of 40
Download to read offline
GENERAL PIPELINE OF
TRANSCRIPTOMICS ANALYSIS
(Based on micro-arrays experiments)
SANTY MARQUES-LADEIRA
GENERAL PIPELINE OF ANALYSIS
Quality
Control
Raw data
Controlled
Raw data
Normalised
data
Controlled
Normalised
data
Normalisation
Quality
Control
Annotated
Normalised
data
Diff.
Expressed
genes
Diff.
Expressed
modules
Differential
Expression
Clustering
Annotation
PIPELINE OF ANALYSIS
NORMALISATION STEP
Quality
Control
Raw data
Controlled
Raw data
Normalised
data
Controlled
Normalised
data
Normalisation
Quality
Control
Annotated
Normalised
data
Diff.
Expressed
genes
Diff.
Expressed
modules
Differential
Expression
Clustering
Annotation
PIPELINE OF ANALYSIS
NORMALISATION STEP
5 Steps of Robust Multi-array Average (= RMA)
1) Background correction
2) Normalisation (across arrays)
3) Probe level intensity calculation
4) Probe set summarisation
Other normalisation algorithms
A few examples
1) MAS 5.0
2) gcRMA
3) Li Wang
…
Can be long to evaluate :
- implement the different algorithms
- find the best way to compare them
PIPELINE OF ANALYSIS
NORMALISATION STEP
PIPELINE OF ANALYSIS
QUALITY CONTROL STEP
Quality
Control
Raw data
Controlled
Raw data
Normalised
data
Controlled
Normalised
data
Normalisation
Quality
Control
Annotated
Normalised
data
Diff.
Expressed
genes
Diff.
Expressed
modules
Differential
Expression
Clustering
Annotation
Before
Normalisation
After
Normalisation
PIPELINE OF ANALYSIS
QUALITY CONTROL STEP
3 Main tests :
Matrix distances & Box-plots
PIPELINE OF ANALYSIS
QUALITY CONTROL STEP
3 Main tests :
Mean-Average plots
Before
Normalisation
After
Normalisation
PIPELINE OF ANALYSIS
ANNOTATION STEP
Quality
Control
Raw data
Controlled
Raw data
Normalised
data
Controlled
Normalised
data
Normalisation
Quality
Control
Annotated
Normalised
data
Diff.
Expressed
genes
Diff.
Expressed
modules
Differential
Expression
Clustering
Annotation
PIPELINE OF ANALYSIS
ANNOTATION STEP
Step 1 :
Annotate Transcript Cluster Probes
GENERAL CASE :
Transc ID : 00 000 000 + Gene Accession : NM… // Gene Name // Description
ELSE :
Transc ID : 00 000 000 + Gene Accession : - - - or NA + mRNAAccession : NM…
EXCEPTIONS :
Transc ID : 00 000 000 + Gene Accession : - - - or NA + mRNAAccession : ?? // NONCODE // Linc…
Transc ID : 00 000 000 + Gene Accession : - - - or NA + mRNAAccession : ?? // NONCODE // ??
KEEP ONLY Category : Main or Rescue
PIPELINE OF ANALYSIS
ANNOTATION STEP
Step 2 :
From TC probes to genes
“single
gene”
NA
“multiple
probes”
maximum
signal
verification
(iso-forms …)
PIPELINE OF ANALYSIS
DIFFERENTIAL EXPRESSION STEP
Quality
Control
Raw data
Controlled
Raw data
Normalised
data
Controlled
Normalised
data
Normalisation
Quality
Control
Annotated
Normalised
data
Diff.
Expressed
genes
Diff.
Expressed
modules
Differential
Expression
Clustering
Annotation
- Limma is a package for the analysis of gene expression data arising
from microarray or RNA-Seq technologies.
- A core capability is the use of linear models to assess differential
expression in the context of multi-factor designed experiments.
- Limma provides the ability to analyse comparisons between many
RNA targets simultaneously.
- It has features that make the analyses stable even for experiments
with small number of arrays.
PIPELINE OF ANALYSIS
DIFFERENTIAL EXPRESSION STEP
Linear Models for Micro-Array & RNA-seq
PIPELINE OF ANALYSIS
DIFFERENTIAL EXPRESSION STEP
Selection of DE threshold :
Volcano-plots
−3 −2 −1 0 1 2 3
02468
Volcano plot HIV1AZTvsNS on genes
log2 (Fold−Change)
−log10(adj−P−Value)
log10of
adjustedPvalue
log2 of Fold-Change
1%
5%
UPDOWN
PIPELINE OF ANALYSIS
DIFFERENTIAL EXPRESSION STEP
Representation of DE probes / genes :
Heat-maps
- derived from heatplot() function.
- several parameters to adapt manually
to each case.
Optimise function to detect best
parameters automatically from data
matrix.
BDCA1_CTRL_D1
BDCA1_CTRL_D2
BDCA1_CTRL_D3
BDCA1_JK_D1
BDCA1_JK_D2
BDCA1_JK_D3
BDCA1_R5_D1
BDCA1_R5_D2
BDCA1_R5_D3
BDCA3_CTRL_D1
BDCA3_CTRL_D2
BDCA3_CTRL_D3
BDCA3_JK_D1
BDCA3_JK_D2
BDCA3_JK_D3
BDCA3_R5_D1
BDCA3_R5_D2
BDCA3_R5_D3
pDC_CTRL_D1
pDC_CTRL_D2
pDC_CTRL_D3
pDC_JK_D1
pDC_JK_D2
pDC_JK_D3
pDC_R5_D1
pDC_R5_D2
pDC_R5_D3
TLR10
TLR8
AIM2
TLR3
IRAK2
TICAM1
IKBKG
TRAF3
IKBKB
NFKB1
NFKB2
TANK
TRAF6
IRAK4
IFI16
MYD88
MB21D1
TLR7
AZI2
TMEM173
TLR6
TBK1
IKBKE
TLR4
IRAK3
TLR2
TLR5
ZBP1
IRAK1
IRF3
−3 −2 −1 0 1 2 3
Value
Color Key
PIPELINE OF ANALYSIS
CLUSTERING STEP
Quality
Control
Raw data
Controlled
Raw data
Normalized
data
Controlled
Normalized
data
Normalisation
Quality
Control
Annotated
Normalized
data
Diff.
Expressed
genes
Diff.
Expressed
modules
Differential
Expression
Clustering
Annotation
PIPELINE OF ANALYSIS
CLUSTERING STEP
General overview :
- Connectivity based clustering = Hierarchical clustering
- Centroid-based clustering
- K-means
- Distribution-based clustering = Expectation-Maximisation (EM)
- “Network”-based clustering = Weighted correlation network
analysis = WGCNA
- Artificial neural network = SOM
- Density-based clustering = DBSCAN
Connectivity-based clustering
Hierarchical clustering
- Hierarchical clustering (= HCA) seeks to build a hierarchy of clusters.
Strategies for hierarchical clustering generally fall into two types:
- Agglomerative or "bottom up" approach = each observation starts
in its own cluster, and pairs of clusters are merged as one moves up
the hierarchy.
- Divisive or "top down" approach = all observations start in one
cluster, and splits are performed recursively as one moves down the
hierarchy.
CLUSTERING
HIERARCHICAL CLUSTERING
Connectivity-based clustering
Hierarchical clustering
A B C D E F G H
orderedlistofDEgenes
CLUSTERING
HIERARCHICAL CLUSTERING
Centroid-based clustering
K-means
●
●
●
●
●
●
● ●
●
● ● ● ● ● ●
2 4 6 8 10 12 14
1000200030004000
Number of Clusters
WithingroupssumofsquaresWGSS
# of cluster
●
●
●
●
●
●
●
● ● ● ● ● ● ● ●
2 4 6 8 10 12 14
20000400006000080000100000120000
Number of Clusters
WithingroupssumofsquaresWGSS
# of cluster
A B C D E F G H
orderedlistofDEgenes
CLUSTERING
K-MEANS
Centroid-based clustering
SOM
- Similar to K-means algorithm.
- Centroids used have a predetermined topographic ordering
relationship.
- For each processing, nearby centroids are also updated.
Standard SOM algorithm
CLUSTERING
SOM
Distribution-based clustering
EM
- Iterative method for finding maximum likelihood or maximum a
posteriori (MAP) estimates of parameters in statistical models,
where the model depends on unobserved latent variables.
- An expectation (E) step, which creates a function for the
expectation of the log-likelihood evaluated using the current
estimate for the parameters.
- A maximisation (M) step, which computes parameters maximising
the expected log-likelihood found on the E step.These parameter-
estimates are then used to determine the distribution of the latent
variables in the next E step.
CLUSTERING
EM
Distribution-based clustering
EM
A B C D E F G H
orderedlistofDEgenes
2 4 6 8
−14000−12000−10000−8000−6000
Number of components
BIC
●
●
●
●
● ●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
● ●
●
●
●
●
●
●
● ●
●
●
●
●
●EII
VII
EEI
VEI
EVI
VVI
EEE
EEV
VEV
VVV
BIC
# of components
2 4 6 8
−160000−140000−120000−100000−80000−60000
Number of components
BIC
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
● ●
●
●
●
●EII
VII
EEI
VEI
EVI
VVI
EEE
EEV
VEV
VVV
BIC
# of components
CLUSTERING
EM
- Finds a number of clusters starting from the estimated
density distribution of corresponding nodes.
- 2 parameters : M minimum number of points having to be
found in a specific radius R to be considered in the same
cluster.
A = CORE points
B & C = DENSITY-REACHABLE points
N = NOISE point
Density-based clustering
DBSCAN
CLUSTERING
DBSCAN
- WGCNA cluster datasets by quantifying not only the correlations
between individual pairs of genes, but also the extent to which these
genes share the same neighbours.
- WGCNA R is usable through 2 methodology that we compared,
choosing the methodology “N-step” :
- Manual selection of parameters
- Noise taken in account
- Merging analysis
“Network”-based clustering
WGCNA
CLUSTERING
WGCNA
“Network”-based clustering
WGCNA
0.50.60.70.80.91.0
hclust (*, "average")
d
Height
0
100
200
300
400
M
odule
0
M
odule
1
M
odule
2
M
odule
3
M
odule
4
M
odule
5
M
odule
6
M
odule
7
Merging Dynamic
0
75
150
225
300
M
odule
0
M
odule
1
M
odule
2
M
odule
3
M
odule
4
M
odule
5
M
odule
6
M
odule
7
Dynamic Tree Cut
Dynamic
TreeCut
Merging
Dynamic
CLUSTERING
WGCNA
Quality
Control
Raw data
Controlled
Raw data
Normalised
data
Controlled
Normalised
data
Normalisation
Quality
Control
Annotated
Normalised
data
Diff.
Expressed
genes
Diff.
Expressed
modules
Differential
Expression
Clustering
Annotation
Specific
Analysis
Specific
Analysis
PIPELINE OF ANALYSIS
SUPPLEMENTAL STEP
SUPPLEMENTAL STEPS
VENN DIAGRAMS
Specific analysis :
Venn-n comparison
Step 1 : Calculating all possible comparisons 2n
Step 2 : Membership of all the genes
Step 3 : For each possible comparison
- Find genes corresponding
- Organise them by HC using their
expressions
INPUT
Lists of genes & data matrix
ALGORITHM
OUTPUT
List of genes organised by comparison & sub-organised
by HC
Specific analysis :
Venn-n comparison
−3−2−10123
NormalisedMicro-ArrayExpressionValues
CTRL CTRL CTRLJK JK JKR5 R5 R5
BDCA1 BDCA3 pDC
JK JK JKR5 R5 R5
pDCBDCA3BDCA1
1 2 3 1 2 3 1 2 3 1 2 3 1 2 3 1 2 3 1 2 3 1 2 3 1 2 3
DEgenes
SUPPLEMENTAL STEPS
VENN DIAGRAMS
Specific analysis :
X-Y plots (correlation)
−2 0 2 4
−2024
HIV1vsNS log2(FC)
HIV1+AZTvsNSlog2(FC)log2(Fold-Change[condition1])
log2( Fold-Change[condition2] )
SUPPLEMENTAL STEPS
CORRELATION
Functional Enrichment analysis
Database for Annotation,Visualisation & Integrated Discovery
Some numbers about DAVID :
- More than 16 000 citations
- Around 40 Nature-branded
citations
- Daily Usage : 1200 gene lists/
sublists & 400 unique researchers 0
1000
2000
3000
4000
2004 2006 2008 2010 2012 2014
SUPPLEMENTAL STEPS
ENRICHMENT ANALYSIS
Functional Enrichment analysis
Gene Set Enrichment Analysis
Get ranked
list L of all the
genes on the
chip based
on a chosen
measure
(expression, FC…)
A B
rankedlistof
genes
For each
gene set S:
find the
location of
each gene s
in S within L
Generate enrichment
score ES for S based
on running-sum statistic:
“reward” presence of s
toward top or bottom of L
runningsum
L
Analyse significance of
this Kolmogorov-Smirnov
type statistic by
permutation testing
Multiple hypothesis testing (MHT)
error control for multiple S’s using
the estimated false discovery rate (FDR)
SUPPLEMENTAL STEPS
ENRICHMENT ANALYSIS
Input :
- Gene list or Gene sequences
- 116TF binding sites motifs in Chip-Seq JASPAR db (2010)
- Selection of distances up and/or down the gene coding sequence
Processing :
- Find specific sequence up and/or down a gene sequence
- Compute over-represented motifs
- Comparison toTF binding sequences ( Z and F score )
Output :
- List ofTF hierarchically organised by Z & F score
- Automatic representation F or Z score vs %GC content
TF Binding Site analysis
oPOSSUM-3
SUPPLEMENTAL STEPS
TF ANALYSIS
TF Binding Site analysis
oPOSSUM-3
0.0 0.2 0.4 0.6 0.8
012345
TF profile %GC composition
Fisherscore
CEBPA
ELF5
FEV
GABPA
IRF1
Klf4
MZF1_5−13
NF−kappaB
NFATC2
RELA
RUNX1
SPI1
STAT1
Stat3 Tcfcp2l1
SUPPLEMENTAL STEPS
TF ANALYSIS
TF Binding Site analysis
Other tools
Databases :
- TRANSFAC database v7.4 through GSEA (615 gene sets)
- JASPAR Core database (205 non redundant motifs)
- UCSC database (690 gene sets) based on data from ENCODETFBS
ChIP-seq production groups (from 2007 to 2012)
Tools :
- HOMER : TRANSFAC database + own motifs database through ChIP-
Seq analysis
- GSEA : usingTRANSFAC db on enriched motifs in ranked list of genes
- DAVID : Using UCSC database
- oPOSSUM : using JASPAR (116Transcription Factors)
SUPPLEMENTAL STEPS
TF ANALYSIS
- Reverse engineering models :
- Algorithm for the reconstruction of accurate cellular
networks = ARACNe
- Tool for Inferring Network of Genes =TINGe
- Other models :
- Functional networks = Cytoscape software (Bingo / ClueGO
plugins)
PIPELINE OF ANALYSIS
NETWORK STEP
General overview :
- Reverse engineering method, it is specifically designed to scale up to the
complexity of regulatory networks in mammalian cells.
- ARACNe defines an edge as an irreducible statistical dependency between
gene expression profiles that cannot be explained as an artifact of other
statistical dependencies in the network.
- An edge is likely to identify direct regulatory interactions mediated by a
transcription factor binding to a target gene's promoter region, although
other types of interactions may also be identified.
PIPELINE OF ANALYSIS
NETWORK STEP
Data networks
ARACNe
- Reverse engineer genome-scale gene networks from large number of
expression profiles based on mutual information (MI), data processing
inequality (DPI) and permutation testing to assess statistical significance
of each inferred edge.
- TINGe can be used for directly constructing high-quality networks, or it
can be used as a component along with other types of data in building
probabilistic networks.
PIPELINE OF ANALYSIS
NETWORK STEP
Data networks
TINGe
- Several Net-tools or softwares can realise it : DAVID and Cytoscape. It
links modules of genes by their functional annotation enrichment.
- Rely on annotations and not on the expression values of the
experiment. Can give somme annotations without any information for
a specific biological question.
Example of network via ClueGO
on list of diff expressed genes
PIPELINE OF ANALYSIS
NETWORK STEP
Functional networks
Cytoscape
THANKS FOR READING !

More Related Content

Similar to Transcriptomics Pipeline Analysis

[2017-05-29] DNASmartTagger
[2017-05-29] DNASmartTagger [2017-05-29] DNASmartTagger
[2017-05-29] DNASmartTagger Eli Kaminuma
 
Workshop NGS data analysis - 3
Workshop NGS data analysis - 3Workshop NGS data analysis - 3
Workshop NGS data analysis - 3Maté Ongenaert
 
sequence alignment
sequence alignmentsequence alignment
sequence alignmentammar kareem
 
Machine learning for_finance
Machine learning for_financeMachine learning for_finance
Machine learning for_financeStefan Duprey
 
Svd filtered temporal usage clustering
Svd filtered temporal usage clusteringSvd filtered temporal usage clustering
Svd filtered temporal usage clusteringLiang Xie, PhD
 
FPGA Implementation of a GA
FPGA Implementation of a GAFPGA Implementation of a GA
FPGA Implementation of a GAHocine Merabti
 
CCC-Bicluster Analysis for Time Series Gene Expression Data
CCC-Bicluster Analysis for Time Series Gene Expression DataCCC-Bicluster Analysis for Time Series Gene Expression Data
CCC-Bicluster Analysis for Time Series Gene Expression DataIRJET Journal
 
Signal Processing IEEE 2015 Projects
Signal Processing IEEE 2015 ProjectsSignal Processing IEEE 2015 Projects
Signal Processing IEEE 2015 ProjectsVijay Karan
 
Application of support vector machines for prediction of anti hiv activity of...
Application of support vector machines for prediction of anti hiv activity of...Application of support vector machines for prediction of anti hiv activity of...
Application of support vector machines for prediction of anti hiv activity of...Alexander Decker
 
M phil-computer-science-signal-processing-projects
M phil-computer-science-signal-processing-projectsM phil-computer-science-signal-processing-projects
M phil-computer-science-signal-processing-projectsVijay Karan
 
Signal Processing IEEE 2015 Projects
Signal Processing IEEE 2015 ProjectsSignal Processing IEEE 2015 Projects
Signal Processing IEEE 2015 ProjectsVijay Karan
 
RDataMining-reference-card
RDataMining-reference-cardRDataMining-reference-card
RDataMining-reference-cardYanchang Zhao
 

Similar to Transcriptomics Pipeline Analysis (20)

[2017-05-29] DNASmartTagger
[2017-05-29] DNASmartTagger [2017-05-29] DNASmartTagger
[2017-05-29] DNASmartTagger
 
Workshop NGS data analysis - 3
Workshop NGS data analysis - 3Workshop NGS data analysis - 3
Workshop NGS data analysis - 3
 
sequence alignment
sequence alignmentsequence alignment
sequence alignment
 
Machine learning for_finance
Machine learning for_financeMachine learning for_finance
Machine learning for_finance
 
Svd filtered temporal usage clustering
Svd filtered temporal usage clusteringSvd filtered temporal usage clustering
Svd filtered temporal usage clustering
 
FPGA Implementation of a GA
FPGA Implementation of a GAFPGA Implementation of a GA
FPGA Implementation of a GA
 
Atomreaktor
AtomreaktorAtomreaktor
Atomreaktor
 
Kq2418061809
Kq2418061809Kq2418061809
Kq2418061809
 
Microarray Analysis
Microarray AnalysisMicroarray Analysis
Microarray Analysis
 
CCC-Bicluster Analysis for Time Series Gene Expression Data
CCC-Bicluster Analysis for Time Series Gene Expression DataCCC-Bicluster Analysis for Time Series Gene Expression Data
CCC-Bicluster Analysis for Time Series Gene Expression Data
 
Signal Processing IEEE 2015 Projects
Signal Processing IEEE 2015 ProjectsSignal Processing IEEE 2015 Projects
Signal Processing IEEE 2015 Projects
 
Application of support vector machines for prediction of anti hiv activity of...
Application of support vector machines for prediction of anti hiv activity of...Application of support vector machines for prediction of anti hiv activity of...
Application of support vector machines for prediction of anti hiv activity of...
 
M phil-computer-science-signal-processing-projects
M phil-computer-science-signal-processing-projectsM phil-computer-science-signal-processing-projects
M phil-computer-science-signal-processing-projects
 
Path loss prediction
Path loss predictionPath loss prediction
Path loss prediction
 
P1121133727
P1121133727P1121133727
P1121133727
 
Signal Processing IEEE 2015 Projects
Signal Processing IEEE 2015 ProjectsSignal Processing IEEE 2015 Projects
Signal Processing IEEE 2015 Projects
 
RDataMining-reference-card
RDataMining-reference-cardRDataMining-reference-card
RDataMining-reference-card
 
Machine Learning - Unsupervised Learning
Machine Learning - Unsupervised LearningMachine Learning - Unsupervised Learning
Machine Learning - Unsupervised Learning
 
Ijetr042111
Ijetr042111Ijetr042111
Ijetr042111
 
Boston housing data analysis
Boston housing data analysisBoston housing data analysis
Boston housing data analysis
 

Recently uploaded

Botany 4th semester series (krishna).pdf
Botany 4th semester series (krishna).pdfBotany 4th semester series (krishna).pdf
Botany 4th semester series (krishna).pdfSumit Kumar yadav
 
Orientation, design and principles of polyhouse
Orientation, design and principles of polyhouseOrientation, design and principles of polyhouse
Orientation, design and principles of polyhousejana861314
 
Natural Polymer Based Nanomaterials
Natural Polymer Based NanomaterialsNatural Polymer Based Nanomaterials
Natural Polymer Based NanomaterialsAArockiyaNisha
 
Is RISC-V ready for HPC workload? Maybe?
Is RISC-V ready for HPC workload? Maybe?Is RISC-V ready for HPC workload? Maybe?
Is RISC-V ready for HPC workload? Maybe?Patrick Diehl
 
Bentham & Hooker's Classification. along with the merits and demerits of the ...
Bentham & Hooker's Classification. along with the merits and demerits of the ...Bentham & Hooker's Classification. along with the merits and demerits of the ...
Bentham & Hooker's Classification. along with the merits and demerits of the ...Nistarini College, Purulia (W.B) India
 
A relative description on Sonoporation.pdf
A relative description on Sonoporation.pdfA relative description on Sonoporation.pdf
A relative description on Sonoporation.pdfnehabiju2046
 
Pests of cotton_Sucking_Pests_Dr.UPR.pdf
Pests of cotton_Sucking_Pests_Dr.UPR.pdfPests of cotton_Sucking_Pests_Dr.UPR.pdf
Pests of cotton_Sucking_Pests_Dr.UPR.pdfPirithiRaju
 
Animal Communication- Auditory and Visual.pptx
Animal Communication- Auditory and Visual.pptxAnimal Communication- Auditory and Visual.pptx
Animal Communication- Auditory and Visual.pptxUmerFayaz5
 
Spermiogenesis or Spermateleosis or metamorphosis of spermatid
Spermiogenesis or Spermateleosis or metamorphosis of spermatidSpermiogenesis or Spermateleosis or metamorphosis of spermatid
Spermiogenesis or Spermateleosis or metamorphosis of spermatidSarthak Sekhar Mondal
 
Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...
Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...
Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...anilsa9823
 
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43b
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43bNightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43b
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43bSérgio Sacani
 
Biopesticide (2).pptx .This slides helps to know the different types of biop...
Biopesticide (2).pptx  .This slides helps to know the different types of biop...Biopesticide (2).pptx  .This slides helps to know the different types of biop...
Biopesticide (2).pptx .This slides helps to know the different types of biop...RohitNehra6
 
GFP in rDNA Technology (Biotechnology).pptx
GFP in rDNA Technology (Biotechnology).pptxGFP in rDNA Technology (Biotechnology).pptx
GFP in rDNA Technology (Biotechnology).pptxAleenaTreesaSaji
 
CALL ON ➥8923113531 🔝Call Girls Kesar Bagh Lucknow best Night Fun service 🪡
CALL ON ➥8923113531 🔝Call Girls Kesar Bagh Lucknow best Night Fun service  🪡CALL ON ➥8923113531 🔝Call Girls Kesar Bagh Lucknow best Night Fun service  🪡
CALL ON ➥8923113531 🔝Call Girls Kesar Bagh Lucknow best Night Fun service 🪡anilsa9823
 
G9 Science Q4- Week 1-2 Projectile Motion.ppt
G9 Science Q4- Week 1-2 Projectile Motion.pptG9 Science Q4- Week 1-2 Projectile Motion.ppt
G9 Science Q4- Week 1-2 Projectile Motion.pptMAESTRELLAMesa2
 
Stunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCR
Stunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCRStunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCR
Stunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCRDelhi Call girls
 
Raman spectroscopy.pptx M Pharm, M Sc, Advanced Spectral Analysis
Raman spectroscopy.pptx M Pharm, M Sc, Advanced Spectral AnalysisRaman spectroscopy.pptx M Pharm, M Sc, Advanced Spectral Analysis
Raman spectroscopy.pptx M Pharm, M Sc, Advanced Spectral AnalysisDiwakar Mishra
 
Cultivation of KODO MILLET . made by Ghanshyam pptx
Cultivation of KODO MILLET . made by Ghanshyam pptxCultivation of KODO MILLET . made by Ghanshyam pptx
Cultivation of KODO MILLET . made by Ghanshyam pptxpradhanghanshyam7136
 

Recently uploaded (20)

Botany 4th semester series (krishna).pdf
Botany 4th semester series (krishna).pdfBotany 4th semester series (krishna).pdf
Botany 4th semester series (krishna).pdf
 
Orientation, design and principles of polyhouse
Orientation, design and principles of polyhouseOrientation, design and principles of polyhouse
Orientation, design and principles of polyhouse
 
Natural Polymer Based Nanomaterials
Natural Polymer Based NanomaterialsNatural Polymer Based Nanomaterials
Natural Polymer Based Nanomaterials
 
Is RISC-V ready for HPC workload? Maybe?
Is RISC-V ready for HPC workload? Maybe?Is RISC-V ready for HPC workload? Maybe?
Is RISC-V ready for HPC workload? Maybe?
 
Bentham & Hooker's Classification. along with the merits and demerits of the ...
Bentham & Hooker's Classification. along with the merits and demerits of the ...Bentham & Hooker's Classification. along with the merits and demerits of the ...
Bentham & Hooker's Classification. along with the merits and demerits of the ...
 
CELL -Structural and Functional unit of life.pdf
CELL -Structural and Functional unit of life.pdfCELL -Structural and Functional unit of life.pdf
CELL -Structural and Functional unit of life.pdf
 
A relative description on Sonoporation.pdf
A relative description on Sonoporation.pdfA relative description on Sonoporation.pdf
A relative description on Sonoporation.pdf
 
Pests of cotton_Sucking_Pests_Dr.UPR.pdf
Pests of cotton_Sucking_Pests_Dr.UPR.pdfPests of cotton_Sucking_Pests_Dr.UPR.pdf
Pests of cotton_Sucking_Pests_Dr.UPR.pdf
 
Animal Communication- Auditory and Visual.pptx
Animal Communication- Auditory and Visual.pptxAnimal Communication- Auditory and Visual.pptx
Animal Communication- Auditory and Visual.pptx
 
Spermiogenesis or Spermateleosis or metamorphosis of spermatid
Spermiogenesis or Spermateleosis or metamorphosis of spermatidSpermiogenesis or Spermateleosis or metamorphosis of spermatid
Spermiogenesis or Spermateleosis or metamorphosis of spermatid
 
The Philosophy of Science
The Philosophy of ScienceThe Philosophy of Science
The Philosophy of Science
 
Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...
Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...
Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...
 
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43b
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43bNightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43b
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43b
 
Biopesticide (2).pptx .This slides helps to know the different types of biop...
Biopesticide (2).pptx  .This slides helps to know the different types of biop...Biopesticide (2).pptx  .This slides helps to know the different types of biop...
Biopesticide (2).pptx .This slides helps to know the different types of biop...
 
GFP in rDNA Technology (Biotechnology).pptx
GFP in rDNA Technology (Biotechnology).pptxGFP in rDNA Technology (Biotechnology).pptx
GFP in rDNA Technology (Biotechnology).pptx
 
CALL ON ➥8923113531 🔝Call Girls Kesar Bagh Lucknow best Night Fun service 🪡
CALL ON ➥8923113531 🔝Call Girls Kesar Bagh Lucknow best Night Fun service  🪡CALL ON ➥8923113531 🔝Call Girls Kesar Bagh Lucknow best Night Fun service  🪡
CALL ON ➥8923113531 🔝Call Girls Kesar Bagh Lucknow best Night Fun service 🪡
 
G9 Science Q4- Week 1-2 Projectile Motion.ppt
G9 Science Q4- Week 1-2 Projectile Motion.pptG9 Science Q4- Week 1-2 Projectile Motion.ppt
G9 Science Q4- Week 1-2 Projectile Motion.ppt
 
Stunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCR
Stunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCRStunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCR
Stunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCR
 
Raman spectroscopy.pptx M Pharm, M Sc, Advanced Spectral Analysis
Raman spectroscopy.pptx M Pharm, M Sc, Advanced Spectral AnalysisRaman spectroscopy.pptx M Pharm, M Sc, Advanced Spectral Analysis
Raman spectroscopy.pptx M Pharm, M Sc, Advanced Spectral Analysis
 
Cultivation of KODO MILLET . made by Ghanshyam pptx
Cultivation of KODO MILLET . made by Ghanshyam pptxCultivation of KODO MILLET . made by Ghanshyam pptx
Cultivation of KODO MILLET . made by Ghanshyam pptx
 

Transcriptomics Pipeline Analysis

  • 1. GENERAL PIPELINE OF TRANSCRIPTOMICS ANALYSIS (Based on micro-arrays experiments) SANTY MARQUES-LADEIRA
  • 2. GENERAL PIPELINE OF ANALYSIS Quality Control Raw data Controlled Raw data Normalised data Controlled Normalised data Normalisation Quality Control Annotated Normalised data Diff. Expressed genes Diff. Expressed modules Differential Expression Clustering Annotation
  • 3. PIPELINE OF ANALYSIS NORMALISATION STEP Quality Control Raw data Controlled Raw data Normalised data Controlled Normalised data Normalisation Quality Control Annotated Normalised data Diff. Expressed genes Diff. Expressed modules Differential Expression Clustering Annotation
  • 4. PIPELINE OF ANALYSIS NORMALISATION STEP 5 Steps of Robust Multi-array Average (= RMA) 1) Background correction 2) Normalisation (across arrays) 3) Probe level intensity calculation 4) Probe set summarisation
  • 5. Other normalisation algorithms A few examples 1) MAS 5.0 2) gcRMA 3) Li Wang … Can be long to evaluate : - implement the different algorithms - find the best way to compare them PIPELINE OF ANALYSIS NORMALISATION STEP
  • 6. PIPELINE OF ANALYSIS QUALITY CONTROL STEP Quality Control Raw data Controlled Raw data Normalised data Controlled Normalised data Normalisation Quality Control Annotated Normalised data Diff. Expressed genes Diff. Expressed modules Differential Expression Clustering Annotation
  • 7. Before Normalisation After Normalisation PIPELINE OF ANALYSIS QUALITY CONTROL STEP 3 Main tests : Matrix distances & Box-plots
  • 8. PIPELINE OF ANALYSIS QUALITY CONTROL STEP 3 Main tests : Mean-Average plots Before Normalisation After Normalisation
  • 9. PIPELINE OF ANALYSIS ANNOTATION STEP Quality Control Raw data Controlled Raw data Normalised data Controlled Normalised data Normalisation Quality Control Annotated Normalised data Diff. Expressed genes Diff. Expressed modules Differential Expression Clustering Annotation
  • 10. PIPELINE OF ANALYSIS ANNOTATION STEP Step 1 : Annotate Transcript Cluster Probes GENERAL CASE : Transc ID : 00 000 000 + Gene Accession : NM… // Gene Name // Description ELSE : Transc ID : 00 000 000 + Gene Accession : - - - or NA + mRNAAccession : NM… EXCEPTIONS : Transc ID : 00 000 000 + Gene Accession : - - - or NA + mRNAAccession : ?? // NONCODE // Linc… Transc ID : 00 000 000 + Gene Accession : - - - or NA + mRNAAccession : ?? // NONCODE // ?? KEEP ONLY Category : Main or Rescue
  • 11. PIPELINE OF ANALYSIS ANNOTATION STEP Step 2 : From TC probes to genes “single gene” NA “multiple probes” maximum signal verification (iso-forms …)
  • 12. PIPELINE OF ANALYSIS DIFFERENTIAL EXPRESSION STEP Quality Control Raw data Controlled Raw data Normalised data Controlled Normalised data Normalisation Quality Control Annotated Normalised data Diff. Expressed genes Diff. Expressed modules Differential Expression Clustering Annotation
  • 13. - Limma is a package for the analysis of gene expression data arising from microarray or RNA-Seq technologies. - A core capability is the use of linear models to assess differential expression in the context of multi-factor designed experiments. - Limma provides the ability to analyse comparisons between many RNA targets simultaneously. - It has features that make the analyses stable even for experiments with small number of arrays. PIPELINE OF ANALYSIS DIFFERENTIAL EXPRESSION STEP Linear Models for Micro-Array & RNA-seq
  • 14. PIPELINE OF ANALYSIS DIFFERENTIAL EXPRESSION STEP Selection of DE threshold : Volcano-plots −3 −2 −1 0 1 2 3 02468 Volcano plot HIV1AZTvsNS on genes log2 (Fold−Change) −log10(adj−P−Value) log10of adjustedPvalue log2 of Fold-Change 1% 5% UPDOWN
  • 15. PIPELINE OF ANALYSIS DIFFERENTIAL EXPRESSION STEP Representation of DE probes / genes : Heat-maps - derived from heatplot() function. - several parameters to adapt manually to each case. Optimise function to detect best parameters automatically from data matrix. BDCA1_CTRL_D1 BDCA1_CTRL_D2 BDCA1_CTRL_D3 BDCA1_JK_D1 BDCA1_JK_D2 BDCA1_JK_D3 BDCA1_R5_D1 BDCA1_R5_D2 BDCA1_R5_D3 BDCA3_CTRL_D1 BDCA3_CTRL_D2 BDCA3_CTRL_D3 BDCA3_JK_D1 BDCA3_JK_D2 BDCA3_JK_D3 BDCA3_R5_D1 BDCA3_R5_D2 BDCA3_R5_D3 pDC_CTRL_D1 pDC_CTRL_D2 pDC_CTRL_D3 pDC_JK_D1 pDC_JK_D2 pDC_JK_D3 pDC_R5_D1 pDC_R5_D2 pDC_R5_D3 TLR10 TLR8 AIM2 TLR3 IRAK2 TICAM1 IKBKG TRAF3 IKBKB NFKB1 NFKB2 TANK TRAF6 IRAK4 IFI16 MYD88 MB21D1 TLR7 AZI2 TMEM173 TLR6 TBK1 IKBKE TLR4 IRAK3 TLR2 TLR5 ZBP1 IRAK1 IRF3 −3 −2 −1 0 1 2 3 Value Color Key
  • 16. PIPELINE OF ANALYSIS CLUSTERING STEP Quality Control Raw data Controlled Raw data Normalized data Controlled Normalized data Normalisation Quality Control Annotated Normalized data Diff. Expressed genes Diff. Expressed modules Differential Expression Clustering Annotation
  • 17. PIPELINE OF ANALYSIS CLUSTERING STEP General overview : - Connectivity based clustering = Hierarchical clustering - Centroid-based clustering - K-means - Distribution-based clustering = Expectation-Maximisation (EM) - “Network”-based clustering = Weighted correlation network analysis = WGCNA - Artificial neural network = SOM - Density-based clustering = DBSCAN
  • 18. Connectivity-based clustering Hierarchical clustering - Hierarchical clustering (= HCA) seeks to build a hierarchy of clusters. Strategies for hierarchical clustering generally fall into two types: - Agglomerative or "bottom up" approach = each observation starts in its own cluster, and pairs of clusters are merged as one moves up the hierarchy. - Divisive or "top down" approach = all observations start in one cluster, and splits are performed recursively as one moves down the hierarchy. CLUSTERING HIERARCHICAL CLUSTERING
  • 19. Connectivity-based clustering Hierarchical clustering A B C D E F G H orderedlistofDEgenes CLUSTERING HIERARCHICAL CLUSTERING
  • 20. Centroid-based clustering K-means ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 2 4 6 8 10 12 14 1000200030004000 Number of Clusters WithingroupssumofsquaresWGSS # of cluster ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 2 4 6 8 10 12 14 20000400006000080000100000120000 Number of Clusters WithingroupssumofsquaresWGSS # of cluster A B C D E F G H orderedlistofDEgenes CLUSTERING K-MEANS
  • 21. Centroid-based clustering SOM - Similar to K-means algorithm. - Centroids used have a predetermined topographic ordering relationship. - For each processing, nearby centroids are also updated. Standard SOM algorithm CLUSTERING SOM
  • 22. Distribution-based clustering EM - Iterative method for finding maximum likelihood or maximum a posteriori (MAP) estimates of parameters in statistical models, where the model depends on unobserved latent variables. - An expectation (E) step, which creates a function for the expectation of the log-likelihood evaluated using the current estimate for the parameters. - A maximisation (M) step, which computes parameters maximising the expected log-likelihood found on the E step.These parameter- estimates are then used to determine the distribution of the latent variables in the next E step. CLUSTERING EM
  • 23. Distribution-based clustering EM A B C D E F G H orderedlistofDEgenes 2 4 6 8 −14000−12000−10000−8000−6000 Number of components BIC ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●EII VII EEI VEI EVI VVI EEE EEV VEV VVV BIC # of components 2 4 6 8 −160000−140000−120000−100000−80000−60000 Number of components BIC ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●EII VII EEI VEI EVI VVI EEE EEV VEV VVV BIC # of components CLUSTERING EM
  • 24. - Finds a number of clusters starting from the estimated density distribution of corresponding nodes. - 2 parameters : M minimum number of points having to be found in a specific radius R to be considered in the same cluster. A = CORE points B & C = DENSITY-REACHABLE points N = NOISE point Density-based clustering DBSCAN CLUSTERING DBSCAN
  • 25. - WGCNA cluster datasets by quantifying not only the correlations between individual pairs of genes, but also the extent to which these genes share the same neighbours. - WGCNA R is usable through 2 methodology that we compared, choosing the methodology “N-step” : - Manual selection of parameters - Noise taken in account - Merging analysis “Network”-based clustering WGCNA CLUSTERING WGCNA
  • 26. “Network”-based clustering WGCNA 0.50.60.70.80.91.0 hclust (*, "average") d Height 0 100 200 300 400 M odule 0 M odule 1 M odule 2 M odule 3 M odule 4 M odule 5 M odule 6 M odule 7 Merging Dynamic 0 75 150 225 300 M odule 0 M odule 1 M odule 2 M odule 3 M odule 4 M odule 5 M odule 6 M odule 7 Dynamic Tree Cut Dynamic TreeCut Merging Dynamic CLUSTERING WGCNA
  • 28. SUPPLEMENTAL STEPS VENN DIAGRAMS Specific analysis : Venn-n comparison Step 1 : Calculating all possible comparisons 2n Step 2 : Membership of all the genes Step 3 : For each possible comparison - Find genes corresponding - Organise them by HC using their expressions INPUT Lists of genes & data matrix ALGORITHM OUTPUT List of genes organised by comparison & sub-organised by HC
  • 29. Specific analysis : Venn-n comparison −3−2−10123 NormalisedMicro-ArrayExpressionValues CTRL CTRL CTRLJK JK JKR5 R5 R5 BDCA1 BDCA3 pDC JK JK JKR5 R5 R5 pDCBDCA3BDCA1 1 2 3 1 2 3 1 2 3 1 2 3 1 2 3 1 2 3 1 2 3 1 2 3 1 2 3 DEgenes SUPPLEMENTAL STEPS VENN DIAGRAMS
  • 30. Specific analysis : X-Y plots (correlation) −2 0 2 4 −2024 HIV1vsNS log2(FC) HIV1+AZTvsNSlog2(FC)log2(Fold-Change[condition1]) log2( Fold-Change[condition2] ) SUPPLEMENTAL STEPS CORRELATION
  • 31. Functional Enrichment analysis Database for Annotation,Visualisation & Integrated Discovery Some numbers about DAVID : - More than 16 000 citations - Around 40 Nature-branded citations - Daily Usage : 1200 gene lists/ sublists & 400 unique researchers 0 1000 2000 3000 4000 2004 2006 2008 2010 2012 2014 SUPPLEMENTAL STEPS ENRICHMENT ANALYSIS
  • 32. Functional Enrichment analysis Gene Set Enrichment Analysis Get ranked list L of all the genes on the chip based on a chosen measure (expression, FC…) A B rankedlistof genes For each gene set S: find the location of each gene s in S within L Generate enrichment score ES for S based on running-sum statistic: “reward” presence of s toward top or bottom of L runningsum L Analyse significance of this Kolmogorov-Smirnov type statistic by permutation testing Multiple hypothesis testing (MHT) error control for multiple S’s using the estimated false discovery rate (FDR) SUPPLEMENTAL STEPS ENRICHMENT ANALYSIS
  • 33. Input : - Gene list or Gene sequences - 116TF binding sites motifs in Chip-Seq JASPAR db (2010) - Selection of distances up and/or down the gene coding sequence Processing : - Find specific sequence up and/or down a gene sequence - Compute over-represented motifs - Comparison toTF binding sequences ( Z and F score ) Output : - List ofTF hierarchically organised by Z & F score - Automatic representation F or Z score vs %GC content TF Binding Site analysis oPOSSUM-3 SUPPLEMENTAL STEPS TF ANALYSIS
  • 34. TF Binding Site analysis oPOSSUM-3 0.0 0.2 0.4 0.6 0.8 012345 TF profile %GC composition Fisherscore CEBPA ELF5 FEV GABPA IRF1 Klf4 MZF1_5−13 NF−kappaB NFATC2 RELA RUNX1 SPI1 STAT1 Stat3 Tcfcp2l1 SUPPLEMENTAL STEPS TF ANALYSIS
  • 35. TF Binding Site analysis Other tools Databases : - TRANSFAC database v7.4 through GSEA (615 gene sets) - JASPAR Core database (205 non redundant motifs) - UCSC database (690 gene sets) based on data from ENCODETFBS ChIP-seq production groups (from 2007 to 2012) Tools : - HOMER : TRANSFAC database + own motifs database through ChIP- Seq analysis - GSEA : usingTRANSFAC db on enriched motifs in ranked list of genes - DAVID : Using UCSC database - oPOSSUM : using JASPAR (116Transcription Factors) SUPPLEMENTAL STEPS TF ANALYSIS
  • 36. - Reverse engineering models : - Algorithm for the reconstruction of accurate cellular networks = ARACNe - Tool for Inferring Network of Genes =TINGe - Other models : - Functional networks = Cytoscape software (Bingo / ClueGO plugins) PIPELINE OF ANALYSIS NETWORK STEP General overview :
  • 37. - Reverse engineering method, it is specifically designed to scale up to the complexity of regulatory networks in mammalian cells. - ARACNe defines an edge as an irreducible statistical dependency between gene expression profiles that cannot be explained as an artifact of other statistical dependencies in the network. - An edge is likely to identify direct regulatory interactions mediated by a transcription factor binding to a target gene's promoter region, although other types of interactions may also be identified. PIPELINE OF ANALYSIS NETWORK STEP Data networks ARACNe
  • 38. - Reverse engineer genome-scale gene networks from large number of expression profiles based on mutual information (MI), data processing inequality (DPI) and permutation testing to assess statistical significance of each inferred edge. - TINGe can be used for directly constructing high-quality networks, or it can be used as a component along with other types of data in building probabilistic networks. PIPELINE OF ANALYSIS NETWORK STEP Data networks TINGe
  • 39. - Several Net-tools or softwares can realise it : DAVID and Cytoscape. It links modules of genes by their functional annotation enrichment. - Rely on annotations and not on the expression values of the experiment. Can give somme annotations without any information for a specific biological question. Example of network via ClueGO on list of diff expressed genes PIPELINE OF ANALYSIS NETWORK STEP Functional networks Cytoscape