SlideShare a Scribd company logo
1 of 81
Correcting bias and variation in
small RNA sequencing for optimal
(microRNA) biomarker discovery
and validation in cardio-metabolic
(and renal) disease
Christos Argyropoulos MD, PhD, FASN
Department of Internal Medicine
Division of Nephrology
University of New Mexico Health Sciences Center
Overview
• Models of sequence counts in short RNA-seq
experiments
• Estimating and controlling for bias in small RNA-seq
experiments
• Statistical approaches to analyzing differential
expression
• MicroRNA regulation – a control theory perspective
• MicroRNAs as biomarkers in diabetes, renal and
cardiometabolic disease
• Leveraging our approach for optimal biomarker
discovery
Signals in short RNA-seq
data
Building a model from first principles
Background
• Short RNA-seq data are becoming more and more
abundant
• There is poor reproducibility of findings between
and within research groups
• Systematic measurement bias confound findings
• Systematic variation  relatively stable within protocols
• Systematic variation  unpredictable between different
protocols and platforms
• Statistical methods may be used to explore and
address such biases
• Existing approaches are phenomelogical descriptions 
• what do model parameters stand for?
• how can one best use these models?
Building a model from first
principles
• Establish testable predictions that may be verified
in existing datasets
• Establish correspondence between model
parameters and experimental steps
• Use this model to understand and correct
systematic and random bias in short RNA-seq
• Embed the model into more general frameworks
for applications:
• Epidemiological
• Biomarker discovery and validation
• Medical diagnostics
The short RNA-seq experiment
The vendor’s view The biochemist’s view
https://doi.org/10.1093/nar/gkt1021
http://www.genomics.hk/SamllRna.htm
http://www.geospiza.com/Products/SmallRNA.shtml
X1 , X2 , … , Xn
Λ1 , Λ2 , … , Λn
B1 , B2 , … , Bn
Y1 , Y2 , … , Yn
Abundance in original
preparation
Abundance in
adapted(ligated)
sample
Abundance in PCR
amplified library
Abundance in
capture probes
Abundance of counts
in fastq files
(ligation efficiency) fi
(number of PCR cycles) N
(PCR efficiency) qi
Probability of capture si
Number of probes (K)
Library dilution factor (d)
Probability of signal
generation r
Probability of sequence
generation pi
L1
𝑁
, L2
𝑁
, … , Ln
𝑁
Conceptual model of the short RNA-seq
experiment (this is what we will talk about)
Modeling the qPCR amplification reaction
• Statistics of PCR amplification
• Branching (Galton-Watson) process
• GW distribution only available implicitly i.e.
through simulation
• Large scale simulations to derive
approximation to the GW process
• PCR literature, GW theory, martingale arguments
 candidate distributions
• Information theory arguments used to compute
distance between GW samples and the
approximate distributions
• A (truncated) Normal distribution derived at the
end
X1 , X2 , … , Xn
Λ1 , Λ2 , … , Λn
B1 , B2 , … , Bn
Y1 , Y2 , … , Yn
L1
𝑁
, L2
𝑁
, … , Ln
𝑁
Flattening the hierarchy through
marginalization
Integrate sources of variations out of the
model:
1. library sequence depth variation
2. PCR amplification
Final statistical model is about absolute
counts
• Direct modeling ≠ % of counts
• Limit of approximation encompasses all possible
sample compositions
• The is a truncated Normal Poisson mixture
distribution (approximated via a Negative
Binomial or Linear Quadratic Gaussian family)
Model implements a Linear-Quadratic (LQ)
mean-variance relationship
X1 , X2 , … , Xn
Λ1 , Λ2 , … , Λn
B1 , B2 , … , Bn
Y1 , Y2 , … , Yn
L1
𝑁
, L2
𝑁
, … , Ln
𝑁
Distributional Regression for RNA-
seq data
LQ relationship between mean (𝜇) and variance (𝜎𝐿𝑄
2
)
𝜎𝐿𝑄
2
= 𝜇(1 + 𝜙𝜇)
• The variance and the mean
have to be modelled
concurrently
• Unless variance is modelled 
inconsistent statistics  small
(overoptimistic) p values
• Realm of distributional
regression models (GAMLSS –
Generalized Additive Models
for Location, Scale and Shape)
• One can re-use existing SW
frameworks to fit such models
Validating model(s) with synthetic
mixes of known composition
• Allow one to test the “backbone” of the model
without worrying about the adequacy of the
modeling of biology
• Sequencing of equimolar mixes:
• Explore and model systematic bias in the same protocol
• Sequencing of dilution series or non-equimolar
mixes:
• “Dose-response” curve of the bias
• Examination of “debiasing” approaches for the ability to
uncover the truth
• Model may also be used to analyze the
performance of differential expression algorithms
Testable predictions: mean and variance linear
quadratic relationships in public RNA-seq data
Linear Quadratic Relationship in the
legacy datasets of the Galas group
Estimating and
Correcting for Ligase Bias
At the corner of Biochemistry and Mathematics
Enzymatic mechanism of RNA ligation
• The kinetics of RNA ligation were investigated thoroughly
in the 1970s and early 1980s
• The intermolecular reaction is relevant to RNA-seq
• The mechanism involves three, fully reversible, steps that
obey ping-pong ordered kinetics and are subject to
substrate inhibition
𝐸 + 𝐴𝑇𝑃
𝑘1
𝑘−1
𝐸 ∙ 𝐴𝑇𝑃
𝑘1𝑎
𝑘−1𝑎
𝐸 − 𝐴𝑀𝑃 + 𝑃𝑃𝑖
𝐸 − 𝐴𝑀𝑃 + 𝐷
𝑘2
𝑘−2
𝐸 ∙ 𝐴𝑝𝑝 − 𝐷
𝑘2𝑎
𝑘−2𝑎
𝐸 + 𝐴𝑝𝑝 − 𝐷
𝐸 ∙ 𝐴𝑝𝑝 − 𝐷 + 𝐴
𝑘3
𝑘−3
𝐸 ∙ 𝐴𝑝𝑝 − 𝐷 ∙ 𝐴
𝑘3𝑎
𝑘−3𝑎
𝐴𝑀𝑃 + 𝐸 + 𝐴𝐷
 Bias in RNA-ligation was noted in these early investigations and the enzyme was
never used as tool in synthetic chemistry, as solid phase methods took off in the 80s
Kinetic analysis of ligase reaction
velocity in RNA-seq protocols
• Existing protocols include abundant cofactors (sharp
contrast to the experiments in 1970s)
Drive reaction to the right
Rate limiting single step reaction instead of tri-step one
Substrate preference (bias in reaction yields) is not eliminated
• Multi-substrate inhibition from all biosample sequences
available from ligation
Analytical series approximation for ratios of random variables
• Ligase operates at the 1st order domain of Michaelis-
Menten kinetics
𝑉𝑖 =
𝑉𝑖
𝑚𝑎𝑥
𝑋𝑖
𝐾 𝑀
𝑖
1 + 𝑖
𝑋𝑗
𝐾 𝑀
𝑗
≈
𝑉𝑖
𝑚𝑎𝑥
𝑋𝑖
𝐾 𝑀
𝑖
1 + 𝑛
𝐸 𝑋
𝐸 𝐾 𝑀
=
𝑉𝑖
𝑚𝑎𝑥
𝑋𝑖
𝐾 𝑀
𝑖
1 +
)𝐶 𝑇𝑜𝑡𝑎𝑙(0
𝐸 𝐾 𝑀
≈
𝑉𝑖
𝑚𝑎𝑥
𝑋𝑖
𝐾 𝑀
𝑖
Testable model predictions about
ligase bias in RNA-seq experiments
Mathematical expression
• 𝑋𝑖 1 − exp −
𝑉𝑖
𝑚𝑎𝑥
𝐾 𝑀
𝑖 𝑇𝑅 = 𝑋𝑖 𝑓𝑖
5ʹ
• Λ 𝑖 = 𝑋𝑖 𝑓𝑖
5ʹ
𝑓𝑖
3ʹ
= 𝑋𝑖 𝑓𝑖
Implications for ligase bias
• Concentration
independence
• Sample composition
independence
• Transferable within
experiments done with the
same protocol
• Protocol dependent
(reaction velocity
incorporates concentration
of cofactors and enzyme)
• Sequence equimolar mixes to
derive empirical correction
factors for ligase bias
• Apply those to biological
samples (“offsets” in
distributional regression) to
eliminate bias
There is substantial
variation in raw
sequence counts
from equimolar
mixes
Application of bias factors virtually
eliminates ligase bias
Monte Carlo Cross Validation in 3 equimoral datasets: randomly split the
dataset into learning and testing subsets, learn the correction factor and
apply it to correct the estimates of the learning dataset. Repeat N times
Empirical factors nearly eliminate bias
between equimolar datasets with 10x
different input (Galas Lab legacy datasets)
Bias factors in public non-equimolar
short-RNA seq datasets
Design of Validation Experiments
What has been established?
• Moderate
concentration
independence
• Ability to nearly
eliminate bias over at
least two orders of
magnitude
• Legacy
platforms/experiments
What needs to be proven?
• Concentration
independence over >2
orders of magnitude
• Sample composition
independence
• Recovery of differential
expression measures
• Any value relative to
existing approaches?
Validation Experiments
Collaboration between PNRI (Galas Lab) and UNM (DoIM)
The largest, single protocol, technical series to date (GSE93399)
Experimental Group Dilution N
miRExplore (972 short
RNAs)
1:10 10
286 miRNAs 1:1 8
1:10 8
1:100 8
1:1000 8
Ratio Metric Series A
(descending)
Mix of
 286 subpool A (1:1)
 286 subpool B (1:10)
 286 subpool C(1:100)
 286 subpool D (1:1000)
8
Ratio Metric Series B
(ascending)
Mix of
 286 subpool A (1:1000)
 286 subpool B (1:100)
 286 subpool C(1:10)
 286 subpool D (1:1)
8
Total 7 groups (58 sequenced x 2 = 116)
Empirical bias correction over 3 orders
of magnitude in equimolar datasets
RMSE reduction: 77%-90% (input in calibration run differs by up to x10 from
target), 54%-67% otherwise
Empirical factors reduce bias by
nearly 60% in non-equimolar series
Bias correction recovers
expression profile patterns
Bias Correction in Heterogeneous
Samples
• Correction factors remove
~55% of bias between
equimolar samples
• ~ 70% of RNAs have
expression within two fold
from the mean (from 23%)
• Bias reduction is ~40% in
ratiometric series
• ~63% of RNAs have
expression within x2 from
the mean (from 33%)
Differential Expression
When more is less, and simplest is the best
Our proposal for a model of
differential expression (DE) changes
Statistical formulation and
assumptions
log 𝜇𝑖,𝑗,𝑘 = 𝛼 + Δk + 𝑚𝑖,0 + 𝛿𝑖,𝑘
𝑚𝑖,0 ~ 𝑁𝑜𝑟𝑚𝑎𝑙(0, 𝜎𝜇0
2
)
𝛿𝑖,𝑘 ~ 𝑁𝑜𝑟𝑚𝑎𝑙(0, 𝜎 𝑘
2
)
(similar model for variance)
1. Expression in reference state is not
of prime scientific interest (can
omit correction for bias)
2. Technical sources of variation (PCR
efficiency, library sampling) of
much smaller magnitude than
biological variability
Parameter interpretation
and context of use
• Accommodates global
and sequence specific DE
changes
• Flexible modeling of
referent (global level and
variation around it)
• Still models counts
• No incorporation of
library specific factors
(model is un-normalized)
• Number of reads in sample j, assigned to species i (Ki,j)
• Assumed to follow a negative binomial distribution:
• 𝐾𝑖,𝑗~𝑁𝐵(𝜇𝑖,𝑗, 𝜎𝑖,𝑗
2
)
Existing Models for RNA-seq experiments
Standard
deviation
=𝜇𝑖,𝑗 + 𝑎𝜇𝑖,𝑗
2
(edgeR1)
1Biostatistics 2008, 9:321-32
2 Genome Biology 2010, 11:R106
=𝜇𝑖,𝑗 + 𝑠𝑗
2
𝑓(𝑚𝑖,𝑗) (DESeq2)
Mean = 𝑚𝑖,𝑗 × 𝑠𝑗 Common scale
(coverage of the library,
sequence depth)
Experimental Effects
iijim ,1,0, )log(  
miRNA expression in the
control group
miRNA expression in the
experimental group
Model for
differential
expression
analysis
Comparison of proposed approach
against existing methods
“We” (gamlss)
• Uses the NB or the LQNO
• LQ relation between mean and
variance
• Variance and mean parameters
are estimated simultaneously
• Explicit count based modeling
• Un-normalized
• Shrinkage via random effects
modeling
• Derived from first principles (a
generative probability model)
“They” (edgeR/DESeq2 etc)
• NB or the linear model
• LQ or flexible relation between
mean & variance
• Two stage procedure to
estimate parameters
• Models counts as % of a given
library depth
• Normalized (% sum to one)
• Shrinkage via random effects
modeling
• Ad hoc, phenomenological
probability model
Scenarios of differential expression
to assess method performance
• Clustered, symmetric differential expression
1. fraction of overexpressed sequences is equal to that of the
underexpressed
2. no change in global expression
over and underexpressed RNAs are present in equal numbers
and exhibit same degree of DE
• Asymmetric, clustered differential expression
1. Fraction of overexpressed sequences ≠ underexpressed
Drives global expression change to one direction
• Global Change: all RNAs exhibit a variable but consistent
directional change of expression
• No change
All scenarios implemented through the validation datasets
The GAMLSS has smaller RMSE than 10
popular workflows for DE analysis
• Performance
benefit seen under
scenarios of
asymmetric,
clustered
differential
expression changes
• When DE are
(nearly) symmetric,
many other
methods have
similar
performance
Existing methods cannot detect global,
directional differential expression
Algorithm
performance
in the
absence of
differential
expression
GAMLSS demonstrates the optimal
balance between False Omission and False
Discovery Rates
ROC Curve Analysis FDR and FOR
What did we just find out about
algorithms for DE analysis?
• Proposed method (GAMLSS) is the top performer:
• Symmetric, clustered, DE changes
• Asymmetric clustered, DE changes
• Asymmetric global, DE changes
• No DE change
Optimal balance between FDR and FOR
• Existing methods introduce moderate – to – severe bias
• force the overall DE to sum to zero (what goes up must be
accompanied by something that goes down)
• Voom/limma somewhat more resilient, near identical
performance to GAMLSS under symmetric DE
These patterns have not seen before, because no-one to
date has generated datasets with known composition/DE
Why do existing methods fail to deliver?
• Existing models for RNA-seq analysis e.g
deSEQ, edgeR can be derived from 1st
principles as approximations
• RNA-seq counts as % of library depth
• Valid for dilute samples, not dominated by a
few RNA species
• Library size depth and modeling counts as %
(a relic of the SAGE era) may be a disastrous
distraction
• Parameterization constraints DE over all RNAs
included in the analysis to sum to zero
Practical implications for experimentalists
(not using GAMLSS)
• Any change to the population of RNAs modelled (e.g.
filtering)→ different DE values from the same dataset
• Both type M (degree of DE changes) and type S (label
an over-expressed sequence to be under-expressed &
vice-versa) errors
• Up to 25% of estimated DE changes may be of the wrong direction
• Up to 100% of estimated DE changes may be of the wrong
magnitude
• RNA-seq findings will fail to validate against qPCR
• Reputation of RNA-seq as a semiquantitative technique of
poor reproducibility is due to statistical methodology
MicroRNA regulation
A control theory perspective
microRNA biology & therapeutic applications
http://www.nature.com/nature/journal/v469/n7330/fig_tab/nature09783_F1.html
http://www.nature.com/nature/journal/v469/n7330/full/nature09783.html
Control In Biological Systems Is Many-
To-Many, Cooperative And Patterned
Feala JD, et al. PLoS ONE 7(1): e29374. (2012)
Riba A et al PLoS Comput Biol 10(2): e1003490.
(2014)
Bipartite Control Network Topologies miRNA – Transcription Factor circuits
Feed Forward Loop: master
control layout in many natural
and artificial control systems
How do we control things?
Predictably simple
(open loop)
Error Correcting
(feeback)
Model based
(feed forward)
Feed forward control
• Control element responds to a change in the
environment in a predefined manner
• Based on prediction of plant (“what is being
controlled”) behavior (requires model of the
system)
• Can react before error actually occurs (stabilizing
the system, e.g. cerebellum control of balance)
• Benefits: reduced hysteresis, increased accuracy,
cost-efficiency, lower “wear-tear”
Practical implications
• miRNAs function as master controllers in FFLs
• biology is intrinsically NOT model free
• miRNA profiling reveals the “plant” dynamics of
complex biological processes
• Emerging data suggest that sequence variation may underline
(dys-)regulation
• miRNA associations are by definition causal to some
aspects of a particular phenotype
• “a priori plausible” biomarkers
• direct therapeutic implications
• Examination of the “plant” (targets) may have
implications for microRNA research
• Context for the interpretation of microRNA changes
• “Stronger” biomarker signatures
microRNAs are rational candidates for
exploring paradigm shifts in biology
• Ubiquity-conservation
• Breadth & width of regulation (>60% of genes)
• Context-specificity (“meta-controller”)
• Master Controllers in Feed Forward Loops
These arguments are not disease area specific (e.g. apply
equal well to cancer or even psychiatric disease)
MicroRNAs as
biomarkers
Renal, Diabetes and Cardiometabolic Disease
• 8-10% of the population suffer from diabetes
• 20-30% of patients with diabetes will develop evidence of
diabetic chronic kidney disease (DKD/CKD)
• DKD progresses in stages of increasing proteinuria
• 50% of patients with overt nephropathy will develop End
Stage Renal Disease (ESRD) within 10 years
• The end result: Diabetic nephropathy is the leading cause
of ESRD, requiring dialysis or kidney transplantation
accounting for 40% of cases
Facts, figures and the natural history of
cardiometabolic and renal disease in diabetes
• DKD is costly:
• 40-50% of the $44B Medicare expenditures for CKD
• 40-50% of the $50B total healthcare costs for ESRD
• DKD is lethal (>50% of these deaths are cardiac)
• Current therapies reduce risk by 30%
• Many of the things we tried to stabilize renal function AND
improve cardiovascular disease failed miserably in trials
• A paradigm change in our understanding of DKD is
warranted => We posit that miRNAs will trigger this shift
• This improvement likely spread to other areas given biology
of cardiovascular disease (“extreme phenotype”)
There is a significant unmet need for therapies that
stabilize progression and reduce death rates in
patients with diabetic kidney disease
1Afkarian et al J Am Soc Nephrol. 2013 Feb;24(2):302-8
US1
population
No Diabetes Diabetes
No CKD 7.7% 11.5%
CKD 17.2 31.1%
0 10 20 30 40 50 60
405060708090100
Dialysis Mortality
Time (months)
%Surviving
GN
DM
Why bother with microRNAs in DKD?
Heart & Vessels
• Angiogenesis
• Vascular inflammation
• Atherosclerosis
• LVH
• Vascular tone
• Endothelial dysfunction
Kidney
• Water homestasis
• Osmoregulation
• Calcium sensing
• Sodium, potassium,
acid base handling
• Renin production
• Renal development
• Renal senescence
• EMT
• Collagen production
Diabetes
• Insulin synthesis and
secretion
• Peripheral tissue
sensitivity
• Hepatic glucose
production
• Inflammatory gene
expression
microRNAs as Minimally Invasive
Biomarkers : a metrological argument
Advantages of microRNAs
Circulating microRNAs
•More stable in circulation than
mRNAs
•High expression level and low
complexity compared to mRNA
•Tissue specific expression
•Availability of analytical platforms
Keep getting cheaper over time
•Sequence conservation
Allows translation of clinical
associations to animal models
Allows translation of animal
models to clinical applications
Cortez et al Nat Rev Clin Oncol. Jun 7, 2011; 8(8): 467–477.
Targets of differentially expressed miRNAs in
early and late stages of DN map to overlapping
pathways MA v.s. NA Overt vs Normal
Pathway P-value Fraction P-value Fraction
Signal Transduction
Signaling by SCF-KIT 0.006 18/76 0.001 41/76
Signaling by Insulin receptor 0.009 23/109 <0.001 65/109
Signaling by NGF 0.016 38/212 <0.001 119/212
Signaling by Rho GTPases 0.024 24/125 <0.001 71/125
Signaling by ERBB4 0.027 16/76 <0.001 45/76
Signaling by ERBB2 0.035 19/97 <0.001 59/97
Signaling by PDGF 0.040 22/118 <0.001 67/118
Signaling by VEGF 0.041 4/11
Signaling by EGFR 0.044 20/106 <0.001 64/106
Dowstream signaling of activated FGFR 0.038 19/98 <0.001 61/98
Signaling by BMP 0.001 16/23
Signaling by TGFβ 0.004 11/15
DAG and IP3 signaling 0.010 20/31
PIP3 activates AKT signaling 0.020 15/26
RAF/MAP kinase cascade 0.031 7/10
Signaling by Notch 0.036 13/23
Interaction of integrin α5β3 with fibrillin 0.044 2/3
Interaction of integrin α5β3 with von Willbrand factor 0.044 2/3
Integrin cell surface interactions 0.024 40/85
Cell-Cell Communication 0.009 57/122
Cell Cycle
G0 and early G1 0.040 12/21
Leveraging the RNA-seq
analytical methodology
To boldly go where no one has gone before
(but many have tried)
Goals of a microRNA research
program in cardiometabolic, renal and
diabetes diseases
• Use carefully designed case-control, before-
after, randomized controlled trials, and n-of-
1 trials for the following goals:
1. Personalized medicine applications
(diagnosis/prognosis/precision medicine)
2. Biomarker discovery (e.g. to aid trials)
3. Novel Therapeutics
Animal
Models
Clinical
Associations
Clinical
Interventions
A microRNA driven discovery process
Biomarker
Discovery
Mechanistic
Insights
Therapeutics
Clinical Science, Bioinformatics, Systems
Biology Driven “Reverse Translation”
Translational
Science
Evidence
Based
Medicine
Basic Science
Ingredients for success of a microRNA
regulation discovery program
Requires open-ended platforms (RNA-seq)
o Especially for kidney disease due to intrarenal RNA editing
Requires unbiased quantification between groups of
patients (differential expression analysis)
Requires unbiased and accurate quantification in
the absence of a controlled comparison (diagnostics
– bias correction)
Proposed approach: GAMLSS for RNA-seq satisfies
requirements better than all currently used methods
Measurement in clinical diagnostics
What we want to happen What actually happens
Patient 1
10,10
Measurement is reproducible
Measurement shows minimal inter-individual variation
Measurement shows minimal intra-individual variation
JANUARYJUNE
Condition A
JANUARYJUNE
Patient 2
10,10
Patient 3
15,15
Condition B
Patient 4
15,15
Patient 1
10,10
Condition A
Patient 2
10,10
Patient 3
15,15
Condition B
Patient 4
15,15
Patient 1
10,18
? Condition
Patient 2
13,10
Patient 3
15,10
Condition B
Patient 4
15,18
Patient 1
10,12
Patient 2
15,14
Patient 3
18,11
Patient 4
14,19
Condition A ? Condition
Condition A Condition B ? Condition
Condition BMeasurement is non-reproducible
Measurement shows high inter-individual variation
Measurement shows high intra-individual variation
• Understand and control for the sources of variation
• Use calibration sets as references
• A measurement is instrument specific
• Global reference standards (role for highly competent
labs that maintain the standards)
• Context of use:
• Detector (“out-of-limits” readings)
• Control (“track the course”)
Lessons from clinical chemistry labs
• Use GAMLSS as the prime analytical tool to analyze short
RNA-seq data as it correctly represents all sources of
variation and can use calibration (equimolar) runs
• Combine this with a protocol that experimentally controls
variation (e.g. 4N protocol of the Galas Lab)
Measurement in experimental samples
What we want to happen What actually happens
Condition A Condition B
10, 10, … , 10 15, 15, … , 15
B > A
Certain of the difference
Measurement is reproducible
Measurement shows no variation
RUN1RUN2
Condition A Condition B
10, 10, … , 10 15, 15, … , 15
B > A
Condition A Condition B
11, 7, … , 10 8, 19, … , 26
B > A
Uncertain of the difference
Measurement is non-reproducible
Measurement shows high variation
RUN1RUN2
Condition A Condition B
120, 90, … , 130 150, 60, … , 20
B < A
• Use GAMLSS as the prime analytical tool to analyze short RNA-seq data
as it optimizes discovery/omission rates & exhibits the least bias
• BUT what do these correctly/unbiasedly assessed DE changes mean?
Understanding the context for
differential expression changes
• A list of de-regulated targets will not by itself
support the microRNA discovery process
• Need some context to interpret changes and guide
further research
• This context is provided by analysis of microRNA
targets
• We have proposed and applied a formal target
analysis methodology in our early diabetic
nephropathy investigations
Formal Target Analysis: A Biochemical
Primer
1. Hill plot:
2. Fold change between two states:
3. Change in binding between the two states
4. Means and standard errors for the fold changes can be synthesized
using random effects meta-analysis
5. Integration of fold changes from different experiments
dKL loglog)logit(
1
log 









FC
R
E
L
L 2log
2

2loglog)(log)logit()logit( 2  FCREORRE 
• Use GAMLSS as the prime analytical tool to analyze differential expression in short
RNA-seq data as it achieves the smallest error among algorithms
http://www.pdg.cnb.uam.es/cursos/BioInfo2002/pages/F
armac/Comput_Lab/Guia_Glaxo/chap3b.html
The 1st grade approach to target analysis
Heuristic Argument: count the number of miRNAs with small p values
• Total Score (TS)= # of differentially expressed miRNAs predicted to
bind to a given target
• Regulation Score (RS)= # over-expressed- # under-expressed
miRNAs predicted to bind to a given target
TS Low High
RS
- -
0 0
+ +
Low Signal To Noise Ratio
Target likely disinhibited
Target likely neutrally modulated
Target likely inhibited
• Use GAMLSS as the prime analytical tool to analyze
putative targets of differentially expressed microRNAs as it
achieves the optimal balance between FDR/FOR
Target
Analysis
for PDGF-
Beta in
patients
with overt
diabetic
kidney
disease
(DKD)
Study
Fixed effect model
Random effects model
I-squared=0%, tau-squared=0, p=0.9656
hsa-let-7a-5p
hsa-let-7b-5p
hsa-let-7c
hsa-let-7d-5p
hsa-let-7e-5p
hsa-let-7f-5p
hsa-let-7g-5p
hsa-let-7i-5p
hsa-miR-106a-5p
hsa-miR-106b-5p
hsa-miR-122-5p
hsa-miR-1224-3p
hsa-miR-134
hsa-miR-140-3p
hsa-miR-17-5p
hsa-miR-1909-3p
hsa-miR-1913
hsa-miR-204-5p
hsa-miR-20a-5p
hsa-miR-20b-5p
hsa-miR-2110
hsa-miR-2113
hsa-miR-324-3p
hsa-miR-329
hsa-miR-335-5p
hsa-miR-342-3p
hsa-miR-361-3p
hsa-miR-450b-3p
hsa-miR-491-5p
hsa-miR-501-5p
hsa-miR-545-3p
hsa-miR-558
hsa-miR-603
hsa-miR-608
hsa-miR-663b
hsa-miR-765
hsa-miR-93-5p
TE
0.80
-0.46
-0.30
0.61
0.22
0.32
0.71
0.45
0.37
0.37
-0.06
1.52
0.44
0.08
0.51
0.32
0.83
0.43
-0.12
0.33
0.09
0.55
0.07
0.14
1.78
-0.10
0.05
0.74
0.60
-0.08
-0.01
0.27
-0.64
0.11
-0.41
0.72
-0.12
seTE
0.5893
0.5709
0.5681
0.6348
0.5636
0.5604
0.6051
0.6479
0.5721
0.6578
0.5752
0.7148
0.5414
0.6300
0.6882
0.5286
0.5430
0.5450
0.5736
0.7984
0.5451
0.7309
0.5503
0.5424
0.6324
0.5810
0.5991
0.6166
0.6992
0.7341
0.7830
0.5398
0.5310
0.7424
0.8823
0.5878
0.5416
0.2 1 2 5 15 50 150
Odds Ratio
Expression Ratio
OR
1.33
1.33
2.23
0.63
0.74
1.85
1.24
1.38
2.04
1.56
1.45
1.45
0.94
4.56
1.56
1.09
1.67
1.38
2.28
1.54
0.89
1.39
1.09
1.74
1.07
1.15
5.90
0.90
1.05
2.10
1.81
0.93
0.99
1.31
0.53
1.12
0.66
2.06
0.89
95%-CI
[1.09; 1.61]
[1.09; 1.61]
[0.70; 7.09]
[0.21; 1.93]
[0.24; 2.26]
[0.53; 6.42]
[0.41; 3.75]
[0.46; 4.14]
[0.62; 6.66]
[0.44; 5.56]
[0.47; 4.46]
[0.40; 5.26]
[0.31; 2.91]
[1.12; 18.50]
[0.54; 4.50]
[0.32; 3.73]
[0.43; 6.42]
[0.49; 3.88]
[0.79; 6.62]
[0.53; 4.48]
[0.29; 2.73]
[0.29; 6.63]
[0.38; 3.18]
[0.41; 7.28]
[0.36; 3.14]
[0.40; 3.34]
[1.71; 20.39]
[0.29; 2.81]
[0.32; 3.39]
[0.63; 7.03]
[0.46; 7.14]
[0.22; 3.90]
[0.21; 4.60]
[0.45; 3.76]
[0.19; 1.50]
[0.26; 4.79]
[0.12; 3.74]
[0.65; 6.50]
[0.31; 2.57]
W(fixed)
100%
--
2.8%
3.0%
3.1%
2.5%
3.1%
3.2%
2.7%
2.4%
3.0%
2.3%
3.0%
1.9%
3.4%
2.5%
2.1%
3.5%
3.4%
3.3%
3.0%
1.6%
3.3%
1.9%
3.3%
3.4%
2.5%
2.9%
2.8%
2.6%
2.0%
1.8%
1.6%
3.4%
3.5%
1.8%
1.3%
2.9%
3.4%
W(random)
--
100%
2.8%
3.0%
3.1%
2.5%
3.1%
3.2%
2.7%
2.4%
3.0%
2.3%
3.0%
1.9%
3.4%
2.5%
2.1%
3.5%
3.4%
3.3%
3.0%
1.6%
3.3%
1.9%
3.3%
3.4%
2.5%
2.9%
2.8%
2.6%
2.0%
1.8%
1.6%
3.4%
3.5%
1.8%
1.3%
2.9%
3.4%
Target Gene: PDGFB
Target
analysis
of the
NFE2L2/
Nrf2
pathway
in DKD
Target
analysis
of the
TGF-beta
pathway
in DKD
To boldly go where no one has gone before….
Methodological
• Extend the model to account
for abundance dependent
variations in PCR efficiency
• Incorporate target analysis
into count analysis
• Estimate ligase bias from the
sequence (computationally
derived correction factors)
microRNA biomarkers projects
• COMPASS: a community disease
detection program focusing on
diabetes and CKD in rural New
Mexico
• MIRROR-Transplant: metabolic
and immunological factors
contributing to kidney transplant
failure
• DIDIT: randomized controlled
trial to preserve urine
production in patients starting
dialysis
• Potential areas for collaboration
in the NIH biorepository?
Summary
• A generative, probability, model for the counts of
short RNA-seq measurements was developed
• This model may be used to estimate and substantially
correct for the presence of ligase bias
• It achieves superior performance (smaller error,
optimal balance of false discoveries and omissions)
than other competing methodologies
• Can be used to power “personalized” medicine
applications or experimental state comparisons
• Formal target analysis to guide further research
(“reverse-translation”)
Acknowledgements
• This work could not have been completed without
the collaboration of the Galas Lab at PNRI
David Galas: provided a friendly ear that had the
patience to listen, comment and risk time and
funds for the experiments
Alton Etheridge: pushed for extensive sequencing
and resequencing and carried out all the validation
experiments
Nikita Sakhanenko: had the patient to be our
software tester, validator and GEO submitter
• This work would not have started without John P
(Nick) Johnson (University of Pittsburgh) who
kicked me into the area about 8 years ago
https://bitbucket.org/chrisarg/rnaseqgamlss
?
Backup
Building the model from first principles
• Establish statistical distributions OR deterministic
relationships that “bind” together the quantities in
successive steps
• There is a “competitive qPCR” experiment beating inside
each RNA-seq dataset  random
• Ligase bias is reproducible  deterministic/systematic
• Apply marginalization (integration) operations to
“flatten” the hierarchy
• Derive the exact distributions (or the limits of
approximation) for a statistical model that directly
represents the quantity of interest
• Relate model parameters to quantities of interest
(absolute/relative quantification)
Facts about the distribution of
RNA-seq data
• Established relationships between distributions
that were first explored in the 1920-1930s
• Rare biomedical applications in the 1940s
• Theoretical work in the early 1960s
• Lead goes cold due to failure to conceptualize
practical applications after the 1960s
• Extremely involved expressions involving special
functions of mathematical physics (parabolic
cylinder functions)  numerical complexities will
hinder attempts to use them as-is in applications
Rediscovering a Negative Binomial
parameterization and introducing a new
Gaussian Generalized Linear Normal Family
• Large scale numerical
simulations (>500,000) to
establish approximations for
the RNA-seq distribution
• Arbitrary precision libraries in
python in multicore machines
• Low precision – but
acceptable for statistical
computations
• Both approximations
implement a LQ relationship
between the mean and
variance
• Inferences are largely the
same (shown in synthetic
mixes)
Two equivalent views of measures of differential expression:
Fold Change and Probability of Over-Expression
• The GLM approach (limma,
DESeq/DESeq2, gamlss ) yield
measures of differential
expression for microarrays,
RNA-Seq or qPCR experiments
• These are estimates of fold
changes (noise) and their
associated standard errors
(signal)
• They can be converted to
probability estimates(= 𝒑)
about the signal being >0
(overexpressed) v.s. <0
• The standard error of 𝑝 is given
by 𝑝(1 − 𝑝)
-2 -1 0 1 2 3 4
0.00.10.20.30.4 Fold Change
Estimated
Fold Change
Fold Change = 1.0, SE=1.0, shaded area
(=1.0-pnorm(0,FC,SE) in R) yields
probability of overexpression
Computing probability of differential
expression (pDE) in R
Why do we need two views of the
same data?
The FC View
• Absolute, relative
quantification is possible
• Fold changes in one
miRNA are directly
comparable against each
other
• Fold changes are
comparable between and
within techniques
• Type I and II statistical
errors
The pDE View
• Only relative, relative
quantification is possible
• Platforms provide
evidence for directional
changes in expression
• Type M and S errors
• Provides input to Systems
Biology tools (e.g
Boolean Networks)
• Experimental work in late 19th century to discover the physiological
basis of coagulation (“prothrombin”)
• Development of different versions of the “Prothrombin Time”:
investigations in hemophilia, post-op bleeding & liver disease
(1930s-1950s): derived the normal range and ranges associated
with specific deficits
• Pre-analytical considerations throughout the 1950s (and even
today)
• In the 70s PT was used to monitor and dose warfarin in the clinic
• Classical studies in the 70-80s demonstrate high inter, intra and
analytic variability (despite > 30 years of standardization)
• WHO proposed to standardize the test in the mid 1980s through
the use of the INR (international normalized ratio)
Solid measurements for thinning one’s blood:
the history of the PT test
http://www.clinchem.org/content/51/3/553.full
http://circ.ahajournals.org/content/19/1/92.full.pdf
Thromb Haemost. 1985 Feb 18;53(1):155-6.
The cautious story of the INR
Normalization procedure
• 𝐼𝑁𝑅 =
𝑃𝑇 𝑝𝑎𝑡𝑖𝑒𝑛𝑡
𝑃𝑇 𝑛𝑜𝑟𝑚𝑎𝑙
𝐼𝑆𝐼
• PTnormal : Geometrical mean of 20
patients
• 𝐼𝑆𝐼 =
log(𝑐𝑎𝑙𝑖𝑏𝑟𝑎𝑡𝑜𝑟 𝐼𝑁𝑅)
log 𝑃𝑇 𝑐𝑎𝑙𝑖𝑏𝑟𝑎𝑡𝑜𝑟 /log(𝑃𝑇 𝑛𝑜𝑟𝑚𝑎𝑙)
Sources of variation
• Different methods to measure
the PT
• Different instruments that
implement each method
• Different calibrator sets for
each instrument!
http://www.who.int/bloodproducts/publications/WHO_TRS_889_A3.pdf
http://www.clinchem.org/content/56/10/1618.full
http://www.clinchem.org/content/51/3/553.full
Statistics Of Biological Regulatory Networks
Feala JD, et al. PLoS ONE 7(1): e29374. (2012)
Pathophysiology of the cardiorenal syndrome
http://www.kdigo.org/meetings_events/pdf/KDIGO%20CVD%20Controversy%20Rpt.pdf

More Related Content

Viewers also liked

Actividades biologicas de proteinas especializadas (1)
Actividades biologicas de proteinas especializadas (1)Actividades biologicas de proteinas especializadas (1)
Actividades biologicas de proteinas especializadas (1)Sebastian Buitrago
 
RNA- Structure, Types and Functions
RNA- Structure, Types and FunctionsRNA- Structure, Types and Functions
RNA- Structure, Types and FunctionsNamrata Chhabra
 
JAWS DAYS 2017「サーバーレスが切り拓く Eightのリアルタイム大規模データ分析」
JAWS DAYS 2017「サーバーレスが切り拓く Eightのリアルタイム大規模データ分析」JAWS DAYS 2017「サーバーレスが切り拓く Eightのリアルタイム大規模データ分析」
JAWS DAYS 2017「サーバーレスが切り拓く Eightのリアルタイム大規模データ分析」Yotaro Fujii
 
Bioo Scientific - Reduced Bias Small RNA Library Prep with Gel-Free or Low-In...
Bioo Scientific - Reduced Bias Small RNA Library Prep with Gel-Free or Low-In...Bioo Scientific - Reduced Bias Small RNA Library Prep with Gel-Free or Low-In...
Bioo Scientific - Reduced Bias Small RNA Library Prep with Gel-Free or Low-In...Bioo Scientific
 
Microsoft power point พันธุศาสตร์และเทคโนโลยีทาง dna
Microsoft power point   พันธุศาสตร์และเทคโนโลยีทาง dnaMicrosoft power point   พันธุศาสตร์และเทคโนโลยีทาง dna
Microsoft power point พันธุศาสตร์และเทคโนโลยีทาง dnaThanyamon Chat.
 
Design in Tech Report 2017
Design in Tech Report 2017Design in Tech Report 2017
Design in Tech Report 2017John Maeda
 
Introduction to RNA-seq and RNA-seq Data Analysis (UEB-UAT Bioinformatics Cou...
Introduction to RNA-seq and RNA-seq Data Analysis (UEB-UAT Bioinformatics Cou...Introduction to RNA-seq and RNA-seq Data Analysis (UEB-UAT Bioinformatics Cou...
Introduction to RNA-seq and RNA-seq Data Analysis (UEB-UAT Bioinformatics Cou...VHIR Vall d’Hebron Institut de Recerca
 
Biochemistry transcription (RNA biosynsthesis)
Biochemistry transcription (RNA biosynsthesis)Biochemistry transcription (RNA biosynsthesis)
Biochemistry transcription (RNA biosynsthesis)Prabesh Raj Jamkatel
 
5 DNA RNA Protein Synthesis
5 DNA RNA Protein Synthesis5 DNA RNA Protein Synthesis
5 DNA RNA Protein SynthesisLouis Rosenfeld
 
Antisense rna technology
Antisense rna technologyAntisense rna technology
Antisense rna technologySaurav Das
 
Dna replication transcription and translation
Dna replication transcription and translationDna replication transcription and translation
Dna replication transcription and translationJames H. Workman
 
Biology - Chp 12 - DNA & RNA - PowerPoint
Biology - Chp 12 - DNA & RNA - PowerPointBiology - Chp 12 - DNA & RNA - PowerPoint
Biology - Chp 12 - DNA & RNA - PowerPointMel Anthony Pepito
 
Biology - Chp 12 - DNA & RNA - PowerPoint
Biology - Chp 12 - DNA & RNA - PowerPointBiology - Chp 12 - DNA & RNA - PowerPoint
Biology - Chp 12 - DNA & RNA - PowerPointMr. Walajtys
 
WordCamp Auckland 2017 summary
WordCamp Auckland 2017 summaryWordCamp Auckland 2017 summary
WordCamp Auckland 2017 summaryRalf Klis
 
Personal loans insights pres
Personal loans insights presPersonal loans insights pres
Personal loans insights presBilly Grant
 
Dna fingerprinting
Dna fingerprintingDna fingerprinting
Dna fingerprinting14cindta
 

Viewers also liked (20)

Actividades biologicas de proteinas especializadas (1)
Actividades biologicas de proteinas especializadas (1)Actividades biologicas de proteinas especializadas (1)
Actividades biologicas de proteinas especializadas (1)
 
Rna interfernce ppt
Rna interfernce pptRna interfernce ppt
Rna interfernce ppt
 
RNA- Structure, Types and Functions
RNA- Structure, Types and FunctionsRNA- Structure, Types and Functions
RNA- Structure, Types and Functions
 
JAWS DAYS 2017「サーバーレスが切り拓く Eightのリアルタイム大規模データ分析」
JAWS DAYS 2017「サーバーレスが切り拓く Eightのリアルタイム大規模データ分析」JAWS DAYS 2017「サーバーレスが切り拓く Eightのリアルタイム大規模データ分析」
JAWS DAYS 2017「サーバーレスが切り拓く Eightのリアルタイム大規模データ分析」
 
Hypertensive disorders in pregnancy
Hypertensive disorders in pregnancyHypertensive disorders in pregnancy
Hypertensive disorders in pregnancy
 
Rna qua trinh phien ma
Rna qua trinh phien maRna qua trinh phien ma
Rna qua trinh phien ma
 
Bioo Scientific - Reduced Bias Small RNA Library Prep with Gel-Free or Low-In...
Bioo Scientific - Reduced Bias Small RNA Library Prep with Gel-Free or Low-In...Bioo Scientific - Reduced Bias Small RNA Library Prep with Gel-Free or Low-In...
Bioo Scientific - Reduced Bias Small RNA Library Prep with Gel-Free or Low-In...
 
Microsoft power point พันธุศาสตร์และเทคโนโลยีทาง dna
Microsoft power point   พันธุศาสตร์และเทคโนโลยีทาง dnaMicrosoft power point   พันธุศาสตร์และเทคโนโลยีทาง dna
Microsoft power point พันธุศาสตร์และเทคโนโลยีทาง dna
 
Design in Tech Report 2017
Design in Tech Report 2017Design in Tech Report 2017
Design in Tech Report 2017
 
Introduction to RNA-seq and RNA-seq Data Analysis (UEB-UAT Bioinformatics Cou...
Introduction to RNA-seq and RNA-seq Data Analysis (UEB-UAT Bioinformatics Cou...Introduction to RNA-seq and RNA-seq Data Analysis (UEB-UAT Bioinformatics Cou...
Introduction to RNA-seq and RNA-seq Data Analysis (UEB-UAT Bioinformatics Cou...
 
Biochemistry transcription (RNA biosynsthesis)
Biochemistry transcription (RNA biosynsthesis)Biochemistry transcription (RNA biosynsthesis)
Biochemistry transcription (RNA biosynsthesis)
 
5 DNA RNA Protein Synthesis
5 DNA RNA Protein Synthesis5 DNA RNA Protein Synthesis
5 DNA RNA Protein Synthesis
 
Antisense rna technology
Antisense rna technologyAntisense rna technology
Antisense rna technology
 
Dna replication transcription and translation
Dna replication transcription and translationDna replication transcription and translation
Dna replication transcription and translation
 
Biology - Chp 12 - DNA & RNA - PowerPoint
Biology - Chp 12 - DNA & RNA - PowerPointBiology - Chp 12 - DNA & RNA - PowerPoint
Biology - Chp 12 - DNA & RNA - PowerPoint
 
Biology - Chp 12 - DNA & RNA - PowerPoint
Biology - Chp 12 - DNA & RNA - PowerPointBiology - Chp 12 - DNA & RNA - PowerPoint
Biology - Chp 12 - DNA & RNA - PowerPoint
 
WordCamp Auckland 2017 summary
WordCamp Auckland 2017 summaryWordCamp Auckland 2017 summary
WordCamp Auckland 2017 summary
 
Perico trepa por chile
Perico trepa por chilePerico trepa por chile
Perico trepa por chile
 
Personal loans insights pres
Personal loans insights presPersonal loans insights pres
Personal loans insights pres
 
Dna fingerprinting
Dna fingerprintingDna fingerprinting
Dna fingerprinting
 

Similar to Correcting bias and variation in small RNA sequencing for optimal (microRNA) biomarker discovery and validation in cardio-metabolic (and renal) disease

Bioinformaatics for M.Sc. Biotecchnology.pptx
Bioinformaatics for M.Sc. Biotecchnology.pptxBioinformaatics for M.Sc. Biotecchnology.pptx
Bioinformaatics for M.Sc. Biotecchnology.pptxRanjan Jyoti Sarma
 
How to analyse bulk transcriptomic data using Deseq2
How to analyse bulk transcriptomic data using Deseq2How to analyse bulk transcriptomic data using Deseq2
How to analyse bulk transcriptomic data using Deseq2AdamCribbs1
 
DHC Microbiome Presentation 4-23-19.pptx
DHC Microbiome Presentation 4-23-19.pptxDHC Microbiome Presentation 4-23-19.pptx
DHC Microbiome Presentation 4-23-19.pptxDivyanshGupta922023
 
Multivariate Analysis and Visualization of Proteomic Data
Multivariate Analysis and Visualization of Proteomic DataMultivariate Analysis and Visualization of Proteomic Data
Multivariate Analysis and Visualization of Proteomic DataUC Davis
 
RUCK 2017 김성환 R 패키지 메타주성분분석(MetaPCA)
RUCK 2017 김성환 R 패키지 메타주성분분석(MetaPCA)RUCK 2017 김성환 R 패키지 메타주성분분석(MetaPCA)
RUCK 2017 김성환 R 패키지 메타주성분분석(MetaPCA)r-kor
 
High throughput Data Analysis
High throughput Data AnalysisHigh throughput Data Analysis
High throughput Data AnalysisSetia Pramana
 
презентация за варшава
презентация за варшавапрезентация за варшава
презентация за варшаваValeriya Simeonova
 
a brief introduction to epistasis detection
a brief introduction to epistasis detectiona brief introduction to epistasis detection
a brief introduction to epistasis detectionHyun-hwan Jeong
 
Mastering RNA-Seq (NGS Data Analysis) - A Critical Approach To Transcriptomic...
Mastering RNA-Seq (NGS Data Analysis) - A Critical Approach To Transcriptomic...Mastering RNA-Seq (NGS Data Analysis) - A Critical Approach To Transcriptomic...
Mastering RNA-Seq (NGS Data Analysis) - A Critical Approach To Transcriptomic...Elia Brodsky
 
Microbial Phylogenomics (EVE161) Class 17: Genomes from Uncultured
Microbial Phylogenomics (EVE161) Class 17: Genomes from UnculturedMicrobial Phylogenomics (EVE161) Class 17: Genomes from Uncultured
Microbial Phylogenomics (EVE161) Class 17: Genomes from UnculturedJonathan Eisen
 
Prote-OMIC Data Analysis and Visualization
Prote-OMIC Data Analysis and VisualizationProte-OMIC Data Analysis and Visualization
Prote-OMIC Data Analysis and VisualizationDmitry Grapov
 
sequence alignment
sequence alignmentsequence alignment
sequence alignmentammar kareem
 
Data analysis
Data analysisData analysis
Data analysisamlbinder
 
AI 바이오 (4일차).pdf
AI 바이오 (4일차).pdfAI 바이오 (4일차).pdf
AI 바이오 (4일차).pdfH K Yoon
 
Exploring Compound Combinations in High Throughput Settings: Going Beyond 1D ...
Exploring Compound Combinations in High Throughput Settings: Going Beyond 1D ...Exploring Compound Combinations in High Throughput Settings: Going Beyond 1D ...
Exploring Compound Combinations in High Throughput Settings: Going Beyond 1D ...Rajarshi Guha
 
A new CPXR Based Logistic Regression Method and Clinical Prognostic Modeling ...
A new CPXR Based Logistic Regression Method and Clinical Prognostic Modeling ...A new CPXR Based Logistic Regression Method and Clinical Prognostic Modeling ...
A new CPXR Based Logistic Regression Method and Clinical Prognostic Modeling ...Vahid Taslimitehrani
 
Micro array based comparative genomic hybridisation -Dr Yogesh D
Micro array based comparative genomic hybridisation -Dr Yogesh DMicro array based comparative genomic hybridisation -Dr Yogesh D
Micro array based comparative genomic hybridisation -Dr Yogesh DDr.Yogesh D
 
Homology Modeling.pptx
Homology Modeling.pptxHomology Modeling.pptx
Homology Modeling.pptxAmnaAkram29
 

Similar to Correcting bias and variation in small RNA sequencing for optimal (microRNA) biomarker discovery and validation in cardio-metabolic (and renal) disease (20)

Bioinformaatics for M.Sc. Biotecchnology.pptx
Bioinformaatics for M.Sc. Biotecchnology.pptxBioinformaatics for M.Sc. Biotecchnology.pptx
Bioinformaatics for M.Sc. Biotecchnology.pptx
 
How to analyse bulk transcriptomic data using Deseq2
How to analyse bulk transcriptomic data using Deseq2How to analyse bulk transcriptomic data using Deseq2
How to analyse bulk transcriptomic data using Deseq2
 
DHC Microbiome Presentation 4-23-19.pptx
DHC Microbiome Presentation 4-23-19.pptxDHC Microbiome Presentation 4-23-19.pptx
DHC Microbiome Presentation 4-23-19.pptx
 
Multivariate Analysis and Visualization of Proteomic Data
Multivariate Analysis and Visualization of Proteomic DataMultivariate Analysis and Visualization of Proteomic Data
Multivariate Analysis and Visualization of Proteomic Data
 
RUCK 2017 김성환 R 패키지 메타주성분분석(MetaPCA)
RUCK 2017 김성환 R 패키지 메타주성분분석(MetaPCA)RUCK 2017 김성환 R 패키지 메타주성분분석(MetaPCA)
RUCK 2017 김성환 R 패키지 메타주성분분석(MetaPCA)
 
High throughput Data Analysis
High throughput Data AnalysisHigh throughput Data Analysis
High throughput Data Analysis
 
презентация за варшава
презентация за варшавапрезентация за варшава
презентация за варшава
 
a brief introduction to epistasis detection
a brief introduction to epistasis detectiona brief introduction to epistasis detection
a brief introduction to epistasis detection
 
Mastering RNA-Seq (NGS Data Analysis) - A Critical Approach To Transcriptomic...
Mastering RNA-Seq (NGS Data Analysis) - A Critical Approach To Transcriptomic...Mastering RNA-Seq (NGS Data Analysis) - A Critical Approach To Transcriptomic...
Mastering RNA-Seq (NGS Data Analysis) - A Critical Approach To Transcriptomic...
 
Microbial Phylogenomics (EVE161) Class 17: Genomes from Uncultured
Microbial Phylogenomics (EVE161) Class 17: Genomes from UnculturedMicrobial Phylogenomics (EVE161) Class 17: Genomes from Uncultured
Microbial Phylogenomics (EVE161) Class 17: Genomes from Uncultured
 
Prote-OMIC Data Analysis and Visualization
Prote-OMIC Data Analysis and VisualizationProte-OMIC Data Analysis and Visualization
Prote-OMIC Data Analysis and Visualization
 
20140711 2 j_willey_ercc2.0_workshop
20140711 2 j_willey_ercc2.0_workshop20140711 2 j_willey_ercc2.0_workshop
20140711 2 j_willey_ercc2.0_workshop
 
sequence alignment
sequence alignmentsequence alignment
sequence alignment
 
Data analysis
Data analysisData analysis
Data analysis
 
Vanderbilt b
Vanderbilt bVanderbilt b
Vanderbilt b
 
AI 바이오 (4일차).pdf
AI 바이오 (4일차).pdfAI 바이오 (4일차).pdf
AI 바이오 (4일차).pdf
 
Exploring Compound Combinations in High Throughput Settings: Going Beyond 1D ...
Exploring Compound Combinations in High Throughput Settings: Going Beyond 1D ...Exploring Compound Combinations in High Throughput Settings: Going Beyond 1D ...
Exploring Compound Combinations in High Throughput Settings: Going Beyond 1D ...
 
A new CPXR Based Logistic Regression Method and Clinical Prognostic Modeling ...
A new CPXR Based Logistic Regression Method and Clinical Prognostic Modeling ...A new CPXR Based Logistic Regression Method and Clinical Prognostic Modeling ...
A new CPXR Based Logistic Regression Method and Clinical Prognostic Modeling ...
 
Micro array based comparative genomic hybridisation -Dr Yogesh D
Micro array based comparative genomic hybridisation -Dr Yogesh DMicro array based comparative genomic hybridisation -Dr Yogesh D
Micro array based comparative genomic hybridisation -Dr Yogesh D
 
Homology Modeling.pptx
Homology Modeling.pptxHomology Modeling.pptx
Homology Modeling.pptx
 

More from Christos Argyropoulos

Secondary Hyperparathyroidism in Kidney Transplantation
Secondary Hyperparathyroidism in Kidney TransplantationSecondary Hyperparathyroidism in Kidney Transplantation
Secondary Hyperparathyroidism in Kidney TransplantationChristos Argyropoulos
 
Management of SHPT in dialysis and beyond.pptx
Management of SHPT in dialysis and beyond.pptxManagement of SHPT in dialysis and beyond.pptx
Management of SHPT in dialysis and beyond.pptxChristos Argyropoulos
 
Kidney Disease In patients living with HIV
Kidney Disease In patients living with HIVKidney Disease In patients living with HIV
Kidney Disease In patients living with HIVChristos Argyropoulos
 
RNA Biomarkers in Chronic Kidney Disease
RNA Biomarkers in Chronic Kidney DiseaseRNA Biomarkers in Chronic Kidney Disease
RNA Biomarkers in Chronic Kidney DiseaseChristos Argyropoulos
 
Cardiometabolic Benefits of Renal Diabetes and Obesity Medications
Cardiometabolic Benefits of Renal Diabetes and Obesity MedicationsCardiometabolic Benefits of Renal Diabetes and Obesity Medications
Cardiometabolic Benefits of Renal Diabetes and Obesity MedicationsChristos Argyropoulos
 
Aldosterone in diabetes and other kidney diseases
Aldosterone in diabetes and other kidney diseasesAldosterone in diabetes and other kidney diseases
Aldosterone in diabetes and other kidney diseasesChristos Argyropoulos
 
Survival Analysis With Generalized Additive Models
Survival Analysis With Generalized Additive ModelsSurvival Analysis With Generalized Additive Models
Survival Analysis With Generalized Additive ModelsChristos Argyropoulos
 
Diabetic kidney disease 2021 all_slides
Diabetic kidney disease 2021 all_slidesDiabetic kidney disease 2021 all_slides
Diabetic kidney disease 2021 all_slidesChristos Argyropoulos
 
Sglt2 across the_spectrum_of_kidney_diseases
Sglt2 across the_spectrum_of_kidney_diseasesSglt2 across the_spectrum_of_kidney_diseases
Sglt2 across the_spectrum_of_kidney_diseasesChristos Argyropoulos
 
Acute Kidney Injury in Patients with Cancer
Acute Kidney Injury in Patients with CancerAcute Kidney Injury in Patients with Cancer
Acute Kidney Injury in Patients with CancerChristos Argyropoulos
 
Hyperparathyroidism after kidney transplantation
Hyperparathyroidism after kidney transplantationHyperparathyroidism after kidney transplantation
Hyperparathyroidism after kidney transplantationChristos Argyropoulos
 
ASK1 Inhibition in Diabetic Kidney Disease
ASK1 Inhibition in Diabetic Kidney DiseaseASK1 Inhibition in Diabetic Kidney Disease
ASK1 Inhibition in Diabetic Kidney DiseaseChristos Argyropoulos
 
Involuntary discharges from the dialysis unit
Involuntary discharges from the dialysis unitInvoluntary discharges from the dialysis unit
Involuntary discharges from the dialysis unitChristos Argyropoulos
 
Update on Diabetic Nephropathy (2018)
Update on Diabetic Nephropathy (2018)Update on Diabetic Nephropathy (2018)
Update on Diabetic Nephropathy (2018)Christos Argyropoulos
 

More from Christos Argyropoulos (20)

Secondary Hyperparathyroidism in Kidney Transplantation
Secondary Hyperparathyroidism in Kidney TransplantationSecondary Hyperparathyroidism in Kidney Transplantation
Secondary Hyperparathyroidism in Kidney Transplantation
 
Management of SHPT in dialysis and beyond.pptx
Management of SHPT in dialysis and beyond.pptxManagement of SHPT in dialysis and beyond.pptx
Management of SHPT in dialysis and beyond.pptx
 
Kidney Disease In patients living with HIV
Kidney Disease In patients living with HIVKidney Disease In patients living with HIV
Kidney Disease In patients living with HIV
 
RNA Biomarkers in Chronic Kidney Disease
RNA Biomarkers in Chronic Kidney DiseaseRNA Biomarkers in Chronic Kidney Disease
RNA Biomarkers in Chronic Kidney Disease
 
Cardiometabolic Benefits of Renal Diabetes and Obesity Medications
Cardiometabolic Benefits of Renal Diabetes and Obesity MedicationsCardiometabolic Benefits of Renal Diabetes and Obesity Medications
Cardiometabolic Benefits of Renal Diabetes and Obesity Medications
 
Diabetic Kidney Disease 2022 Update
Diabetic Kidney Disease 2022 UpdateDiabetic Kidney Disease 2022 Update
Diabetic Kidney Disease 2022 Update
 
Aldosterone in diabetes and other kidney diseases
Aldosterone in diabetes and other kidney diseasesAldosterone in diabetes and other kidney diseases
Aldosterone in diabetes and other kidney diseases
 
Survival Analysis With Generalized Additive Models
Survival Analysis With Generalized Additive ModelsSurvival Analysis With Generalized Additive Models
Survival Analysis With Generalized Additive Models
 
Diabetic kidney disease 2021
Diabetic kidney disease 2021 Diabetic kidney disease 2021
Diabetic kidney disease 2021
 
Diabetic kidney disease 2021 all_slides
Diabetic kidney disease 2021 all_slidesDiabetic kidney disease 2021 all_slides
Diabetic kidney disease 2021 all_slides
 
Diabetic kidney disease 2021
Diabetic kidney disease 2021Diabetic kidney disease 2021
Diabetic kidney disease 2021
 
Sglt2 across the_spectrum_of_kidney_diseases
Sglt2 across the_spectrum_of_kidney_diseasesSglt2 across the_spectrum_of_kidney_diseases
Sglt2 across the_spectrum_of_kidney_diseases
 
Acute Kidney Injury in Patients with Cancer
Acute Kidney Injury in Patients with CancerAcute Kidney Injury in Patients with Cancer
Acute Kidney Injury in Patients with Cancer
 
Telenephrology
TelenephrologyTelenephrology
Telenephrology
 
Hyperparathyroidism after kidney transplantation
Hyperparathyroidism after kidney transplantationHyperparathyroidism after kidney transplantation
Hyperparathyroidism after kidney transplantation
 
ASK1 Inhibition in Diabetic Kidney Disease
ASK1 Inhibition in Diabetic Kidney DiseaseASK1 Inhibition in Diabetic Kidney Disease
ASK1 Inhibition in Diabetic Kidney Disease
 
Chronic Kidney Disease Update 2019
Chronic Kidney Disease Update 2019Chronic Kidney Disease Update 2019
Chronic Kidney Disease Update 2019
 
Involuntary discharges from the dialysis unit
Involuntary discharges from the dialysis unitInvoluntary discharges from the dialysis unit
Involuntary discharges from the dialysis unit
 
Update on diabetic nephropathy 2019
Update on diabetic nephropathy 2019Update on diabetic nephropathy 2019
Update on diabetic nephropathy 2019
 
Update on Diabetic Nephropathy (2018)
Update on Diabetic Nephropathy (2018)Update on Diabetic Nephropathy (2018)
Update on Diabetic Nephropathy (2018)
 

Recently uploaded

Top Rated Bangalore Call Girls Majestic ⟟ 9332606886 ⟟ Call Me For Genuine S...
Top Rated Bangalore Call Girls Majestic ⟟  9332606886 ⟟ Call Me For Genuine S...Top Rated Bangalore Call Girls Majestic ⟟  9332606886 ⟟ Call Me For Genuine S...
Top Rated Bangalore Call Girls Majestic ⟟ 9332606886 ⟟ Call Me For Genuine S...narwatsonia7
 
Call Girls Raipur Just Call 9630942363 Top Class Call Girl Service Available
Call Girls Raipur Just Call 9630942363 Top Class Call Girl Service AvailableCall Girls Raipur Just Call 9630942363 Top Class Call Girl Service Available
Call Girls Raipur Just Call 9630942363 Top Class Call Girl Service AvailableGENUINE ESCORT AGENCY
 
8980367676 Call Girls In Ahmedabad Escort Service Available 24×7 In Ahmedabad
8980367676 Call Girls In Ahmedabad Escort Service Available 24×7 In Ahmedabad8980367676 Call Girls In Ahmedabad Escort Service Available 24×7 In Ahmedabad
8980367676 Call Girls In Ahmedabad Escort Service Available 24×7 In AhmedabadGENUINE ESCORT AGENCY
 
Call Girls Gwalior Just Call 8617370543 Top Class Call Girl Service Available
Call Girls Gwalior Just Call 8617370543 Top Class Call Girl Service AvailableCall Girls Gwalior Just Call 8617370543 Top Class Call Girl Service Available
Call Girls Gwalior Just Call 8617370543 Top Class Call Girl Service AvailableDipal Arora
 
Call Girl in Indore 8827247818 {LowPrice} ❤️ (ahana) Indore Call Girls * UPA...
Call Girl in Indore 8827247818 {LowPrice} ❤️ (ahana) Indore Call Girls  * UPA...Call Girl in Indore 8827247818 {LowPrice} ❤️ (ahana) Indore Call Girls  * UPA...
Call Girl in Indore 8827247818 {LowPrice} ❤️ (ahana) Indore Call Girls * UPA...mahaiklolahd
 
Independent Call Girls In Jaipur { 8445551418 } ✔ ANIKA MEHTA ✔ Get High Prof...
Independent Call Girls In Jaipur { 8445551418 } ✔ ANIKA MEHTA ✔ Get High Prof...Independent Call Girls In Jaipur { 8445551418 } ✔ ANIKA MEHTA ✔ Get High Prof...
Independent Call Girls In Jaipur { 8445551418 } ✔ ANIKA MEHTA ✔ Get High Prof...parulsinha
 
Call Girls Tirupati Just Call 8250077686 Top Class Call Girl Service Available
Call Girls Tirupati Just Call 8250077686 Top Class Call Girl Service AvailableCall Girls Tirupati Just Call 8250077686 Top Class Call Girl Service Available
Call Girls Tirupati Just Call 8250077686 Top Class Call Girl Service AvailableDipal Arora
 
Top Quality Call Girl Service Kalyanpur 6378878445 Available Call Girls Any Time
Top Quality Call Girl Service Kalyanpur 6378878445 Available Call Girls Any TimeTop Quality Call Girl Service Kalyanpur 6378878445 Available Call Girls Any Time
Top Quality Call Girl Service Kalyanpur 6378878445 Available Call Girls Any TimeCall Girls Delhi
 
Call Girls Ahmedabad Just Call 9630942363 Top Class Call Girl Service Available
Call Girls Ahmedabad Just Call 9630942363 Top Class Call Girl Service AvailableCall Girls Ahmedabad Just Call 9630942363 Top Class Call Girl Service Available
Call Girls Ahmedabad Just Call 9630942363 Top Class Call Girl Service AvailableGENUINE ESCORT AGENCY
 
Russian Call Girls Service Jaipur {8445551418} ❤️PALLAVI VIP Jaipur Call Gir...
Russian Call Girls Service  Jaipur {8445551418} ❤️PALLAVI VIP Jaipur Call Gir...Russian Call Girls Service  Jaipur {8445551418} ❤️PALLAVI VIP Jaipur Call Gir...
Russian Call Girls Service Jaipur {8445551418} ❤️PALLAVI VIP Jaipur Call Gir...parulsinha
 
Night 7k to 12k Navi Mumbai Call Girl Photo 👉 BOOK NOW 9833363713 👈 ♀️ night ...
Night 7k to 12k Navi Mumbai Call Girl Photo 👉 BOOK NOW 9833363713 👈 ♀️ night ...Night 7k to 12k Navi Mumbai Call Girl Photo 👉 BOOK NOW 9833363713 👈 ♀️ night ...
Night 7k to 12k Navi Mumbai Call Girl Photo 👉 BOOK NOW 9833363713 👈 ♀️ night ...aartirawatdelhi
 
Pondicherry Call Girls Book Now 9630942363 Top Class Pondicherry Escort Servi...
Pondicherry Call Girls Book Now 9630942363 Top Class Pondicherry Escort Servi...Pondicherry Call Girls Book Now 9630942363 Top Class Pondicherry Escort Servi...
Pondicherry Call Girls Book Now 9630942363 Top Class Pondicherry Escort Servi...GENUINE ESCORT AGENCY
 
VIP Service Call Girls Sindhi Colony 📳 7877925207 For 18+ VIP Call Girl At Th...
VIP Service Call Girls Sindhi Colony 📳 7877925207 For 18+ VIP Call Girl At Th...VIP Service Call Girls Sindhi Colony 📳 7877925207 For 18+ VIP Call Girl At Th...
VIP Service Call Girls Sindhi Colony 📳 7877925207 For 18+ VIP Call Girl At Th...jageshsingh5554
 
Call Girls Kakinada Just Call 9907093804 Top Class Call Girl Service Available
Call Girls Kakinada Just Call 9907093804 Top Class Call Girl Service AvailableCall Girls Kakinada Just Call 9907093804 Top Class Call Girl Service Available
Call Girls Kakinada Just Call 9907093804 Top Class Call Girl Service AvailableDipal Arora
 
Top Rated Bangalore Call Girls Richmond Circle ⟟ 9332606886 ⟟ Call Me For Ge...
Top Rated Bangalore Call Girls Richmond Circle ⟟  9332606886 ⟟ Call Me For Ge...Top Rated Bangalore Call Girls Richmond Circle ⟟  9332606886 ⟟ Call Me For Ge...
Top Rated Bangalore Call Girls Richmond Circle ⟟ 9332606886 ⟟ Call Me For Ge...narwatsonia7
 
Trichy Call Girls Book Now 9630942363 Top Class Trichy Escort Service Available
Trichy Call Girls Book Now 9630942363 Top Class Trichy Escort Service AvailableTrichy Call Girls Book Now 9630942363 Top Class Trichy Escort Service Available
Trichy Call Girls Book Now 9630942363 Top Class Trichy Escort Service AvailableGENUINE ESCORT AGENCY
 
Call Girls Hosur Just Call 9630942363 Top Class Call Girl Service Available
Call Girls Hosur Just Call 9630942363 Top Class Call Girl Service AvailableCall Girls Hosur Just Call 9630942363 Top Class Call Girl Service Available
Call Girls Hosur Just Call 9630942363 Top Class Call Girl Service AvailableGENUINE ESCORT AGENCY
 
Jogeshwari ! Call Girls Service Mumbai - 450+ Call Girl Cash Payment 90042684...
Jogeshwari ! Call Girls Service Mumbai - 450+ Call Girl Cash Payment 90042684...Jogeshwari ! Call Girls Service Mumbai - 450+ Call Girl Cash Payment 90042684...
Jogeshwari ! Call Girls Service Mumbai - 450+ Call Girl Cash Payment 90042684...Anamika Rawat
 
Call Girls Visakhapatnam Just Call 8250077686 Top Class Call Girl Service Ava...
Call Girls Visakhapatnam Just Call 8250077686 Top Class Call Girl Service Ava...Call Girls Visakhapatnam Just Call 8250077686 Top Class Call Girl Service Ava...
Call Girls Visakhapatnam Just Call 8250077686 Top Class Call Girl Service Ava...Dipal Arora
 

Recently uploaded (20)

Call Girls in Gagan Vihar (delhi) call me [🔝 9953056974 🔝] escort service 24X7
Call Girls in Gagan Vihar (delhi) call me [🔝  9953056974 🔝] escort service 24X7Call Girls in Gagan Vihar (delhi) call me [🔝  9953056974 🔝] escort service 24X7
Call Girls in Gagan Vihar (delhi) call me [🔝 9953056974 🔝] escort service 24X7
 
Top Rated Bangalore Call Girls Majestic ⟟ 9332606886 ⟟ Call Me For Genuine S...
Top Rated Bangalore Call Girls Majestic ⟟  9332606886 ⟟ Call Me For Genuine S...Top Rated Bangalore Call Girls Majestic ⟟  9332606886 ⟟ Call Me For Genuine S...
Top Rated Bangalore Call Girls Majestic ⟟ 9332606886 ⟟ Call Me For Genuine S...
 
Call Girls Raipur Just Call 9630942363 Top Class Call Girl Service Available
Call Girls Raipur Just Call 9630942363 Top Class Call Girl Service AvailableCall Girls Raipur Just Call 9630942363 Top Class Call Girl Service Available
Call Girls Raipur Just Call 9630942363 Top Class Call Girl Service Available
 
8980367676 Call Girls In Ahmedabad Escort Service Available 24×7 In Ahmedabad
8980367676 Call Girls In Ahmedabad Escort Service Available 24×7 In Ahmedabad8980367676 Call Girls In Ahmedabad Escort Service Available 24×7 In Ahmedabad
8980367676 Call Girls In Ahmedabad Escort Service Available 24×7 In Ahmedabad
 
Call Girls Gwalior Just Call 8617370543 Top Class Call Girl Service Available
Call Girls Gwalior Just Call 8617370543 Top Class Call Girl Service AvailableCall Girls Gwalior Just Call 8617370543 Top Class Call Girl Service Available
Call Girls Gwalior Just Call 8617370543 Top Class Call Girl Service Available
 
Call Girl in Indore 8827247818 {LowPrice} ❤️ (ahana) Indore Call Girls * UPA...
Call Girl in Indore 8827247818 {LowPrice} ❤️ (ahana) Indore Call Girls  * UPA...Call Girl in Indore 8827247818 {LowPrice} ❤️ (ahana) Indore Call Girls  * UPA...
Call Girl in Indore 8827247818 {LowPrice} ❤️ (ahana) Indore Call Girls * UPA...
 
Independent Call Girls In Jaipur { 8445551418 } ✔ ANIKA MEHTA ✔ Get High Prof...
Independent Call Girls In Jaipur { 8445551418 } ✔ ANIKA MEHTA ✔ Get High Prof...Independent Call Girls In Jaipur { 8445551418 } ✔ ANIKA MEHTA ✔ Get High Prof...
Independent Call Girls In Jaipur { 8445551418 } ✔ ANIKA MEHTA ✔ Get High Prof...
 
Call Girls Tirupati Just Call 8250077686 Top Class Call Girl Service Available
Call Girls Tirupati Just Call 8250077686 Top Class Call Girl Service AvailableCall Girls Tirupati Just Call 8250077686 Top Class Call Girl Service Available
Call Girls Tirupati Just Call 8250077686 Top Class Call Girl Service Available
 
Top Quality Call Girl Service Kalyanpur 6378878445 Available Call Girls Any Time
Top Quality Call Girl Service Kalyanpur 6378878445 Available Call Girls Any TimeTop Quality Call Girl Service Kalyanpur 6378878445 Available Call Girls Any Time
Top Quality Call Girl Service Kalyanpur 6378878445 Available Call Girls Any Time
 
Call Girls Ahmedabad Just Call 9630942363 Top Class Call Girl Service Available
Call Girls Ahmedabad Just Call 9630942363 Top Class Call Girl Service AvailableCall Girls Ahmedabad Just Call 9630942363 Top Class Call Girl Service Available
Call Girls Ahmedabad Just Call 9630942363 Top Class Call Girl Service Available
 
Russian Call Girls Service Jaipur {8445551418} ❤️PALLAVI VIP Jaipur Call Gir...
Russian Call Girls Service  Jaipur {8445551418} ❤️PALLAVI VIP Jaipur Call Gir...Russian Call Girls Service  Jaipur {8445551418} ❤️PALLAVI VIP Jaipur Call Gir...
Russian Call Girls Service Jaipur {8445551418} ❤️PALLAVI VIP Jaipur Call Gir...
 
Night 7k to 12k Navi Mumbai Call Girl Photo 👉 BOOK NOW 9833363713 👈 ♀️ night ...
Night 7k to 12k Navi Mumbai Call Girl Photo 👉 BOOK NOW 9833363713 👈 ♀️ night ...Night 7k to 12k Navi Mumbai Call Girl Photo 👉 BOOK NOW 9833363713 👈 ♀️ night ...
Night 7k to 12k Navi Mumbai Call Girl Photo 👉 BOOK NOW 9833363713 👈 ♀️ night ...
 
Pondicherry Call Girls Book Now 9630942363 Top Class Pondicherry Escort Servi...
Pondicherry Call Girls Book Now 9630942363 Top Class Pondicherry Escort Servi...Pondicherry Call Girls Book Now 9630942363 Top Class Pondicherry Escort Servi...
Pondicherry Call Girls Book Now 9630942363 Top Class Pondicherry Escort Servi...
 
VIP Service Call Girls Sindhi Colony 📳 7877925207 For 18+ VIP Call Girl At Th...
VIP Service Call Girls Sindhi Colony 📳 7877925207 For 18+ VIP Call Girl At Th...VIP Service Call Girls Sindhi Colony 📳 7877925207 For 18+ VIP Call Girl At Th...
VIP Service Call Girls Sindhi Colony 📳 7877925207 For 18+ VIP Call Girl At Th...
 
Call Girls Kakinada Just Call 9907093804 Top Class Call Girl Service Available
Call Girls Kakinada Just Call 9907093804 Top Class Call Girl Service AvailableCall Girls Kakinada Just Call 9907093804 Top Class Call Girl Service Available
Call Girls Kakinada Just Call 9907093804 Top Class Call Girl Service Available
 
Top Rated Bangalore Call Girls Richmond Circle ⟟ 9332606886 ⟟ Call Me For Ge...
Top Rated Bangalore Call Girls Richmond Circle ⟟  9332606886 ⟟ Call Me For Ge...Top Rated Bangalore Call Girls Richmond Circle ⟟  9332606886 ⟟ Call Me For Ge...
Top Rated Bangalore Call Girls Richmond Circle ⟟ 9332606886 ⟟ Call Me For Ge...
 
Trichy Call Girls Book Now 9630942363 Top Class Trichy Escort Service Available
Trichy Call Girls Book Now 9630942363 Top Class Trichy Escort Service AvailableTrichy Call Girls Book Now 9630942363 Top Class Trichy Escort Service Available
Trichy Call Girls Book Now 9630942363 Top Class Trichy Escort Service Available
 
Call Girls Hosur Just Call 9630942363 Top Class Call Girl Service Available
Call Girls Hosur Just Call 9630942363 Top Class Call Girl Service AvailableCall Girls Hosur Just Call 9630942363 Top Class Call Girl Service Available
Call Girls Hosur Just Call 9630942363 Top Class Call Girl Service Available
 
Jogeshwari ! Call Girls Service Mumbai - 450+ Call Girl Cash Payment 90042684...
Jogeshwari ! Call Girls Service Mumbai - 450+ Call Girl Cash Payment 90042684...Jogeshwari ! Call Girls Service Mumbai - 450+ Call Girl Cash Payment 90042684...
Jogeshwari ! Call Girls Service Mumbai - 450+ Call Girl Cash Payment 90042684...
 
Call Girls Visakhapatnam Just Call 8250077686 Top Class Call Girl Service Ava...
Call Girls Visakhapatnam Just Call 8250077686 Top Class Call Girl Service Ava...Call Girls Visakhapatnam Just Call 8250077686 Top Class Call Girl Service Ava...
Call Girls Visakhapatnam Just Call 8250077686 Top Class Call Girl Service Ava...
 

Correcting bias and variation in small RNA sequencing for optimal (microRNA) biomarker discovery and validation in cardio-metabolic (and renal) disease

  • 1. Correcting bias and variation in small RNA sequencing for optimal (microRNA) biomarker discovery and validation in cardio-metabolic (and renal) disease Christos Argyropoulos MD, PhD, FASN Department of Internal Medicine Division of Nephrology University of New Mexico Health Sciences Center
  • 2. Overview • Models of sequence counts in short RNA-seq experiments • Estimating and controlling for bias in small RNA-seq experiments • Statistical approaches to analyzing differential expression • MicroRNA regulation – a control theory perspective • MicroRNAs as biomarkers in diabetes, renal and cardiometabolic disease • Leveraging our approach for optimal biomarker discovery
  • 3. Signals in short RNA-seq data Building a model from first principles
  • 4. Background • Short RNA-seq data are becoming more and more abundant • There is poor reproducibility of findings between and within research groups • Systematic measurement bias confound findings • Systematic variation  relatively stable within protocols • Systematic variation  unpredictable between different protocols and platforms • Statistical methods may be used to explore and address such biases • Existing approaches are phenomelogical descriptions  • what do model parameters stand for? • how can one best use these models?
  • 5. Building a model from first principles • Establish testable predictions that may be verified in existing datasets • Establish correspondence between model parameters and experimental steps • Use this model to understand and correct systematic and random bias in short RNA-seq • Embed the model into more general frameworks for applications: • Epidemiological • Biomarker discovery and validation • Medical diagnostics
  • 6. The short RNA-seq experiment The vendor’s view The biochemist’s view https://doi.org/10.1093/nar/gkt1021 http://www.genomics.hk/SamllRna.htm http://www.geospiza.com/Products/SmallRNA.shtml
  • 7. X1 , X2 , … , Xn Λ1 , Λ2 , … , Λn B1 , B2 , … , Bn Y1 , Y2 , … , Yn Abundance in original preparation Abundance in adapted(ligated) sample Abundance in PCR amplified library Abundance in capture probes Abundance of counts in fastq files (ligation efficiency) fi (number of PCR cycles) N (PCR efficiency) qi Probability of capture si Number of probes (K) Library dilution factor (d) Probability of signal generation r Probability of sequence generation pi L1 𝑁 , L2 𝑁 , … , Ln 𝑁 Conceptual model of the short RNA-seq experiment (this is what we will talk about)
  • 8. Modeling the qPCR amplification reaction • Statistics of PCR amplification • Branching (Galton-Watson) process • GW distribution only available implicitly i.e. through simulation • Large scale simulations to derive approximation to the GW process • PCR literature, GW theory, martingale arguments  candidate distributions • Information theory arguments used to compute distance between GW samples and the approximate distributions • A (truncated) Normal distribution derived at the end X1 , X2 , … , Xn Λ1 , Λ2 , … , Λn B1 , B2 , … , Bn Y1 , Y2 , … , Yn L1 𝑁 , L2 𝑁 , … , Ln 𝑁
  • 9. Flattening the hierarchy through marginalization Integrate sources of variations out of the model: 1. library sequence depth variation 2. PCR amplification Final statistical model is about absolute counts • Direct modeling ≠ % of counts • Limit of approximation encompasses all possible sample compositions • The is a truncated Normal Poisson mixture distribution (approximated via a Negative Binomial or Linear Quadratic Gaussian family) Model implements a Linear-Quadratic (LQ) mean-variance relationship X1 , X2 , … , Xn Λ1 , Λ2 , … , Λn B1 , B2 , … , Bn Y1 , Y2 , … , Yn L1 𝑁 , L2 𝑁 , … , Ln 𝑁
  • 10. Distributional Regression for RNA- seq data LQ relationship between mean (𝜇) and variance (𝜎𝐿𝑄 2 ) 𝜎𝐿𝑄 2 = 𝜇(1 + 𝜙𝜇) • The variance and the mean have to be modelled concurrently • Unless variance is modelled  inconsistent statistics  small (overoptimistic) p values • Realm of distributional regression models (GAMLSS – Generalized Additive Models for Location, Scale and Shape) • One can re-use existing SW frameworks to fit such models
  • 11. Validating model(s) with synthetic mixes of known composition • Allow one to test the “backbone” of the model without worrying about the adequacy of the modeling of biology • Sequencing of equimolar mixes: • Explore and model systematic bias in the same protocol • Sequencing of dilution series or non-equimolar mixes: • “Dose-response” curve of the bias • Examination of “debiasing” approaches for the ability to uncover the truth • Model may also be used to analyze the performance of differential expression algorithms
  • 12. Testable predictions: mean and variance linear quadratic relationships in public RNA-seq data
  • 13. Linear Quadratic Relationship in the legacy datasets of the Galas group
  • 14. Estimating and Correcting for Ligase Bias At the corner of Biochemistry and Mathematics
  • 15. Enzymatic mechanism of RNA ligation • The kinetics of RNA ligation were investigated thoroughly in the 1970s and early 1980s • The intermolecular reaction is relevant to RNA-seq • The mechanism involves three, fully reversible, steps that obey ping-pong ordered kinetics and are subject to substrate inhibition 𝐸 + 𝐴𝑇𝑃 𝑘1 𝑘−1 𝐸 ∙ 𝐴𝑇𝑃 𝑘1𝑎 𝑘−1𝑎 𝐸 − 𝐴𝑀𝑃 + 𝑃𝑃𝑖 𝐸 − 𝐴𝑀𝑃 + 𝐷 𝑘2 𝑘−2 𝐸 ∙ 𝐴𝑝𝑝 − 𝐷 𝑘2𝑎 𝑘−2𝑎 𝐸 + 𝐴𝑝𝑝 − 𝐷 𝐸 ∙ 𝐴𝑝𝑝 − 𝐷 + 𝐴 𝑘3 𝑘−3 𝐸 ∙ 𝐴𝑝𝑝 − 𝐷 ∙ 𝐴 𝑘3𝑎 𝑘−3𝑎 𝐴𝑀𝑃 + 𝐸 + 𝐴𝐷  Bias in RNA-ligation was noted in these early investigations and the enzyme was never used as tool in synthetic chemistry, as solid phase methods took off in the 80s
  • 16. Kinetic analysis of ligase reaction velocity in RNA-seq protocols • Existing protocols include abundant cofactors (sharp contrast to the experiments in 1970s) Drive reaction to the right Rate limiting single step reaction instead of tri-step one Substrate preference (bias in reaction yields) is not eliminated • Multi-substrate inhibition from all biosample sequences available from ligation Analytical series approximation for ratios of random variables • Ligase operates at the 1st order domain of Michaelis- Menten kinetics 𝑉𝑖 = 𝑉𝑖 𝑚𝑎𝑥 𝑋𝑖 𝐾 𝑀 𝑖 1 + 𝑖 𝑋𝑗 𝐾 𝑀 𝑗 ≈ 𝑉𝑖 𝑚𝑎𝑥 𝑋𝑖 𝐾 𝑀 𝑖 1 + 𝑛 𝐸 𝑋 𝐸 𝐾 𝑀 = 𝑉𝑖 𝑚𝑎𝑥 𝑋𝑖 𝐾 𝑀 𝑖 1 + )𝐶 𝑇𝑜𝑡𝑎𝑙(0 𝐸 𝐾 𝑀 ≈ 𝑉𝑖 𝑚𝑎𝑥 𝑋𝑖 𝐾 𝑀 𝑖
  • 17. Testable model predictions about ligase bias in RNA-seq experiments Mathematical expression • 𝑋𝑖 1 − exp − 𝑉𝑖 𝑚𝑎𝑥 𝐾 𝑀 𝑖 𝑇𝑅 = 𝑋𝑖 𝑓𝑖 5ʹ • Λ 𝑖 = 𝑋𝑖 𝑓𝑖 5ʹ 𝑓𝑖 3ʹ = 𝑋𝑖 𝑓𝑖 Implications for ligase bias • Concentration independence • Sample composition independence • Transferable within experiments done with the same protocol • Protocol dependent (reaction velocity incorporates concentration of cofactors and enzyme) • Sequence equimolar mixes to derive empirical correction factors for ligase bias • Apply those to biological samples (“offsets” in distributional regression) to eliminate bias
  • 18. There is substantial variation in raw sequence counts from equimolar mixes
  • 19. Application of bias factors virtually eliminates ligase bias Monte Carlo Cross Validation in 3 equimoral datasets: randomly split the dataset into learning and testing subsets, learn the correction factor and apply it to correct the estimates of the learning dataset. Repeat N times
  • 20. Empirical factors nearly eliminate bias between equimolar datasets with 10x different input (Galas Lab legacy datasets)
  • 21. Bias factors in public non-equimolar short-RNA seq datasets
  • 22. Design of Validation Experiments What has been established? • Moderate concentration independence • Ability to nearly eliminate bias over at least two orders of magnitude • Legacy platforms/experiments What needs to be proven? • Concentration independence over >2 orders of magnitude • Sample composition independence • Recovery of differential expression measures • Any value relative to existing approaches?
  • 23. Validation Experiments Collaboration between PNRI (Galas Lab) and UNM (DoIM) The largest, single protocol, technical series to date (GSE93399) Experimental Group Dilution N miRExplore (972 short RNAs) 1:10 10 286 miRNAs 1:1 8 1:10 8 1:100 8 1:1000 8 Ratio Metric Series A (descending) Mix of  286 subpool A (1:1)  286 subpool B (1:10)  286 subpool C(1:100)  286 subpool D (1:1000) 8 Ratio Metric Series B (ascending) Mix of  286 subpool A (1:1000)  286 subpool B (1:100)  286 subpool C(1:10)  286 subpool D (1:1) 8 Total 7 groups (58 sequenced x 2 = 116)
  • 24. Empirical bias correction over 3 orders of magnitude in equimolar datasets RMSE reduction: 77%-90% (input in calibration run differs by up to x10 from target), 54%-67% otherwise
  • 25. Empirical factors reduce bias by nearly 60% in non-equimolar series
  • 27. Bias Correction in Heterogeneous Samples • Correction factors remove ~55% of bias between equimolar samples • ~ 70% of RNAs have expression within two fold from the mean (from 23%) • Bias reduction is ~40% in ratiometric series • ~63% of RNAs have expression within x2 from the mean (from 33%)
  • 28. Differential Expression When more is less, and simplest is the best
  • 29. Our proposal for a model of differential expression (DE) changes Statistical formulation and assumptions log 𝜇𝑖,𝑗,𝑘 = 𝛼 + Δk + 𝑚𝑖,0 + 𝛿𝑖,𝑘 𝑚𝑖,0 ~ 𝑁𝑜𝑟𝑚𝑎𝑙(0, 𝜎𝜇0 2 ) 𝛿𝑖,𝑘 ~ 𝑁𝑜𝑟𝑚𝑎𝑙(0, 𝜎 𝑘 2 ) (similar model for variance) 1. Expression in reference state is not of prime scientific interest (can omit correction for bias) 2. Technical sources of variation (PCR efficiency, library sampling) of much smaller magnitude than biological variability Parameter interpretation and context of use • Accommodates global and sequence specific DE changes • Flexible modeling of referent (global level and variation around it) • Still models counts • No incorporation of library specific factors (model is un-normalized)
  • 30. • Number of reads in sample j, assigned to species i (Ki,j) • Assumed to follow a negative binomial distribution: • 𝐾𝑖,𝑗~𝑁𝐵(𝜇𝑖,𝑗, 𝜎𝑖,𝑗 2 ) Existing Models for RNA-seq experiments Standard deviation =𝜇𝑖,𝑗 + 𝑎𝜇𝑖,𝑗 2 (edgeR1) 1Biostatistics 2008, 9:321-32 2 Genome Biology 2010, 11:R106 =𝜇𝑖,𝑗 + 𝑠𝑗 2 𝑓(𝑚𝑖,𝑗) (DESeq2) Mean = 𝑚𝑖,𝑗 × 𝑠𝑗 Common scale (coverage of the library, sequence depth) Experimental Effects iijim ,1,0, )log(   miRNA expression in the control group miRNA expression in the experimental group Model for differential expression analysis
  • 31. Comparison of proposed approach against existing methods “We” (gamlss) • Uses the NB or the LQNO • LQ relation between mean and variance • Variance and mean parameters are estimated simultaneously • Explicit count based modeling • Un-normalized • Shrinkage via random effects modeling • Derived from first principles (a generative probability model) “They” (edgeR/DESeq2 etc) • NB or the linear model • LQ or flexible relation between mean & variance • Two stage procedure to estimate parameters • Models counts as % of a given library depth • Normalized (% sum to one) • Shrinkage via random effects modeling • Ad hoc, phenomenological probability model
  • 32. Scenarios of differential expression to assess method performance • Clustered, symmetric differential expression 1. fraction of overexpressed sequences is equal to that of the underexpressed 2. no change in global expression over and underexpressed RNAs are present in equal numbers and exhibit same degree of DE • Asymmetric, clustered differential expression 1. Fraction of overexpressed sequences ≠ underexpressed Drives global expression change to one direction • Global Change: all RNAs exhibit a variable but consistent directional change of expression • No change All scenarios implemented through the validation datasets
  • 33. The GAMLSS has smaller RMSE than 10 popular workflows for DE analysis • Performance benefit seen under scenarios of asymmetric, clustered differential expression changes • When DE are (nearly) symmetric, many other methods have similar performance
  • 34. Existing methods cannot detect global, directional differential expression
  • 36. GAMLSS demonstrates the optimal balance between False Omission and False Discovery Rates ROC Curve Analysis FDR and FOR
  • 37. What did we just find out about algorithms for DE analysis? • Proposed method (GAMLSS) is the top performer: • Symmetric, clustered, DE changes • Asymmetric clustered, DE changes • Asymmetric global, DE changes • No DE change Optimal balance between FDR and FOR • Existing methods introduce moderate – to – severe bias • force the overall DE to sum to zero (what goes up must be accompanied by something that goes down) • Voom/limma somewhat more resilient, near identical performance to GAMLSS under symmetric DE These patterns have not seen before, because no-one to date has generated datasets with known composition/DE
  • 38. Why do existing methods fail to deliver? • Existing models for RNA-seq analysis e.g deSEQ, edgeR can be derived from 1st principles as approximations • RNA-seq counts as % of library depth • Valid for dilute samples, not dominated by a few RNA species • Library size depth and modeling counts as % (a relic of the SAGE era) may be a disastrous distraction • Parameterization constraints DE over all RNAs included in the analysis to sum to zero
  • 39. Practical implications for experimentalists (not using GAMLSS) • Any change to the population of RNAs modelled (e.g. filtering)→ different DE values from the same dataset • Both type M (degree of DE changes) and type S (label an over-expressed sequence to be under-expressed & vice-versa) errors • Up to 25% of estimated DE changes may be of the wrong direction • Up to 100% of estimated DE changes may be of the wrong magnitude • RNA-seq findings will fail to validate against qPCR • Reputation of RNA-seq as a semiquantitative technique of poor reproducibility is due to statistical methodology
  • 40. MicroRNA regulation A control theory perspective
  • 41. microRNA biology & therapeutic applications http://www.nature.com/nature/journal/v469/n7330/fig_tab/nature09783_F1.html http://www.nature.com/nature/journal/v469/n7330/full/nature09783.html
  • 42. Control In Biological Systems Is Many- To-Many, Cooperative And Patterned Feala JD, et al. PLoS ONE 7(1): e29374. (2012) Riba A et al PLoS Comput Biol 10(2): e1003490. (2014) Bipartite Control Network Topologies miRNA – Transcription Factor circuits Feed Forward Loop: master control layout in many natural and artificial control systems
  • 43. How do we control things? Predictably simple (open loop) Error Correcting (feeback) Model based (feed forward)
  • 44. Feed forward control • Control element responds to a change in the environment in a predefined manner • Based on prediction of plant (“what is being controlled”) behavior (requires model of the system) • Can react before error actually occurs (stabilizing the system, e.g. cerebellum control of balance) • Benefits: reduced hysteresis, increased accuracy, cost-efficiency, lower “wear-tear”
  • 45. Practical implications • miRNAs function as master controllers in FFLs • biology is intrinsically NOT model free • miRNA profiling reveals the “plant” dynamics of complex biological processes • Emerging data suggest that sequence variation may underline (dys-)regulation • miRNA associations are by definition causal to some aspects of a particular phenotype • “a priori plausible” biomarkers • direct therapeutic implications • Examination of the “plant” (targets) may have implications for microRNA research • Context for the interpretation of microRNA changes • “Stronger” biomarker signatures
  • 46. microRNAs are rational candidates for exploring paradigm shifts in biology • Ubiquity-conservation • Breadth & width of regulation (>60% of genes) • Context-specificity (“meta-controller”) • Master Controllers in Feed Forward Loops These arguments are not disease area specific (e.g. apply equal well to cancer or even psychiatric disease)
  • 47. MicroRNAs as biomarkers Renal, Diabetes and Cardiometabolic Disease
  • 48. • 8-10% of the population suffer from diabetes • 20-30% of patients with diabetes will develop evidence of diabetic chronic kidney disease (DKD/CKD) • DKD progresses in stages of increasing proteinuria • 50% of patients with overt nephropathy will develop End Stage Renal Disease (ESRD) within 10 years • The end result: Diabetic nephropathy is the leading cause of ESRD, requiring dialysis or kidney transplantation accounting for 40% of cases Facts, figures and the natural history of cardiometabolic and renal disease in diabetes
  • 49. • DKD is costly: • 40-50% of the $44B Medicare expenditures for CKD • 40-50% of the $50B total healthcare costs for ESRD • DKD is lethal (>50% of these deaths are cardiac) • Current therapies reduce risk by 30% • Many of the things we tried to stabilize renal function AND improve cardiovascular disease failed miserably in trials • A paradigm change in our understanding of DKD is warranted => We posit that miRNAs will trigger this shift • This improvement likely spread to other areas given biology of cardiovascular disease (“extreme phenotype”) There is a significant unmet need for therapies that stabilize progression and reduce death rates in patients with diabetic kidney disease 1Afkarian et al J Am Soc Nephrol. 2013 Feb;24(2):302-8 US1 population No Diabetes Diabetes No CKD 7.7% 11.5% CKD 17.2 31.1% 0 10 20 30 40 50 60 405060708090100 Dialysis Mortality Time (months) %Surviving GN DM
  • 50. Why bother with microRNAs in DKD? Heart & Vessels • Angiogenesis • Vascular inflammation • Atherosclerosis • LVH • Vascular tone • Endothelial dysfunction Kidney • Water homestasis • Osmoregulation • Calcium sensing • Sodium, potassium, acid base handling • Renin production • Renal development • Renal senescence • EMT • Collagen production Diabetes • Insulin synthesis and secretion • Peripheral tissue sensitivity • Hepatic glucose production • Inflammatory gene expression
  • 51. microRNAs as Minimally Invasive Biomarkers : a metrological argument Advantages of microRNAs Circulating microRNAs •More stable in circulation than mRNAs •High expression level and low complexity compared to mRNA •Tissue specific expression •Availability of analytical platforms Keep getting cheaper over time •Sequence conservation Allows translation of clinical associations to animal models Allows translation of animal models to clinical applications Cortez et al Nat Rev Clin Oncol. Jun 7, 2011; 8(8): 467–477.
  • 52.
  • 53.
  • 54. Targets of differentially expressed miRNAs in early and late stages of DN map to overlapping pathways MA v.s. NA Overt vs Normal Pathway P-value Fraction P-value Fraction Signal Transduction Signaling by SCF-KIT 0.006 18/76 0.001 41/76 Signaling by Insulin receptor 0.009 23/109 <0.001 65/109 Signaling by NGF 0.016 38/212 <0.001 119/212 Signaling by Rho GTPases 0.024 24/125 <0.001 71/125 Signaling by ERBB4 0.027 16/76 <0.001 45/76 Signaling by ERBB2 0.035 19/97 <0.001 59/97 Signaling by PDGF 0.040 22/118 <0.001 67/118 Signaling by VEGF 0.041 4/11 Signaling by EGFR 0.044 20/106 <0.001 64/106 Dowstream signaling of activated FGFR 0.038 19/98 <0.001 61/98 Signaling by BMP 0.001 16/23 Signaling by TGFβ 0.004 11/15 DAG and IP3 signaling 0.010 20/31 PIP3 activates AKT signaling 0.020 15/26 RAF/MAP kinase cascade 0.031 7/10 Signaling by Notch 0.036 13/23 Interaction of integrin α5β3 with fibrillin 0.044 2/3 Interaction of integrin α5β3 with von Willbrand factor 0.044 2/3 Integrin cell surface interactions 0.024 40/85 Cell-Cell Communication 0.009 57/122 Cell Cycle G0 and early G1 0.040 12/21
  • 55. Leveraging the RNA-seq analytical methodology To boldly go where no one has gone before (but many have tried)
  • 56. Goals of a microRNA research program in cardiometabolic, renal and diabetes diseases • Use carefully designed case-control, before- after, randomized controlled trials, and n-of- 1 trials for the following goals: 1. Personalized medicine applications (diagnosis/prognosis/precision medicine) 2. Biomarker discovery (e.g. to aid trials) 3. Novel Therapeutics
  • 57. Animal Models Clinical Associations Clinical Interventions A microRNA driven discovery process Biomarker Discovery Mechanistic Insights Therapeutics Clinical Science, Bioinformatics, Systems Biology Driven “Reverse Translation” Translational Science Evidence Based Medicine Basic Science
  • 58. Ingredients for success of a microRNA regulation discovery program Requires open-ended platforms (RNA-seq) o Especially for kidney disease due to intrarenal RNA editing Requires unbiased quantification between groups of patients (differential expression analysis) Requires unbiased and accurate quantification in the absence of a controlled comparison (diagnostics – bias correction) Proposed approach: GAMLSS for RNA-seq satisfies requirements better than all currently used methods
  • 59. Measurement in clinical diagnostics What we want to happen What actually happens Patient 1 10,10 Measurement is reproducible Measurement shows minimal inter-individual variation Measurement shows minimal intra-individual variation JANUARYJUNE Condition A JANUARYJUNE Patient 2 10,10 Patient 3 15,15 Condition B Patient 4 15,15 Patient 1 10,10 Condition A Patient 2 10,10 Patient 3 15,15 Condition B Patient 4 15,15 Patient 1 10,18 ? Condition Patient 2 13,10 Patient 3 15,10 Condition B Patient 4 15,18 Patient 1 10,12 Patient 2 15,14 Patient 3 18,11 Patient 4 14,19 Condition A ? Condition Condition A Condition B ? Condition Condition BMeasurement is non-reproducible Measurement shows high inter-individual variation Measurement shows high intra-individual variation
  • 60. • Understand and control for the sources of variation • Use calibration sets as references • A measurement is instrument specific • Global reference standards (role for highly competent labs that maintain the standards) • Context of use: • Detector (“out-of-limits” readings) • Control (“track the course”) Lessons from clinical chemistry labs • Use GAMLSS as the prime analytical tool to analyze short RNA-seq data as it correctly represents all sources of variation and can use calibration (equimolar) runs • Combine this with a protocol that experimentally controls variation (e.g. 4N protocol of the Galas Lab)
  • 61. Measurement in experimental samples What we want to happen What actually happens Condition A Condition B 10, 10, … , 10 15, 15, … , 15 B > A Certain of the difference Measurement is reproducible Measurement shows no variation RUN1RUN2 Condition A Condition B 10, 10, … , 10 15, 15, … , 15 B > A Condition A Condition B 11, 7, … , 10 8, 19, … , 26 B > A Uncertain of the difference Measurement is non-reproducible Measurement shows high variation RUN1RUN2 Condition A Condition B 120, 90, … , 130 150, 60, … , 20 B < A • Use GAMLSS as the prime analytical tool to analyze short RNA-seq data as it optimizes discovery/omission rates & exhibits the least bias • BUT what do these correctly/unbiasedly assessed DE changes mean?
  • 62. Understanding the context for differential expression changes • A list of de-regulated targets will not by itself support the microRNA discovery process • Need some context to interpret changes and guide further research • This context is provided by analysis of microRNA targets • We have proposed and applied a formal target analysis methodology in our early diabetic nephropathy investigations
  • 63. Formal Target Analysis: A Biochemical Primer 1. Hill plot: 2. Fold change between two states: 3. Change in binding between the two states 4. Means and standard errors for the fold changes can be synthesized using random effects meta-analysis 5. Integration of fold changes from different experiments dKL loglog)logit( 1 log           FC R E L L 2log 2  2loglog)(log)logit()logit( 2  FCREORRE  • Use GAMLSS as the prime analytical tool to analyze differential expression in short RNA-seq data as it achieves the smallest error among algorithms http://www.pdg.cnb.uam.es/cursos/BioInfo2002/pages/F armac/Comput_Lab/Guia_Glaxo/chap3b.html
  • 64. The 1st grade approach to target analysis Heuristic Argument: count the number of miRNAs with small p values • Total Score (TS)= # of differentially expressed miRNAs predicted to bind to a given target • Regulation Score (RS)= # over-expressed- # under-expressed miRNAs predicted to bind to a given target TS Low High RS - - 0 0 + + Low Signal To Noise Ratio Target likely disinhibited Target likely neutrally modulated Target likely inhibited • Use GAMLSS as the prime analytical tool to analyze putative targets of differentially expressed microRNAs as it achieves the optimal balance between FDR/FOR
  • 65. Target Analysis for PDGF- Beta in patients with overt diabetic kidney disease (DKD) Study Fixed effect model Random effects model I-squared=0%, tau-squared=0, p=0.9656 hsa-let-7a-5p hsa-let-7b-5p hsa-let-7c hsa-let-7d-5p hsa-let-7e-5p hsa-let-7f-5p hsa-let-7g-5p hsa-let-7i-5p hsa-miR-106a-5p hsa-miR-106b-5p hsa-miR-122-5p hsa-miR-1224-3p hsa-miR-134 hsa-miR-140-3p hsa-miR-17-5p hsa-miR-1909-3p hsa-miR-1913 hsa-miR-204-5p hsa-miR-20a-5p hsa-miR-20b-5p hsa-miR-2110 hsa-miR-2113 hsa-miR-324-3p hsa-miR-329 hsa-miR-335-5p hsa-miR-342-3p hsa-miR-361-3p hsa-miR-450b-3p hsa-miR-491-5p hsa-miR-501-5p hsa-miR-545-3p hsa-miR-558 hsa-miR-603 hsa-miR-608 hsa-miR-663b hsa-miR-765 hsa-miR-93-5p TE 0.80 -0.46 -0.30 0.61 0.22 0.32 0.71 0.45 0.37 0.37 -0.06 1.52 0.44 0.08 0.51 0.32 0.83 0.43 -0.12 0.33 0.09 0.55 0.07 0.14 1.78 -0.10 0.05 0.74 0.60 -0.08 -0.01 0.27 -0.64 0.11 -0.41 0.72 -0.12 seTE 0.5893 0.5709 0.5681 0.6348 0.5636 0.5604 0.6051 0.6479 0.5721 0.6578 0.5752 0.7148 0.5414 0.6300 0.6882 0.5286 0.5430 0.5450 0.5736 0.7984 0.5451 0.7309 0.5503 0.5424 0.6324 0.5810 0.5991 0.6166 0.6992 0.7341 0.7830 0.5398 0.5310 0.7424 0.8823 0.5878 0.5416 0.2 1 2 5 15 50 150 Odds Ratio Expression Ratio OR 1.33 1.33 2.23 0.63 0.74 1.85 1.24 1.38 2.04 1.56 1.45 1.45 0.94 4.56 1.56 1.09 1.67 1.38 2.28 1.54 0.89 1.39 1.09 1.74 1.07 1.15 5.90 0.90 1.05 2.10 1.81 0.93 0.99 1.31 0.53 1.12 0.66 2.06 0.89 95%-CI [1.09; 1.61] [1.09; 1.61] [0.70; 7.09] [0.21; 1.93] [0.24; 2.26] [0.53; 6.42] [0.41; 3.75] [0.46; 4.14] [0.62; 6.66] [0.44; 5.56] [0.47; 4.46] [0.40; 5.26] [0.31; 2.91] [1.12; 18.50] [0.54; 4.50] [0.32; 3.73] [0.43; 6.42] [0.49; 3.88] [0.79; 6.62] [0.53; 4.48] [0.29; 2.73] [0.29; 6.63] [0.38; 3.18] [0.41; 7.28] [0.36; 3.14] [0.40; 3.34] [1.71; 20.39] [0.29; 2.81] [0.32; 3.39] [0.63; 7.03] [0.46; 7.14] [0.22; 3.90] [0.21; 4.60] [0.45; 3.76] [0.19; 1.50] [0.26; 4.79] [0.12; 3.74] [0.65; 6.50] [0.31; 2.57] W(fixed) 100% -- 2.8% 3.0% 3.1% 2.5% 3.1% 3.2% 2.7% 2.4% 3.0% 2.3% 3.0% 1.9% 3.4% 2.5% 2.1% 3.5% 3.4% 3.3% 3.0% 1.6% 3.3% 1.9% 3.3% 3.4% 2.5% 2.9% 2.8% 2.6% 2.0% 1.8% 1.6% 3.4% 3.5% 1.8% 1.3% 2.9% 3.4% W(random) -- 100% 2.8% 3.0% 3.1% 2.5% 3.1% 3.2% 2.7% 2.4% 3.0% 2.3% 3.0% 1.9% 3.4% 2.5% 2.1% 3.5% 3.4% 3.3% 3.0% 1.6% 3.3% 1.9% 3.3% 3.4% 2.5% 2.9% 2.8% 2.6% 2.0% 1.8% 1.6% 3.4% 3.5% 1.8% 1.3% 2.9% 3.4% Target Gene: PDGFB
  • 68. To boldly go where no one has gone before…. Methodological • Extend the model to account for abundance dependent variations in PCR efficiency • Incorporate target analysis into count analysis • Estimate ligase bias from the sequence (computationally derived correction factors) microRNA biomarkers projects • COMPASS: a community disease detection program focusing on diabetes and CKD in rural New Mexico • MIRROR-Transplant: metabolic and immunological factors contributing to kidney transplant failure • DIDIT: randomized controlled trial to preserve urine production in patients starting dialysis • Potential areas for collaboration in the NIH biorepository?
  • 69. Summary • A generative, probability, model for the counts of short RNA-seq measurements was developed • This model may be used to estimate and substantially correct for the presence of ligase bias • It achieves superior performance (smaller error, optimal balance of false discoveries and omissions) than other competing methodologies • Can be used to power “personalized” medicine applications or experimental state comparisons • Formal target analysis to guide further research (“reverse-translation”)
  • 70. Acknowledgements • This work could not have been completed without the collaboration of the Galas Lab at PNRI David Galas: provided a friendly ear that had the patience to listen, comment and risk time and funds for the experiments Alton Etheridge: pushed for extensive sequencing and resequencing and carried out all the validation experiments Nikita Sakhanenko: had the patient to be our software tester, validator and GEO submitter • This work would not have started without John P (Nick) Johnson (University of Pittsburgh) who kicked me into the area about 8 years ago https://bitbucket.org/chrisarg/rnaseqgamlss
  • 71. ?
  • 73. Building the model from first principles • Establish statistical distributions OR deterministic relationships that “bind” together the quantities in successive steps • There is a “competitive qPCR” experiment beating inside each RNA-seq dataset  random • Ligase bias is reproducible  deterministic/systematic • Apply marginalization (integration) operations to “flatten” the hierarchy • Derive the exact distributions (or the limits of approximation) for a statistical model that directly represents the quantity of interest • Relate model parameters to quantities of interest (absolute/relative quantification)
  • 74. Facts about the distribution of RNA-seq data • Established relationships between distributions that were first explored in the 1920-1930s • Rare biomedical applications in the 1940s • Theoretical work in the early 1960s • Lead goes cold due to failure to conceptualize practical applications after the 1960s • Extremely involved expressions involving special functions of mathematical physics (parabolic cylinder functions)  numerical complexities will hinder attempts to use them as-is in applications
  • 75. Rediscovering a Negative Binomial parameterization and introducing a new Gaussian Generalized Linear Normal Family • Large scale numerical simulations (>500,000) to establish approximations for the RNA-seq distribution • Arbitrary precision libraries in python in multicore machines • Low precision – but acceptable for statistical computations • Both approximations implement a LQ relationship between the mean and variance • Inferences are largely the same (shown in synthetic mixes)
  • 76. Two equivalent views of measures of differential expression: Fold Change and Probability of Over-Expression • The GLM approach (limma, DESeq/DESeq2, gamlss ) yield measures of differential expression for microarrays, RNA-Seq or qPCR experiments • These are estimates of fold changes (noise) and their associated standard errors (signal) • They can be converted to probability estimates(= 𝒑) about the signal being >0 (overexpressed) v.s. <0 • The standard error of 𝑝 is given by 𝑝(1 − 𝑝) -2 -1 0 1 2 3 4 0.00.10.20.30.4 Fold Change Estimated Fold Change Fold Change = 1.0, SE=1.0, shaded area (=1.0-pnorm(0,FC,SE) in R) yields probability of overexpression Computing probability of differential expression (pDE) in R
  • 77. Why do we need two views of the same data? The FC View • Absolute, relative quantification is possible • Fold changes in one miRNA are directly comparable against each other • Fold changes are comparable between and within techniques • Type I and II statistical errors The pDE View • Only relative, relative quantification is possible • Platforms provide evidence for directional changes in expression • Type M and S errors • Provides input to Systems Biology tools (e.g Boolean Networks)
  • 78. • Experimental work in late 19th century to discover the physiological basis of coagulation (“prothrombin”) • Development of different versions of the “Prothrombin Time”: investigations in hemophilia, post-op bleeding & liver disease (1930s-1950s): derived the normal range and ranges associated with specific deficits • Pre-analytical considerations throughout the 1950s (and even today) • In the 70s PT was used to monitor and dose warfarin in the clinic • Classical studies in the 70-80s demonstrate high inter, intra and analytic variability (despite > 30 years of standardization) • WHO proposed to standardize the test in the mid 1980s through the use of the INR (international normalized ratio) Solid measurements for thinning one’s blood: the history of the PT test http://www.clinchem.org/content/51/3/553.full http://circ.ahajournals.org/content/19/1/92.full.pdf Thromb Haemost. 1985 Feb 18;53(1):155-6.
  • 79. The cautious story of the INR Normalization procedure • 𝐼𝑁𝑅 = 𝑃𝑇 𝑝𝑎𝑡𝑖𝑒𝑛𝑡 𝑃𝑇 𝑛𝑜𝑟𝑚𝑎𝑙 𝐼𝑆𝐼 • PTnormal : Geometrical mean of 20 patients • 𝐼𝑆𝐼 = log(𝑐𝑎𝑙𝑖𝑏𝑟𝑎𝑡𝑜𝑟 𝐼𝑁𝑅) log 𝑃𝑇 𝑐𝑎𝑙𝑖𝑏𝑟𝑎𝑡𝑜𝑟 /log(𝑃𝑇 𝑛𝑜𝑟𝑚𝑎𝑙) Sources of variation • Different methods to measure the PT • Different instruments that implement each method • Different calibrator sets for each instrument! http://www.who.int/bloodproducts/publications/WHO_TRS_889_A3.pdf http://www.clinchem.org/content/56/10/1618.full http://www.clinchem.org/content/51/3/553.full
  • 80. Statistics Of Biological Regulatory Networks Feala JD, et al. PLoS ONE 7(1): e29374. (2012)
  • 81. Pathophysiology of the cardiorenal syndrome http://www.kdigo.org/meetings_events/pdf/KDIGO%20CVD%20Controversy%20Rpt.pdf

Editor's Notes

  1. FFL = Feed Forward Loop (=cerebellar circuits) Common Statistical Properties in many gene regulatory pathways
  2. “Understanding biology by reverse engineering the control”
  3. Comment that among those with ESRD one may find an approximate equal proportion of patients with T1D and T2D, even though T2D is more frequent that T1D. The reason for this discrepancy is that overt nephropathy develops less frequently in patients T2D, many of whom will die from macrovascular complications before their kidney disease progresses Ref for this slide: Diabetes Care January 2004 vol. 27 no. suppl 1 s79-s83 http://care.diabetesjournals.org/content/27/suppl_1/s79.full
  4. To put things in context: five year survival stage for - All cancers: 68% Breast cancer: 72% (stage III), 22% (stage IV) Colon Cancer: 53% (IIIC), 11% (IV) Prostate Cancer: 28% (distant)
  5. Analysis of enriched terms in REACTOME (Table) suggest that the predicted miRNA targets map to a distinct pathways involving growth factor signaling, apoptosis, immunity, substrate metabolism, transmembrane transport and certain non-kidney related terms. Furthermore, the identified pathways overlapped considerably between the comparisons of patients with overt nephropathy and normals , and follow-up v.s. baseline samples from MA patients. In the comparisons within baseline and follow-up MA samples we found only a few (<80) targets mapping to annotated REACTOME pathways, thus precluding a meaningful assessment with this structured vocabulary.
  6. This slides shows a clinical diagnostics scenario in which we would like to use a microRNA as a biomarker for a clinical condition. In contrast to experimental samples, clinical diagnostic samples work in a “inverse” mode and are used in regulated environments. In particular clinical biomarkers are used to infer the presence of a clinical condition based (conditional) on the actual measurement obtained. Threshold for diagnosis: 14 to distinguish condition A from B
  7. Show hypothetical repetitive measurements in two experimental or clinical conditions. In the left panel, there is no variation and the underlying pattern is visible to the naked eye without any “statistical” assistance In the right panel, noise variation makes different to discern the patterns – the measurement is imperfect (in this case measurements from condition B are generated from distributions with higher means than A)
  8. When the TS is low, the gene targeted receives input from a small number of DE miRNAs on the basis of the expression data so it would seem that such targets are not high priority for validation, or rather that the effects of miRNA on mRNA would be difficult to sort out (low signal to noise ratio). On the other hand, if a target has a high TS this suggests that miRNAs potentially play an important role in modulating the expression of that target. In such a case the RS may allow a semi-quantitative assessment of the direction and the magnitude of the modulation: Negative: target is likely “dis-inhibited” (expression may go up) Positive: target is likely inhibited (expression may go down) Zero: the modulation is likely neutral Caveat1: In a biological fluid such as urine which integrates the microRNA signatures from diverse cellular populations in the kidney and the genitourinary tract, one may get a zero RS as a result of a positive RS in one cellular population and a negative one in another. Caveat2: In the context of transcriptional/translational regulation, the anticipated (on the miRNA expression profile) mRNA response may be in the same or even in the opposite direction than the measured mRNA levels. In the former case, the miRNA pattern enhances, while in the latter case it opposes (counter-balances) the mRNA response.
  9. This figure shows the meta-analysis for the target PDGF Beta. This factor has been shown to be expressed in biopsies from human patients with overt diabetic nephropathy, so it can be used as benchmark of the proposed methodology. Out of the 68 predicted miRNAs targeting PDGFB, 37 were represented in our urinary profiles. It can be seen that the majority of these miRNAs do not exhibit a directional change (they are scattered around the vertical line of no effect OR=1), and only two stand out. When the evidence is synthesized together, using techniques and software for clinical trial analysis, the odds ratio is >1 suggesting that miRNAs are working towards inhibiting the gene. As the evidence from clinical biopsy material is that this particular factor is upregulated in diabetic kidneys, the miRNA pattern is probably providing a counter-balancing influence to a (?transcriptional) response. Reference for the PDGF-beta: Langman et al. Over-expression of platelet-derived growth factor in human diabetic nephropathy. Nephrol Dial Transplant (2003) 18: 1392–1396 http://ndt.oxfordjournals.org/content/18/7/1392.full.pdf
  10. The fold change view makes some strong assumptions about our ability to quantify relative changes in miRNA expression. This is a rather strong assumption e.g equivalent to assuming the same efficiency in a qPCR experiment. In particular it assumes that absolute quantification is indeed possible (e.g. platforms are like car speedometers so the acceleration (difference in velocity) in mph/sec from one are directly comparable to the same reading given by another speedometer). Hence microRNA changes are comparable within experiments (for different miRNAs) and between experiment (for the same miRNA). The statistical errors implicit in this view are of Type I (calling a signal different from zero when it is fact is not) and Type II (calling a signal =0 when it is not). The pDE view only admits the possibility of a relative, relative quantification i.e. we can only infer directions of changes but not their absolute magnitude. When comparing data between and within platforms, only the direction of the change is important. This is a weaker assumption that the one made by the adoptees of the FC view. To make the difference explicit consider the Delta-Ct values in PCR; unless the probes have the same efficiency (so that we can convert Delta-Cts to FC), the only thing we can say is that a larger Delta-Ct corresponds to a larger change in expression than a smaller one. It follows that the type of errors in this view are type M (larger v.s. smaller) and S (calling an over-expressed miRNA under-expressed and vice versa). Though one may consider the pDE limiting, it is in fact
  11. In this section we will examine a cautious story for a test widely used to monitor anti-coagulation. This test (called INR: International Normalized Ratio) came about as an attempt to standardize measurements of the Prothrombin Time, a test used to monitor the vitamin K dependent coagulation factors. The PT was a test developed initially as a way to study the coagulation pathways (“research tool”) and in particular post-operative thrombotic events and bleeding, hemophilia and liver disease
  12. Normalization for the INR is based on subtracting (in logarithmic scale) the geometric mean of a “normal” sample and then scaling the result with a “fudge” factor that represents the technical variability/bias of the assay used. This factor (ISI) is determined by calibration against an adopted standard maintained by the World Health Organization. ISI derived from calibrator sets with known (certified) INRs and normal plasma The formula and the process is directly analogous with various normalization approaches that have been applied to expression profiles so far Despite the rigorous, international effort for the calibration assaying the same sample by different methods (lines in the graph) still yield different values. Considering the “hard” thresholds and tight ranges required in clinical practice, this performance is not adequate necessitating frequent measurements to ensure that the patient is maintained within range There are many sources of variation which can be investigated with repeated measurements of the same individual. For the INR, a normalization procedure developed 30 years ago for a test widely used for the last 80 years!, used in CLIA-certified environments the analytical imprecision is still of the same magnitude as between and within individual sources of variability