SlideShare a Scribd company logo
1 of 124
Download to read offline
Robust Biomarker Selection from RT-qPCR Data
using Statistical Consensus Criteria
Jodi Lapidus OHSU Biostatistics & Design Program, Director
Jack Wiedrick OHSU Biostatistics & Design Program, Staff Biostatistician
NIH/NCATS ExRNA Seminar Series, 07-Dec-2017
BACKGROUND
• Are micro-RNAs (miRNAs) expressed in human
body fluids, and if so, can they be used to
distinguish healthy from diseased patients?
• Quantification of miRNA expression can be done
using off-the-shelf quantitative reverse transcription
polymerase chain reaction (RT-qPCR) arrays
• NIH funding divides biomarker experiments into
UH2 (discovery) and UH3 (validation) phases
BACKGROUND
• UH2
Discover miRNA-based biomarkers in human body fluids
Experimentation to identify plausible and robust
candidates from a large panel of potential markers
Need to select a small set of markers while retaining and
prioritizing candidates with promising clinical utility
• UH3
Validation using independent sample sets
 Targeted testing on the reduced list of candidates
 Elucidate associations with clinical characteristics
BACKGROUND
• UH2
Discover miRNA-based biomarkers in human body fluids
Experimentation to identify plausible and robust
candidates from a large panel of potential markers
Need to select a small set of markers while retaining and
prioritizing candidates with promising clinical utility
• UH3
Validation using independent sample sets
 Targeted testing on the reduced list of candidates
 Elucidate associations with clinical characteristics
What do we mean by this?
Markers are robust if they’re
not sensitive to irrelevant
features in the samples
BACKGROUND
• UH2
Discover miRNA-based biomarkers in human body fluids
Experimentation to identify plausible and robust
candidates from a large panel of potential markers
Need to select a small set of markers while retaining and
prioritizing candidates with promising clinical utility
• UH3
Validation using independent sample sets
 Targeted testing on the reduced list of candidates
 Elucidate associations with clinical characteristics
This is the real problem.
We need to make selection
decisions based on small
numbers of samples, where
irrelevant features will often
be very prominent
A MOTIVATING SCENARIO
• Two clinical populations: Alzheimer's Disease (AD)
patients versus age/sex-matched non-AD controls
 Are there biofluid markers that distinguish them?
 If so, how do we discover which ones?
A MOTIVATING SCENARIO
• Two clinical populations: Alzheimer's Disease (AD)
patients versus age/sex-matched non-AD controls
 Are there biofluid markers that distinguish them?
 If so, how do we discover which ones?
Measure all samples using a
standard RT-qPCR panel
assay and look for miRNAs
showing group differences
A MOTIVATING SCENARIO
PHASE 1: DISCOVERY PHASE 2: VERIFICATION/VALIDATION
X
Large set of candidate markers
(Which ones are promising?
How do they seem to perform?)
Small set of promising markers
(Do they predict as well as hoped?
Are they feasible for screening?)
A MOTIVATING SCENARIO
1 sample x 377 probes/card x 2 cards
TaqMan® TLDA Cards for miRNA
3 internal standards per
card are used to align
the cards for a subject
x #{subjects}
CONSIDERATIONS
1.Only one measured value
per probe per sample
2.Some yield no value…
why? (No expression?
Really low expression?
Assay failure?)
3.Some are untrustworthy
(weak amplification, etc)
4.Many probes but most are
likely to be unimportant
CONSIDERATIONS
1.Only one measured value
per probe per sample
2.Some yield no value…
why? (No expression?
Really low expression?
Assay failure?)
3.Some are untrustworthy
(weak amplification, etc)
4.Many probes but most are
likely to be unimportant
A MOTIVATING SCENARIO
A given experiment may vary on some or all
of these considerations, but…
the statistical pipeline we describe can
be applied to any RT-qPCR experiment
designed to select biomarkers.
THE PROBLEM
• We want a robust selection methodology
THE PROBLEM
• We want a robust selection methodology that:
 emphasizes the predictors that matter and discounts the
ones that don't
THE PROBLEM
• We want a robust selection methodology that:
 emphasizes the predictors that matter and discounts the
ones that don't
 characterizes the associations within the larger context of
uncertainty about prediction model validity
THE PROBLEM
• We want a robust selection methodology that:
 emphasizes the predictors that matter and discounts the
ones that don't
 characterizes the associations within the larger context of
uncertainty about prediction model validity
 generates realistic and testable expectations for how well
the predictors can actually predict
THE PROBLEM
• We want a robust selection methodology that:
 emphasizes the predictors that matter and discounts the
ones that don't
 characterizes the associations within the larger context of
uncertainty about prediction model validity
 generates realistic and testable expectations for how well
the predictors can actually predict
Our selection pipeline uses statistical
consensus methodology to mitigate the
risk of false discoveries by focusing
attention on reliable and robust signaling
WHY DO WE NEED A "ROBUST" PIPELINE?
• Standard methods (e.g. p-values from t-tests) give
good answers, but to the wrong questions
WHY DO WE NEED A "ROBUST" PIPELINE?
• Standard methods (e.g. p-values from t-tests) give
good answers, but to the wrong questions
 "statistically significant" doesn't mean "predictive"
p < 0.01
WHY DO WE NEED A "ROBUST" PIPELINE?
• Standard methods (e.g. p-values from t-tests) give
good answers, but to the wrong questions
 "statistically significant" doesn't mean "predictive"
p < 0.01
Are the means significantly different?
Sure.
If you take a single measurement,
can you guess whether it's a case?
No.
WHY DO WE NEED A "ROBUST" PIPELINE?
• Standard methods (e.g. p-values from t-tests) give
good answers, but to the wrong questions
 a t-test assumes the values are correct, but some aren't
CENSORING
WHY DO WE NEED A "ROBUST" PIPELINE?
• Standard methods (e.g. p-values from t-tests) give
good answers, but to the wrong questions
 a t-test assumes the values are correct, but some aren't
At some point we stop counting
cycles because noise dominates
the signal at high resolutions
CENSORING
WHY DO WE NEED A "ROBUST" PIPELINE?
• Standard methods (e.g. p-values from t-tests) give
good answers, but to the wrong questions
 a t-test assumes the values are correct, but some aren't
If amplification hasn't
occurred yet, there may
still be expression, but
all we know about the
cycle time is that it has
to be longer than the
maximum number of
cycles we attempted
CENSORING
WHY DO WE NEED A "ROBUST" PIPELINE?
• Standard methods (e.g. p-values from t-tests) give
good answers, but to the wrong questions
 one-at-a-time tests can miss important parts of the story
miRNA#2
INTERACTION
(miRNA#2 is strongly correlated
with the outcome, but also
correlated with miRNA#1)
miRNA#2
WHY DO WE NEED A "ROBUST" PIPELINE?
• Standard methods (e.g. p-values from t-tests) give
good answers, but to the wrong questions
 one-at-a-time tests can miss important parts of the story
Ignoring miRNA#2 leads to
the conclusion that
miRNA#1 is uninformative
about the outcome
(or maybe just a hint of
negative association)
INTERACTION
WHY DO WE NEED A "ROBUST" PIPELINE?
• Standard methods (e.g. p-values from t-tests) give
good answers, but to the wrong questions
 one-at-a-time tests can miss important parts of the story
But at any given level of miRNA#2,
increased miRNA#1 is linked to a
significant increase in the outcome
miRNA#2
INTERACTION
STEPS FOR ROBUST SELECTION
1. VISUALIZE RAW DATA — be on the lookout for batch artifacts
and process noise and filter appropriately
2. NORMALIZE & TRANSFORM — encode sources of technical
noise and model their effects before beginning selection
3. FILTER UNSUITABLE TARGETS — if they don't assay well on
the technology, we can't use them as biomarkers anyway
4. SELECT USING MULTIPLE STATISTICAL METHODS —
different looks give a robust assessment of biomarker validity
5. CROSS-VALIDATE AND RANK — get expectations for
independent validation and prioritize markers accordingly
6. VALIDATE! — verify that the markers behave as expected in an
independent sample set and look for covariate influences
STEPS FOR ROBUST SELECTION
1. VISUALIZE RAW DATA — be on the lookout for batch artifacts
and process noise and filter appropriately
2. NORMALIZE & TRANSFORM — encode sources of technical
noise and model their effects before beginning selection
3. FILTER UNSUITABLE TARGETS — if they don't assay well on
the technology, we can't use them as biomarkers anyway
4. SELECT USING MULTIPLE STATISTICAL METHODS —
different looks give a robust assessment of biomarker validity
5. CROSS-VALIDATE AND RANK — get expectations for
independent validation and prioritize markers accordingly
6. VALIDATE! — verify that the markers behave as expected in an
independent sample set and look for covariate influences
UH2
STEPS FOR ROBUST SELECTION
1. VISUALIZE RAW DATA — be on the lookout for batch artifacts
and process noise and filter appropriately
2. NORMALIZE & TRANSFORM — encode sources of technical
noise and model their effects before beginning selection
3. FILTER UNSUITABLE TARGETS — if they don't assay well on
the technology, we can't use them as biomarkers anyway
4. SELECT USING MULTIPLE STATISTICAL METHODS —
different looks give a robust assessment of biomarker validity
5. CROSS-VALIDATE AND RANK — get expectations for
independent validation and prioritize markers accordingly
6. VALIDATE! — verify that the markers behave as expected in an
independent sample set and look for covariate influences
UH3
1. VISUALIZE RAW DATA
• Find process heterogeneities and failures
Be on the lookout for batch artifacts and process noise
1. VISUALIZE RAW DATA
• Find process heterogeneities and failures
• Remove candidates with poor assay performance
Be on the lookout for batch artifacts and process noise
1. VISUALIZE RAW DATA
• Find process heterogeneities and failures
• Remove candidates with poor assay performance
Be on the lookout for batch artifacts and process noise
Our UH2 study considered 754
candidate miRNAs, and 343 (45%)
of those targets could be excluded
on assay quality grounds alone
1. VISUALIZE RAW DATA
• Find process heterogeneities and failures
• Remove candidates with poor assay performance
• Determine assay quality/detection limits
(e.g. cycle time censoring threshold)
Be on the lookout for batch artifacts and process noise
2. NORMALIZE & TRANSFORM
• Negative controls should be uniform
Encode sources of technical noise and model their effects
2. NORMALIZE & TRANSFORM
• Negative controls should be uniform within:
 processing batch (e.g. reagent lot)
Encode sources of technical noise and model their effects
2. NORMALIZE & TRANSFORM
• Negative controls should be uniform within:
 processing batch (e.g. reagent lot)
 measurement batch (e.g. assay plate)
Encode sources of technical noise and model their effects
2. NORMALIZE & TRANSFORM
• Negative controls should be uniform within:
 processing batch (e.g. reagent lot)
 measurement batch (e.g. assay plate)
 fixed instrument settings
Encode sources of technical noise and model their effects
2. NORMALIZE & TRANSFORM
• Negative controls should be uniform within:
 processing batch (e.g. reagent lot)
 measurement batch (e.g. assay plate)
 fixed instrument settings
• Model and remove these effects if they differ
Encode sources of technical noise and model their effects
U6 U6
2. NORMALIZE & TRANSFORM
• Summarize replicates by median (robust center)
Encode sources of technical noise and model their effects
2. NORMALIZE & TRANSFORM
• Summarize replicates by median (robust center)
• Transform cycle times to an expression scale:
Encode sources of technical noise and model their effects
2. NORMALIZE & TRANSFORM
• Summarize replicates by median (robust center)
• Transform cycle times to an expression scale:
 higher numbers mean more expression
Encode sources of technical noise and model their effects
2. NORMALIZE & TRANSFORM
• Summarize replicates by median (robust center)
• Transform cycle times to an expression scale:
 higher numbers mean more expression
 censored values become 0
Encode sources of technical noise and model their effects
2. NORMALIZE & TRANSFORM
• Summarize replicates by median (robust center)
• Transform cycle times to an expression scale:
Encode sources of technical noise and model their effects
Low cycle time values
on this axis…
2. NORMALIZE & TRANSFORM
• Summarize replicates by median (robust center)
• Transform cycle times to an expression scale:
Encode sources of technical noise and model their effects
…map to high expression
values on this axis
Low cycle time values
on this axis…
2. NORMALIZE & TRANSFORM
• Summarize replicates by median (robust center)
• Transform cycle times to an expression scale:
Encode sources of technical noise and model their effects
…map to high expression
values on this axis
Low cycle time values
on this axis…
censored
values here
3. FILTER UNSUITABLE TARGETS
• Targets should be reasonably attested
If they don't assay well on the technology, we can't use them
3. FILTER UNSUITABLE TARGETS
• Targets should be reasonably attested
If they don't assay well on the technology, we can't use them
75% censoring with
1:1 case:control ratio
means specificity can
never exceed 50%
3. FILTER UNSUITABLE TARGETS
• Targets should be reasonably attested
• Cycle time accuracy should be mostly high
If they don't assay well on the technology, we can't use them
3. FILTER UNSUITABLE TARGETS
• Targets should be reasonably attested
• Cycle time accuracy should be mostly high
If they don't assay well on the technology, we can't use them
Otherwise rankings
become unreliable
because the cycle
times are unreliable
3. FILTER UNSUITABLE TARGETS
• Targets should be reasonably attested
• Cycle time accuracy should be mostly high
• Censoring should be unrelated to accuracy
If they don't assay well on the technology, we can't use them
3. FILTER UNSUITABLE TARGETS
• Targets should be reasonably attested
• Cycle time accuracy should be mostly high
• Censoring should be unrelated to accuracy
If they don't assay well on the technology, we can't use them
A correlation here would mean
that measurement error is
blurring the distinction between
'expressed' and 'not expressed'
3. FILTER UNSUITABLE TARGETS
• Targets should be reasonably attested
• Cycle time accuracy should be mostly high
• Censoring should be unrelated to accuracy
If they don't assay well on the technology, we can't use them
In our UH2 study, out of 411
well-measured targets we
were able to filter 260 (63%)
as unlikely to be viable
biomarkers in the technology
4. USE MULTIPLE STATISTICAL METHODS
• Different tests offer different views of classification
Different looks give a robust assessment of validity
4. USE MULTIPLE STATISTICAL METHODS
• Different tests offer different views of classification
 Questions for individual markers
Different looks give a robust assessment of validity
4. USE MULTIPLE STATISTICAL METHODS
• Different tests offer different views of classification
 Questions for individual markers:
o Are cycle time counts equal? (LOG-RANK TEST)
Different looks give a robust assessment of validity
strong expression-disease association
4. USE MULTIPLE STATISTICAL METHODS
• Different tests offer different views of classification
 Questions for individual markers:
o Are cycle time counts equal? (LOG-RANK TEST)
Different looks give a robust assessment of validity
strong expression-disease association
Log-rank tests properly
account for censoring
in the cycle times
4. USE MULTIPLE STATISTICAL METHODS
• Different tests offer different views of classification
 Questions for individual markers:
o Do cycle times cluster by group? (ROC ANALYSIS)
Different looks give a robust assessment of validity
large group separation in expression
4. USE MULTIPLE STATISTICAL METHODS
• Different tests offer different views of classification
 Questions for individual markers:
o Do cycle times cluster by group? (ROC ANALYSIS)
Different looks give a robust assessment of validity
large group separation in expression
ROC analysis is designed
to compare entire
distributions of values
4. USE MULTIPLE STATISTICAL METHODS
• Different tests offer different views of classification
 Questions for groups of markers
Different looks give a robust assessment of validity
4. USE MULTIPLE STATISTICAL METHODS
• Different tests offer different views of classification
 Questions for groups of markers:
o Do target signals overlap? (RANDOM FOREST)
Different looks give a robust assessment of validity
robust classification across many random trees
4. USE MULTIPLE STATISTICAL METHODS
• Different tests offer different views of classification
 Questions for groups of markers:
o Do target signals overlap? (RANDOM FOREST)
Different looks give a robust assessment of validity
robust classification across many random trees
Random forests can
capture complex cross-
marker interactions
4. USE MULTIPLE STATISTICAL METHODS
• Different tests offer different views of classification
 Questions for groups of markers:
o Do signals transcend models? (ALTERNATE CLASSIFIERS)
Different looks give a robust assessment of validity
4. USE MULTIPLE STATISTICAL METHODS
• Different tests offer different views of classification
 Questions for groups of markers:
o Do signals transcend models? (ALTERNATE CLASSIFIERS)
Different looks give a robust assessment of validity
More than one way to grow a random forest
A random forest is resampling-based aggregate of decision
trees that attempts to average over all the many possible
trees that could be formed from a set of predictor variables.
But the decision trees could use different rules, e.g.:
•CART (Classification And Regression Trees)
•CFOREST (Conditional inference tree FORESTs)
•CHAID (CHi-squared Automatic Interaction Detection)
•BOOST (BOOSTed classification trees)
All are kinds of random forests, but their component trees
decide differently.
4. USE MULTIPLE STATISTICAL METHODS
• Different tests offer different views of classification
• Consensus in selection suggests signal validity
Different looks give a robust assessment of validity
selected combinations of markers classify well
4. USE MULTIPLE STATISTICAL METHODS
• Different tests offer different views of classification
• Consensus in selection suggests signal validity
Different looks give a robust assessment of validity
selected combinations of markers classify well
In our UH2 study, we were
able to reduce a set of
hundreds of candidate
miRNAs down to just a few
dozen demonstrating good
classification performance.
5. CROSS-VALIDATE AND RANK
• Use multiple imputation to fill in missing data and
simulate population-plausible datasets
Get expectations for independent validation and prioritize accordingly
5. CROSS-VALIDATE AND RANK
• Use multiple imputation to fill in missing data and
simulate population-plausible datasets
• Leave-one-out resampling gives estimates of
prediction ability in a new independent cohort
Get expectations for independent validation and prioritize accordingly
5. CROSS-VALIDATE AND RANK
• Use multiple imputation to fill in missing data and
simulate population-plausible datasets
• Leave-one-out resampling gives estimates of
prediction ability in a new independent cohort
Get expectations for independent validation and prioritize accordingly
Take the average of
a bunch of informed
guesses…
5. CROSS-VALIDATE AND RANK
• Use multiple imputation to fill in missing data and
simulate population-plausible datasets
• Leave-one-out resampling gives estimates of
prediction ability in a new independent cohort
Get expectations for independent validation and prioritize accordingly
Take the average of
a bunch of informed
guesses…
…and make an
informed guess about
future performance
5. CROSS-VALIDATE AND RANK
• Assess multimarker classification performance in all
possible groupings of top candidates
Get expectations for independent validation and prioritize accordingly
5. CROSS-VALIDATE AND RANK
• Assess multimarker classification performance in all
possible groupings of top candidates
 Bayesian model averaging accounts for uncertainty about
"the right model”
Get expectations for independent validation and prioritize accordingly
5. CROSS-VALIDATE AND RANK
• Assess multimarker classification performance in all
possible groupings of top candidates
 Bayesian model averaging accounts for uncertainty about
"the right model"
 targets ranked by frequency
of inclusion in models,
weighted by goodness of fit
Get expectations for independent validation and prioritize accordingly
5. CROSS-VALIDATE AND RANK
• Assess multimarker classification performance in all
possible groupings of top candidates:
 Bayesian model averaging accounts for uncertainty about
"the right model"
 targets ranked by frequency
of inclusion in models,
weighted by goodness of fit
 compare to any existing
biomarkers and look for
independent signaling
Get expectations for independent validation and prioritize accordingly
6. VALIDATE!
Verify that the markers behave as expected in an independent sample set
• Before starting the validation, compare the shape of
response distributions in the old and new cohorts
6. VALIDATE!
Verify that the markers behave as expected in an independent sample set
• Before starting the validation, compare the shape of
response distributions in the old and new cohorts
 Differences in skew (whether the distribution leans one
way or the other) and kurtosis (how centered vs diffuse
the distribution is) could indicate poorly matched cohorts
6. VALIDATE!
Verify that the markers behave as expected in an independent sample set
• Before starting the validation, compare the shape of
response distributions in the old and new cohorts
 Differences in skew (whether the distribution leans one
way or the other) and kurtosis (how centered vs diffuse
the distribution is) could indicate poorly matched cohorts
The marker means
approximately line
up in both cohorts
6. VALIDATE!
Verify that the markers behave as expected in an independent sample set
• Before starting the validation, compare the shape of
response distributions in the old and new cohorts
 Differences in skew (whether the distribution leans one
way or the other) and kurtosis (how centered vs diffuse
the distribution is) could indicate poorly matched cohorts
But some markers in
the old cohort fell on the
extreme edges of the
distribution in the new
cohort — evidence of
potentially large skew in
the discovery cohort
6. VALIDATE!
Verify that the markers behave as expected in an independent sample set
• Before starting the validation, compare the shape of
response distributions in the old and new cohorts
 Differences in skew (whether the distribution leans one
way or the other) and kurtosis (how centered vs diffuse
the distribution is) could indicate poorly matched cohorts
Central thinness in some of the
distributions is an indication of
low kurtosis — pointing to a
possible admixture of dissimilar
subjects in the population
sampled by the new cohort
6. VALIDATE!
Verify that the markers behave as expected in an independent sample set
• Verify that marker relevance assumptions hold
6. VALIDATE!
Verify that the markers behave as expected in an independent sample set
• Verify that marker relevance assumptions hold
These were chosen
as nondiscriminating
miRNAs…are they?
6. VALIDATE!
Verify that the markers behave as expected in an independent sample set
• Verify that marker relevance assumptions hold
6. VALIDATE!
Verify that the markers behave as expected in an independent sample set
• Verify that marker relevance assumptions hold
These were chosen
as discriminating
miRNAs…are they?
6. VALIDATE!
Verify that the markers behave as expected in an independent sample set
• Verify that marker relevance assumptions hold
6. VALIDATE!
Verify that the markers behave as expected in an independent sample set
• Look for discrepancies in classification performance
patterns between the cohorts
6. VALIDATE!
Verify that the markers behave as expected in an independent sample set
• Look for discrepancies in classification performance
patterns between the cohorts
6. VALIDATE!
Verify that the markers behave as expected in an independent sample set
• Look for discrepancies in classification performance
patterns between the cohorts
Based on UH2 patterns, we
expect the miRNA-only curve
to pass through this region
6. VALIDATE!
Verify that the markers behave as expected in an independent sample set
• Look for discrepancies in classification performance
patterns between the cohorts
6. VALIDATE!
Verify that the markers behave as expected in an independent sample set
• Look for discrepancies in classification performance
patterns between the cohorts
6. VALIDATE!
Verify that the markers behave as expected in an independent sample set
• Look for discrepancies in classification performance
patterns between the cohortsHappily, the miRNA-only curve
behaves exactly as expected
6. VALIDATE!
Verify that the markers behave as expected in an independent sample set
• Look for discrepancies in classification performance
patterns between the cohorts
Similarly, we expect the bump
in performance from adding a
genetic marker to only kick in
at relatively low specificities
6. VALIDATE!
Verify that the markers behave as expected in an independent sample set
• Look for discrepancies in classification performance
patterns between the cohorts
6. VALIDATE!
Verify that the markers behave as expected in an independent sample set
• Look for discrepancies in classification performance
patterns between the cohorts
6. VALIDATE!
Verify that the markers behave as expected in an independent sample set
• Look for discrepancies in classification performance
patterns between the cohorts
Equally happily, the curve with the
genetic marker doesn't behave as
expected — it's even better
6. VALIDATE!
Verify that the markers behave as expected in an independent sample set
• Look for discrepancies in classification performance
patterns between the cohorts
Why would the genetic marker behave
differently in the two cohorts?
One explanation is that the discovery cohort
was less healthy — the symptoms were already
so strong that the genetic factor no longer
added much new information. When the
disease is less severe and cases are more
similar to controls, the genetic information
boosts sensitivity for the borderline cases.
These kinds of nuances can be very valuable
for deciding not only who the biomarker
screening should be applied to, but also when.
6. VALIDATE!
Verify that the markers behave as expected in an independent sample set
• Reprioritize markers based on how well they held
up as predictive in the new cohort
6. VALIDATE!
Verify that the markers behave as expected in an independent sample set
• Reprioritize markers based on how well they held
up as predictive in the new cohort
= low rank numbers for stronger markers
= middling rank numbers for mediocre markers
= high rank numbers for weaker markers
Apply several different methods of ranking
(i.e. "judges") to the set of markers —
these are the same statistical tests used
for biomarker discovery, but now the goal
is to prioritize rather than exclude
6. VALIDATE!
Verify that the markers behave as expected in an independent sample set
• Reprioritize markers based on how well they held
up as predictive in the new cohort
= low rank numbers for stronger markers
= middling rank numbers for mediocre markers
= high rank numbers for weaker markers
Note that one of the judges is the
ranking created in the discovery phase,
prior to seeing the validation data
6. VALIDATE!
Verify that the markers behave as expected in an independent sample set
• Reprioritize markers based on how well they held
up as predictive in the new cohort
= low rank numbers for stronger markers
= middling rank numbers for mediocre markers
= high rank numbers for weaker markers
Each judge independently
ranks the candidate markers
in order (1=best, 26=worst)
6. VALIDATE!
Verify that the markers behave as expected in an independent sample set
• Reprioritize markers based on how well they held
up as predictive in the new cohort
= low rank numbers for stronger markers
= middling rank numbers for mediocre markers
= high rank numbers for weaker markers
Then ranks for each marker
are summed across judges
6. VALIDATE!
Verify that the markers behave as expected in an independent sample set
• Reprioritize markers based on how well they held
up as predictive in the new cohort
= low rank numbers for stronger markers
= middling rank numbers for mediocre markers
= high rank numbers for weaker markersWe colorcode the table to
visually assess the
consistency of rankings
6. VALIDATE!
Verify that the markers behave as expected in an independent sample set
• Reprioritize markers based on how well they held
up as predictive in the new cohort
The rank sums define an
ordering of markers — this is
our consensus opinion
across evaluation methods
6. VALIDATE!
Verify that the markers behave as expected in an independent sample set
• Reprioritize markers based on how well they held
up as predictive in the new cohort
This means that rank sums
within 56 of each other
could be randomly assigned
with high probability, but
gaps larger than that are
likely to be qualitative
6. VALIDATE!
Verify that the markers behave as expected in an independent sample set
• Reprioritize markers based on how well they held
up as predictive in the new cohort
56 ranks 56 ranks
6. VALIDATE!
Verify that the markers behave as expected in an independent sample set
• Reprioritize markers based on how well they held
up as predictive in the new cohort
We could think of the rank
sum distribution as a
roughly even mixture of two
kinds of markers: "hot"
markers and "cool" markers
(where in the middle it's
hard to tell which is which)
6. VALIDATE!
Verify that the markers behave as expected in an independent sample set
• Reprioritize markers based on how well they held
up as predictive in the new cohort
"Hot"
markers
"Lukewarm"
markers
"Cool"
markers
6. VALIDATE!
Verify that the markers behave as expected in an independent sample set
• Seek internal validation of the marker prioritization
6. VALIDATE!
Verify that the markers behave as expected in an independent sample set
• Seek internal validation of the marker prioritization
 If the markers we think are important really are, then they
should contribute the most to multimarker models
6. VALIDATE!
Verify that the markers behave as expected in an independent sample set
• Seek internal validation of the marker prioritization
 If the markers we think are important really are, then they
should contribute the most to multimarker models
Nearly all possible
parsimonious models,
colorcoded by number of
markers in the model
6. VALIDATE!
Verify that the markers behave as expected in an independent sample set
• Seek internal validation of the marker prioritization
 If the markers we think are important really are, then they
should contribute the most to multimarker models
Best models are here —
high AUC (strong),
low AIC (parsimonious)
6. VALIDATE!
Verify that the markers behave as expected in an independent sample set
• Seek internal validation of the marker prioritization
 If the markers we think are important really are, then they
should contribute the most to multimarker models
Which markers contribute
most to the best models?
6. VALIDATE!
Verify that the markers behave as expected in an independent sample set
• Seek internal validation of the marker prioritization
 If the markers we think are important really are, then they
should contribute the most to multimarker models
The highest ranked ones!
6. VALIDATE!
Verify that the markers behave as expected in an independent sample set
• Assess the overall quality of group separation
6. VALIDATE!
Verify that the markers behave as expected in an independent sample set
• Assess the overall quality of group separation
We may never be able to
screen these kinds of
cases with our markers
6. VALIDATE!
Verify that the markers behave as expected in an independent sample set
• Assess the overall quality of group separation
But many of these could
be latent cases…!
6. VALIDATE!
Verify that the markers behave as expected in an independent sample set
• Evaluate performance ranges for model classes
6. VALIDATE!
Verify that the markers behave as expected in an independent sample set
• Evaluate performance ranges for model classes
These regions fare no
better than existing
clinical markers
6. VALIDATE!
Verify that the markers behave as expected in an independent sample set
• Evaluate performance ranges for model classes
High-performance
regions can only be
reached with a
sufficient number of
markers to allow
clear discrimination
6. VALIDATE!
Verify that the markers behave as expected in an independent sample set
• Look for differential performance of the classifiers
within clinically relevant covariate subgroups
6. VALIDATE!
Verify that the markers behave as expected in an independent sample set
• Look for differential performance of the classifiers
within clinically relevant covariate subgroups
 A clever way to do this is to cluster the subjects and then
examine covariates in the tightest clusters
6. VALIDATE!
Verify that the markers behave as expected in an independent sample set
• Look for differential performance of the classifiers
within clinically relevant covariate subgroups
 A clever way to do this is to cluster the subjects and then
examine covariates in the tightest clusters
The markers are more
sensitive in Cluster 2 than
in Cluster 1, and Cluster 2
also has more males
6. VALIDATE!
Verify that the markers behave as expected in an independent sample set
• Look for trending of discrimination performance
across covariate spectra
6. VALIDATE!
Verify that the markers behave as expected in an independent sample set
• Look for trending of discrimination performance
across covariate spectra
 The relationships may be marker-specific! Or nonlinear!
6. VALIDATE!
Verify that the markers behave as expected in an independent sample set
• Look for trending of discrimination performance
across covariate spectra
 The relationships may be marker-specific! Or nonlinear!
Higher marker ranks seem to
correlate with marker
associations to the covariate…
6. VALIDATE!
Verify that the markers behave as expected in an independent sample set
• Look for trending of discrimination performance
across covariate spectra
 The relationships may be marker-specific! Or nonlinear!
…but only for the strongest
markers. Weaker markers
don't show such a clear pattern
NEXT STEPS
Co-authors:
Theresa Lusardi
Jay Phillips
Jack Wiedrick
Chris Harrington
Babette Lind
Jodi Lapidus
Joe Quinn
Julie Saugstad
Vol.55, no.3, pp.1223-1233,
2017
“MicroRNAs in Human Cerebrospinal Fluid
as Biomarkers for Alzheimer’s Disease”
DOI: 10.3233/JAD-160835
• An abbreviated discussion
of this pipeline appeared in
our recent UH2 paper in
the Journal of Alzheimer's
Disease
• Publication of a follow-up
UH3 paper on validation
results is in progress
• Also currently working on a
standalone methods paper
– Robust Statistical Analysis
Pipeline for Selecting Promising
Biomarkers from RT-qPCR
Experiments
Wiedrick, Lusardi, Saugstad, Lapidus
THANKS FOR LISTENING!

More Related Content

Similar to Robust biomarker selection from RT-qPCR data using statistical consensus criteria

GIAB Integrating multiple technologies to form benchmark SVs 180517
GIAB Integrating multiple technologies to form benchmark SVs 180517GIAB Integrating multiple technologies to form benchmark SVs 180517
GIAB Integrating multiple technologies to form benchmark SVs 180517GenomeInABottle
 
GIAB update for GRC GIAB workshop 191015
GIAB update for GRC GIAB workshop 191015GIAB update for GRC GIAB workshop 191015
GIAB update for GRC GIAB workshop 191015GenomeInABottle
 
sience 2.0 : an illustration of good research practices in a real study
sience 2.0 : an illustration of good research practices in a real studysience 2.0 : an illustration of good research practices in a real study
sience 2.0 : an illustration of good research practices in a real studywolf vanpaemel
 
Basic Aspects of Microarray Technology and Data Analysis (UEB-UAT Bioinformat...
Basic Aspects of Microarray Technology and Data Analysis (UEB-UAT Bioinformat...Basic Aspects of Microarray Technology and Data Analysis (UEB-UAT Bioinformat...
Basic Aspects of Microarray Technology and Data Analysis (UEB-UAT Bioinformat...VHIR Vall d’Hebron Institut de Recerca
 
171017 giab for giab grc workshop
171017 giab for giab grc workshop171017 giab for giab grc workshop
171017 giab for giab grc workshopGenomeInABottle
 
150224 giab 30 min generic slides
150224 giab 30 min generic slides150224 giab 30 min generic slides
150224 giab 30 min generic slidesGenomeInABottle
 
Illustrating uncertainty in extrapolating evidence for cost-effectiveness mod...
Illustrating uncertainty in extrapolating evidence for cost-effectiveness mod...Illustrating uncertainty in extrapolating evidence for cost-effectiveness mod...
Illustrating uncertainty in extrapolating evidence for cost-effectiveness mod...cheweb1
 
Data Con LA 2019 - Best Practices for Prototyping Machine Learning Models for...
Data Con LA 2019 - Best Practices for Prototyping Machine Learning Models for...Data Con LA 2019 - Best Practices for Prototyping Machine Learning Models for...
Data Con LA 2019 - Best Practices for Prototyping Machine Learning Models for...Data Con LA
 
neuropredict: a proposal and a tool towards standardized and easy assessment ...
neuropredict: a proposal and a tool towards standardized and easy assessment ...neuropredict: a proposal and a tool towards standardized and easy assessment ...
neuropredict: a proposal and a tool towards standardized and easy assessment ...Pradeep Redddy Raamana
 
THE CONCEPT OF TRACEABILITY IN LABORATORY MEDICINE - A TOOL FOR STANDARDISATION
THE CONCEPT OF TRACEABILITY IN LABORATORY MEDICINE - A TOOL FOR STANDARDISATIONTHE CONCEPT OF TRACEABILITY IN LABORATORY MEDICINE - A TOOL FOR STANDARDISATION
THE CONCEPT OF TRACEABILITY IN LABORATORY MEDICINE - A TOOL FOR STANDARDISATIONMoustafa Rezk
 
randomized clinical trials II
randomized clinical trials IIrandomized clinical trials II
randomized clinical trials IIIAU Dent
 
Jan Hrabal: Evaluation of medical information quality #bcs2015
Jan Hrabal: Evaluation of medical information quality #bcs2015Jan Hrabal: Evaluation of medical information quality #bcs2015
Jan Hrabal: Evaluation of medical information quality #bcs2015KISK FF MU
 
Acmg secondary findings open forum 3 28-12 final
Acmg secondary findings open forum 3 28-12 finalAcmg secondary findings open forum 3 28-12 final
Acmg secondary findings open forum 3 28-12 finalerikanature
 
How to conduct a systematic review
How to conduct a systematic reviewHow to conduct a systematic review
How to conduct a systematic reviewDrNidhiPruthiShukla
 

Similar to Robust biomarker selection from RT-qPCR data using statistical consensus criteria (20)

GIAB Integrating multiple technologies to form benchmark SVs 180517
GIAB Integrating multiple technologies to form benchmark SVs 180517GIAB Integrating multiple technologies to form benchmark SVs 180517
GIAB Integrating multiple technologies to form benchmark SVs 180517
 
GIAB update for GRC GIAB workshop 191015
GIAB update for GRC GIAB workshop 191015GIAB update for GRC GIAB workshop 191015
GIAB update for GRC GIAB workshop 191015
 
sience 2.0 : an illustration of good research practices in a real study
sience 2.0 : an illustration of good research practices in a real studysience 2.0 : an illustration of good research practices in a real study
sience 2.0 : an illustration of good research practices in a real study
 
Basic Aspects of Microarray Technology and Data Analysis (UEB-UAT Bioinformat...
Basic Aspects of Microarray Technology and Data Analysis (UEB-UAT Bioinformat...Basic Aspects of Microarray Technology and Data Analysis (UEB-UAT Bioinformat...
Basic Aspects of Microarray Technology and Data Analysis (UEB-UAT Bioinformat...
 
171017 giab for giab grc workshop
171017 giab for giab grc workshop171017 giab for giab grc workshop
171017 giab for giab grc workshop
 
SYRCLE_Hooijmans mini symposium sr animal studies 30082012
SYRCLE_Hooijmans mini symposium sr animal studies 30082012SYRCLE_Hooijmans mini symposium sr animal studies 30082012
SYRCLE_Hooijmans mini symposium sr animal studies 30082012
 
171017 giab for giab grc workshop
171017 giab for giab grc workshop171017 giab for giab grc workshop
171017 giab for giab grc workshop
 
150224 giab 30 min generic slides
150224 giab 30 min generic slides150224 giab 30 min generic slides
150224 giab 30 min generic slides
 
Illustrating uncertainty in extrapolating evidence for cost-effectiveness mod...
Illustrating uncertainty in extrapolating evidence for cost-effectiveness mod...Illustrating uncertainty in extrapolating evidence for cost-effectiveness mod...
Illustrating uncertainty in extrapolating evidence for cost-effectiveness mod...
 
Evidence-based Dentistry
Evidence-based DentistryEvidence-based Dentistry
Evidence-based Dentistry
 
Data Con LA 2019 - Best Practices for Prototyping Machine Learning Models for...
Data Con LA 2019 - Best Practices for Prototyping Machine Learning Models for...Data Con LA 2019 - Best Practices for Prototyping Machine Learning Models for...
Data Con LA 2019 - Best Practices for Prototyping Machine Learning Models for...
 
Survey design
Survey designSurvey design
Survey design
 
neuropredict: a proposal and a tool towards standardized and easy assessment ...
neuropredict: a proposal and a tool towards standardized and easy assessment ...neuropredict: a proposal and a tool towards standardized and easy assessment ...
neuropredict: a proposal and a tool towards standardized and easy assessment ...
 
170326 giab abrf
170326 giab abrf170326 giab abrf
170326 giab abrf
 
THE CONCEPT OF TRACEABILITY IN LABORATORY MEDICINE - A TOOL FOR STANDARDISATION
THE CONCEPT OF TRACEABILITY IN LABORATORY MEDICINE - A TOOL FOR STANDARDISATIONTHE CONCEPT OF TRACEABILITY IN LABORATORY MEDICINE - A TOOL FOR STANDARDISATION
THE CONCEPT OF TRACEABILITY IN LABORATORY MEDICINE - A TOOL FOR STANDARDISATION
 
Lecture jr
Lecture jrLecture jr
Lecture jr
 
randomized clinical trials II
randomized clinical trials IIrandomized clinical trials II
randomized clinical trials II
 
Jan Hrabal: Evaluation of medical information quality #bcs2015
Jan Hrabal: Evaluation of medical information quality #bcs2015Jan Hrabal: Evaluation of medical information quality #bcs2015
Jan Hrabal: Evaluation of medical information quality #bcs2015
 
Acmg secondary findings open forum 3 28-12 final
Acmg secondary findings open forum 3 28-12 finalAcmg secondary findings open forum 3 28-12 final
Acmg secondary findings open forum 3 28-12 final
 
How to conduct a systematic review
How to conduct a systematic reviewHow to conduct a systematic review
How to conduct a systematic review
 

Recently uploaded

Chemical Tests; flame test, positive and negative ions test Edexcel Internati...
Chemical Tests; flame test, positive and negative ions test Edexcel Internati...Chemical Tests; flame test, positive and negative ions test Edexcel Internati...
Chemical Tests; flame test, positive and negative ions test Edexcel Internati...ssuser79fe74
 
COST ESTIMATION FOR A RESEARCH PROJECT.pptx
COST ESTIMATION FOR A RESEARCH PROJECT.pptxCOST ESTIMATION FOR A RESEARCH PROJECT.pptx
COST ESTIMATION FOR A RESEARCH PROJECT.pptxFarihaAbdulRasheed
 
SAMASTIPUR CALL GIRL 7857803690 LOW PRICE ESCORT SERVICE
SAMASTIPUR CALL GIRL 7857803690  LOW PRICE  ESCORT SERVICESAMASTIPUR CALL GIRL 7857803690  LOW PRICE  ESCORT SERVICE
SAMASTIPUR CALL GIRL 7857803690 LOW PRICE ESCORT SERVICEayushi9330
 
TEST BANK For Radiologic Science for Technologists, 12th Edition by Stewart C...
TEST BANK For Radiologic Science for Technologists, 12th Edition by Stewart C...TEST BANK For Radiologic Science for Technologists, 12th Edition by Stewart C...
TEST BANK For Radiologic Science for Technologists, 12th Edition by Stewart C...ssifa0344
 
Formation of low mass protostars and their circumstellar disks
Formation of low mass protostars and their circumstellar disksFormation of low mass protostars and their circumstellar disks
Formation of low mass protostars and their circumstellar disksSérgio Sacani
 
Pulmonary drug delivery system M.pharm -2nd sem P'ceutics
Pulmonary drug delivery system M.pharm -2nd sem P'ceuticsPulmonary drug delivery system M.pharm -2nd sem P'ceutics
Pulmonary drug delivery system M.pharm -2nd sem P'ceuticssakshisoni2385
 
Recombination DNA Technology (Nucleic Acid Hybridization )
Recombination DNA Technology (Nucleic Acid Hybridization )Recombination DNA Technology (Nucleic Acid Hybridization )
Recombination DNA Technology (Nucleic Acid Hybridization )aarthirajkumar25
 
Biopesticide (2).pptx .This slides helps to know the different types of biop...
Biopesticide (2).pptx  .This slides helps to know the different types of biop...Biopesticide (2).pptx  .This slides helps to know the different types of biop...
Biopesticide (2).pptx .This slides helps to know the different types of biop...RohitNehra6
 
Hire 💕 9907093804 Hooghly Call Girls Service Call Girls Agency
Hire 💕 9907093804 Hooghly Call Girls Service Call Girls AgencyHire 💕 9907093804 Hooghly Call Girls Service Call Girls Agency
Hire 💕 9907093804 Hooghly Call Girls Service Call Girls AgencySheetal Arora
 
Pests of cotton_Sucking_Pests_Dr.UPR.pdf
Pests of cotton_Sucking_Pests_Dr.UPR.pdfPests of cotton_Sucking_Pests_Dr.UPR.pdf
Pests of cotton_Sucking_Pests_Dr.UPR.pdfPirithiRaju
 
Stunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCR
Stunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCRStunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCR
Stunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCRDelhi Call girls
 
Seismic Method Estimate velocity from seismic data.pptx
Seismic Method Estimate velocity from seismic  data.pptxSeismic Method Estimate velocity from seismic  data.pptx
Seismic Method Estimate velocity from seismic data.pptxAlMamun560346
 
GBSN - Biochemistry (Unit 1)
GBSN - Biochemistry (Unit 1)GBSN - Biochemistry (Unit 1)
GBSN - Biochemistry (Unit 1)Areesha Ahmad
 
Biological Classification BioHack (3).pdf
Biological Classification BioHack (3).pdfBiological Classification BioHack (3).pdf
Biological Classification BioHack (3).pdfmuntazimhurra
 
Disentangling the origin of chemical differences using GHOST
Disentangling the origin of chemical differences using GHOSTDisentangling the origin of chemical differences using GHOST
Disentangling the origin of chemical differences using GHOSTSérgio Sacani
 
Recombinant DNA technology (Immunological screening)
Recombinant DNA technology (Immunological screening)Recombinant DNA technology (Immunological screening)
Recombinant DNA technology (Immunological screening)PraveenaKalaiselvan1
 
GUIDELINES ON SIMILAR BIOLOGICS Regulatory Requirements for Marketing Authori...
GUIDELINES ON SIMILAR BIOLOGICS Regulatory Requirements for Marketing Authori...GUIDELINES ON SIMILAR BIOLOGICS Regulatory Requirements for Marketing Authori...
GUIDELINES ON SIMILAR BIOLOGICS Regulatory Requirements for Marketing Authori...Lokesh Kothari
 
Botany 4th semester file By Sumit Kumar yadav.pdf
Botany 4th semester file By Sumit Kumar yadav.pdfBotany 4th semester file By Sumit Kumar yadav.pdf
Botany 4th semester file By Sumit Kumar yadav.pdfSumit Kumar yadav
 
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 bAsymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 bSérgio Sacani
 
Nanoparticles synthesis and characterization​ ​
Nanoparticles synthesis and characterization​  ​Nanoparticles synthesis and characterization​  ​
Nanoparticles synthesis and characterization​ ​kaibalyasahoo82800
 

Recently uploaded (20)

Chemical Tests; flame test, positive and negative ions test Edexcel Internati...
Chemical Tests; flame test, positive and negative ions test Edexcel Internati...Chemical Tests; flame test, positive and negative ions test Edexcel Internati...
Chemical Tests; flame test, positive and negative ions test Edexcel Internati...
 
COST ESTIMATION FOR A RESEARCH PROJECT.pptx
COST ESTIMATION FOR A RESEARCH PROJECT.pptxCOST ESTIMATION FOR A RESEARCH PROJECT.pptx
COST ESTIMATION FOR A RESEARCH PROJECT.pptx
 
SAMASTIPUR CALL GIRL 7857803690 LOW PRICE ESCORT SERVICE
SAMASTIPUR CALL GIRL 7857803690  LOW PRICE  ESCORT SERVICESAMASTIPUR CALL GIRL 7857803690  LOW PRICE  ESCORT SERVICE
SAMASTIPUR CALL GIRL 7857803690 LOW PRICE ESCORT SERVICE
 
TEST BANK For Radiologic Science for Technologists, 12th Edition by Stewart C...
TEST BANK For Radiologic Science for Technologists, 12th Edition by Stewart C...TEST BANK For Radiologic Science for Technologists, 12th Edition by Stewart C...
TEST BANK For Radiologic Science for Technologists, 12th Edition by Stewart C...
 
Formation of low mass protostars and their circumstellar disks
Formation of low mass protostars and their circumstellar disksFormation of low mass protostars and their circumstellar disks
Formation of low mass protostars and their circumstellar disks
 
Pulmonary drug delivery system M.pharm -2nd sem P'ceutics
Pulmonary drug delivery system M.pharm -2nd sem P'ceuticsPulmonary drug delivery system M.pharm -2nd sem P'ceutics
Pulmonary drug delivery system M.pharm -2nd sem P'ceutics
 
Recombination DNA Technology (Nucleic Acid Hybridization )
Recombination DNA Technology (Nucleic Acid Hybridization )Recombination DNA Technology (Nucleic Acid Hybridization )
Recombination DNA Technology (Nucleic Acid Hybridization )
 
Biopesticide (2).pptx .This slides helps to know the different types of biop...
Biopesticide (2).pptx  .This slides helps to know the different types of biop...Biopesticide (2).pptx  .This slides helps to know the different types of biop...
Biopesticide (2).pptx .This slides helps to know the different types of biop...
 
Hire 💕 9907093804 Hooghly Call Girls Service Call Girls Agency
Hire 💕 9907093804 Hooghly Call Girls Service Call Girls AgencyHire 💕 9907093804 Hooghly Call Girls Service Call Girls Agency
Hire 💕 9907093804 Hooghly Call Girls Service Call Girls Agency
 
Pests of cotton_Sucking_Pests_Dr.UPR.pdf
Pests of cotton_Sucking_Pests_Dr.UPR.pdfPests of cotton_Sucking_Pests_Dr.UPR.pdf
Pests of cotton_Sucking_Pests_Dr.UPR.pdf
 
Stunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCR
Stunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCRStunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCR
Stunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCR
 
Seismic Method Estimate velocity from seismic data.pptx
Seismic Method Estimate velocity from seismic  data.pptxSeismic Method Estimate velocity from seismic  data.pptx
Seismic Method Estimate velocity from seismic data.pptx
 
GBSN - Biochemistry (Unit 1)
GBSN - Biochemistry (Unit 1)GBSN - Biochemistry (Unit 1)
GBSN - Biochemistry (Unit 1)
 
Biological Classification BioHack (3).pdf
Biological Classification BioHack (3).pdfBiological Classification BioHack (3).pdf
Biological Classification BioHack (3).pdf
 
Disentangling the origin of chemical differences using GHOST
Disentangling the origin of chemical differences using GHOSTDisentangling the origin of chemical differences using GHOST
Disentangling the origin of chemical differences using GHOST
 
Recombinant DNA technology (Immunological screening)
Recombinant DNA technology (Immunological screening)Recombinant DNA technology (Immunological screening)
Recombinant DNA technology (Immunological screening)
 
GUIDELINES ON SIMILAR BIOLOGICS Regulatory Requirements for Marketing Authori...
GUIDELINES ON SIMILAR BIOLOGICS Regulatory Requirements for Marketing Authori...GUIDELINES ON SIMILAR BIOLOGICS Regulatory Requirements for Marketing Authori...
GUIDELINES ON SIMILAR BIOLOGICS Regulatory Requirements for Marketing Authori...
 
Botany 4th semester file By Sumit Kumar yadav.pdf
Botany 4th semester file By Sumit Kumar yadav.pdfBotany 4th semester file By Sumit Kumar yadav.pdf
Botany 4th semester file By Sumit Kumar yadav.pdf
 
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 bAsymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
 
Nanoparticles synthesis and characterization​ ​
Nanoparticles synthesis and characterization​  ​Nanoparticles synthesis and characterization​  ​
Nanoparticles synthesis and characterization​ ​
 

Robust biomarker selection from RT-qPCR data using statistical consensus criteria

  • 1. Robust Biomarker Selection from RT-qPCR Data using Statistical Consensus Criteria Jodi Lapidus OHSU Biostatistics & Design Program, Director Jack Wiedrick OHSU Biostatistics & Design Program, Staff Biostatistician NIH/NCATS ExRNA Seminar Series, 07-Dec-2017
  • 2. BACKGROUND • Are micro-RNAs (miRNAs) expressed in human body fluids, and if so, can they be used to distinguish healthy from diseased patients? • Quantification of miRNA expression can be done using off-the-shelf quantitative reverse transcription polymerase chain reaction (RT-qPCR) arrays • NIH funding divides biomarker experiments into UH2 (discovery) and UH3 (validation) phases
  • 3. BACKGROUND • UH2 Discover miRNA-based biomarkers in human body fluids Experimentation to identify plausible and robust candidates from a large panel of potential markers Need to select a small set of markers while retaining and prioritizing candidates with promising clinical utility • UH3 Validation using independent sample sets  Targeted testing on the reduced list of candidates  Elucidate associations with clinical characteristics
  • 4. BACKGROUND • UH2 Discover miRNA-based biomarkers in human body fluids Experimentation to identify plausible and robust candidates from a large panel of potential markers Need to select a small set of markers while retaining and prioritizing candidates with promising clinical utility • UH3 Validation using independent sample sets  Targeted testing on the reduced list of candidates  Elucidate associations with clinical characteristics What do we mean by this? Markers are robust if they’re not sensitive to irrelevant features in the samples
  • 5. BACKGROUND • UH2 Discover miRNA-based biomarkers in human body fluids Experimentation to identify plausible and robust candidates from a large panel of potential markers Need to select a small set of markers while retaining and prioritizing candidates with promising clinical utility • UH3 Validation using independent sample sets  Targeted testing on the reduced list of candidates  Elucidate associations with clinical characteristics This is the real problem. We need to make selection decisions based on small numbers of samples, where irrelevant features will often be very prominent
  • 6. A MOTIVATING SCENARIO • Two clinical populations: Alzheimer's Disease (AD) patients versus age/sex-matched non-AD controls  Are there biofluid markers that distinguish them?  If so, how do we discover which ones?
  • 7. A MOTIVATING SCENARIO • Two clinical populations: Alzheimer's Disease (AD) patients versus age/sex-matched non-AD controls  Are there biofluid markers that distinguish them?  If so, how do we discover which ones? Measure all samples using a standard RT-qPCR panel assay and look for miRNAs showing group differences
  • 8. A MOTIVATING SCENARIO PHASE 1: DISCOVERY PHASE 2: VERIFICATION/VALIDATION X Large set of candidate markers (Which ones are promising? How do they seem to perform?) Small set of promising markers (Do they predict as well as hoped? Are they feasible for screening?)
  • 9. A MOTIVATING SCENARIO 1 sample x 377 probes/card x 2 cards TaqMan® TLDA Cards for miRNA 3 internal standards per card are used to align the cards for a subject x #{subjects} CONSIDERATIONS 1.Only one measured value per probe per sample 2.Some yield no value… why? (No expression? Really low expression? Assay failure?) 3.Some are untrustworthy (weak amplification, etc) 4.Many probes but most are likely to be unimportant
  • 10. CONSIDERATIONS 1.Only one measured value per probe per sample 2.Some yield no value… why? (No expression? Really low expression? Assay failure?) 3.Some are untrustworthy (weak amplification, etc) 4.Many probes but most are likely to be unimportant A MOTIVATING SCENARIO A given experiment may vary on some or all of these considerations, but… the statistical pipeline we describe can be applied to any RT-qPCR experiment designed to select biomarkers.
  • 11. THE PROBLEM • We want a robust selection methodology
  • 12. THE PROBLEM • We want a robust selection methodology that:  emphasizes the predictors that matter and discounts the ones that don't
  • 13. THE PROBLEM • We want a robust selection methodology that:  emphasizes the predictors that matter and discounts the ones that don't  characterizes the associations within the larger context of uncertainty about prediction model validity
  • 14. THE PROBLEM • We want a robust selection methodology that:  emphasizes the predictors that matter and discounts the ones that don't  characterizes the associations within the larger context of uncertainty about prediction model validity  generates realistic and testable expectations for how well the predictors can actually predict
  • 15. THE PROBLEM • We want a robust selection methodology that:  emphasizes the predictors that matter and discounts the ones that don't  characterizes the associations within the larger context of uncertainty about prediction model validity  generates realistic and testable expectations for how well the predictors can actually predict Our selection pipeline uses statistical consensus methodology to mitigate the risk of false discoveries by focusing attention on reliable and robust signaling
  • 16. WHY DO WE NEED A "ROBUST" PIPELINE? • Standard methods (e.g. p-values from t-tests) give good answers, but to the wrong questions
  • 17. WHY DO WE NEED A "ROBUST" PIPELINE? • Standard methods (e.g. p-values from t-tests) give good answers, but to the wrong questions  "statistically significant" doesn't mean "predictive" p < 0.01
  • 18. WHY DO WE NEED A "ROBUST" PIPELINE? • Standard methods (e.g. p-values from t-tests) give good answers, but to the wrong questions  "statistically significant" doesn't mean "predictive" p < 0.01 Are the means significantly different? Sure. If you take a single measurement, can you guess whether it's a case? No.
  • 19. WHY DO WE NEED A "ROBUST" PIPELINE? • Standard methods (e.g. p-values from t-tests) give good answers, but to the wrong questions  a t-test assumes the values are correct, but some aren't CENSORING
  • 20. WHY DO WE NEED A "ROBUST" PIPELINE? • Standard methods (e.g. p-values from t-tests) give good answers, but to the wrong questions  a t-test assumes the values are correct, but some aren't At some point we stop counting cycles because noise dominates the signal at high resolutions CENSORING
  • 21. WHY DO WE NEED A "ROBUST" PIPELINE? • Standard methods (e.g. p-values from t-tests) give good answers, but to the wrong questions  a t-test assumes the values are correct, but some aren't If amplification hasn't occurred yet, there may still be expression, but all we know about the cycle time is that it has to be longer than the maximum number of cycles we attempted CENSORING
  • 22. WHY DO WE NEED A "ROBUST" PIPELINE? • Standard methods (e.g. p-values from t-tests) give good answers, but to the wrong questions  one-at-a-time tests can miss important parts of the story miRNA#2 INTERACTION (miRNA#2 is strongly correlated with the outcome, but also correlated with miRNA#1)
  • 23. miRNA#2 WHY DO WE NEED A "ROBUST" PIPELINE? • Standard methods (e.g. p-values from t-tests) give good answers, but to the wrong questions  one-at-a-time tests can miss important parts of the story Ignoring miRNA#2 leads to the conclusion that miRNA#1 is uninformative about the outcome (or maybe just a hint of negative association) INTERACTION
  • 24. WHY DO WE NEED A "ROBUST" PIPELINE? • Standard methods (e.g. p-values from t-tests) give good answers, but to the wrong questions  one-at-a-time tests can miss important parts of the story But at any given level of miRNA#2, increased miRNA#1 is linked to a significant increase in the outcome miRNA#2 INTERACTION
  • 25. STEPS FOR ROBUST SELECTION 1. VISUALIZE RAW DATA — be on the lookout for batch artifacts and process noise and filter appropriately 2. NORMALIZE & TRANSFORM — encode sources of technical noise and model their effects before beginning selection 3. FILTER UNSUITABLE TARGETS — if they don't assay well on the technology, we can't use them as biomarkers anyway 4. SELECT USING MULTIPLE STATISTICAL METHODS — different looks give a robust assessment of biomarker validity 5. CROSS-VALIDATE AND RANK — get expectations for independent validation and prioritize markers accordingly 6. VALIDATE! — verify that the markers behave as expected in an independent sample set and look for covariate influences
  • 26. STEPS FOR ROBUST SELECTION 1. VISUALIZE RAW DATA — be on the lookout for batch artifacts and process noise and filter appropriately 2. NORMALIZE & TRANSFORM — encode sources of technical noise and model their effects before beginning selection 3. FILTER UNSUITABLE TARGETS — if they don't assay well on the technology, we can't use them as biomarkers anyway 4. SELECT USING MULTIPLE STATISTICAL METHODS — different looks give a robust assessment of biomarker validity 5. CROSS-VALIDATE AND RANK — get expectations for independent validation and prioritize markers accordingly 6. VALIDATE! — verify that the markers behave as expected in an independent sample set and look for covariate influences UH2
  • 27. STEPS FOR ROBUST SELECTION 1. VISUALIZE RAW DATA — be on the lookout for batch artifacts and process noise and filter appropriately 2. NORMALIZE & TRANSFORM — encode sources of technical noise and model their effects before beginning selection 3. FILTER UNSUITABLE TARGETS — if they don't assay well on the technology, we can't use them as biomarkers anyway 4. SELECT USING MULTIPLE STATISTICAL METHODS — different looks give a robust assessment of biomarker validity 5. CROSS-VALIDATE AND RANK — get expectations for independent validation and prioritize markers accordingly 6. VALIDATE! — verify that the markers behave as expected in an independent sample set and look for covariate influences UH3
  • 28. 1. VISUALIZE RAW DATA • Find process heterogeneities and failures Be on the lookout for batch artifacts and process noise
  • 29. 1. VISUALIZE RAW DATA • Find process heterogeneities and failures • Remove candidates with poor assay performance Be on the lookout for batch artifacts and process noise
  • 30. 1. VISUALIZE RAW DATA • Find process heterogeneities and failures • Remove candidates with poor assay performance Be on the lookout for batch artifacts and process noise Our UH2 study considered 754 candidate miRNAs, and 343 (45%) of those targets could be excluded on assay quality grounds alone
  • 31. 1. VISUALIZE RAW DATA • Find process heterogeneities and failures • Remove candidates with poor assay performance • Determine assay quality/detection limits (e.g. cycle time censoring threshold) Be on the lookout for batch artifacts and process noise
  • 32. 2. NORMALIZE & TRANSFORM • Negative controls should be uniform Encode sources of technical noise and model their effects
  • 33. 2. NORMALIZE & TRANSFORM • Negative controls should be uniform within:  processing batch (e.g. reagent lot) Encode sources of technical noise and model their effects
  • 34. 2. NORMALIZE & TRANSFORM • Negative controls should be uniform within:  processing batch (e.g. reagent lot)  measurement batch (e.g. assay plate) Encode sources of technical noise and model their effects
  • 35. 2. NORMALIZE & TRANSFORM • Negative controls should be uniform within:  processing batch (e.g. reagent lot)  measurement batch (e.g. assay plate)  fixed instrument settings Encode sources of technical noise and model their effects
  • 36. 2. NORMALIZE & TRANSFORM • Negative controls should be uniform within:  processing batch (e.g. reagent lot)  measurement batch (e.g. assay plate)  fixed instrument settings • Model and remove these effects if they differ Encode sources of technical noise and model their effects U6 U6
  • 37. 2. NORMALIZE & TRANSFORM • Summarize replicates by median (robust center) Encode sources of technical noise and model their effects
  • 38. 2. NORMALIZE & TRANSFORM • Summarize replicates by median (robust center) • Transform cycle times to an expression scale: Encode sources of technical noise and model their effects
  • 39. 2. NORMALIZE & TRANSFORM • Summarize replicates by median (robust center) • Transform cycle times to an expression scale:  higher numbers mean more expression Encode sources of technical noise and model their effects
  • 40. 2. NORMALIZE & TRANSFORM • Summarize replicates by median (robust center) • Transform cycle times to an expression scale:  higher numbers mean more expression  censored values become 0 Encode sources of technical noise and model their effects
  • 41. 2. NORMALIZE & TRANSFORM • Summarize replicates by median (robust center) • Transform cycle times to an expression scale: Encode sources of technical noise and model their effects Low cycle time values on this axis…
  • 42. 2. NORMALIZE & TRANSFORM • Summarize replicates by median (robust center) • Transform cycle times to an expression scale: Encode sources of technical noise and model their effects …map to high expression values on this axis Low cycle time values on this axis…
  • 43. 2. NORMALIZE & TRANSFORM • Summarize replicates by median (robust center) • Transform cycle times to an expression scale: Encode sources of technical noise and model their effects …map to high expression values on this axis Low cycle time values on this axis… censored values here
  • 44. 3. FILTER UNSUITABLE TARGETS • Targets should be reasonably attested If they don't assay well on the technology, we can't use them
  • 45. 3. FILTER UNSUITABLE TARGETS • Targets should be reasonably attested If they don't assay well on the technology, we can't use them 75% censoring with 1:1 case:control ratio means specificity can never exceed 50%
  • 46. 3. FILTER UNSUITABLE TARGETS • Targets should be reasonably attested • Cycle time accuracy should be mostly high If they don't assay well on the technology, we can't use them
  • 47. 3. FILTER UNSUITABLE TARGETS • Targets should be reasonably attested • Cycle time accuracy should be mostly high If they don't assay well on the technology, we can't use them Otherwise rankings become unreliable because the cycle times are unreliable
  • 48. 3. FILTER UNSUITABLE TARGETS • Targets should be reasonably attested • Cycle time accuracy should be mostly high • Censoring should be unrelated to accuracy If they don't assay well on the technology, we can't use them
  • 49. 3. FILTER UNSUITABLE TARGETS • Targets should be reasonably attested • Cycle time accuracy should be mostly high • Censoring should be unrelated to accuracy If they don't assay well on the technology, we can't use them A correlation here would mean that measurement error is blurring the distinction between 'expressed' and 'not expressed'
  • 50. 3. FILTER UNSUITABLE TARGETS • Targets should be reasonably attested • Cycle time accuracy should be mostly high • Censoring should be unrelated to accuracy If they don't assay well on the technology, we can't use them In our UH2 study, out of 411 well-measured targets we were able to filter 260 (63%) as unlikely to be viable biomarkers in the technology
  • 51. 4. USE MULTIPLE STATISTICAL METHODS • Different tests offer different views of classification Different looks give a robust assessment of validity
  • 52. 4. USE MULTIPLE STATISTICAL METHODS • Different tests offer different views of classification  Questions for individual markers Different looks give a robust assessment of validity
  • 53. 4. USE MULTIPLE STATISTICAL METHODS • Different tests offer different views of classification  Questions for individual markers: o Are cycle time counts equal? (LOG-RANK TEST) Different looks give a robust assessment of validity strong expression-disease association
  • 54. 4. USE MULTIPLE STATISTICAL METHODS • Different tests offer different views of classification  Questions for individual markers: o Are cycle time counts equal? (LOG-RANK TEST) Different looks give a robust assessment of validity strong expression-disease association Log-rank tests properly account for censoring in the cycle times
  • 55. 4. USE MULTIPLE STATISTICAL METHODS • Different tests offer different views of classification  Questions for individual markers: o Do cycle times cluster by group? (ROC ANALYSIS) Different looks give a robust assessment of validity large group separation in expression
  • 56. 4. USE MULTIPLE STATISTICAL METHODS • Different tests offer different views of classification  Questions for individual markers: o Do cycle times cluster by group? (ROC ANALYSIS) Different looks give a robust assessment of validity large group separation in expression ROC analysis is designed to compare entire distributions of values
  • 57. 4. USE MULTIPLE STATISTICAL METHODS • Different tests offer different views of classification  Questions for groups of markers Different looks give a robust assessment of validity
  • 58. 4. USE MULTIPLE STATISTICAL METHODS • Different tests offer different views of classification  Questions for groups of markers: o Do target signals overlap? (RANDOM FOREST) Different looks give a robust assessment of validity robust classification across many random trees
  • 59. 4. USE MULTIPLE STATISTICAL METHODS • Different tests offer different views of classification  Questions for groups of markers: o Do target signals overlap? (RANDOM FOREST) Different looks give a robust assessment of validity robust classification across many random trees Random forests can capture complex cross- marker interactions
  • 60. 4. USE MULTIPLE STATISTICAL METHODS • Different tests offer different views of classification  Questions for groups of markers: o Do signals transcend models? (ALTERNATE CLASSIFIERS) Different looks give a robust assessment of validity
  • 61. 4. USE MULTIPLE STATISTICAL METHODS • Different tests offer different views of classification  Questions for groups of markers: o Do signals transcend models? (ALTERNATE CLASSIFIERS) Different looks give a robust assessment of validity More than one way to grow a random forest A random forest is resampling-based aggregate of decision trees that attempts to average over all the many possible trees that could be formed from a set of predictor variables. But the decision trees could use different rules, e.g.: •CART (Classification And Regression Trees) •CFOREST (Conditional inference tree FORESTs) •CHAID (CHi-squared Automatic Interaction Detection) •BOOST (BOOSTed classification trees) All are kinds of random forests, but their component trees decide differently.
  • 62. 4. USE MULTIPLE STATISTICAL METHODS • Different tests offer different views of classification • Consensus in selection suggests signal validity Different looks give a robust assessment of validity selected combinations of markers classify well
  • 63. 4. USE MULTIPLE STATISTICAL METHODS • Different tests offer different views of classification • Consensus in selection suggests signal validity Different looks give a robust assessment of validity selected combinations of markers classify well In our UH2 study, we were able to reduce a set of hundreds of candidate miRNAs down to just a few dozen demonstrating good classification performance.
  • 64. 5. CROSS-VALIDATE AND RANK • Use multiple imputation to fill in missing data and simulate population-plausible datasets Get expectations for independent validation and prioritize accordingly
  • 65. 5. CROSS-VALIDATE AND RANK • Use multiple imputation to fill in missing data and simulate population-plausible datasets • Leave-one-out resampling gives estimates of prediction ability in a new independent cohort Get expectations for independent validation and prioritize accordingly
  • 66. 5. CROSS-VALIDATE AND RANK • Use multiple imputation to fill in missing data and simulate population-plausible datasets • Leave-one-out resampling gives estimates of prediction ability in a new independent cohort Get expectations for independent validation and prioritize accordingly Take the average of a bunch of informed guesses…
  • 67. 5. CROSS-VALIDATE AND RANK • Use multiple imputation to fill in missing data and simulate population-plausible datasets • Leave-one-out resampling gives estimates of prediction ability in a new independent cohort Get expectations for independent validation and prioritize accordingly Take the average of a bunch of informed guesses… …and make an informed guess about future performance
  • 68. 5. CROSS-VALIDATE AND RANK • Assess multimarker classification performance in all possible groupings of top candidates Get expectations for independent validation and prioritize accordingly
  • 69. 5. CROSS-VALIDATE AND RANK • Assess multimarker classification performance in all possible groupings of top candidates  Bayesian model averaging accounts for uncertainty about "the right model” Get expectations for independent validation and prioritize accordingly
  • 70. 5. CROSS-VALIDATE AND RANK • Assess multimarker classification performance in all possible groupings of top candidates  Bayesian model averaging accounts for uncertainty about "the right model"  targets ranked by frequency of inclusion in models, weighted by goodness of fit Get expectations for independent validation and prioritize accordingly
  • 71. 5. CROSS-VALIDATE AND RANK • Assess multimarker classification performance in all possible groupings of top candidates:  Bayesian model averaging accounts for uncertainty about "the right model"  targets ranked by frequency of inclusion in models, weighted by goodness of fit  compare to any existing biomarkers and look for independent signaling Get expectations for independent validation and prioritize accordingly
  • 72. 6. VALIDATE! Verify that the markers behave as expected in an independent sample set • Before starting the validation, compare the shape of response distributions in the old and new cohorts
  • 73. 6. VALIDATE! Verify that the markers behave as expected in an independent sample set • Before starting the validation, compare the shape of response distributions in the old and new cohorts  Differences in skew (whether the distribution leans one way or the other) and kurtosis (how centered vs diffuse the distribution is) could indicate poorly matched cohorts
  • 74. 6. VALIDATE! Verify that the markers behave as expected in an independent sample set • Before starting the validation, compare the shape of response distributions in the old and new cohorts  Differences in skew (whether the distribution leans one way or the other) and kurtosis (how centered vs diffuse the distribution is) could indicate poorly matched cohorts The marker means approximately line up in both cohorts
  • 75. 6. VALIDATE! Verify that the markers behave as expected in an independent sample set • Before starting the validation, compare the shape of response distributions in the old and new cohorts  Differences in skew (whether the distribution leans one way or the other) and kurtosis (how centered vs diffuse the distribution is) could indicate poorly matched cohorts But some markers in the old cohort fell on the extreme edges of the distribution in the new cohort — evidence of potentially large skew in the discovery cohort
  • 76. 6. VALIDATE! Verify that the markers behave as expected in an independent sample set • Before starting the validation, compare the shape of response distributions in the old and new cohorts  Differences in skew (whether the distribution leans one way or the other) and kurtosis (how centered vs diffuse the distribution is) could indicate poorly matched cohorts Central thinness in some of the distributions is an indication of low kurtosis — pointing to a possible admixture of dissimilar subjects in the population sampled by the new cohort
  • 77. 6. VALIDATE! Verify that the markers behave as expected in an independent sample set • Verify that marker relevance assumptions hold
  • 78. 6. VALIDATE! Verify that the markers behave as expected in an independent sample set • Verify that marker relevance assumptions hold These were chosen as nondiscriminating miRNAs…are they?
  • 79. 6. VALIDATE! Verify that the markers behave as expected in an independent sample set • Verify that marker relevance assumptions hold
  • 80. 6. VALIDATE! Verify that the markers behave as expected in an independent sample set • Verify that marker relevance assumptions hold These were chosen as discriminating miRNAs…are they?
  • 81. 6. VALIDATE! Verify that the markers behave as expected in an independent sample set • Verify that marker relevance assumptions hold
  • 82. 6. VALIDATE! Verify that the markers behave as expected in an independent sample set • Look for discrepancies in classification performance patterns between the cohorts
  • 83. 6. VALIDATE! Verify that the markers behave as expected in an independent sample set • Look for discrepancies in classification performance patterns between the cohorts
  • 84. 6. VALIDATE! Verify that the markers behave as expected in an independent sample set • Look for discrepancies in classification performance patterns between the cohorts Based on UH2 patterns, we expect the miRNA-only curve to pass through this region
  • 85. 6. VALIDATE! Verify that the markers behave as expected in an independent sample set • Look for discrepancies in classification performance patterns between the cohorts
  • 86. 6. VALIDATE! Verify that the markers behave as expected in an independent sample set • Look for discrepancies in classification performance patterns between the cohorts
  • 87. 6. VALIDATE! Verify that the markers behave as expected in an independent sample set • Look for discrepancies in classification performance patterns between the cohortsHappily, the miRNA-only curve behaves exactly as expected
  • 88. 6. VALIDATE! Verify that the markers behave as expected in an independent sample set • Look for discrepancies in classification performance patterns between the cohorts Similarly, we expect the bump in performance from adding a genetic marker to only kick in at relatively low specificities
  • 89. 6. VALIDATE! Verify that the markers behave as expected in an independent sample set • Look for discrepancies in classification performance patterns between the cohorts
  • 90. 6. VALIDATE! Verify that the markers behave as expected in an independent sample set • Look for discrepancies in classification performance patterns between the cohorts
  • 91. 6. VALIDATE! Verify that the markers behave as expected in an independent sample set • Look for discrepancies in classification performance patterns between the cohorts Equally happily, the curve with the genetic marker doesn't behave as expected — it's even better
  • 92. 6. VALIDATE! Verify that the markers behave as expected in an independent sample set • Look for discrepancies in classification performance patterns between the cohorts Why would the genetic marker behave differently in the two cohorts? One explanation is that the discovery cohort was less healthy — the symptoms were already so strong that the genetic factor no longer added much new information. When the disease is less severe and cases are more similar to controls, the genetic information boosts sensitivity for the borderline cases. These kinds of nuances can be very valuable for deciding not only who the biomarker screening should be applied to, but also when.
  • 93. 6. VALIDATE! Verify that the markers behave as expected in an independent sample set • Reprioritize markers based on how well they held up as predictive in the new cohort
  • 94. 6. VALIDATE! Verify that the markers behave as expected in an independent sample set • Reprioritize markers based on how well they held up as predictive in the new cohort = low rank numbers for stronger markers = middling rank numbers for mediocre markers = high rank numbers for weaker markers Apply several different methods of ranking (i.e. "judges") to the set of markers — these are the same statistical tests used for biomarker discovery, but now the goal is to prioritize rather than exclude
  • 95. 6. VALIDATE! Verify that the markers behave as expected in an independent sample set • Reprioritize markers based on how well they held up as predictive in the new cohort = low rank numbers for stronger markers = middling rank numbers for mediocre markers = high rank numbers for weaker markers Note that one of the judges is the ranking created in the discovery phase, prior to seeing the validation data
  • 96. 6. VALIDATE! Verify that the markers behave as expected in an independent sample set • Reprioritize markers based on how well they held up as predictive in the new cohort = low rank numbers for stronger markers = middling rank numbers for mediocre markers = high rank numbers for weaker markers Each judge independently ranks the candidate markers in order (1=best, 26=worst)
  • 97. 6. VALIDATE! Verify that the markers behave as expected in an independent sample set • Reprioritize markers based on how well they held up as predictive in the new cohort = low rank numbers for stronger markers = middling rank numbers for mediocre markers = high rank numbers for weaker markers Then ranks for each marker are summed across judges
  • 98. 6. VALIDATE! Verify that the markers behave as expected in an independent sample set • Reprioritize markers based on how well they held up as predictive in the new cohort = low rank numbers for stronger markers = middling rank numbers for mediocre markers = high rank numbers for weaker markersWe colorcode the table to visually assess the consistency of rankings
  • 99. 6. VALIDATE! Verify that the markers behave as expected in an independent sample set • Reprioritize markers based on how well they held up as predictive in the new cohort The rank sums define an ordering of markers — this is our consensus opinion across evaluation methods
  • 100. 6. VALIDATE! Verify that the markers behave as expected in an independent sample set • Reprioritize markers based on how well they held up as predictive in the new cohort This means that rank sums within 56 of each other could be randomly assigned with high probability, but gaps larger than that are likely to be qualitative
  • 101. 6. VALIDATE! Verify that the markers behave as expected in an independent sample set • Reprioritize markers based on how well they held up as predictive in the new cohort 56 ranks 56 ranks
  • 102. 6. VALIDATE! Verify that the markers behave as expected in an independent sample set • Reprioritize markers based on how well they held up as predictive in the new cohort We could think of the rank sum distribution as a roughly even mixture of two kinds of markers: "hot" markers and "cool" markers (where in the middle it's hard to tell which is which)
  • 103. 6. VALIDATE! Verify that the markers behave as expected in an independent sample set • Reprioritize markers based on how well they held up as predictive in the new cohort "Hot" markers "Lukewarm" markers "Cool" markers
  • 104. 6. VALIDATE! Verify that the markers behave as expected in an independent sample set • Seek internal validation of the marker prioritization
  • 105. 6. VALIDATE! Verify that the markers behave as expected in an independent sample set • Seek internal validation of the marker prioritization  If the markers we think are important really are, then they should contribute the most to multimarker models
  • 106. 6. VALIDATE! Verify that the markers behave as expected in an independent sample set • Seek internal validation of the marker prioritization  If the markers we think are important really are, then they should contribute the most to multimarker models Nearly all possible parsimonious models, colorcoded by number of markers in the model
  • 107. 6. VALIDATE! Verify that the markers behave as expected in an independent sample set • Seek internal validation of the marker prioritization  If the markers we think are important really are, then they should contribute the most to multimarker models Best models are here — high AUC (strong), low AIC (parsimonious)
  • 108. 6. VALIDATE! Verify that the markers behave as expected in an independent sample set • Seek internal validation of the marker prioritization  If the markers we think are important really are, then they should contribute the most to multimarker models Which markers contribute most to the best models?
  • 109. 6. VALIDATE! Verify that the markers behave as expected in an independent sample set • Seek internal validation of the marker prioritization  If the markers we think are important really are, then they should contribute the most to multimarker models The highest ranked ones!
  • 110. 6. VALIDATE! Verify that the markers behave as expected in an independent sample set • Assess the overall quality of group separation
  • 111. 6. VALIDATE! Verify that the markers behave as expected in an independent sample set • Assess the overall quality of group separation We may never be able to screen these kinds of cases with our markers
  • 112. 6. VALIDATE! Verify that the markers behave as expected in an independent sample set • Assess the overall quality of group separation But many of these could be latent cases…!
  • 113. 6. VALIDATE! Verify that the markers behave as expected in an independent sample set • Evaluate performance ranges for model classes
  • 114. 6. VALIDATE! Verify that the markers behave as expected in an independent sample set • Evaluate performance ranges for model classes These regions fare no better than existing clinical markers
  • 115. 6. VALIDATE! Verify that the markers behave as expected in an independent sample set • Evaluate performance ranges for model classes High-performance regions can only be reached with a sufficient number of markers to allow clear discrimination
  • 116. 6. VALIDATE! Verify that the markers behave as expected in an independent sample set • Look for differential performance of the classifiers within clinically relevant covariate subgroups
  • 117. 6. VALIDATE! Verify that the markers behave as expected in an independent sample set • Look for differential performance of the classifiers within clinically relevant covariate subgroups  A clever way to do this is to cluster the subjects and then examine covariates in the tightest clusters
  • 118. 6. VALIDATE! Verify that the markers behave as expected in an independent sample set • Look for differential performance of the classifiers within clinically relevant covariate subgroups  A clever way to do this is to cluster the subjects and then examine covariates in the tightest clusters The markers are more sensitive in Cluster 2 than in Cluster 1, and Cluster 2 also has more males
  • 119. 6. VALIDATE! Verify that the markers behave as expected in an independent sample set • Look for trending of discrimination performance across covariate spectra
  • 120. 6. VALIDATE! Verify that the markers behave as expected in an independent sample set • Look for trending of discrimination performance across covariate spectra  The relationships may be marker-specific! Or nonlinear!
  • 121. 6. VALIDATE! Verify that the markers behave as expected in an independent sample set • Look for trending of discrimination performance across covariate spectra  The relationships may be marker-specific! Or nonlinear! Higher marker ranks seem to correlate with marker associations to the covariate…
  • 122. 6. VALIDATE! Verify that the markers behave as expected in an independent sample set • Look for trending of discrimination performance across covariate spectra  The relationships may be marker-specific! Or nonlinear! …but only for the strongest markers. Weaker markers don't show such a clear pattern
  • 123. NEXT STEPS Co-authors: Theresa Lusardi Jay Phillips Jack Wiedrick Chris Harrington Babette Lind Jodi Lapidus Joe Quinn Julie Saugstad Vol.55, no.3, pp.1223-1233, 2017 “MicroRNAs in Human Cerebrospinal Fluid as Biomarkers for Alzheimer’s Disease” DOI: 10.3233/JAD-160835 • An abbreviated discussion of this pipeline appeared in our recent UH2 paper in the Journal of Alzheimer's Disease • Publication of a follow-up UH3 paper on validation results is in progress • Also currently working on a standalone methods paper – Robust Statistical Analysis Pipeline for Selecting Promising Biomarkers from RT-qPCR Experiments Wiedrick, Lusardi, Saugstad, Lapidus