CAD v2

Computer Aided Detection of
Abnormalities in Medical Images
Balaji Krishnapuram
Siemens Medical Solutions USA
1

Outline of the talk
 Computer aided detection/diagnosis (CAD)
 Key challenges / Algorithms
 Clinical impact
 Lessons learnt
Several thousand units of the products described in this paper have been
commercially deployed in hospitals around the world since 2004
2

ML as part of a full system
• In this talk I only focus on some ML Research
• In practice, statistical modeling / ML algorithmic innovation
is < 20% of the effort to get to the full product.
• This was work undertaken by a large and very
talented team
3

Medical Imaging
• Increased resolution has resulted in Data Overload
– Increased total study time
– Increase in data does not always translate to improved diagnosis
• CAD: extract the actionable information from the imaging data
– in order to improve patient care
– while reducing total study time
Digital MammogramDigital Mammogram
CT ScanCT Scan
4

Computer-aided diagnosis/detection (CAD)
• Used as a second reader
• Improves the detection
performance of a
radiologist
• Reduces mistakes related
to misinterpretation
• The principal benefit of
CAD is determined by
carefully measuring the
incremental value of CAD
in normal clinical practice
CAD technologies support the physician by drawing attention to structures in
the image that may require further review.
5

Lung CAD
Identify suspicious regions called nodules (which may be
precursors of cancer) in CT scans of the lung.
6

Colon PEV Polyp Enhanced Viewer
Identify suspicious regions called polyps in CT scans of the
colon.
7

Mammo CAD
Identify abnormal masses/ clusters of micro-calcifications in
digital mammograms.
PECAD and MammoCAD are only sold outside the US.8

PE CAD
Pulmonary Embolism (PE) is a sudden blockage in a pulmonary artery
caused by an embolus that is formed in one part of the body and travels to
the lungs in the bloodstream through the heart.
PECAD and MammoCAD are only sold outside the US.9

Typical CAD architecture
Candidate Generation
Feature Computation
Classification
Image [ X-ray | CT scan | MRI ]
Location of lesions
Focus of the current talk
Potential candidates
Lesion
> 90% sensitivity
60-300 FP/image
> 80% sensitivity
2-5 FP/image
10

Key Machine Learning Challenges
Challenge Solutions
1. Training/testing data is correlated Multiple instance learning
batch classification
2. Evaluation metric is CAD specific Multiple instance learning
3. Run-time Constraints Cascaded classifiers
4. No objective ground truth EM crowd-sourcing algorithm
5. Data shortage Multi-task learning
6. Sensitivity for specific FP range Maximize (partial) AUC
11

The breakdown of assumptions
region on a mammogram lesion not a lesion
Traditional classification algorithms
Neural networks
Support Vector Machines
Logistic Regression ….
Often violated in CAD
Make two key assumptions
(1) Training samples are independent
(2) Maximize classification accuracy over all
candidates
12

Violation 1: Training examples are correlated
Candidate generation produces a lot of spatially adjacent candidates.
Hence there are high level of correlations among candidates.
Correlations also common across different images/detector type/hospitals.
13

Violation 2: Candidate level accuracy not important
Several candidates from the CG point to the same lesion
in the breast.
Lesion is detected if at least one of them is detected.
It is fine if we miss adjacent overlapping candidates.
Hence CAD system accuracy is measured in terms of
per lesion/image/patient sensitivity.
So why not optimize the performance metric we use to
evaluate our system?
Most algorithms maximize classification accuracy.
Try to classify every candidate correctly.
14

Solution 1: Multiple Instance Learning
Fung, et al. 2006, Bi, et al. 2007, Raykar et al. 2008, Krishnapuram, et al. 2008.
How do we acquire labels ?
Candidates which overlap with the radiologist mark is a positive.
Rest are negative.
1
1
0
0
0
0
Single Instance Learning
1
0
0
0
0
Multiple Instance Learning
Classify every candidate correctly
Positive Bag
Classify at-least one candidate correctly
15

Simple Illustration
Single instance learning:
•Reject as many negative candidates as
possible.
•Detect as many positives as possible.
Multiple Instance Learning
Single Instance Learning
Multiple instance learning:
Reject as many negative candidates as possible.
Detect at-least one candidate in a positive bag.
Accounts for correlation during trainingAccounts for correlation during training
16

Multiple Instance Learning Algorithm Details
Logistic Regression model
feature vector
weight vector
17

Solution part 2: Batch Classification
Vural et al., 2009
Accounts for correlation during testingAccounts for correlation during testing
Change the decision boundary during test time.Change the decision boundary during test time.
19

Batch Classification Model
20
Traditional, one-location at a time classification:
Modeling correlations using location (spatial adjacency) as side information:
Gaussian prior for latent variable that determines classification
Noise model for one-location-at-a-time classification primitive
Posterior: combining location side
information and classification features
Combined Gaussian CRF classification using location as side information:

Example results
21
Pulmonary Embolism Colon Cancer (polyps)

Run-time vs Accuracy Tradeoff: Soft Cascaded
Classifiers Raykar et al, 2010
+
−− −
Stage 1 Stage 2 Stage 3
increasing predictive power
increasing acquisition cost
increasing predictive power
increasing acquisition cost
22

For a given instance Cost
Stage 1
Stage 2
Stage 3
Modeling the expected cost
+
−− −
Stage 1 Stage 2 Stage 3
We optimize using cyclic coordinate descent
24

Some properties of soft cascades
• Sequential ordering of the cascade is not important.
• Order definitely matters during testing.
• A device to ease the training process.
• We use a maximum a-posteriori (MAP) estimate with
Bayesian priors on weights.
25

Subjective Ground truth
Raykar et al. 2009
Lesion ID Radiologist 1 Radiologist 2 Radiologist 3 Radiologist 4 Truth
Unknown
12 0 0 0 0 x
32 0 1 0 0 x
10 1 1 1 1 x
11 0 0 1 1 x
24 0 1 1 1 x
23 0 0 1 0 x
40 0 1 1 0 x
Each radiologist is asked to annotate whether a lesion is malignant (1) or not (0).
In practice there is a substantial
amount of disagreement.
We have no knowledge of the
actual golden ground truth.
Getting absolute ground truth (e.g.
biopsy) can be expensive.
We proposed an EM algorithm to simultaneously
learn the ground truth and the classifier.
We proposed an EM algorithm to simultaneously
learn the ground truth and the classifier.
27

How to judge an expert/annotator ?
A radiologist with two coins
True Label
Label assigned by
expert j
28

EM algorithm for jointly estimating radiologist
accuracy and classifier
If I knew the true label I can estimate sensitivity /specificity of
each expert, and also estimate classifier w:
If I knew how good each expert is I can estimate the true label
Iterate till convergence
Initialize using majority-voting
29

Data Shortage: Multi-task Learning
Raykar et al. 2008.
Lung Nodule Ground Glass Object
31

Example Multi-Task Learning Result
32

Maximizing AUC
Raykar et al. 2008
+
+
+
+ +
+
+
-
-
-
-
-
-
33

Generalization of AUC maximization: Learning
Preference Relationships / Ranking
From these two we can get a set of
pairwise preference relations
34

MAP Estimator is expensive to compute
Discrete optimization problem
Original task: Choose w to maximize
35
Log-likelihood:
Prior:

Accelerating the core computational primitive
Weighted summation of erfc() functions:
36
Truncated Beauliu’s series admits decomposition & regrouping:

37
Dataset Direct Fast
1 1736 secs. 2 secs.
2 6731 secs. 19 secs.
3 2557 secs. 4 secs.
4 * 47 secs.
Direct vs Fast – Time taken

38
Sample result
Dataset 8
Time taken
(secs)
WMW
RankNCG direct 333 0.984
RankNCG fast 3 0.984
RankNet linear 1264 0.951
RankNet two layer 2464 0.765
RankSVM linear 34 0.984
RankSVM quadratic 1332 0.996
RankBoost 6 0.958

Key Machine Learning Challenges
Challenge Solutions
1. Training/testing data is correlated Multiple instance learning
batch classification
2. Evaluation metric is CAD specific Multiple instance learning
3. Run-time Constraints Cascaded classifiers
4. No objective ground truth EM crowd-sourcing algorithm
5. Data shortage Multi-task learning
6. Sensitivity for specific FP range Maximize (partial) AUC
39

Clinical Impact
• Measure the improvement in performance of a radiologist with
the Siemens CAD software.
• Several independent clinical studies/trials have been conducted
by our collaborators worldwide.
• NOTE: CAD is deployed in second reader mode in these
studies.
40

Lung CAD
1. FDA clinical validation study with17 radiologists,196 cases from
4 hospitals. Average reader AUC increased by 0.048 (p<0.001)
because of CAD.
2. Study at NYU by Godoy et al. 2008
3. New version also helps detect different kinds of nodules.
Mean sensitivity
without CAD
Mean sensitivity with
CAD
Increase in sensitivity
Solid Nodules 60% 85% 15 %
Part-solid Nodules 80% 95% 15%
Ground Glass Opacities 75% 86% 11%
Sensitivity without CAD Sensitivity with CAD Increase in sensitivity
Reader 1 56.2 % 66.0 % 9.8 %
Reader 2 79.2 % 89.8 % 10.6 %
41

Colon PEV
Colon PEV (Polyp Enhanced Viewer) was evaluated by Baker,
et al. 2007
– Study with seven less-experienced readers
– Without PEV average sensitivity was 0.810
– With PEV average sensitivity was 0.908
– A 9.8% increase in average sensitivity (p=0.0152).
42

PE CAD
Das et al. 2008 conducted a study with 43 patients to asses the
sensitivity of detection of pulmonary embolism.
.
Sensitivity
without CAD
Sensitivity
with CAD
Increase in
sensitivity
Reader 1 87% 98% 11%
Reader 2 82% 93% 11%
Reader 3 77% 92% 15%
43

Long-term career growth
=
Increased Impact
(Customers, Share holders, Society)
44
Themes relevant for ML practitioners

We increase our impact by growing along 3 axes:
1.Product
2.Technology
3.Team
45

1. Product: Domain knowledge is very important. We need to
design or utilize algorithms to optimize the metrics relevant to
our customers.
– CAD example: Collaboration with radiologists is crucial in eliciting the domain knowledge
about cancer, and also too understand their usage habits, what they care about, etc.
change
– For example accuracy metric was different in our product
2. Technology: Need careful analysis of the assumptions behind
off-the-shelf data-mining algorithms.
– CAD example: most of this talk covered these technical / mathematical
assumptions
46

3. Team: By truly integrating with the entire product team we can
optimize the entire system and achieve much bigger impact.
It is important for us to design or contribute to the infrastructure.
• End-to-end automated system optimization: e.g. automated optimization of
parameter settings for image processing algorithms
• Re-usable tools e.g. features, deployable large-scale learning algorithms.
• Analysis/modeling to support deployment goals: e.g. reduce memory &
computational footprint
• Version control for Data/Ground-truth, Automated tests (probabilistic!) etc
• Visualization tools for inputs or failure modes for other team members : eg
cluster failures in feature space, visualize prototypical failures as images to
discover clinical or image processing insights about failures
• Analysis of technical debt associated with ML
47

Technical Debt associated with ML
• Entanglement: Changing Anything Changes Everything (CACE)
• Hidden causal-feedback loops: eg changing CTR with ML alters user
clicks & thus the data generating distributions
• Undeclared consumers of intermediate stages/features etc
• Unstable data dependencies: need versioned copies of signals!
• Legacy features, epsilon features etc
• Correction cascades are a terrible idea!
• System level glue code / pipeline jungles
• Dead experimental code paths eg AB test
• Configuration debt
• Etc…
48

Acknowledgements
Dr. D. Naidich, MD, of New York University
Dr. M. E. Baker, MD, of the Cleveland Clinic Foundation
Dr. M. Das, MD, of the University of Aachen
Dr. U. J. Schoepf, MD, of the Medical University of South Carolina
Dr. Peter Herzog, MD, of Klinikum Grossharden, Munich.
Siemens:
Ingo Schmuecking, MD, Alok Gupta, Bharat Rao, Murat Dundar, Jinbo Bi,
Harald Steck, Stefan Niculescu, Romer Rosales, Shipeng Yu, Glenn Fung,
Vikas Raykar, Sangmin Park, Gerardo Valadez, Jonathan Stoeckel, Anna
Jerebko, Matthias Wolf, and the entire SISL team.
49

Maximum Likelihood Estimator
52

How to judge an annotator ?
Gold Standard
Novice
Luminary
Dart throwing
monkey
Evil
Dumb expert
Good experts have high sensitivity and high specificity.
55

1. Beauliu’s series expansion
57
Retain only the first few terms
contributing to the desired
accuracy.

3. Regrouping
Does not depend on y.
Can be computed in O(pN)
Once A and B are precomputed
Can be computed in O(pM)
Reduced from O(MN) to O(p(M+N)) 59

4. Other tricks
• Rapid saturation of the erfc function.
• Space subdivision
• Choosing the parameters to achieve
the error bound
• See the technical report
60

61
Sample result
Dataset 8
Time taken
(secs)
WMW
RankNCG direct 333 0.984
RankNCG fast 3 0.984
RankNet linear 1264 0.951
RankNet two layer 2464 0.765
RankSVM linear 34 0.984
RankSVM quadratic 1332 0.996
RankBoost 6 0.958

62
Application to collaborative filtering
• Predict movie ratings for a user based on the
ratings provided by other users.
• MovieLens dataset (www.grouplens.org)
• 1 million ratings (1-5)
• 3592 movies
• 6040 users
• Feature vector for each movie – rating provided
by d other users

63
Collaborative filtering results

64
Collaborative filtering results

CAD v2

Recommended

Recommended

More Related Content

What's hot

What's hot (6)

Similar to CAD v2

Similar to CAD v2 (20)

CAD v2

Editor's Notes