SlideShare a Scribd company logo
Computer Aided Detection of
Abnormalities in Medical Images
Balaji Krishnapuram
Siemens Medical Solutions USA
1
Outline of the talk
 Computer aided detection/diagnosis (CAD)
 Key challenges / Algorithms
 Clinical impact
 Lessons learnt
Several thousand units of the products described in this paper have been
commercially deployed in hospitals around the world since 2004
2
ML as part of a full system
• In this talk I only focus on some ML Research
• In practice, statistical modeling / ML algorithmic innovation
is < 20% of the effort to get to the full product.
• This was work undertaken by a large and very
talented team
3
Medical Imaging
• Increased resolution has resulted in Data Overload
– Increased total study time
– Increase in data does not always translate to improved diagnosis
• CAD: extract the actionable information from the imaging data
– in order to improve patient care
– while reducing total study time
Digital MammogramDigital Mammogram
CT ScanCT Scan
4
Computer-aided diagnosis/detection (CAD)
• Used as a second reader
• Improves the detection
performance of a
radiologist
• Reduces mistakes related
to misinterpretation
• The principal benefit of
CAD is determined by
carefully measuring the
incremental value of CAD
in normal clinical practice
CAD technologies support the physician by drawing attention to structures in
the image that may require further review.
5
Lung CAD
Identify suspicious regions called nodules (which may be
precursors of cancer) in CT scans of the lung.
6
Colon PEV Polyp Enhanced Viewer
Identify suspicious regions called polyps in CT scans of the
colon.
7
Mammo CAD
Identify abnormal masses/ clusters of micro-calcifications in
digital mammograms.
PECAD and MammoCAD are only sold outside the US.8
PE CAD
Pulmonary Embolism (PE) is a sudden blockage in a pulmonary artery
caused by an embolus that is formed in one part of the body and travels to
the lungs in the bloodstream through the heart.
PECAD and MammoCAD are only sold outside the US.9
Typical CAD architecture
Candidate Generation
Feature Computation
Classification
Image [ X-ray | CT scan | MRI ]
Location of lesions
Focus of the current talk
Potential candidates
Lesion
> 90% sensitivity
60-300 FP/image
> 80% sensitivity
2-5 FP/image
10
Key Machine Learning Challenges
Challenge Solutions
1. Training/testing data is correlated Multiple instance learning
batch classification
2. Evaluation metric is CAD specific Multiple instance learning
3. Run-time Constraints Cascaded classifiers
4. No objective ground truth EM crowd-sourcing algorithm
5. Data shortage Multi-task learning
6. Sensitivity for specific FP range Maximize (partial) AUC
11
The breakdown of assumptions
region on a mammogram lesion not a lesion
Traditional classification algorithms
Neural networks
Support Vector Machines
Logistic Regression ….
Often violated in CAD
Make two key assumptions
(1) Training samples are independent
(2) Maximize classification accuracy over all
candidates
12
Violation 1: Training examples are correlated
Candidate generation produces a lot of spatially adjacent candidates.
Hence there are high level of correlations among candidates.
Correlations also common across different images/detector type/hospitals.
13
Violation 2: Candidate level accuracy not important
Several candidates from the CG point to the same lesion
in the breast.
Lesion is detected if at least one of them is detected.
It is fine if we miss adjacent overlapping candidates.
Hence CAD system accuracy is measured in terms of
per lesion/image/patient sensitivity.
So why not optimize the performance metric we use to
evaluate our system?
Most algorithms maximize classification accuracy.
Try to classify every candidate correctly.
14
Solution 1: Multiple Instance Learning
Fung, et al. 2006, Bi, et al. 2007, Raykar et al. 2008, Krishnapuram, et al. 2008.
How do we acquire labels ?
Candidates which overlap with the radiologist mark is a positive.
Rest are negative.
1
1
0
0
0
0
Single Instance Learning
1
0
0
0
0
Multiple Instance Learning
Classify every candidate correctly
Positive Bag
Classify at-least one candidate correctly
15
Simple Illustration
Single instance learning:
•Reject as many negative candidates as
possible.
•Detect as many positives as possible.
Multiple Instance Learning
Single Instance Learning
Multiple instance learning:
Reject as many negative candidates as possible.
Detect at-least one candidate in a positive bag.
Accounts for correlation during trainingAccounts for correlation during training
16
Multiple Instance Learning Algorithm Details
Logistic Regression model
feature vector
weight vector
17
Example MIL Result
18
Solution part 2: Batch Classification
Vural et al., 2009
Accounts for correlation during testingAccounts for correlation during testing
Change the decision boundary during test time.Change the decision boundary during test time.
19
Batch Classification Model
20
Traditional, one-location at a time classification:
Modeling correlations using location (spatial adjacency) as side information:
Gaussian prior for latent variable that determines classification
Noise model for one-location-at-a-time classification primitive
Posterior: combining location side
information and classification features
Combined Gaussian CRF classification using location as side information:
Example results
21
Pulmonary Embolism Colon Cancer (polyps)
Run-time vs Accuracy Tradeoff: Soft Cascaded
Classifiers Raykar et al, 2010
+
−− −
Stage 1 Stage 2 Stage 3
increasing predictive power
increasing acquisition cost
increasing predictive power
increasing acquisition cost
22
For a given instance Cost
Stage 1
Stage 2
Stage 3
Modeling the expected cost
+
−− −
Stage 1 Stage 2 Stage 3
We optimize using cyclic coordinate descent
24
Some properties of soft cascades
• Sequential ordering of the cascade is not important.
• Order definitely matters during testing.
• A device to ease the training process.
• We use a maximum a-posteriori (MAP) estimate with
Bayesian priors on weights.
25
Test set FROC Curves
26
Subjective Ground truth
Raykar et al. 2009
Lesion ID Radiologist 1 Radiologist 2 Radiologist 3 Radiologist 4 Truth
Unknown
12 0 0 0 0 x
32 0 1 0 0 x
10 1 1 1 1 x
11 0 0 1 1 x
24 0 1 1 1 x
23 0 0 1 0 x
40 0 1 1 0 x
Each radiologist is asked to annotate whether a lesion is malignant (1) or not (0).
In practice there is a substantial
amount of disagreement.
We have no knowledge of the
actual golden ground truth.
Getting absolute ground truth (e.g.
biopsy) can be expensive.
We proposed an EM algorithm to simultaneously
learn the ground truth and the classifier.
We proposed an EM algorithm to simultaneously
learn the ground truth and the classifier.
27
How to judge an expert/annotator ?
A radiologist with two coins
True Label
Label assigned by
expert j
28
EM algorithm for jointly estimating radiologist
accuracy and classifier
If I knew the true label I can estimate sensitivity /specificity of
each expert, and also estimate classifier w:
If I knew how good each expert is I can estimate the true label
Iterate till convergence
Initialize using majority-voting
29
Breast MRI results
30
Data Shortage: Multi-task Learning
Raykar et al. 2008.
Lung Nodule Ground Glass Object
31
Example Multi-Task Learning Result
32
Maximizing AUC
Raykar et al. 2008
+
+
+
+ +
+
+
-
-
-
-
-
-
33
Generalization of AUC maximization: Learning
Preference Relationships / Ranking
From these two we can get a set of
pairwise preference relations
34
MAP Estimator is expensive to compute
Discrete optimization problem
Original task: Choose w to maximize
35
Log-likelihood:
Prior:
Accelerating the core computational primitive
Weighted summation of erfc() functions:
36
Truncated Beauliu’s series admits decomposition & regrouping:
37
Dataset Direct Fast
1 1736 secs. 2 secs.
2 6731 secs. 19 secs.
3 2557 secs. 4 secs.
4 * 47 secs.
Direct vs Fast – Time taken
38
Sample result
Dataset 8
Time taken
(secs)
WMW
RankNCG direct 333 0.984
RankNCG fast 3 0.984
RankNet linear 1264 0.951
RankNet two layer 2464 0.765
RankSVM linear 34 0.984
RankSVM quadratic 1332 0.996
RankBoost 6 0.958
Key Machine Learning Challenges
Challenge Solutions
1. Training/testing data is correlated Multiple instance learning
batch classification
2. Evaluation metric is CAD specific Multiple instance learning
3. Run-time Constraints Cascaded classifiers
4. No objective ground truth EM crowd-sourcing algorithm
5. Data shortage Multi-task learning
6. Sensitivity for specific FP range Maximize (partial) AUC
39
Clinical Impact
• Measure the improvement in performance of a radiologist with
the Siemens CAD software.
• Several independent clinical studies/trials have been conducted
by our collaborators worldwide.
• NOTE: CAD is deployed in second reader mode in these
studies.
40
Lung CAD
1. FDA clinical validation study with17 radiologists,196 cases from
4 hospitals. Average reader AUC increased by 0.048 (p<0.001)
because of CAD.
2. Study at NYU by Godoy et al. 2008
3. New version also helps detect different kinds of nodules.
Mean sensitivity
without CAD
Mean sensitivity with
CAD
Increase in sensitivity
Solid Nodules 60% 85% 15 %
Part-solid Nodules 80% 95% 15%
Ground Glass Opacities 75% 86% 11%
Sensitivity without CAD Sensitivity with CAD Increase in sensitivity
Reader 1 56.2 % 66.0 % 9.8 %
Reader 2 79.2 % 89.8 % 10.6 %
41
Colon PEV
Colon PEV (Polyp Enhanced Viewer) was evaluated by Baker,
et al. 2007
– Study with seven less-experienced readers
– Without PEV average sensitivity was 0.810
– With PEV average sensitivity was 0.908
– A 9.8% increase in average sensitivity (p=0.0152).
42
PE CAD
Das et al. 2008 conducted a study with 43 patients to asses the
sensitivity of detection of pulmonary embolism.
.
Sensitivity
without CAD
Sensitivity
with CAD
Increase in
sensitivity
Reader 1 87% 98% 11%
Reader 2 82% 93% 11%
Reader 3 77% 92% 15%
43
Long-term career growth
=
Increased Impact
(Customers, Share holders, Society)
44
Themes relevant for ML practitioners
We increase our impact by growing along 3 axes:
1.Product
2.Technology
3.Team
45
Themes relevant for ML practitioners
Themes relevant for ML practitioners
1. Product: Domain knowledge is very important. We need to
design or utilize algorithms to optimize the metrics relevant to
our customers.
– CAD example: Collaboration with radiologists is crucial in eliciting the domain knowledge
about cancer, and also too understand their usage habits, what they care about, etc.
change
– For example accuracy metric was different in our product
2. Technology: Need careful analysis of the assumptions behind
off-the-shelf data-mining algorithms.
– CAD example: most of this talk covered these technical / mathematical
assumptions
46
Themes relevant for ML practitioners
3. Team: By truly integrating with the entire product team we can
optimize the entire system and achieve much bigger impact.
It is important for us to design or contribute to the infrastructure.
• End-to-end automated system optimization: e.g. automated optimization of
parameter settings for image processing algorithms
• Re-usable tools e.g. features, deployable large-scale learning algorithms.
• Analysis/modeling to support deployment goals: e.g. reduce memory &
computational footprint
• Version control for Data/Ground-truth, Automated tests (probabilistic!) etc
• Visualization tools for inputs or failure modes for other team members : eg
cluster failures in feature space, visualize prototypical failures as images to
discover clinical or image processing insights about failures
• Analysis of technical debt associated with ML
47
Technical Debt associated with ML
• Entanglement: Changing Anything Changes Everything (CACE)
• Hidden causal-feedback loops: eg changing CTR with ML alters user
clicks & thus the data generating distributions
• Undeclared consumers of intermediate stages/features etc
• Unstable data dependencies: need versioned copies of signals!
• Legacy features, epsilon features etc
• Correction cascades are a terrible idea!
• System level glue code / pipeline jungles
• Dead experimental code paths eg AB test
• Configuration debt
• Etc…
48
Acknowledgements
Dr. D. Naidich, MD, of New York University
Dr. M. E. Baker, MD, of the Cleveland Clinic Foundation
Dr. M. Das, MD, of the University of Aachen
Dr. U. J. Schoepf, MD, of the Medical University of South Carolina
Dr. Peter Herzog, MD, of Klinikum Grossharden, Munich.
Siemens:
Ingo Schmuecking, MD, Alok Gupta, Bharat Rao, Murat Dundar, Jinbo Bi,
Harald Steck, Stefan Niculescu, Romer Rosales, Shipeng Yu, Glenn Fung,
Vikas Raykar, Sangmin Park, Gerardo Valadez, Jonathan Stoeckel, Anna
Jerebko, Matthias Wolf, and the entire SISL team.
49
Thank You ! | Questions ?
50
MIL
51
Maximum Likelihood Estimator
52
Sparse Bayesian Prior
53
Feature Selection
54
How to judge an annotator ?
Gold Standard
Novice
Luminary
Dart throwing
monkey
Evil
Dumb expert
Good experts have high sensitivity and high specificity.
55
1. Beauliu’s series expansion
57
Retain only the first few terms
contributing to the desired
accuracy.
2. Use truncated series
58
3. Regrouping
Does not depend on y.
Can be computed in O(pN)
Once A and B are precomputed
Can be computed in O(pM)
Reduced from O(MN) to O(p(M+N)) 59
4. Other tricks
• Rapid saturation of the erfc function.
• Space subdivision
• Choosing the parameters to achieve
the error bound
• See the technical report
60
61
Sample result
Dataset 8
Time taken
(secs)
WMW
RankNCG direct 333 0.984
RankNCG fast 3 0.984
RankNet linear 1264 0.951
RankNet two layer 2464 0.765
RankSVM linear 34 0.984
RankSVM quadratic 1332 0.996
RankBoost 6 0.958
62
Application to collaborative filtering
• Predict movie ratings for a user based on the
ratings provided by other users.
• MovieLens dataset (www.grouplens.org)
• 1 million ratings (1-5)
• 3592 movies
• 6040 users
• Feature vector for each movie – rating provided
by d other users
63
Collaborative filtering results
64
Collaborative filtering results

More Related Content

What's hot

Parametric Tolerance Interval Testing
Parametric Tolerance Interval TestingParametric Tolerance Interval Testing
Parametric Tolerance Interval Testing
Siva Chaitanya Addala
 
Deep Generative model-based quality control for cardiac MRI segmentation
Deep Generative model-based quality control for cardiac MRI segmentation Deep Generative model-based quality control for cardiac MRI segmentation
Deep Generative model-based quality control for cardiac MRI segmentation
Seunghyun Hwang
 
Aditya Bhattacharya Chest XRay Image Analysis Using Deep Learning
Aditya Bhattacharya Chest XRay Image Analysis Using Deep LearningAditya Bhattacharya Chest XRay Image Analysis Using Deep Learning
Aditya Bhattacharya Chest XRay Image Analysis Using Deep Learning
Aditya Bhattacharya
 
IRJET- Detection and Classification of Skin Diseases using Different Colo...
IRJET-  	  Detection and Classification of Skin Diseases using Different Colo...IRJET-  	  Detection and Classification of Skin Diseases using Different Colo...
IRJET- Detection and Classification of Skin Diseases using Different Colo...
IRJET Journal
 
Building a model for expected cost function to obtain double
Building a model for expected cost function to obtain doubleBuilding a model for expected cost function to obtain double
Building a model for expected cost function to obtain double
IAEME Publication
 
Air conditioner market case study
Air conditioner market case studyAir conditioner market case study
Air conditioner market case study
Shashwat Shankar
 

What's hot (6)

Parametric Tolerance Interval Testing
Parametric Tolerance Interval TestingParametric Tolerance Interval Testing
Parametric Tolerance Interval Testing
 
Deep Generative model-based quality control for cardiac MRI segmentation
Deep Generative model-based quality control for cardiac MRI segmentation Deep Generative model-based quality control for cardiac MRI segmentation
Deep Generative model-based quality control for cardiac MRI segmentation
 
Aditya Bhattacharya Chest XRay Image Analysis Using Deep Learning
Aditya Bhattacharya Chest XRay Image Analysis Using Deep LearningAditya Bhattacharya Chest XRay Image Analysis Using Deep Learning
Aditya Bhattacharya Chest XRay Image Analysis Using Deep Learning
 
IRJET- Detection and Classification of Skin Diseases using Different Colo...
IRJET-  	  Detection and Classification of Skin Diseases using Different Colo...IRJET-  	  Detection and Classification of Skin Diseases using Different Colo...
IRJET- Detection and Classification of Skin Diseases using Different Colo...
 
Building a model for expected cost function to obtain double
Building a model for expected cost function to obtain doubleBuilding a model for expected cost function to obtain double
Building a model for expected cost function to obtain double
 
Air conditioner market case study
Air conditioner market case studyAir conditioner market case study
Air conditioner market case study
 

Similar to CAD v2

Medical Segmentation Decathalon
Medical Segmentation DecathalonMedical Segmentation Decathalon
Medical Segmentation Decathalon
imgcommcall
 
Deep learning methods applied to physicochemical and toxicological endpoints
Deep learning methods applied to physicochemical and toxicological endpointsDeep learning methods applied to physicochemical and toxicological endpoints
Deep learning methods applied to physicochemical and toxicological endpoints
Valery Tkachenko
 
Flexible Clinical Trial Design - Survival, Stepped-Wedge & MAMS Designs
Flexible Clinical Trial Design - Survival, Stepped-Wedge & MAMS DesignsFlexible Clinical Trial Design - Survival, Stepped-Wedge & MAMS Designs
Flexible Clinical Trial Design - Survival, Stepped-Wedge & MAMS Designs
nQuery
 
IRJET - Survey on Analysis of Breast Cancer Prediction
IRJET - Survey on Analysis of Breast Cancer PredictionIRJET - Survey on Analysis of Breast Cancer Prediction
IRJET - Survey on Analysis of Breast Cancer Prediction
IRJET Journal
 
Development and sharing of ADME/Tox and Drug Discovery Machine learning models
Development and sharing of ADME/Tox and Drug Discovery Machine learning modelsDevelopment and sharing of ADME/Tox and Drug Discovery Machine learning models
Development and sharing of ADME/Tox and Drug Discovery Machine learning models
Sean Ekins
 
Extending A Trial’s Design Case Studies Of Dealing With Study Design Issues
Extending A Trial’s Design Case Studies Of Dealing With Study Design IssuesExtending A Trial’s Design Case Studies Of Dealing With Study Design Issues
Extending A Trial’s Design Case Studies Of Dealing With Study Design Issues
nQuery
 
IRJET - Breast Cancer Prediction using Supervised Machine Learning Algorithms...
IRJET - Breast Cancer Prediction using Supervised Machine Learning Algorithms...IRJET - Breast Cancer Prediction using Supervised Machine Learning Algorithms...
IRJET - Breast Cancer Prediction using Supervised Machine Learning Algorithms...
IRJET Journal
 
Simplified Knowledge Prediction: Application of Machine Learning in Real Life
Simplified Knowledge Prediction: Application of Machine Learning in Real LifeSimplified Knowledge Prediction: Application of Machine Learning in Real Life
Simplified Knowledge Prediction: Application of Machine Learning in Real Life
Peea Bal Chakraborty
 
1115 wyatt wheres the science in hi for christchurch nz oct 2015
1115 wyatt wheres the science in hi   for christchurch nz oct 20151115 wyatt wheres the science in hi   for christchurch nz oct 2015
1115 wyatt wheres the science in hi for christchurch nz oct 2015
Health Informatics New Zealand
 
Parkinson disease classification v2.0
Parkinson disease classification v2.0Parkinson disease classification v2.0
Parkinson disease classification v2.0
Nikhil Shrivastava, MS, SAFe PMPO
 
Enhancing Psychotherapy Treatment by Analyzing Alliance Ruptures through Gaze...
Enhancing Psychotherapy Treatment by Analyzing Alliance Ruptures through Gaze...Enhancing Psychotherapy Treatment by Analyzing Alliance Ruptures through Gaze...
Enhancing Psychotherapy Treatment by Analyzing Alliance Ruptures through Gaze...
Muhammad Zbeedat
 
X02513181323
X02513181323X02513181323
X02513181323
ijceronline
 
Saliency Based Hookworm and Infection Detection for Wireless Capsule Endoscop...
Saliency Based Hookworm and Infection Detection for Wireless Capsule Endoscop...Saliency Based Hookworm and Infection Detection for Wireless Capsule Endoscop...
Saliency Based Hookworm and Infection Detection for Wireless Capsule Endoscop...
IRJET Journal
 
Cluster randomised trials with excessive cluster sizes: ethical and design im...
Cluster randomised trials with excessive cluster sizes: ethical and design im...Cluster randomised trials with excessive cluster sizes: ethical and design im...
Cluster randomised trials with excessive cluster sizes: ethical and design im...
Karla hemming
 
Nico Karssemeijer
Nico KarssemeijerNico Karssemeijer
Nico Karssemeijer
NFBI
 
2020 trends in biostatistics what you should know about study design - slid...
2020 trends in biostatistics   what you should know about study design - slid...2020 trends in biostatistics   what you should know about study design - slid...
2020 trends in biostatistics what you should know about study design - slid...
nQuery
 
Parkinson disease classification recorded v2.0
Parkinson disease classification recorded   v2.0Parkinson disease classification recorded   v2.0
Parkinson disease classification recorded v2.0
Nikhil Shrivastava, MS, SAFe PMPO
 
Comparison of breast cancer classification models on Wisconsin dataset
Comparison of breast cancer classification models on Wisconsin  datasetComparison of breast cancer classification models on Wisconsin  dataset
Comparison of breast cancer classification models on Wisconsin dataset
International Journal of Reconfigurable and Embedded Systems
 
Surface features with nonparametric machine learning
Surface features with nonparametric machine learningSurface features with nonparametric machine learning
Surface features with nonparametric machine learning
Sylvain Ferrandiz
 
Skin melanoma stage detection - CNN.pptx
Skin melanoma stage detection - CNN.pptxSkin melanoma stage detection - CNN.pptx
Skin melanoma stage detection - CNN.pptx
VishalLabde
 

Similar to CAD v2 (20)

Medical Segmentation Decathalon
Medical Segmentation DecathalonMedical Segmentation Decathalon
Medical Segmentation Decathalon
 
Deep learning methods applied to physicochemical and toxicological endpoints
Deep learning methods applied to physicochemical and toxicological endpointsDeep learning methods applied to physicochemical and toxicological endpoints
Deep learning methods applied to physicochemical and toxicological endpoints
 
Flexible Clinical Trial Design - Survival, Stepped-Wedge & MAMS Designs
Flexible Clinical Trial Design - Survival, Stepped-Wedge & MAMS DesignsFlexible Clinical Trial Design - Survival, Stepped-Wedge & MAMS Designs
Flexible Clinical Trial Design - Survival, Stepped-Wedge & MAMS Designs
 
IRJET - Survey on Analysis of Breast Cancer Prediction
IRJET - Survey on Analysis of Breast Cancer PredictionIRJET - Survey on Analysis of Breast Cancer Prediction
IRJET - Survey on Analysis of Breast Cancer Prediction
 
Development and sharing of ADME/Tox and Drug Discovery Machine learning models
Development and sharing of ADME/Tox and Drug Discovery Machine learning modelsDevelopment and sharing of ADME/Tox and Drug Discovery Machine learning models
Development and sharing of ADME/Tox and Drug Discovery Machine learning models
 
Extending A Trial’s Design Case Studies Of Dealing With Study Design Issues
Extending A Trial’s Design Case Studies Of Dealing With Study Design IssuesExtending A Trial’s Design Case Studies Of Dealing With Study Design Issues
Extending A Trial’s Design Case Studies Of Dealing With Study Design Issues
 
IRJET - Breast Cancer Prediction using Supervised Machine Learning Algorithms...
IRJET - Breast Cancer Prediction using Supervised Machine Learning Algorithms...IRJET - Breast Cancer Prediction using Supervised Machine Learning Algorithms...
IRJET - Breast Cancer Prediction using Supervised Machine Learning Algorithms...
 
Simplified Knowledge Prediction: Application of Machine Learning in Real Life
Simplified Knowledge Prediction: Application of Machine Learning in Real LifeSimplified Knowledge Prediction: Application of Machine Learning in Real Life
Simplified Knowledge Prediction: Application of Machine Learning in Real Life
 
1115 wyatt wheres the science in hi for christchurch nz oct 2015
1115 wyatt wheres the science in hi   for christchurch nz oct 20151115 wyatt wheres the science in hi   for christchurch nz oct 2015
1115 wyatt wheres the science in hi for christchurch nz oct 2015
 
Parkinson disease classification v2.0
Parkinson disease classification v2.0Parkinson disease classification v2.0
Parkinson disease classification v2.0
 
Enhancing Psychotherapy Treatment by Analyzing Alliance Ruptures through Gaze...
Enhancing Psychotherapy Treatment by Analyzing Alliance Ruptures through Gaze...Enhancing Psychotherapy Treatment by Analyzing Alliance Ruptures through Gaze...
Enhancing Psychotherapy Treatment by Analyzing Alliance Ruptures through Gaze...
 
X02513181323
X02513181323X02513181323
X02513181323
 
Saliency Based Hookworm and Infection Detection for Wireless Capsule Endoscop...
Saliency Based Hookworm and Infection Detection for Wireless Capsule Endoscop...Saliency Based Hookworm and Infection Detection for Wireless Capsule Endoscop...
Saliency Based Hookworm and Infection Detection for Wireless Capsule Endoscop...
 
Cluster randomised trials with excessive cluster sizes: ethical and design im...
Cluster randomised trials with excessive cluster sizes: ethical and design im...Cluster randomised trials with excessive cluster sizes: ethical and design im...
Cluster randomised trials with excessive cluster sizes: ethical and design im...
 
Nico Karssemeijer
Nico KarssemeijerNico Karssemeijer
Nico Karssemeijer
 
2020 trends in biostatistics what you should know about study design - slid...
2020 trends in biostatistics   what you should know about study design - slid...2020 trends in biostatistics   what you should know about study design - slid...
2020 trends in biostatistics what you should know about study design - slid...
 
Parkinson disease classification recorded v2.0
Parkinson disease classification recorded   v2.0Parkinson disease classification recorded   v2.0
Parkinson disease classification recorded v2.0
 
Comparison of breast cancer classification models on Wisconsin dataset
Comparison of breast cancer classification models on Wisconsin  datasetComparison of breast cancer classification models on Wisconsin  dataset
Comparison of breast cancer classification models on Wisconsin dataset
 
Surface features with nonparametric machine learning
Surface features with nonparametric machine learningSurface features with nonparametric machine learning
Surface features with nonparametric machine learning
 
Skin melanoma stage detection - CNN.pptx
Skin melanoma stage detection - CNN.pptxSkin melanoma stage detection - CNN.pptx
Skin melanoma stage detection - CNN.pptx
 

CAD v2

  • 1. Computer Aided Detection of Abnormalities in Medical Images Balaji Krishnapuram Siemens Medical Solutions USA 1
  • 2. Outline of the talk  Computer aided detection/diagnosis (CAD)  Key challenges / Algorithms  Clinical impact  Lessons learnt Several thousand units of the products described in this paper have been commercially deployed in hospitals around the world since 2004 2
  • 3. ML as part of a full system • In this talk I only focus on some ML Research • In practice, statistical modeling / ML algorithmic innovation is < 20% of the effort to get to the full product. • This was work undertaken by a large and very talented team 3
  • 4. Medical Imaging • Increased resolution has resulted in Data Overload – Increased total study time – Increase in data does not always translate to improved diagnosis • CAD: extract the actionable information from the imaging data – in order to improve patient care – while reducing total study time Digital MammogramDigital Mammogram CT ScanCT Scan 4
  • 5. Computer-aided diagnosis/detection (CAD) • Used as a second reader • Improves the detection performance of a radiologist • Reduces mistakes related to misinterpretation • The principal benefit of CAD is determined by carefully measuring the incremental value of CAD in normal clinical practice CAD technologies support the physician by drawing attention to structures in the image that may require further review. 5
  • 6. Lung CAD Identify suspicious regions called nodules (which may be precursors of cancer) in CT scans of the lung. 6
  • 7. Colon PEV Polyp Enhanced Viewer Identify suspicious regions called polyps in CT scans of the colon. 7
  • 8. Mammo CAD Identify abnormal masses/ clusters of micro-calcifications in digital mammograms. PECAD and MammoCAD are only sold outside the US.8
  • 9. PE CAD Pulmonary Embolism (PE) is a sudden blockage in a pulmonary artery caused by an embolus that is formed in one part of the body and travels to the lungs in the bloodstream through the heart. PECAD and MammoCAD are only sold outside the US.9
  • 10. Typical CAD architecture Candidate Generation Feature Computation Classification Image [ X-ray | CT scan | MRI ] Location of lesions Focus of the current talk Potential candidates Lesion > 90% sensitivity 60-300 FP/image > 80% sensitivity 2-5 FP/image 10
  • 11. Key Machine Learning Challenges Challenge Solutions 1. Training/testing data is correlated Multiple instance learning batch classification 2. Evaluation metric is CAD specific Multiple instance learning 3. Run-time Constraints Cascaded classifiers 4. No objective ground truth EM crowd-sourcing algorithm 5. Data shortage Multi-task learning 6. Sensitivity for specific FP range Maximize (partial) AUC 11
  • 12. The breakdown of assumptions region on a mammogram lesion not a lesion Traditional classification algorithms Neural networks Support Vector Machines Logistic Regression …. Often violated in CAD Make two key assumptions (1) Training samples are independent (2) Maximize classification accuracy over all candidates 12
  • 13. Violation 1: Training examples are correlated Candidate generation produces a lot of spatially adjacent candidates. Hence there are high level of correlations among candidates. Correlations also common across different images/detector type/hospitals. 13
  • 14. Violation 2: Candidate level accuracy not important Several candidates from the CG point to the same lesion in the breast. Lesion is detected if at least one of them is detected. It is fine if we miss adjacent overlapping candidates. Hence CAD system accuracy is measured in terms of per lesion/image/patient sensitivity. So why not optimize the performance metric we use to evaluate our system? Most algorithms maximize classification accuracy. Try to classify every candidate correctly. 14
  • 15. Solution 1: Multiple Instance Learning Fung, et al. 2006, Bi, et al. 2007, Raykar et al. 2008, Krishnapuram, et al. 2008. How do we acquire labels ? Candidates which overlap with the radiologist mark is a positive. Rest are negative. 1 1 0 0 0 0 Single Instance Learning 1 0 0 0 0 Multiple Instance Learning Classify every candidate correctly Positive Bag Classify at-least one candidate correctly 15
  • 16. Simple Illustration Single instance learning: •Reject as many negative candidates as possible. •Detect as many positives as possible. Multiple Instance Learning Single Instance Learning Multiple instance learning: Reject as many negative candidates as possible. Detect at-least one candidate in a positive bag. Accounts for correlation during trainingAccounts for correlation during training 16
  • 17. Multiple Instance Learning Algorithm Details Logistic Regression model feature vector weight vector 17
  • 19. Solution part 2: Batch Classification Vural et al., 2009 Accounts for correlation during testingAccounts for correlation during testing Change the decision boundary during test time.Change the decision boundary during test time. 19
  • 20. Batch Classification Model 20 Traditional, one-location at a time classification: Modeling correlations using location (spatial adjacency) as side information: Gaussian prior for latent variable that determines classification Noise model for one-location-at-a-time classification primitive Posterior: combining location side information and classification features Combined Gaussian CRF classification using location as side information:
  • 21. Example results 21 Pulmonary Embolism Colon Cancer (polyps)
  • 22. Run-time vs Accuracy Tradeoff: Soft Cascaded Classifiers Raykar et al, 2010 + −− − Stage 1 Stage 2 Stage 3 increasing predictive power increasing acquisition cost increasing predictive power increasing acquisition cost 22
  • 23. For a given instance Cost Stage 1 Stage 2 Stage 3 Modeling the expected cost + −− − Stage 1 Stage 2 Stage 3 We optimize using cyclic coordinate descent 24
  • 24. Some properties of soft cascades • Sequential ordering of the cascade is not important. • Order definitely matters during testing. • A device to ease the training process. • We use a maximum a-posteriori (MAP) estimate with Bayesian priors on weights. 25
  • 25. Test set FROC Curves 26
  • 26. Subjective Ground truth Raykar et al. 2009 Lesion ID Radiologist 1 Radiologist 2 Radiologist 3 Radiologist 4 Truth Unknown 12 0 0 0 0 x 32 0 1 0 0 x 10 1 1 1 1 x 11 0 0 1 1 x 24 0 1 1 1 x 23 0 0 1 0 x 40 0 1 1 0 x Each radiologist is asked to annotate whether a lesion is malignant (1) or not (0). In practice there is a substantial amount of disagreement. We have no knowledge of the actual golden ground truth. Getting absolute ground truth (e.g. biopsy) can be expensive. We proposed an EM algorithm to simultaneously learn the ground truth and the classifier. We proposed an EM algorithm to simultaneously learn the ground truth and the classifier. 27
  • 27. How to judge an expert/annotator ? A radiologist with two coins True Label Label assigned by expert j 28
  • 28. EM algorithm for jointly estimating radiologist accuracy and classifier If I knew the true label I can estimate sensitivity /specificity of each expert, and also estimate classifier w: If I knew how good each expert is I can estimate the true label Iterate till convergence Initialize using majority-voting 29
  • 30. Data Shortage: Multi-task Learning Raykar et al. 2008. Lung Nodule Ground Glass Object 31
  • 32. Maximizing AUC Raykar et al. 2008 + + + + + + + - - - - - - 33
  • 33. Generalization of AUC maximization: Learning Preference Relationships / Ranking From these two we can get a set of pairwise preference relations 34
  • 34. MAP Estimator is expensive to compute Discrete optimization problem Original task: Choose w to maximize 35 Log-likelihood: Prior:
  • 35. Accelerating the core computational primitive Weighted summation of erfc() functions: 36 Truncated Beauliu’s series admits decomposition & regrouping:
  • 36. 37 Dataset Direct Fast 1 1736 secs. 2 secs. 2 6731 secs. 19 secs. 3 2557 secs. 4 secs. 4 * 47 secs. Direct vs Fast – Time taken
  • 37. 38 Sample result Dataset 8 Time taken (secs) WMW RankNCG direct 333 0.984 RankNCG fast 3 0.984 RankNet linear 1264 0.951 RankNet two layer 2464 0.765 RankSVM linear 34 0.984 RankSVM quadratic 1332 0.996 RankBoost 6 0.958
  • 38. Key Machine Learning Challenges Challenge Solutions 1. Training/testing data is correlated Multiple instance learning batch classification 2. Evaluation metric is CAD specific Multiple instance learning 3. Run-time Constraints Cascaded classifiers 4. No objective ground truth EM crowd-sourcing algorithm 5. Data shortage Multi-task learning 6. Sensitivity for specific FP range Maximize (partial) AUC 39
  • 39. Clinical Impact • Measure the improvement in performance of a radiologist with the Siemens CAD software. • Several independent clinical studies/trials have been conducted by our collaborators worldwide. • NOTE: CAD is deployed in second reader mode in these studies. 40
  • 40. Lung CAD 1. FDA clinical validation study with17 radiologists,196 cases from 4 hospitals. Average reader AUC increased by 0.048 (p<0.001) because of CAD. 2. Study at NYU by Godoy et al. 2008 3. New version also helps detect different kinds of nodules. Mean sensitivity without CAD Mean sensitivity with CAD Increase in sensitivity Solid Nodules 60% 85% 15 % Part-solid Nodules 80% 95% 15% Ground Glass Opacities 75% 86% 11% Sensitivity without CAD Sensitivity with CAD Increase in sensitivity Reader 1 56.2 % 66.0 % 9.8 % Reader 2 79.2 % 89.8 % 10.6 % 41
  • 41. Colon PEV Colon PEV (Polyp Enhanced Viewer) was evaluated by Baker, et al. 2007 – Study with seven less-experienced readers – Without PEV average sensitivity was 0.810 – With PEV average sensitivity was 0.908 – A 9.8% increase in average sensitivity (p=0.0152). 42
  • 42. PE CAD Das et al. 2008 conducted a study with 43 patients to asses the sensitivity of detection of pulmonary embolism. . Sensitivity without CAD Sensitivity with CAD Increase in sensitivity Reader 1 87% 98% 11% Reader 2 82% 93% 11% Reader 3 77% 92% 15% 43
  • 43. Long-term career growth = Increased Impact (Customers, Share holders, Society) 44 Themes relevant for ML practitioners
  • 44. We increase our impact by growing along 3 axes: 1.Product 2.Technology 3.Team 45 Themes relevant for ML practitioners
  • 45. Themes relevant for ML practitioners 1. Product: Domain knowledge is very important. We need to design or utilize algorithms to optimize the metrics relevant to our customers. – CAD example: Collaboration with radiologists is crucial in eliciting the domain knowledge about cancer, and also too understand their usage habits, what they care about, etc. change – For example accuracy metric was different in our product 2. Technology: Need careful analysis of the assumptions behind off-the-shelf data-mining algorithms. – CAD example: most of this talk covered these technical / mathematical assumptions 46
  • 46. Themes relevant for ML practitioners 3. Team: By truly integrating with the entire product team we can optimize the entire system and achieve much bigger impact. It is important for us to design or contribute to the infrastructure. • End-to-end automated system optimization: e.g. automated optimization of parameter settings for image processing algorithms • Re-usable tools e.g. features, deployable large-scale learning algorithms. • Analysis/modeling to support deployment goals: e.g. reduce memory & computational footprint • Version control for Data/Ground-truth, Automated tests (probabilistic!) etc • Visualization tools for inputs or failure modes for other team members : eg cluster failures in feature space, visualize prototypical failures as images to discover clinical or image processing insights about failures • Analysis of technical debt associated with ML 47
  • 47. Technical Debt associated with ML • Entanglement: Changing Anything Changes Everything (CACE) • Hidden causal-feedback loops: eg changing CTR with ML alters user clicks & thus the data generating distributions • Undeclared consumers of intermediate stages/features etc • Unstable data dependencies: need versioned copies of signals! • Legacy features, epsilon features etc • Correction cascades are a terrible idea! • System level glue code / pipeline jungles • Dead experimental code paths eg AB test • Configuration debt • Etc… 48
  • 48. Acknowledgements Dr. D. Naidich, MD, of New York University Dr. M. E. Baker, MD, of the Cleveland Clinic Foundation Dr. M. Das, MD, of the University of Aachen Dr. U. J. Schoepf, MD, of the Medical University of South Carolina Dr. Peter Herzog, MD, of Klinikum Grossharden, Munich. Siemens: Ingo Schmuecking, MD, Alok Gupta, Bharat Rao, Murat Dundar, Jinbo Bi, Harald Steck, Stefan Niculescu, Romer Rosales, Shipeng Yu, Glenn Fung, Vikas Raykar, Sangmin Park, Gerardo Valadez, Jonathan Stoeckel, Anna Jerebko, Matthias Wolf, and the entire SISL team. 49
  • 49. Thank You ! | Questions ? 50
  • 54. How to judge an annotator ? Gold Standard Novice Luminary Dart throwing monkey Evil Dumb expert Good experts have high sensitivity and high specificity. 55
  • 55. 1. Beauliu’s series expansion 57 Retain only the first few terms contributing to the desired accuracy.
  • 56. 2. Use truncated series 58
  • 57. 3. Regrouping Does not depend on y. Can be computed in O(pN) Once A and B are precomputed Can be computed in O(pM) Reduced from O(MN) to O(p(M+N)) 59
  • 58. 4. Other tricks • Rapid saturation of the erfc function. • Space subdivision • Choosing the parameters to achieve the error bound • See the technical report 60
  • 59. 61 Sample result Dataset 8 Time taken (secs) WMW RankNCG direct 333 0.984 RankNCG fast 3 0.984 RankNet linear 1264 0.951 RankNet two layer 2464 0.765 RankSVM linear 34 0.984 RankSVM quadratic 1332 0.996 RankBoost 6 0.958
  • 60. 62 Application to collaborative filtering • Predict movie ratings for a user based on the ratings provided by other users. • MovieLens dataset (www.grouplens.org) • 1 million ratings (1-5) • 3592 movies • 6040 users • Feature vector for each movie – rating provided by d other users

Editor's Notes

  1. The proposed workflow is to use CAD as a second reader (i.e., in conjunction with the radiologist) – the radiologist first performs an interpretation of the image as usual, and then runs the CAD algorithm (typically a set of image processing algorithms followed by a classifier), and highlights structures identified by the CAD algorithm as being of interest to the radiologist. The radiologist examines these marks and concludes the interpretation.
  2. Lung cancer is the most commonly diagnosed cancer worldwide, accounting for 1.2 million new cases annually.
  3. A major clinical challenge is to quickly and correctly diagnose patients with PE and then send them on to treatment. A prompt and accurate diagnosis of PE is the key to survival. We developed a fast yet effective approach for computer aided detection of pulmonary embolism (PE) in CT pulmonary angiography (CTPA). Our research has been motivated by the lethal, emergent nature of PE and the limited accuracy and efficiency of manual interpretation of CTPA studies.
  4. In the MIL framework the training set consists of bags. A bag contains many instances. All the instances in a bag share the same bag-level label. A bag is labeled positive if it contains at-least one positive instance. A negative bag means that all instances in the bag are negative. The goal is to learn a classification function that can predict the labels of unseen instances and/or bags. Figure 9 illustrates that MIL can yield very different classifiers over the conventional single instance learning. The single instance classifier on the left is trying to reject as many negative candidates as possible and detect as many positives as possible. The MIL classifier on the right tries to detect at-least one candidate in a positive bag and reject as many negative candidates as possible.
  5. Most classification systems assume that the data used to train and test the classifier are independently drawn from an identical underlying distribution. For example, samples are classified one at a time in a support vector machine (SVM), thus the classification of a particular test sample does not depend on the features from any other test samples. Nevertheless, this assumption is commonly violated in many real-life problems where sub-groups of samples have a high degree of correlation amongst both their features and their labels. Due to spatial adjacency of the regions identified by a candidate generator, both the features and the class labels of several adjacent candidates can be highly correlated during training and testing. We proposed batch-wise classification algorithms to explicitly account for correlations (Vural, et al. 2009).