Computer Aided Detection of
Abnormalities in Medical Images
Balaji Krishnapuram
Siemens Medical Solutions USA
1
Outline of the talk
 Computer aided detection/diagnosis (CAD)
 Key challenges / Algorithms
 Clinical impact
 Lessons learnt
Several thousand units of the products described in this paper have been
commercially deployed in hospitals around the world since 2004
2
ML as part of a full system
• In this talk I only focus on some ML Research
• In practice, statistical modeling / ML algorithmic innovation
is < 20% of the effort to get to the full product.
• This was work undertaken by a large and very
talented team
3
Medical Imaging
• Increased resolution has resulted in Data Overload
– Increased total study time
– Increase in data does not always translate to improved diagnosis
• CAD: extract the actionable information from the imaging data
– in order to improve patient care
– while reducing total study time
Digital MammogramDigital Mammogram
CT ScanCT Scan
4
Computer-aided diagnosis/detection (CAD)
• Used as a second reader
• Improves the detection
performance of a
radiologist
• Reduces mistakes related
to misinterpretation
• The principal benefit of
CAD is determined by
carefully measuring the
incremental value of CAD
in normal clinical practice
CAD technologies support the physician by drawing attention to structures in
the image that may require further review.
5
Lung CAD
Identify suspicious regions called nodules (which may be
precursors of cancer) in CT scans of the lung.
6
Colon PEV Polyp Enhanced Viewer
Identify suspicious regions called polyps in CT scans of the
colon.
7
Mammo CAD
Identify abnormal masses/ clusters of micro-calcifications in
digital mammograms.
PECAD and MammoCAD are only sold outside the US.8
PE CAD
Pulmonary Embolism (PE) is a sudden blockage in a pulmonary artery
caused by an embolus that is formed in one part of the body and travels to
the lungs in the bloodstream through the heart.
PECAD and MammoCAD are only sold outside the US.9
Typical CAD architecture
Candidate Generation
Feature Computation
Classification
Image [ X-ray | CT scan | MRI ]
Location of lesions
Focus of the current talk
Potential candidates
Lesion
> 90% sensitivity
60-300 FP/image
> 80% sensitivity
2-5 FP/image
10
Key Machine Learning Challenges
Challenge Solutions
1. Training/testing data is correlated Multiple instance learning
batch classification
2. Evaluation metric is CAD specific Multiple instance learning
3. Run-time Constraints Cascaded classifiers
4. No objective ground truth EM crowd-sourcing algorithm
5. Data shortage Multi-task learning
6. Sensitivity for specific FP range Maximize (partial) AUC
11
The breakdown of assumptions
region on a mammogram lesion not a lesion
Traditional classification algorithms
Neural networks
Support Vector Machines
Logistic Regression ….
Often violated in CAD
Make two key assumptions
(1) Training samples are independent
(2) Maximize classification accuracy over all
candidates
12
Violation 1: Training examples are correlated
Candidate generation produces a lot of spatially adjacent candidates.
Hence there are high level of correlations among candidates.
Correlations also common across different images/detector type/hospitals.
13
Violation 2: Candidate level accuracy not important
Several candidates from the CG point to the same lesion
in the breast.
Lesion is detected if at least one of them is detected.
It is fine if we miss adjacent overlapping candidates.
Hence CAD system accuracy is measured in terms of
per lesion/image/patient sensitivity.
So why not optimize the performance metric we use to
evaluate our system?
Most algorithms maximize classification accuracy.
Try to classify every candidate correctly.
14
Solution 1: Multiple Instance Learning
Fung, et al. 2006, Bi, et al. 2007, Raykar et al. 2008, Krishnapuram, et al. 2008.
How do we acquire labels ?
Candidates which overlap with the radiologist mark is a positive.
Rest are negative.
1
1
0
0
0
0
Single Instance Learning
1
0
0
0
0
Multiple Instance Learning
Classify every candidate correctly
Positive Bag
Classify at-least one candidate correctly
15
Simple Illustration
Single instance learning:
•Reject as many negative candidates as
possible.
•Detect as many positives as possible.
Multiple Instance Learning
Single Instance Learning
Multiple instance learning:
Reject as many negative candidates as possible.
Detect at-least one candidate in a positive bag.
Accounts for correlation during trainingAccounts for correlation during training
16
Multiple Instance Learning Algorithm Details
Logistic Regression model
feature vector
weight vector
17
Example MIL Result
18
Solution part 2: Batch Classification
Vural et al., 2009
Accounts for correlation during testingAccounts for correlation during testing
Change the decision boundary during test time.Change the decision boundary during test time.
19
Batch Classification Model
20
Traditional, one-location at a time classification:
Modeling correlations using location (spatial adjacency) as side information:
Gaussian prior for latent variable that determines classification
Noise model for one-location-at-a-time classification primitive
Posterior: combining location side
information and classification features
Combined Gaussian CRF classification using location as side information:
Example results
21
Pulmonary Embolism Colon Cancer (polyps)
Run-time vs Accuracy Tradeoff: Soft Cascaded
Classifiers Raykar et al, 2010
+
−− −
Stage 1 Stage 2 Stage 3
increasing predictive power
increasing acquisition cost
increasing predictive power
increasing acquisition cost
22
For a given instance Cost
Stage 1
Stage 2
Stage 3
Modeling the expected cost
+
−− −
Stage 1 Stage 2 Stage 3
We optimize using cyclic coordinate descent
24
Some properties of soft cascades
• Sequential ordering of the cascade is not important.
• Order definitely matters during testing.
• A device to ease the training process.
• We use a maximum a-posteriori (MAP) estimate with
Bayesian priors on weights.
25
Test set FROC Curves
26
Subjective Ground truth
Raykar et al. 2009
Lesion ID Radiologist 1 Radiologist 2 Radiologist 3 Radiologist 4 Truth
Unknown
12 0 0 0 0 x
32 0 1 0 0 x
10 1 1 1 1 x
11 0 0 1 1 x
24 0 1 1 1 x
23 0 0 1 0 x
40 0 1 1 0 x
Each radiologist is asked to annotate whether a lesion is malignant (1) or not (0).
In practice there is a substantial
amount of disagreement.
We have no knowledge of the
actual golden ground truth.
Getting absolute ground truth (e.g.
biopsy) can be expensive.
We proposed an EM algorithm to simultaneously
learn the ground truth and the classifier.
We proposed an EM algorithm to simultaneously
learn the ground truth and the classifier.
27
How to judge an expert/annotator ?
A radiologist with two coins
True Label
Label assigned by
expert j
28
EM algorithm for jointly estimating radiologist
accuracy and classifier
If I knew the true label I can estimate sensitivity /specificity of
each expert, and also estimate classifier w:
If I knew how good each expert is I can estimate the true label
Iterate till convergence
Initialize using majority-voting
29
Breast MRI results
30
Data Shortage: Multi-task Learning
Raykar et al. 2008.
Lung Nodule Ground Glass Object
31
Example Multi-Task Learning Result
32
Maximizing AUC
Raykar et al. 2008
+
+
+
+ +
+
+
-
-
-
-
-
-
33
Generalization of AUC maximization: Learning
Preference Relationships / Ranking
From these two we can get a set of
pairwise preference relations
34
MAP Estimator is expensive to compute
Discrete optimization problem
Original task: Choose w to maximize
35
Log-likelihood:
Prior:
Accelerating the core computational primitive
Weighted summation of erfc() functions:
36
Truncated Beauliu’s series admits decomposition & regrouping:
37
Dataset Direct Fast
1 1736 secs. 2 secs.
2 6731 secs. 19 secs.
3 2557 secs. 4 secs.
4 * 47 secs.
Direct vs Fast – Time taken
38
Sample result
Dataset 8
Time taken
(secs)
WMW
RankNCG direct 333 0.984
RankNCG fast 3 0.984
RankNet linear 1264 0.951
RankNet two layer 2464 0.765
RankSVM linear 34 0.984
RankSVM quadratic 1332 0.996
RankBoost 6 0.958
Key Machine Learning Challenges
Challenge Solutions
1. Training/testing data is correlated Multiple instance learning
batch classification
2. Evaluation metric is CAD specific Multiple instance learning
3. Run-time Constraints Cascaded classifiers
4. No objective ground truth EM crowd-sourcing algorithm
5. Data shortage Multi-task learning
6. Sensitivity for specific FP range Maximize (partial) AUC
39
Clinical Impact
• Measure the improvement in performance of a radiologist with
the Siemens CAD software.
• Several independent clinical studies/trials have been conducted
by our collaborators worldwide.
• NOTE: CAD is deployed in second reader mode in these
studies.
40
Lung CAD
1. FDA clinical validation study with17 radiologists,196 cases from
4 hospitals. Average reader AUC increased by 0.048 (p<0.001)
because of CAD.
2. Study at NYU by Godoy et al. 2008
3. New version also helps detect different kinds of nodules.
Mean sensitivity
without CAD
Mean sensitivity with
CAD
Increase in sensitivity
Solid Nodules 60% 85% 15 %
Part-solid Nodules 80% 95% 15%
Ground Glass Opacities 75% 86% 11%
Sensitivity without CAD Sensitivity with CAD Increase in sensitivity
Reader 1 56.2 % 66.0 % 9.8 %
Reader 2 79.2 % 89.8 % 10.6 %
41
Colon PEV
Colon PEV (Polyp Enhanced Viewer) was evaluated by Baker,
et al. 2007
– Study with seven less-experienced readers
– Without PEV average sensitivity was 0.810
– With PEV average sensitivity was 0.908
– A 9.8% increase in average sensitivity (p=0.0152).
42
PE CAD
Das et al. 2008 conducted a study with 43 patients to asses the
sensitivity of detection of pulmonary embolism.
.
Sensitivity
without CAD
Sensitivity
with CAD
Increase in
sensitivity
Reader 1 87% 98% 11%
Reader 2 82% 93% 11%
Reader 3 77% 92% 15%
43
Long-term career growth
=
Increased Impact
(Customers, Share holders, Society)
44
Themes relevant for ML practitioners
We increase our impact by growing along 3 axes:
1.Product
2.Technology
3.Team
45
Themes relevant for ML practitioners
Themes relevant for ML practitioners
1. Product: Domain knowledge is very important. We need to
design or utilize algorithms to optimize the metrics relevant to
our customers.
– CAD example: Collaboration with radiologists is crucial in eliciting the domain knowledge
about cancer, and also too understand their usage habits, what they care about, etc.
change
– For example accuracy metric was different in our product
2. Technology: Need careful analysis of the assumptions behind
off-the-shelf data-mining algorithms.
– CAD example: most of this talk covered these technical / mathematical
assumptions
46
Themes relevant for ML practitioners
3. Team: By truly integrating with the entire product team we can
optimize the entire system and achieve much bigger impact.
It is important for us to design or contribute to the infrastructure.
• End-to-end automated system optimization: e.g. automated optimization of
parameter settings for image processing algorithms
• Re-usable tools e.g. features, deployable large-scale learning algorithms.
• Analysis/modeling to support deployment goals: e.g. reduce memory &
computational footprint
• Version control for Data/Ground-truth, Automated tests (probabilistic!) etc
• Visualization tools for inputs or failure modes for other team members : eg
cluster failures in feature space, visualize prototypical failures as images to
discover clinical or image processing insights about failures
• Analysis of technical debt associated with ML
47
Technical Debt associated with ML
• Entanglement: Changing Anything Changes Everything (CACE)
• Hidden causal-feedback loops: eg changing CTR with ML alters user
clicks & thus the data generating distributions
• Undeclared consumers of intermediate stages/features etc
• Unstable data dependencies: need versioned copies of signals!
• Legacy features, epsilon features etc
• Correction cascades are a terrible idea!
• System level glue code / pipeline jungles
• Dead experimental code paths eg AB test
• Configuration debt
• Etc…
48
Acknowledgements
Dr. D. Naidich, MD, of New York University
Dr. M. E. Baker, MD, of the Cleveland Clinic Foundation
Dr. M. Das, MD, of the University of Aachen
Dr. U. J. Schoepf, MD, of the Medical University of South Carolina
Dr. Peter Herzog, MD, of Klinikum Grossharden, Munich.
Siemens:
Ingo Schmuecking, MD, Alok Gupta, Bharat Rao, Murat Dundar, Jinbo Bi,
Harald Steck, Stefan Niculescu, Romer Rosales, Shipeng Yu, Glenn Fung,
Vikas Raykar, Sangmin Park, Gerardo Valadez, Jonathan Stoeckel, Anna
Jerebko, Matthias Wolf, and the entire SISL team.
49
Thank You ! | Questions ?
50
MIL
51
Maximum Likelihood Estimator
52
Sparse Bayesian Prior
53
Feature Selection
54
How to judge an annotator ?
Gold Standard
Novice
Luminary
Dart throwing
monkey
Evil
Dumb expert
Good experts have high sensitivity and high specificity.
55
1. Beauliu’s series expansion
57
Retain only the first few terms
contributing to the desired
accuracy.
2. Use truncated series
58
3. Regrouping
Does not depend on y.
Can be computed in O(pN)
Once A and B are precomputed
Can be computed in O(pM)
Reduced from O(MN) to O(p(M+N)) 59
4. Other tricks
• Rapid saturation of the erfc function.
• Space subdivision
• Choosing the parameters to achieve
the error bound
• See the technical report
60
61
Sample result
Dataset 8
Time taken
(secs)
WMW
RankNCG direct 333 0.984
RankNCG fast 3 0.984
RankNet linear 1264 0.951
RankNet two layer 2464 0.765
RankSVM linear 34 0.984
RankSVM quadratic 1332 0.996
RankBoost 6 0.958
62
Application to collaborative filtering
• Predict movie ratings for a user based on the
ratings provided by other users.
• MovieLens dataset (www.grouplens.org)
• 1 million ratings (1-5)
• 3592 movies
• 6040 users
• Feature vector for each movie – rating provided
by d other users
63
Collaborative filtering results
64
Collaborative filtering results

CAD v2

  • 1.
    Computer Aided Detectionof Abnormalities in Medical Images Balaji Krishnapuram Siemens Medical Solutions USA 1
  • 2.
    Outline of thetalk  Computer aided detection/diagnosis (CAD)  Key challenges / Algorithms  Clinical impact  Lessons learnt Several thousand units of the products described in this paper have been commercially deployed in hospitals around the world since 2004 2
  • 3.
    ML as partof a full system • In this talk I only focus on some ML Research • In practice, statistical modeling / ML algorithmic innovation is < 20% of the effort to get to the full product. • This was work undertaken by a large and very talented team 3
  • 4.
    Medical Imaging • Increasedresolution has resulted in Data Overload – Increased total study time – Increase in data does not always translate to improved diagnosis • CAD: extract the actionable information from the imaging data – in order to improve patient care – while reducing total study time Digital MammogramDigital Mammogram CT ScanCT Scan 4
  • 5.
    Computer-aided diagnosis/detection (CAD) •Used as a second reader • Improves the detection performance of a radiologist • Reduces mistakes related to misinterpretation • The principal benefit of CAD is determined by carefully measuring the incremental value of CAD in normal clinical practice CAD technologies support the physician by drawing attention to structures in the image that may require further review. 5
  • 6.
    Lung CAD Identify suspiciousregions called nodules (which may be precursors of cancer) in CT scans of the lung. 6
  • 7.
    Colon PEV PolypEnhanced Viewer Identify suspicious regions called polyps in CT scans of the colon. 7
  • 8.
    Mammo CAD Identify abnormalmasses/ clusters of micro-calcifications in digital mammograms. PECAD and MammoCAD are only sold outside the US.8
  • 9.
    PE CAD Pulmonary Embolism(PE) is a sudden blockage in a pulmonary artery caused by an embolus that is formed in one part of the body and travels to the lungs in the bloodstream through the heart. PECAD and MammoCAD are only sold outside the US.9
  • 10.
    Typical CAD architecture CandidateGeneration Feature Computation Classification Image [ X-ray | CT scan | MRI ] Location of lesions Focus of the current talk Potential candidates Lesion > 90% sensitivity 60-300 FP/image > 80% sensitivity 2-5 FP/image 10
  • 11.
    Key Machine LearningChallenges Challenge Solutions 1. Training/testing data is correlated Multiple instance learning batch classification 2. Evaluation metric is CAD specific Multiple instance learning 3. Run-time Constraints Cascaded classifiers 4. No objective ground truth EM crowd-sourcing algorithm 5. Data shortage Multi-task learning 6. Sensitivity for specific FP range Maximize (partial) AUC 11
  • 12.
    The breakdown ofassumptions region on a mammogram lesion not a lesion Traditional classification algorithms Neural networks Support Vector Machines Logistic Regression …. Often violated in CAD Make two key assumptions (1) Training samples are independent (2) Maximize classification accuracy over all candidates 12
  • 13.
    Violation 1: Trainingexamples are correlated Candidate generation produces a lot of spatially adjacent candidates. Hence there are high level of correlations among candidates. Correlations also common across different images/detector type/hospitals. 13
  • 14.
    Violation 2: Candidatelevel accuracy not important Several candidates from the CG point to the same lesion in the breast. Lesion is detected if at least one of them is detected. It is fine if we miss adjacent overlapping candidates. Hence CAD system accuracy is measured in terms of per lesion/image/patient sensitivity. So why not optimize the performance metric we use to evaluate our system? Most algorithms maximize classification accuracy. Try to classify every candidate correctly. 14
  • 15.
    Solution 1: MultipleInstance Learning Fung, et al. 2006, Bi, et al. 2007, Raykar et al. 2008, Krishnapuram, et al. 2008. How do we acquire labels ? Candidates which overlap with the radiologist mark is a positive. Rest are negative. 1 1 0 0 0 0 Single Instance Learning 1 0 0 0 0 Multiple Instance Learning Classify every candidate correctly Positive Bag Classify at-least one candidate correctly 15
  • 16.
    Simple Illustration Single instancelearning: •Reject as many negative candidates as possible. •Detect as many positives as possible. Multiple Instance Learning Single Instance Learning Multiple instance learning: Reject as many negative candidates as possible. Detect at-least one candidate in a positive bag. Accounts for correlation during trainingAccounts for correlation during training 16
  • 17.
    Multiple Instance LearningAlgorithm Details Logistic Regression model feature vector weight vector 17
  • 18.
  • 19.
    Solution part 2:Batch Classification Vural et al., 2009 Accounts for correlation during testingAccounts for correlation during testing Change the decision boundary during test time.Change the decision boundary during test time. 19
  • 20.
    Batch Classification Model 20 Traditional,one-location at a time classification: Modeling correlations using location (spatial adjacency) as side information: Gaussian prior for latent variable that determines classification Noise model for one-location-at-a-time classification primitive Posterior: combining location side information and classification features Combined Gaussian CRF classification using location as side information:
  • 21.
  • 22.
    Run-time vs AccuracyTradeoff: Soft Cascaded Classifiers Raykar et al, 2010 + −− − Stage 1 Stage 2 Stage 3 increasing predictive power increasing acquisition cost increasing predictive power increasing acquisition cost 22
  • 23.
    For a giveninstance Cost Stage 1 Stage 2 Stage 3 Modeling the expected cost + −− − Stage 1 Stage 2 Stage 3 We optimize using cyclic coordinate descent 24
  • 24.
    Some properties ofsoft cascades • Sequential ordering of the cascade is not important. • Order definitely matters during testing. • A device to ease the training process. • We use a maximum a-posteriori (MAP) estimate with Bayesian priors on weights. 25
  • 25.
    Test set FROCCurves 26
  • 26.
    Subjective Ground truth Raykaret al. 2009 Lesion ID Radiologist 1 Radiologist 2 Radiologist 3 Radiologist 4 Truth Unknown 12 0 0 0 0 x 32 0 1 0 0 x 10 1 1 1 1 x 11 0 0 1 1 x 24 0 1 1 1 x 23 0 0 1 0 x 40 0 1 1 0 x Each radiologist is asked to annotate whether a lesion is malignant (1) or not (0). In practice there is a substantial amount of disagreement. We have no knowledge of the actual golden ground truth. Getting absolute ground truth (e.g. biopsy) can be expensive. We proposed an EM algorithm to simultaneously learn the ground truth and the classifier. We proposed an EM algorithm to simultaneously learn the ground truth and the classifier. 27
  • 27.
    How to judgean expert/annotator ? A radiologist with two coins True Label Label assigned by expert j 28
  • 28.
    EM algorithm forjointly estimating radiologist accuracy and classifier If I knew the true label I can estimate sensitivity /specificity of each expert, and also estimate classifier w: If I knew how good each expert is I can estimate the true label Iterate till convergence Initialize using majority-voting 29
  • 29.
  • 30.
    Data Shortage: Multi-taskLearning Raykar et al. 2008. Lung Nodule Ground Glass Object 31
  • 31.
  • 32.
    Maximizing AUC Raykar etal. 2008 + + + + + + + - - - - - - 33
  • 33.
    Generalization of AUCmaximization: Learning Preference Relationships / Ranking From these two we can get a set of pairwise preference relations 34
  • 34.
    MAP Estimator isexpensive to compute Discrete optimization problem Original task: Choose w to maximize 35 Log-likelihood: Prior:
  • 35.
    Accelerating the corecomputational primitive Weighted summation of erfc() functions: 36 Truncated Beauliu’s series admits decomposition & regrouping:
  • 36.
    37 Dataset Direct Fast 11736 secs. 2 secs. 2 6731 secs. 19 secs. 3 2557 secs. 4 secs. 4 * 47 secs. Direct vs Fast – Time taken
  • 37.
    38 Sample result Dataset 8 Timetaken (secs) WMW RankNCG direct 333 0.984 RankNCG fast 3 0.984 RankNet linear 1264 0.951 RankNet two layer 2464 0.765 RankSVM linear 34 0.984 RankSVM quadratic 1332 0.996 RankBoost 6 0.958
  • 38.
    Key Machine LearningChallenges Challenge Solutions 1. Training/testing data is correlated Multiple instance learning batch classification 2. Evaluation metric is CAD specific Multiple instance learning 3. Run-time Constraints Cascaded classifiers 4. No objective ground truth EM crowd-sourcing algorithm 5. Data shortage Multi-task learning 6. Sensitivity for specific FP range Maximize (partial) AUC 39
  • 39.
    Clinical Impact • Measurethe improvement in performance of a radiologist with the Siemens CAD software. • Several independent clinical studies/trials have been conducted by our collaborators worldwide. • NOTE: CAD is deployed in second reader mode in these studies. 40
  • 40.
    Lung CAD 1. FDAclinical validation study with17 radiologists,196 cases from 4 hospitals. Average reader AUC increased by 0.048 (p<0.001) because of CAD. 2. Study at NYU by Godoy et al. 2008 3. New version also helps detect different kinds of nodules. Mean sensitivity without CAD Mean sensitivity with CAD Increase in sensitivity Solid Nodules 60% 85% 15 % Part-solid Nodules 80% 95% 15% Ground Glass Opacities 75% 86% 11% Sensitivity without CAD Sensitivity with CAD Increase in sensitivity Reader 1 56.2 % 66.0 % 9.8 % Reader 2 79.2 % 89.8 % 10.6 % 41
  • 41.
    Colon PEV Colon PEV(Polyp Enhanced Viewer) was evaluated by Baker, et al. 2007 – Study with seven less-experienced readers – Without PEV average sensitivity was 0.810 – With PEV average sensitivity was 0.908 – A 9.8% increase in average sensitivity (p=0.0152). 42
  • 42.
    PE CAD Das etal. 2008 conducted a study with 43 patients to asses the sensitivity of detection of pulmonary embolism. . Sensitivity without CAD Sensitivity with CAD Increase in sensitivity Reader 1 87% 98% 11% Reader 2 82% 93% 11% Reader 3 77% 92% 15% 43
  • 43.
    Long-term career growth = IncreasedImpact (Customers, Share holders, Society) 44 Themes relevant for ML practitioners
  • 44.
    We increase ourimpact by growing along 3 axes: 1.Product 2.Technology 3.Team 45 Themes relevant for ML practitioners
  • 45.
    Themes relevant forML practitioners 1. Product: Domain knowledge is very important. We need to design or utilize algorithms to optimize the metrics relevant to our customers. – CAD example: Collaboration with radiologists is crucial in eliciting the domain knowledge about cancer, and also too understand their usage habits, what they care about, etc. change – For example accuracy metric was different in our product 2. Technology: Need careful analysis of the assumptions behind off-the-shelf data-mining algorithms. – CAD example: most of this talk covered these technical / mathematical assumptions 46
  • 46.
    Themes relevant forML practitioners 3. Team: By truly integrating with the entire product team we can optimize the entire system and achieve much bigger impact. It is important for us to design or contribute to the infrastructure. • End-to-end automated system optimization: e.g. automated optimization of parameter settings for image processing algorithms • Re-usable tools e.g. features, deployable large-scale learning algorithms. • Analysis/modeling to support deployment goals: e.g. reduce memory & computational footprint • Version control for Data/Ground-truth, Automated tests (probabilistic!) etc • Visualization tools for inputs or failure modes for other team members : eg cluster failures in feature space, visualize prototypical failures as images to discover clinical or image processing insights about failures • Analysis of technical debt associated with ML 47
  • 47.
    Technical Debt associatedwith ML • Entanglement: Changing Anything Changes Everything (CACE) • Hidden causal-feedback loops: eg changing CTR with ML alters user clicks & thus the data generating distributions • Undeclared consumers of intermediate stages/features etc • Unstable data dependencies: need versioned copies of signals! • Legacy features, epsilon features etc • Correction cascades are a terrible idea! • System level glue code / pipeline jungles • Dead experimental code paths eg AB test • Configuration debt • Etc… 48
  • 48.
    Acknowledgements Dr. D. Naidich,MD, of New York University Dr. M. E. Baker, MD, of the Cleveland Clinic Foundation Dr. M. Das, MD, of the University of Aachen Dr. U. J. Schoepf, MD, of the Medical University of South Carolina Dr. Peter Herzog, MD, of Klinikum Grossharden, Munich. Siemens: Ingo Schmuecking, MD, Alok Gupta, Bharat Rao, Murat Dundar, Jinbo Bi, Harald Steck, Stefan Niculescu, Romer Rosales, Shipeng Yu, Glenn Fung, Vikas Raykar, Sangmin Park, Gerardo Valadez, Jonathan Stoeckel, Anna Jerebko, Matthias Wolf, and the entire SISL team. 49
  • 49.
    Thank You !| Questions ? 50
  • 50.
  • 51.
  • 52.
  • 53.
  • 54.
    How to judgean annotator ? Gold Standard Novice Luminary Dart throwing monkey Evil Dumb expert Good experts have high sensitivity and high specificity. 55
  • 55.
    1. Beauliu’s seriesexpansion 57 Retain only the first few terms contributing to the desired accuracy.
  • 56.
  • 57.
    3. Regrouping Does notdepend on y. Can be computed in O(pN) Once A and B are precomputed Can be computed in O(pM) Reduced from O(MN) to O(p(M+N)) 59
  • 58.
    4. Other tricks •Rapid saturation of the erfc function. • Space subdivision • Choosing the parameters to achieve the error bound • See the technical report 60
  • 59.
    61 Sample result Dataset 8 Timetaken (secs) WMW RankNCG direct 333 0.984 RankNCG fast 3 0.984 RankNet linear 1264 0.951 RankNet two layer 2464 0.765 RankSVM linear 34 0.984 RankSVM quadratic 1332 0.996 RankBoost 6 0.958
  • 60.
    62 Application to collaborativefiltering • Predict movie ratings for a user based on the ratings provided by other users. • MovieLens dataset (www.grouplens.org) • 1 million ratings (1-5) • 3592 movies • 6040 users • Feature vector for each movie – rating provided by d other users
  • 61.
  • 62.

Editor's Notes

  • #6 The proposed workflow is to use CAD as a second reader (i.e., in conjunction with the radiologist) – the radiologist first performs an interpretation of the image as usual, and then runs the CAD algorithm (typically a set of image processing algorithms followed by a classifier), and highlights structures identified by the CAD algorithm as being of interest to the radiologist. The radiologist examines these marks and concludes the interpretation.
  • #7 Lung cancer is the most commonly diagnosed cancer worldwide, accounting for 1.2 million new cases annually.
  • #10 A major clinical challenge is to quickly and correctly diagnose patients with PE and then send them on to treatment. A prompt and accurate diagnosis of PE is the key to survival. We developed a fast yet effective approach for computer aided detection of pulmonary embolism (PE) in CT pulmonary angiography (CTPA). Our research has been motivated by the lethal, emergent nature of PE and the limited accuracy and efficiency of manual interpretation of CTPA studies.
  • #16 In the MIL framework the training set consists of bags. A bag contains many instances. All the instances in a bag share the same bag-level label. A bag is labeled positive if it contains at-least one positive instance. A negative bag means that all instances in the bag are negative. The goal is to learn a classification function that can predict the labels of unseen instances and/or bags. Figure 9 illustrates that MIL can yield very different classifiers over the conventional single instance learning. The single instance classifier on the left is trying to reject as many negative candidates as possible and detect as many positives as possible. The MIL classifier on the right tries to detect at-least one candidate in a positive bag and reject as many negative candidates as possible.
  • #20 Most classification systems assume that the data used to train and test the classifier are independently drawn from an identical underlying distribution. For example, samples are classified one at a time in a support vector machine (SVM), thus the classification of a particular test sample does not depend on the features from any other test samples. Nevertheless, this assumption is commonly violated in many real-life problems where sub-groups of samples have a high degree of correlation amongst both their features and their labels. Due to spatial adjacency of the regions identified by a candidate generator, both the features and the class labels of several adjacent candidates can be highly correlated during training and testing. We proposed batch-wise classification algorithms to explicitly account for correlations (Vural, et al. 2009).