Ground truth generation in medical imaging: a crowdsourcing-based iterative approach           Antonio Foncubierta-Rodrígu...
Introduction•   Medical image production grows rapidly in    scientific and clinical environment•   If images are easily a...
Motivation and dataset•   ImageCLEF dataset:     •   Over 300,000 images from open access         biomedical literature   ...
Conventional                                  Diagnostic                                                     Ultrasound  T...
Image examplesCOMPOUND            GENERIC        GENERIC                       Table          Figures/ChartsDIAGNOSTIC    ...
Iterative workflow•     Avoid manual classification as much as possible•     Iterative approach:    1. Create a small trai...
Crowdsourcing in medical imaging•   Crowdsourcing reduces time and cost for    annotation•   Medical image annotation is o...
User Groups•   Experiments were performed with three different    user groups:                        1                   ...
Crowdsourcing platform•   Crowdflower platform was chosen for the    experiments     •   Integrated interface for job desi...
Experiment: Initial training set generation•   Initial training set    generation     •    1,000 images     •    Limited t...
Experiment: Automated classification verification•   300,000 images•   Binary task: approve    or refuse classification•  ...
Experiments: trustability•   Trustability experiments     •   Aim: compare user groups expected accuracy     •   3,415 ima...
Results: user self assessment•   Users were required to answer how sure they    were of their choice•   Allows discarding ...
Results: tasks completed per userOpen crowdsourcing    Internal interface
Results: MD and known experts•   Agreement     •   Broad category: 88.76 %     •   Diagnostic subcategory: 97.40 %     •  ...
Results: MD and Crowd•   Agreement     •   Broad category: 85.53 %     •   Diagnostic subcategory: 85.15 %     •   Microsc...
Results: Automatic classification verification•   Verification by experts•   1,000 images were verified•   Agreement among...
Conclusions•   Iterative approach reduces amount of manual    work     •   Only a small subset is fully manually annotated...
Future work•   Experiments can be redesigned to fit the crowd    behaviour:     •   A smaller number of (good) contributor...
Thanks for your attention!            Antonio Foncubierta-Rodríguez and Henning Müller.   “Ground truth generation in medi...
Upcoming SlideShare
Loading in …5
×

Ground truth generation in medical imaging: a crowdsourcing-based iterative approach

1,019 views
889 views

Published on

As in many other scientific domains where computer–based tools need to be evaluated, also medical imaging often requires the expensive generation of manual ground truth. For some specific tasks medical doctors can be required to guarantee high quality and valid results, whereas other tasks such as the image modality classification described in this text can in sufficiently high quality be performed with simple domain experts.

Published in: Technology
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
1,019
On SlideShare
0
From Embeds
0
Number of Embeds
2
Actions
Shares
0
Downloads
11
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Ground truth generation in medical imaging: a crowdsourcing-based iterative approach

  1. 1. Ground truth generation in medical imaging: a crowdsourcing-based iterative approach Antonio Foncubierta-Rodríguez Henning Müller
  2. 2. Introduction• Medical image production grows rapidly in scientific and clinical environment• If images are easily accessed, they can be reused: • Clinical decision support • Young physician training • Relevant document retrieval for researchers• Modality classification improves retrieval and accessibility of images
  3. 3. Motivation and dataset• ImageCLEF dataset: • Over 300,000 images from open access biomedical literature • Over 30 modalities hierarchically defined• Manual classification is expensive and time consuming• How can this be done in a more efficient way?
  4. 4. Conventional Diagnostic Ultrasound Tables, forms MRI Program Listing CT 2D X-RAYStatistical figures, graphs and Angiography charts PET Radiology System overviews SPECT Infrared Flowcharts Combined GraphGene sequence Skin Gross Organs light VisibleChromatography, Endoscopy Classification Hierarchy gel EEG Chemical structure ECG, EKG Compound waves Signals, EMG Symbol Light Micr. Transmission Electron Math formulae Microscope Micr. Fluorescence Phase Interference contrast Microscopy Dark field Non photos clinical 2D 3D sketches Reconstructions Hand-drawn
  5. 5. Image examplesCOMPOUND GENERIC GENERIC Table Figures/ChartsDIAGNOSTIC DIAGNOSTIC DIAGNOSTIC Radiology Radiology Microscopy Ultrasound CT Fluorescence
  6. 6. Iterative workflow• Avoid manual classification as much as possible• Iterative approach: 1. Create a small training set • Manual classification into 34 categories 2. Use an automatic tool that learns from training set 3. Evaluate results • Manual classification into right/wrong categories 4. Improve training set 5. Repeat from 2
  7. 7. Crowdsourcing in medical imaging• Crowdsourcing reduces time and cost for annotation• Medical image annotation is often done by • Medical doctors • Domain experts• Can unknown users provide valid annotations? • Quality? • Speed?
  8. 8. User Groups• Experiments were performed with three different user groups: 1 MD 18 known experts 2470 contributors from open crowdsourcing
  9. 9. Crowdsourcing platform• Crowdflower platform was chosen for the experiments • Integrated interface for job design • Complete set of management tools: gold creation, internal interface, statistics, raw data • Hub feature: jobs can be announced in several crowdsourcing pools: • Amazon Mturk • Get Paid • Zoombucks
  10. 10. Experiment: Initial training set generation• Initial training set generation • 1,000 images • Limited to 18 known experts • Aim: test the crowdsourcing interface
  11. 11. Experiment: Automated classification verification• 300,000 images• Binary task: approve or refuse classification• Aim: evaluate speed and difficulty of verification task
  12. 12. Experiments: trustability• Trustability experiments • Aim: compare user groups expected accuracy • 3,415 images were classified by the Medical Doctor • The two user groups were required to reclassify images • Random subset of 1,661 images used as gold standard • Feedback on wrong classification was given to the known experts for detecting ambiguities • Feedback on 847 of the gold images was muted for the crowd
  13. 13. Results: user self assessment• Users were required to answer how sure they were of their choice• Allows discarding untrusted data from trusted sources• Confidence rate • Medical doctor: 100 % • Known experts group: 95.04 % • Crowd group: 85.56 %
  14. 14. Results: tasks completed per userOpen crowdsourcing Internal interface
  15. 15. Results: MD and known experts• Agreement • Broad category: 88.76 % • Diagnostic subcategory: 97.40 % • Microscopy: 89.06 % • Radiology: 90.91 % • Reconstructions: 100 % • Visible light photography: 79.41 % • Conventional subcategory: 76 %• Speed • MD: 85 judgements per hour • Experts: 66 judgements per hour and user
  16. 16. Results: MD and Crowd• Agreement • Broad category: 85.53 % • Diagnostic subcategory: 85.15 % • Microscopy: 70.89 % • Radiology: 64.01 % • Reconstructions: 0 % • Visible light photography: 58.89 % • Conventional subcategory: 75.91 %• Speed • MD: 85 judgements per hour • Crowd: 25 judgements per hour and user
  17. 17. Results: Automatic classification verification• Verification by experts• 1,000 images were verified• Agreement among annotators: 100%• Speed: • Users answered twice as fast
  18. 18. Conclusions• Iterative approach reduces amount of manual work • Only a small subset is fully manually annotated • Automatic classification verification is faster• Significant differences among user groups • Faster crowd annotations due to the number of contributors • Poorer crowd annotations in the most specific classes• Comparable performance among user groups • Broad categories
  19. 19. Future work• Experiments can be redesigned to fit the crowd behaviour: • A smaller number of (good) contributors has previously led to CAD-comparable performance • Selection of contributors: • Historical performance on the platform? • Selection/Training phase within the job
  20. 20. Thanks for your attention! Antonio Foncubierta-Rodríguez and Henning Müller. “Ground truth generation in medical imaging: A crowdsourcing based iterative approach”,in Workshop on Crowdsourcing for Multimedia, ACM Multimedia, Nara, Japan, 2012 Contact: antonio.foncubierta@hevs.ch

×