Ground truth generation in medical imaging: a crowdsourcing-based iterative approach
Upcoming SlideShare
Loading in...5
×
 

Ground truth generation in medical imaging: a crowdsourcing-based iterative approach

on

  • 1,014 views

As in many other scientific domains where computer–based tools need to be evaluated, also medical imaging often requires the expensive generation of manual ground truth. For some specific tasks ...

As in many other scientific domains where computer–based tools need to be evaluated, also medical imaging often requires the expensive generation of manual ground truth. For some specific tasks medical doctors can be required to guarantee high quality and valid results, whereas other tasks such as the image modality classification described in this text can in sufficiently high quality be performed with simple domain experts.

Statistics

Views

Total Views
1,014
Views on SlideShare
978
Embed Views
36

Actions

Likes
0
Downloads
3
Comments
0

3 Embeds 36

http://iig.hevs.ch 32
https://twitter.com 3
http://twitter.com 1

Accessibility

Categories

Upload Details

Uploaded via as Adobe PDF

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

Ground truth generation in medical imaging: a crowdsourcing-based iterative approach Ground truth generation in medical imaging: a crowdsourcing-based iterative approach Presentation Transcript

  • Ground truth generation in medical imaging: a crowdsourcing-based iterative approach Antonio Foncubierta-Rodríguez Henning Müller
  • Introduction• Medical image production grows rapidly in scientific and clinical environment• If images are easily accessed, they can be reused: • Clinical decision support • Young physician training • Relevant document retrieval for researchers• Modality classification improves retrieval and accessibility of images
  • Motivation and dataset• ImageCLEF dataset: • Over 300,000 images from open access biomedical literature • Over 30 modalities hierarchically defined• Manual classification is expensive and time consuming• How can this be done in a more efficient way?
  • Conventional Diagnostic Ultrasound Tables, forms MRI Program Listing CT 2D X-RAYStatistical figures, graphs and Angiography charts PET Radiology System overviews SPECT Infrared Flowcharts Combined GraphGene sequence Skin Gross Organs light VisibleChromatography, Endoscopy Classification Hierarchy gel EEG Chemical structure ECG, EKG Compound waves Signals, EMG Symbol Light Micr. Transmission Electron Math formulae Microscope Micr. Fluorescence Phase Interference contrast Microscopy Dark field Non photos clinical 2D 3D sketches Reconstructions Hand-drawn
  • Image examplesCOMPOUND GENERIC GENERIC Table Figures/ChartsDIAGNOSTIC DIAGNOSTIC DIAGNOSTIC Radiology Radiology Microscopy Ultrasound CT Fluorescence
  • Iterative workflow• Avoid manual classification as much as possible• Iterative approach: 1. Create a small training set • Manual classification into 34 categories 2. Use an automatic tool that learns from training set 3. Evaluate results • Manual classification into right/wrong categories 4. Improve training set 5. Repeat from 2
  • Crowdsourcing in medical imaging• Crowdsourcing reduces time and cost for annotation• Medical image annotation is often done by • Medical doctors • Domain experts• Can unknown users provide valid annotations? • Quality? • Speed?
  • User Groups• Experiments were performed with three different user groups: 1 MD 18 known experts 2470 contributors from open crowdsourcing
  • Crowdsourcing platform• Crowdflower platform was chosen for the experiments • Integrated interface for job design • Complete set of management tools: gold creation, internal interface, statistics, raw data • Hub feature: jobs can be announced in several crowdsourcing pools: • Amazon Mturk • Get Paid • Zoombucks
  • Experiment: Initial training set generation• Initial training set generation • 1,000 images • Limited to 18 known experts • Aim: test the crowdsourcing interface
  • Experiment: Automated classification verification• 300,000 images• Binary task: approve or refuse classification• Aim: evaluate speed and difficulty of verification task
  • Experiments: trustability• Trustability experiments • Aim: compare user groups expected accuracy • 3,415 images were classified by the Medical Doctor • The two user groups were required to reclassify images • Random subset of 1,661 images used as gold standard • Feedback on wrong classification was given to the known experts for detecting ambiguities • Feedback on 847 of the gold images was muted for the crowd
  • Results: user self assessment• Users were required to answer how sure they were of their choice• Allows discarding untrusted data from trusted sources• Confidence rate • Medical doctor: 100 % • Known experts group: 95.04 % • Crowd group: 85.56 %
  • Results: tasks completed per userOpen crowdsourcing Internal interface
  • Results: MD and known experts• Agreement • Broad category: 88.76 % • Diagnostic subcategory: 97.40 % • Microscopy: 89.06 % • Radiology: 90.91 % • Reconstructions: 100 % • Visible light photography: 79.41 % • Conventional subcategory: 76 %• Speed • MD: 85 judgements per hour • Experts: 66 judgements per hour and user
  • Results: MD and Crowd• Agreement • Broad category: 85.53 % • Diagnostic subcategory: 85.15 % • Microscopy: 70.89 % • Radiology: 64.01 % • Reconstructions: 0 % • Visible light photography: 58.89 % • Conventional subcategory: 75.91 %• Speed • MD: 85 judgements per hour • Crowd: 25 judgements per hour and user
  • Results: Automatic classification verification• Verification by experts• 1,000 images were verified• Agreement among annotators: 100%• Speed: • Users answered twice as fast
  • Conclusions• Iterative approach reduces amount of manual work • Only a small subset is fully manually annotated • Automatic classification verification is faster• Significant differences among user groups • Faster crowd annotations due to the number of contributors • Poorer crowd annotations in the most specific classes• Comparable performance among user groups • Broad categories
  • Future work• Experiments can be redesigned to fit the crowd behaviour: • A smaller number of (good) contributors has previously led to CAD-comparable performance • Selection of contributors: • Historical performance on the platform? • Selection/Training phase within the job
  • Thanks for your attention! Antonio Foncubierta-Rodríguez and Henning Müller. “Ground truth generation in medical imaging: A crowdsourcing based iterative approach”,in Workshop on Crowdsourcing for Multimedia, ACM Multimedia, Nara, Japan, 2012 Contact: antonio.foncubierta@hevs.ch