Relevant Content Detection
in Cataract Surgery Videos
Assoc.Prof. Dr. Klaus Schöffmann
Institute of Information Technology
Alpen-Adria-Universität Klagenfurt, Austria
IPTA 2022,
Salzburg, Austria
19.04.2022
Images/Videos in Medicine
Klaus Schöffmann Relevant Content Detection in Cataract Surgery Videos 2
Modern operation rooms
Many ways of delivering
visuals to the surgeon’s eye
Capsules
Endoscopes
Robot-assisted surgery
Microscopes
Ophthalmology
Klaus Schöffmann Relevant Content Detection in Cataract Surgery Videos 3
Cataract
Surgery
Most Common
Eye Surgery
Most Common
Surgery Worldwide
Cataract Surgery
Klaus Schöffmann Relevant Content Detection in Cataract Surgery Videos 4
microscopic setup limits teaching and training
Image: https://www.ranelle.com/cataract-surgery/
typically
5-7 mins
• Many ophthalmic surgeons record and archive entire surgery
• Teaching and Training
• limited teaching capabilities during surgery (microscope)
• video as a training aid
• demonstration of operation techniques
• Documentation
• an image tells a thousand words
• legal source of evidence
• explanations to patients
• Research
• retrospective analyses
• case re-visitations
• analyses and forensics
Cataract Surgery Videos
Relevant Content Detection in Cataract Surgery Videos
However, simple recording is not enough –
we need automatic content analysis to
enable content-based search and filtering.
Klaus Schöffmann 5
Content Analysis for Information Retrieval
Klaus Schöffmann Relevant Content Detection in Cataract Surgery Videos 6
A/I handpiece
Cap. forceps
Hydro. cannula
Lens injector
Micromanipulator
Phaco. handpiece
Primary knife
Secondary knife
Visco. cannula
Instruments:
Implantation
Phaco
Rhexis
Irrigation
…
Phases:
Cataract Video Analysis Framework
Relevant Content Detection in Cataract Surgery Videos
Negin Ghamsarian. 2020. Enabling Relevance-Based Exploration of Cataract Videos. In Proceedings of the 2020 International Conference on Multimedia Retrieval (ICMR '20). Association for Computing Machinery, New York, NY, USA, 378–382.
Klaus Schöffmann 7
Phase Segmentation
Phase Segmentation in Cataract Surgery Videos
Klaus Schöffmann Relevant Content Detection in Cataract Surgery Videos 9
Manfred J. Primus, Doris Putzgruber-Adamitsch, Mario Taschwer, Bernd Münzer, Yosuf El-Shabrawi, Laszlo Böszörmenyi, and Klaus Schoeffmann. 2018. Frame-Based Classification of Operation Phases in Cataract Surgery Videos. In Proceedings of the 24th
International Conference on Multimedia Modeling 2018 (MMM2018). Lecture Notes in Computer Science, vol 10704, Springer, Cham, 241-253.
Frame-Based Classification:
• GoogLeNet CNN, pre-trained on ImageNet-1000
• 21 videos (17/4 randomly chosen for training/test)
• 212,487 frames (175,488/36,999)
• practical problem: heavily unbalanced data
• incision: 6,642 training images
• phacoemulsification: 54,679 training images
• à random sampling and augmentation (copy, rotate, scale) to
get 12,000 frames per phase
• Three types of models/data
• Unbalanced (“basic”)
• balanced
• time-based CNN
• frame number
as 4th channel
Cataract Surgery Phases
Phase Segmentation in Cataract Surgery Videos
Klaus Schöffmann Relevant Content Detection in Cataract Surgery Videos 10
Manfred J. Primus, Doris Putzgruber-Adamitsch, Mario Taschwer, Bernd Münzer, Yosuf El-Shabrawi, Laszlo Böszörmenyi, and Klaus Schoeffmann. 2018. Frame-Based Classification of Operation Phases in Cataract Surgery Videos. In Proceedings of the 24th
International Conference on Multimedia Modeling 2018 (MMM2018). Lecture Notes in Computer Science, vol 10704, Springer, Cham, 241-253.
Cataract Surgery Phases Frame-Based Classification:
Relevant
Phase Segmentation
• Only a few phases are medically relevant
• and they are separated by idle phases
Relevant Phase Segmentation in Cataract Videos
Relevant Content Detection in Cataract Surgery Videos
Klaus Schöffmann 12
Relevant Phase Segmentation in Cataract Videos
Relevant Content Detection in Cataract Surgery Videos
N. Ghamsarian, M. Taschwer, D. Putzgruber, S. Sarny, K. Schoeffmann, “Relevance Detection in Cataract Surgery Videos by Spatio-Temporal Action Localization ”, 25th International Conference on Pattern Recognition (ICPR), Milan, Italy, 2020
Klaus Schöffmann 13
Relevant Phase Segmentation in Cataract Videos
Relevant Content Detection in Cataract Surgery Videos
N. Ghamsarian, M. Taschwer, D. Putzgruber, S. Sarny, K. Schoeffmann, “Relevance Detection in Cataract Surgery Videos by Spatio-Temporal Action Localization ”, 25th International Conference on Pattern Recognition (ICPR), Milan, Italy, 2020
(CNN)
LSTM
GRU
Bi-LSTM
Bi-GRU
feature-based (FB)
end-to-end (EE)
best backbone
model with spatial
action localization
and Bi-RNN (EE):
LocalPhase
Gru/LSTM
Dataset with videos from 22
cataract surgeries, annotated
for medically relevant phases
(publicly released)
18 videos randomly used for
training, 4 for testing
Klaus Schöffmann 14
Relevant Phase Segmentation in Cataract Videos
Relevant Content Detection in Cataract Surgery Videos
Training hyper/parameters
N. Ghamsarian, M. Taschwer, D. Putzgruber, S. Sarny, K. Schoeffmann, “Relevance Detection in Cataract Surgery Videos by Spatio-Temporal Action Localization ”, 25th International Conference on Pattern Recognition (ICPR), Milan, Italy, 2020
Klaus Schöffmann 15
Relevant Phase Segmentation in Cataract Videos
Relevant Content Detection in Cataract Surgery Videos
Results
N. Ghamsarian, M. Taschwer, D. Putzgruber, S. Sarny, K. Schoeffmann, “Relevance Detection in Cataract Surgery Videos by Spatio-Temporal Action Localization ”, 25th International Conference on Pattern Recognition (ICPR), Milan, Italy, 2020
Klaus Schöffmann 16
Instrument Segmentation
• Evaluate a region-based CNN for
1. Binary semantic segmentation
• distinguish between instrument instances and background
(without recognizing the actual instrument)
2. Multi-class instance segmentation
• Labeling different instrument classes and their segments
• Mask R-CNN (extension of Faster R-CNN)
• Region Proposal Network (RPN)
• with a common CNN as a backbone
• End-to-end training with multi-task loss
• Classification
• Localization
• Mask Segmentation
Instrument Segmentation
Klaus Schöffmann Relevant Content Detection in Cataract Surgery Videos 18
He, K., Gkioxari, G., Dollár, P., & Girshick, R. (2017). Mask r-cnn. In Proceedings of the IEEE international conference on computer vision (pp. 2961-2969).
• Manual annotations of content the neural network needs to learn
• e.g., instrument segmentation in cataract surgery
Instrument Segmentation
Klaus Schöffmann Relevant Content Detection in Cataract Surgery Videos 19
• Two different datasets
• Cataract-101 (DS 1)
• CaDIS (DS 2)
• Segment annotations
• manually created for DS 1
• converted from existing mask images of DS 2
• Data augmentation techniques
• geometric (hor./vert. flip, 90-degree rotation)
• color (brightness, saturation, grayscale, contrast, distortion)
• bbox (random crop and random jitter on bounding-boxes)
Instrument Segmentation: Datasets and Augmentation
Klaus Schöffmann Relevant Content Detection in Cataract Surgery Videos 20
Dataset Images Train Val Test Classes
DS 1 393 237 61 95 9
DS 2 4738 3582 542 614 21
• Schoeffmann, K., Taschwer, M., Sarny, S., Münzer, B., Primus, M. J., & Putzgruber, D. (2018, June). Cataract-101: video dataset of 101 cataract surgeries. In Proceedings of the 9th ACM Multimedia Systems Conference (pp. 421-425).
• Flouty, E., Kadkhodamohammadi, A., Luengo, I., Fuentes-Hurtado, F., Taleb, H., Barbarisi, S., ... & Stoyanov, D. (2019). Cadis: Cataract dataset for image segmentation. arXiv preprint arXiv:1906.11586.
Conversion of mask images in CaDIS dataset:
original image horizontal flip vertical flip grayscale adjust brightness color distortion random crop
60% 15% 25%
• Several different backbone networks
• Inceptionv2, Inception-ResNet-v2, ResNet-50, ResNet-101
• Settings
• Transfer learning from COCO dataset
• SGD as optimizer, different LR={0.06, 0.01, 0.006, 0.001, 0.0006, 0.0001}
• Evaluation metrics
• Based on IoU (Jaccard index) for every instance
• with ground truth G and the predicted region P
• Average precision (mAP) and recall (AR)
• with a minimum of 50% IoU
• For bounding-box and mask segmentation
Instrument Segmentation: Experimental Setup
Klaus Schöffmann Relevant Content Detection in Cataract Surgery Videos 21
𝐼𝑜𝑈 =
𝑮 ∩ 𝑷
𝑮 ∪ 𝑷
M. Fox, M. Taschwer and K. Schoeffmann, "Pixel-Based Tool Segmentation in Cataract Surgery Videos with Mask R-CNN," 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), Rochester, MN, USA, 2020, pp. 565-568
Instrument Segmentation: Experimental Results
Klaus Schöffmann Relevant Content Detection in Cataract Surgery Videos 22
Dataset mAP@.50IoU AR@1
DS 1 0.770 0.284
DS 2 0.710 0.443
Binary semantic segmentation (ResNet-101)
Dataset mAP@.50IoU AR@1
DS 1 0.559 0.299
DS 2 0.607 0.461
Multi-class instance segmentation (ResNet-101)
Dataset mAP@.50IoU AR@1
DS 1 0.928 0.612
DS 2 0.839 0.573
Binary semantic segmentation (ResNet-101)
Dataset mAP@.50IoU AR@1
DS 1 0.656 0.598
DS 2 0.685 0.576
Multi-class instance segmentation (ResNet-101)
Bounding-Box prediction
M. Fox, M. Taschwer and K. Schoeffmann, "Pixel-Based Tool Segmentation in Cataract Surgery Videos with Mask R-CNN," 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), Rochester, MN, USA, 2020, pp. 565-568
Mask prediction
Instrument Segmentation Results
Klaus Schöffmann Relevant Content Detection in Cataract Surgery Videos 23
M. Fox, M. Taschwer and K. Schoeffmann, "Pixel-Based Tool Segmentation in Cataract Surgery Videos with Mask R-CNN," 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), Rochester, MN, USA, 2020, pp. 565-568
Relevance-Based Compression
• Clinicians want to record and archive as many videos as possible
• However, hospitals typically have quite limited storage space
• In order to optimize video storage, we need domain-specific video
compression
• automatic relevance detection (segments, ROIs)
• relevant for documentation teaching, research
• relevance-based encoding with HEVC/H.265
• content with high relevance -> high quality
• content with low relevance -> low quality
• [remove irrelevant content]
Relevance-Based Compression: Motivation
Relevant Content Detection in Cataract Surgery Videos
Relevance as defined by clincians:
• Instruments in general
• Everything that happens inside
they eye (Cornea)
Klaus Schöffmann 25
Relevance-Based Compression: Framework
Relevant Content Detection in Cataract Surgery Videos
Negin Ghamsarian, Hadi Amirpourazarian, Christian Timmerer, Mario Taschwer, and Klaus Schöffmann. 2020. Relevance-Based Compression of Cataract Surgery Videos Using Convolutional Neural Networks. In Proceedings of the 28th ACM International Conference on Multimedia (MM '20).
Association for Computing Machinery, New York, NY, USA, 3577–3585.
Klaus Schöffmann 26
• ResNet-50 and ResNet-101 (pre-trained on ImageNet)
• trained/tested with random split of videos from Cataract-101 (18/4)
• 500+500 idle and non-idle frames are uniformly sampled per video
• data augmentation
• brightness, rotation, width/height shift, zoom, shear
Relevance-Based Compression: Idle Detection
Relevant Content Detection in Cataract Surgery Videos
Negin Ghamsarian, Hadi Amirpourazarian, Christian Timmerer, Mario Taschwer, and Klaus Schöffmann. 2020. Relevance-Based Compression of Cataract Surgery Videos Using Convolutional Neural Networks. In Proceedings of the 28th ACM International Conference on Multimedia (MM '20).
Association for Computing Machinery, New York, NY, USA, 3577–3585.
Network Class Precision Recall F1-score
ResNet-50 Action 1.00 0.85 0.92
Idle 0.87 1.00 0.93
ResNet-101 Action 0.99 0.88 0.93
Idle 0.89 0.99 0.94
Results:
Predictions for four example videos
(about 20% idle):
Klaus Schöffmann 27
• Mask R-CNN
• with ResNet-50 and ResNet-101 as backbones (pre-trained on MS COCO)
• cornea annotations in 262 frames, instruments in 216 frames
• 90/10 split for training/test set
Relevance-Based Compression: ROI Segmentation
Relevant Content Detection in Cataract Surgery Videos
Negin Ghamsarian, Hadi Amirpourazarian, Christian Timmerer, Mario Taschwer, and Klaus Schöffmann. 2020. Relevance-Based Compression of Cataract Surgery Videos Using Convolutional Neural Networks. In Proceedings of the 28th ACM International Conference on Multimedia (MM '20).
Association for Computing Machinery, New York, NY, USA, 3577–3585.
Target Backbone Mask Segmentation Bounding-Box Segmentation
Cornea
mAP80 mAP85 mAP mAP80 mAP85 mAP
ResNet-101 1.00 0.92 0.89 1.00 1.00 0.95
ResNet-50 1.00 1.00 0.88 1.00 1.00 0.94
Instrument
mAP60 mAP65 mAP mAP80 mAP85 mAP
ResNet-101 0.77 0.65 0.41 1.00 1.00 0.89
ResNet-50 0.58 0.49 0.29 0.64 0.26 0.65
Klaus Schöffmann 28
Relevance-Based Compression: Considered Scenarios
Relevant Content Detection in Cataract Surgery Videos
Negin Ghamsarian, Hadi Amirpourazarian, Christian Timmerer, Mario Taschwer, and Klaus Schöffmann. 2020. Relevance-Based Compression of Cataract Surgery Videos Using Convolutional Neural Networks. In Proceedings of the 28th ACM International Conference on Multimedia (MM '20).
Association for Computing Machinery, New York, NY, USA, 3577–3585.
1
•𝐴𝑐𝑡𝑖𝑜𝑛
2
•𝐴𝑐𝑡𝑖𝑜𝑛 ∩ (𝑐𝑜𝑟𝑛𝑒𝑎 ∪ 𝑖𝑛𝑠𝑡𝑟𝑢𝑚𝑒𝑛𝑡)
3
•𝐴𝑐𝑡𝑖𝑜𝑛 ∩ 𝑐𝑜𝑟𝑛𝑒𝑎 (simple)
4
•𝐴𝑐𝑡𝑖𝑜𝑛 ∩ 𝑐𝑜𝑟𝑛𝑒𝑎 (Luma preference)
5
•𝐴𝑐𝑡𝑖𝑜𝑛 ∩ 𝑐𝑜𝑟𝑛𝑒𝑎 (removed background)
Scenarios for the relevant content
… … …
Action Idle Action Idle
Irrelevant Irrelevant
Relevant
Klaus Schöffmann 29
• PSNR measurements (Scenario 2)
to original content for an
exemplary segment
• 4s+4s action ad idle content
• 𝑄𝑃 = 𝑄𝑃4 for relevant content
• 𝑄𝑃! fixed to 22 in our study
• 𝑄𝑃 = 𝑄𝑃4 + ∆𝑄 for irrelevant
• ∆𝑄 varied
Relevance-Based Compression: Results
Relevant Content Detection in Cataract Surgery Videos
∆𝑄 = 5 ∆𝑄 = 10
∆𝑄 = 13 ∆𝑄 = 15
à PSNR of relevant content is equivalent in all four
situations, relevant content keeps hiqh quality.
(fluctuations are due to low-delay encoding of HEVC, which adapts
quantization parameters to perceived visual input quality)
Klaus Schöffmann 30
• Achievable bitrate reduction for nine representative cataract surgery videos
Relevance-Based Compression: Results
Relevant Content Detection in Cataract Surgery Videos
Negin Ghamsarian, Hadi Amirpourazarian, Christian Timmerer, Mario Taschwer, and Klaus Schöffmann. 2020. Relevance-Based Compression of Cataract Surgery Videos Using Convolutional Neural Networks. In Proceedings of the 28th ACM International Conference on Multimedia (MM '20).
Association for Computing Machinery, New York, NY, USA, 3577–3585.
original scenario II
scenario IV scenario V
Klaus Schöffmann 31
Irregularity Detection:
Pupil Reactions
• During cataract surgery some patients show pupil reactions
• dilations or contradictions of the eye chamber
• weak, medium, or strong
• contradictions are esp. dangerous (instrument inside)
• yet it is unknown why this happens
• à Pupil reaction detection framework and several evaluations
• collected many videos, annotated pupil and cornea (iris)
• trained a segmentation network (Mask R-CNN w/ ResNet-101)
• applied network and track size of pupil vs. cornea
• evaluated with three datasets
• DS 0 (segmentation): 83 images from cataract surgery videos, annotated pupil/cornea
• DS A (evaluation): 10 cataract surgery videos, annotated pupil reactions
• DS B (generalization test): 20 cataract surgery videos, results checked by clinicians
Pupil Reaction Detection
Relevant Content Detection in Cataract Surgery Videos
Sokolova, N., Schoeffmann, K., Taschwer, M., Sarny, S., Putzgruber-Adamitsch, D., & El-Shabrawi, Y. (2021). Automatic detection of pupil reactions in cataract surgery videos. PloS one, 16(10), e0258390.
Klaus Schöffmann 33
Pupil Reaction Detection – Methodology
Relevant Content Detection in Cataract Surgery Videos
✓ ✗
✓ ✗ ✓
✓ ✓ ✓
✓ ✓
Klaus Schöffmann 34
Sokolova, N., Schoeffmann, K., Taschwer, M., Sarny, S., Putzgruber-Adamitsch, D., & El-Shabrawi, Y. (2021). Automatic detection of pupil reactions in cataract surgery videos. PloS one, 16(10), e0258390.
Pupil Reaction Detection – Methodology
Relevant Content Detection in Cataract Surgery Videos
Manual
annotation
Mask R-CNN
segmentation
Backbone CNNs
• ResNet-50
• ResNet-101
83 frames
25 videos
various conditions
published online
Klaus Schöffmann 35
Sokolova, N., Schoeffmann, K., Taschwer, M., Sarny, S., Putzgruber-Adamitsch, D., & El-Shabrawi, Y. (2021). Automatic detection of pupil reactions in cataract surgery videos. PloS one, 16(10), e0258390.
Pupil Reaction Detection – Performance
Relevant Content Detection in Cataract Surgery Videos
IoU
pupil 91.47%
iris 80.1%
0%
50%
100%
80% 85% 90% 95%
Frames
Overlap, IoU
Iris segmentation
ResNet-101
ResNet-50
0%
50%
100%
80% 85% 90% 95%
Frames
Overlap, IoU
Pupil segmentation
ResNet-101
ResNet-50
Klaus Schöffmann 36
Sokolova, N., Schoeffmann, K., Taschwer, M., Sarny, S., Putzgruber-Adamitsch, D., & El-Shabrawi, Y. (2021). Automatic detection of pupil reactions in cataract surgery videos. PloS one, 16(10), e0258390.
Pupil Reaction Detection – Dataset
Relevant Content Detection in Cataract Surgery Videos
A
10 videos
135 reactions
B 20 videos
229 reactions
Weak Medium Strong Weak Medium Strong
Parameter optimization Generalization evaluation
Klaus Schöffmann 37
Sokolova, N., Schoeffmann, K., Taschwer, M., Sarny, S., Putzgruber-Adamitsch, D., & El-Shabrawi, Y. (2021). Automatic detection of pupil reactions in cataract surgery videos. PloS one, 16(10), e0258390.
Pupil Reaction Detection – Results
Relevant Content Detection in Cataract Surgery Videos
0,00%
10,00%
20,00%
30,00%
40,00%
50,00%
60,00%
70,00%
80,00%
90,00%
100,00%
Recall Precision Prediction accuracy
Intensity-Based Performance Analysis
All Medium and strong Strong
Optimization Generalization
Recall 64,31% 47,00%
Precision 45,53% 38,25%
Prediction
accuracy
85,35% 72,96%
Prediction
length
18,93s 18,15s
Klaus Schöffmann 38
Sokolova, N., Schoeffmann, K., Taschwer, M., Sarny, S., Putzgruber-Adamitsch, D., & El-Shabrawi, Y. (2021). Automatic detection of pupil reactions in cataract surgery videos. PloS one, 16(10), e0258390.
Pupil Reaction Detection – Subjective Evaluation
Relevant Content Detection in Cataract Surgery Videos
Video Detected reactions Intensity
Approach Doctors Approach* Doctors
1 6 Rare 12,26% Weak and medium
2 10 Rare 10,55% No dangerous reactions
3 14 Often 17,10% Medium and strong
4 23 Very
often
29,09% Very strong and dangerous
reactions
5 16 Very
often
13,98% Unstable pupil, medium
reactions
Klaus Schöffmann 39
Sokolova, N., Schoeffmann, K., Taschwer, M., Sarny, S., Putzgruber-Adamitsch, D., & El-Shabrawi, Y. (2021). Automatic detection of pupil reactions in cataract surgery videos. PloS one, 16(10), e0258390.
Conclusions
• Cataract surgery can benefit a lot from image and video analysis
• teaching and training needs specific content
• however, also retrospective analysis and forensics benefit from content-based search
• Strong progress has been made over the last years in terms of DL performance
• both recall and precision significantly improved with deep learning, for many problems
• high-accuracy segmentation possible
• Video analysis is beneficial for real-time but also post-operative use
• former field is strongly addressed by medical imaging community, latter not so much
• Still, there are many open issues…
• focus on specific content (events, irregularities) and diverse operations (not only cataracts)
• larger datasets with rich annotations
• wide-spread application of video analysis to collect large-scale content statistics
• generalizable and explainable DL models
Conclusions
Relevant Content Detection in Cataract Surgery Videos
Klaus Schöffmann 41
Thank You!
Assoc.Prof. Dr. Klaus Schoeffmann,
Associate Professor at Universität Klagenfurt, Austria
ks@itec.aau.at | EndoscopicVideo.com

Relevant Content Detection in Cataract Surgery Videos (Invited Talk 1 at IPTA 2022)

  • 1.
    Relevant Content Detection inCataract Surgery Videos Assoc.Prof. Dr. Klaus Schöffmann Institute of Information Technology Alpen-Adria-Universität Klagenfurt, Austria IPTA 2022, Salzburg, Austria 19.04.2022
  • 2.
    Images/Videos in Medicine KlausSchöffmann Relevant Content Detection in Cataract Surgery Videos 2 Modern operation rooms Many ways of delivering visuals to the surgeon’s eye Capsules Endoscopes Robot-assisted surgery Microscopes
  • 3.
    Ophthalmology Klaus Schöffmann RelevantContent Detection in Cataract Surgery Videos 3 Cataract Surgery Most Common Eye Surgery Most Common Surgery Worldwide
  • 4.
    Cataract Surgery Klaus SchöffmannRelevant Content Detection in Cataract Surgery Videos 4 microscopic setup limits teaching and training Image: https://www.ranelle.com/cataract-surgery/ typically 5-7 mins
  • 5.
    • Many ophthalmicsurgeons record and archive entire surgery • Teaching and Training • limited teaching capabilities during surgery (microscope) • video as a training aid • demonstration of operation techniques • Documentation • an image tells a thousand words • legal source of evidence • explanations to patients • Research • retrospective analyses • case re-visitations • analyses and forensics Cataract Surgery Videos Relevant Content Detection in Cataract Surgery Videos However, simple recording is not enough – we need automatic content analysis to enable content-based search and filtering. Klaus Schöffmann 5
  • 6.
    Content Analysis forInformation Retrieval Klaus Schöffmann Relevant Content Detection in Cataract Surgery Videos 6 A/I handpiece Cap. forceps Hydro. cannula Lens injector Micromanipulator Phaco. handpiece Primary knife Secondary knife Visco. cannula Instruments: Implantation Phaco Rhexis Irrigation … Phases:
  • 7.
    Cataract Video AnalysisFramework Relevant Content Detection in Cataract Surgery Videos Negin Ghamsarian. 2020. Enabling Relevance-Based Exploration of Cataract Videos. In Proceedings of the 2020 International Conference on Multimedia Retrieval (ICMR '20). Association for Computing Machinery, New York, NY, USA, 378–382. Klaus Schöffmann 7
  • 8.
  • 9.
    Phase Segmentation inCataract Surgery Videos Klaus Schöffmann Relevant Content Detection in Cataract Surgery Videos 9 Manfred J. Primus, Doris Putzgruber-Adamitsch, Mario Taschwer, Bernd Münzer, Yosuf El-Shabrawi, Laszlo Böszörmenyi, and Klaus Schoeffmann. 2018. Frame-Based Classification of Operation Phases in Cataract Surgery Videos. In Proceedings of the 24th International Conference on Multimedia Modeling 2018 (MMM2018). Lecture Notes in Computer Science, vol 10704, Springer, Cham, 241-253. Frame-Based Classification: • GoogLeNet CNN, pre-trained on ImageNet-1000 • 21 videos (17/4 randomly chosen for training/test) • 212,487 frames (175,488/36,999) • practical problem: heavily unbalanced data • incision: 6,642 training images • phacoemulsification: 54,679 training images • à random sampling and augmentation (copy, rotate, scale) to get 12,000 frames per phase • Three types of models/data • Unbalanced (“basic”) • balanced • time-based CNN • frame number as 4th channel Cataract Surgery Phases
  • 10.
    Phase Segmentation inCataract Surgery Videos Klaus Schöffmann Relevant Content Detection in Cataract Surgery Videos 10 Manfred J. Primus, Doris Putzgruber-Adamitsch, Mario Taschwer, Bernd Münzer, Yosuf El-Shabrawi, Laszlo Böszörmenyi, and Klaus Schoeffmann. 2018. Frame-Based Classification of Operation Phases in Cataract Surgery Videos. In Proceedings of the 24th International Conference on Multimedia Modeling 2018 (MMM2018). Lecture Notes in Computer Science, vol 10704, Springer, Cham, 241-253. Cataract Surgery Phases Frame-Based Classification:
  • 11.
  • 12.
    • Only afew phases are medically relevant • and they are separated by idle phases Relevant Phase Segmentation in Cataract Videos Relevant Content Detection in Cataract Surgery Videos Klaus Schöffmann 12
  • 13.
    Relevant Phase Segmentationin Cataract Videos Relevant Content Detection in Cataract Surgery Videos N. Ghamsarian, M. Taschwer, D. Putzgruber, S. Sarny, K. Schoeffmann, “Relevance Detection in Cataract Surgery Videos by Spatio-Temporal Action Localization ”, 25th International Conference on Pattern Recognition (ICPR), Milan, Italy, 2020 Klaus Schöffmann 13
  • 14.
    Relevant Phase Segmentationin Cataract Videos Relevant Content Detection in Cataract Surgery Videos N. Ghamsarian, M. Taschwer, D. Putzgruber, S. Sarny, K. Schoeffmann, “Relevance Detection in Cataract Surgery Videos by Spatio-Temporal Action Localization ”, 25th International Conference on Pattern Recognition (ICPR), Milan, Italy, 2020 (CNN) LSTM GRU Bi-LSTM Bi-GRU feature-based (FB) end-to-end (EE) best backbone model with spatial action localization and Bi-RNN (EE): LocalPhase Gru/LSTM Dataset with videos from 22 cataract surgeries, annotated for medically relevant phases (publicly released) 18 videos randomly used for training, 4 for testing Klaus Schöffmann 14
  • 15.
    Relevant Phase Segmentationin Cataract Videos Relevant Content Detection in Cataract Surgery Videos Training hyper/parameters N. Ghamsarian, M. Taschwer, D. Putzgruber, S. Sarny, K. Schoeffmann, “Relevance Detection in Cataract Surgery Videos by Spatio-Temporal Action Localization ”, 25th International Conference on Pattern Recognition (ICPR), Milan, Italy, 2020 Klaus Schöffmann 15
  • 16.
    Relevant Phase Segmentationin Cataract Videos Relevant Content Detection in Cataract Surgery Videos Results N. Ghamsarian, M. Taschwer, D. Putzgruber, S. Sarny, K. Schoeffmann, “Relevance Detection in Cataract Surgery Videos by Spatio-Temporal Action Localization ”, 25th International Conference on Pattern Recognition (ICPR), Milan, Italy, 2020 Klaus Schöffmann 16
  • 17.
  • 18.
    • Evaluate aregion-based CNN for 1. Binary semantic segmentation • distinguish between instrument instances and background (without recognizing the actual instrument) 2. Multi-class instance segmentation • Labeling different instrument classes and their segments • Mask R-CNN (extension of Faster R-CNN) • Region Proposal Network (RPN) • with a common CNN as a backbone • End-to-end training with multi-task loss • Classification • Localization • Mask Segmentation Instrument Segmentation Klaus Schöffmann Relevant Content Detection in Cataract Surgery Videos 18 He, K., Gkioxari, G., Dollár, P., & Girshick, R. (2017). Mask r-cnn. In Proceedings of the IEEE international conference on computer vision (pp. 2961-2969).
  • 19.
    • Manual annotationsof content the neural network needs to learn • e.g., instrument segmentation in cataract surgery Instrument Segmentation Klaus Schöffmann Relevant Content Detection in Cataract Surgery Videos 19
  • 20.
    • Two differentdatasets • Cataract-101 (DS 1) • CaDIS (DS 2) • Segment annotations • manually created for DS 1 • converted from existing mask images of DS 2 • Data augmentation techniques • geometric (hor./vert. flip, 90-degree rotation) • color (brightness, saturation, grayscale, contrast, distortion) • bbox (random crop and random jitter on bounding-boxes) Instrument Segmentation: Datasets and Augmentation Klaus Schöffmann Relevant Content Detection in Cataract Surgery Videos 20 Dataset Images Train Val Test Classes DS 1 393 237 61 95 9 DS 2 4738 3582 542 614 21 • Schoeffmann, K., Taschwer, M., Sarny, S., Münzer, B., Primus, M. J., & Putzgruber, D. (2018, June). Cataract-101: video dataset of 101 cataract surgeries. In Proceedings of the 9th ACM Multimedia Systems Conference (pp. 421-425). • Flouty, E., Kadkhodamohammadi, A., Luengo, I., Fuentes-Hurtado, F., Taleb, H., Barbarisi, S., ... & Stoyanov, D. (2019). Cadis: Cataract dataset for image segmentation. arXiv preprint arXiv:1906.11586. Conversion of mask images in CaDIS dataset: original image horizontal flip vertical flip grayscale adjust brightness color distortion random crop 60% 15% 25%
  • 21.
    • Several differentbackbone networks • Inceptionv2, Inception-ResNet-v2, ResNet-50, ResNet-101 • Settings • Transfer learning from COCO dataset • SGD as optimizer, different LR={0.06, 0.01, 0.006, 0.001, 0.0006, 0.0001} • Evaluation metrics • Based on IoU (Jaccard index) for every instance • with ground truth G and the predicted region P • Average precision (mAP) and recall (AR) • with a minimum of 50% IoU • For bounding-box and mask segmentation Instrument Segmentation: Experimental Setup Klaus Schöffmann Relevant Content Detection in Cataract Surgery Videos 21 𝐼𝑜𝑈 = 𝑮 ∩ 𝑷 𝑮 ∪ 𝑷 M. Fox, M. Taschwer and K. Schoeffmann, "Pixel-Based Tool Segmentation in Cataract Surgery Videos with Mask R-CNN," 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), Rochester, MN, USA, 2020, pp. 565-568
  • 22.
    Instrument Segmentation: ExperimentalResults Klaus Schöffmann Relevant Content Detection in Cataract Surgery Videos 22 Dataset mAP@.50IoU AR@1 DS 1 0.770 0.284 DS 2 0.710 0.443 Binary semantic segmentation (ResNet-101) Dataset mAP@.50IoU AR@1 DS 1 0.559 0.299 DS 2 0.607 0.461 Multi-class instance segmentation (ResNet-101) Dataset mAP@.50IoU AR@1 DS 1 0.928 0.612 DS 2 0.839 0.573 Binary semantic segmentation (ResNet-101) Dataset mAP@.50IoU AR@1 DS 1 0.656 0.598 DS 2 0.685 0.576 Multi-class instance segmentation (ResNet-101) Bounding-Box prediction M. Fox, M. Taschwer and K. Schoeffmann, "Pixel-Based Tool Segmentation in Cataract Surgery Videos with Mask R-CNN," 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), Rochester, MN, USA, 2020, pp. 565-568 Mask prediction
  • 23.
    Instrument Segmentation Results KlausSchöffmann Relevant Content Detection in Cataract Surgery Videos 23 M. Fox, M. Taschwer and K. Schoeffmann, "Pixel-Based Tool Segmentation in Cataract Surgery Videos with Mask R-CNN," 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), Rochester, MN, USA, 2020, pp. 565-568
  • 24.
  • 25.
    • Clinicians wantto record and archive as many videos as possible • However, hospitals typically have quite limited storage space • In order to optimize video storage, we need domain-specific video compression • automatic relevance detection (segments, ROIs) • relevant for documentation teaching, research • relevance-based encoding with HEVC/H.265 • content with high relevance -> high quality • content with low relevance -> low quality • [remove irrelevant content] Relevance-Based Compression: Motivation Relevant Content Detection in Cataract Surgery Videos Relevance as defined by clincians: • Instruments in general • Everything that happens inside they eye (Cornea) Klaus Schöffmann 25
  • 26.
    Relevance-Based Compression: Framework RelevantContent Detection in Cataract Surgery Videos Negin Ghamsarian, Hadi Amirpourazarian, Christian Timmerer, Mario Taschwer, and Klaus Schöffmann. 2020. Relevance-Based Compression of Cataract Surgery Videos Using Convolutional Neural Networks. In Proceedings of the 28th ACM International Conference on Multimedia (MM '20). Association for Computing Machinery, New York, NY, USA, 3577–3585. Klaus Schöffmann 26
  • 27.
    • ResNet-50 andResNet-101 (pre-trained on ImageNet) • trained/tested with random split of videos from Cataract-101 (18/4) • 500+500 idle and non-idle frames are uniformly sampled per video • data augmentation • brightness, rotation, width/height shift, zoom, shear Relevance-Based Compression: Idle Detection Relevant Content Detection in Cataract Surgery Videos Negin Ghamsarian, Hadi Amirpourazarian, Christian Timmerer, Mario Taschwer, and Klaus Schöffmann. 2020. Relevance-Based Compression of Cataract Surgery Videos Using Convolutional Neural Networks. In Proceedings of the 28th ACM International Conference on Multimedia (MM '20). Association for Computing Machinery, New York, NY, USA, 3577–3585. Network Class Precision Recall F1-score ResNet-50 Action 1.00 0.85 0.92 Idle 0.87 1.00 0.93 ResNet-101 Action 0.99 0.88 0.93 Idle 0.89 0.99 0.94 Results: Predictions for four example videos (about 20% idle): Klaus Schöffmann 27
  • 28.
    • Mask R-CNN •with ResNet-50 and ResNet-101 as backbones (pre-trained on MS COCO) • cornea annotations in 262 frames, instruments in 216 frames • 90/10 split for training/test set Relevance-Based Compression: ROI Segmentation Relevant Content Detection in Cataract Surgery Videos Negin Ghamsarian, Hadi Amirpourazarian, Christian Timmerer, Mario Taschwer, and Klaus Schöffmann. 2020. Relevance-Based Compression of Cataract Surgery Videos Using Convolutional Neural Networks. In Proceedings of the 28th ACM International Conference on Multimedia (MM '20). Association for Computing Machinery, New York, NY, USA, 3577–3585. Target Backbone Mask Segmentation Bounding-Box Segmentation Cornea mAP80 mAP85 mAP mAP80 mAP85 mAP ResNet-101 1.00 0.92 0.89 1.00 1.00 0.95 ResNet-50 1.00 1.00 0.88 1.00 1.00 0.94 Instrument mAP60 mAP65 mAP mAP80 mAP85 mAP ResNet-101 0.77 0.65 0.41 1.00 1.00 0.89 ResNet-50 0.58 0.49 0.29 0.64 0.26 0.65 Klaus Schöffmann 28
  • 29.
    Relevance-Based Compression: ConsideredScenarios Relevant Content Detection in Cataract Surgery Videos Negin Ghamsarian, Hadi Amirpourazarian, Christian Timmerer, Mario Taschwer, and Klaus Schöffmann. 2020. Relevance-Based Compression of Cataract Surgery Videos Using Convolutional Neural Networks. In Proceedings of the 28th ACM International Conference on Multimedia (MM '20). Association for Computing Machinery, New York, NY, USA, 3577–3585. 1 •𝐴𝑐𝑡𝑖𝑜𝑛 2 •𝐴𝑐𝑡𝑖𝑜𝑛 ∩ (𝑐𝑜𝑟𝑛𝑒𝑎 ∪ 𝑖𝑛𝑠𝑡𝑟𝑢𝑚𝑒𝑛𝑡) 3 •𝐴𝑐𝑡𝑖𝑜𝑛 ∩ 𝑐𝑜𝑟𝑛𝑒𝑎 (simple) 4 •𝐴𝑐𝑡𝑖𝑜𝑛 ∩ 𝑐𝑜𝑟𝑛𝑒𝑎 (Luma preference) 5 •𝐴𝑐𝑡𝑖𝑜𝑛 ∩ 𝑐𝑜𝑟𝑛𝑒𝑎 (removed background) Scenarios for the relevant content … … … Action Idle Action Idle Irrelevant Irrelevant Relevant Klaus Schöffmann 29
  • 30.
    • PSNR measurements(Scenario 2) to original content for an exemplary segment • 4s+4s action ad idle content • 𝑄𝑃 = 𝑄𝑃4 for relevant content • 𝑄𝑃! fixed to 22 in our study • 𝑄𝑃 = 𝑄𝑃4 + ∆𝑄 for irrelevant • ∆𝑄 varied Relevance-Based Compression: Results Relevant Content Detection in Cataract Surgery Videos ∆𝑄 = 5 ∆𝑄 = 10 ∆𝑄 = 13 ∆𝑄 = 15 à PSNR of relevant content is equivalent in all four situations, relevant content keeps hiqh quality. (fluctuations are due to low-delay encoding of HEVC, which adapts quantization parameters to perceived visual input quality) Klaus Schöffmann 30
  • 31.
    • Achievable bitratereduction for nine representative cataract surgery videos Relevance-Based Compression: Results Relevant Content Detection in Cataract Surgery Videos Negin Ghamsarian, Hadi Amirpourazarian, Christian Timmerer, Mario Taschwer, and Klaus Schöffmann. 2020. Relevance-Based Compression of Cataract Surgery Videos Using Convolutional Neural Networks. In Proceedings of the 28th ACM International Conference on Multimedia (MM '20). Association for Computing Machinery, New York, NY, USA, 3577–3585. original scenario II scenario IV scenario V Klaus Schöffmann 31
  • 32.
  • 33.
    • During cataractsurgery some patients show pupil reactions • dilations or contradictions of the eye chamber • weak, medium, or strong • contradictions are esp. dangerous (instrument inside) • yet it is unknown why this happens • à Pupil reaction detection framework and several evaluations • collected many videos, annotated pupil and cornea (iris) • trained a segmentation network (Mask R-CNN w/ ResNet-101) • applied network and track size of pupil vs. cornea • evaluated with three datasets • DS 0 (segmentation): 83 images from cataract surgery videos, annotated pupil/cornea • DS A (evaluation): 10 cataract surgery videos, annotated pupil reactions • DS B (generalization test): 20 cataract surgery videos, results checked by clinicians Pupil Reaction Detection Relevant Content Detection in Cataract Surgery Videos Sokolova, N., Schoeffmann, K., Taschwer, M., Sarny, S., Putzgruber-Adamitsch, D., & El-Shabrawi, Y. (2021). Automatic detection of pupil reactions in cataract surgery videos. PloS one, 16(10), e0258390. Klaus Schöffmann 33
  • 34.
    Pupil Reaction Detection– Methodology Relevant Content Detection in Cataract Surgery Videos ✓ ✗ ✓ ✗ ✓ ✓ ✓ ✓ ✓ ✓ Klaus Schöffmann 34 Sokolova, N., Schoeffmann, K., Taschwer, M., Sarny, S., Putzgruber-Adamitsch, D., & El-Shabrawi, Y. (2021). Automatic detection of pupil reactions in cataract surgery videos. PloS one, 16(10), e0258390.
  • 35.
    Pupil Reaction Detection– Methodology Relevant Content Detection in Cataract Surgery Videos Manual annotation Mask R-CNN segmentation Backbone CNNs • ResNet-50 • ResNet-101 83 frames 25 videos various conditions published online Klaus Schöffmann 35 Sokolova, N., Schoeffmann, K., Taschwer, M., Sarny, S., Putzgruber-Adamitsch, D., & El-Shabrawi, Y. (2021). Automatic detection of pupil reactions in cataract surgery videos. PloS one, 16(10), e0258390.
  • 36.
    Pupil Reaction Detection– Performance Relevant Content Detection in Cataract Surgery Videos IoU pupil 91.47% iris 80.1% 0% 50% 100% 80% 85% 90% 95% Frames Overlap, IoU Iris segmentation ResNet-101 ResNet-50 0% 50% 100% 80% 85% 90% 95% Frames Overlap, IoU Pupil segmentation ResNet-101 ResNet-50 Klaus Schöffmann 36 Sokolova, N., Schoeffmann, K., Taschwer, M., Sarny, S., Putzgruber-Adamitsch, D., & El-Shabrawi, Y. (2021). Automatic detection of pupil reactions in cataract surgery videos. PloS one, 16(10), e0258390.
  • 37.
    Pupil Reaction Detection– Dataset Relevant Content Detection in Cataract Surgery Videos A 10 videos 135 reactions B 20 videos 229 reactions Weak Medium Strong Weak Medium Strong Parameter optimization Generalization evaluation Klaus Schöffmann 37 Sokolova, N., Schoeffmann, K., Taschwer, M., Sarny, S., Putzgruber-Adamitsch, D., & El-Shabrawi, Y. (2021). Automatic detection of pupil reactions in cataract surgery videos. PloS one, 16(10), e0258390.
  • 38.
    Pupil Reaction Detection– Results Relevant Content Detection in Cataract Surgery Videos 0,00% 10,00% 20,00% 30,00% 40,00% 50,00% 60,00% 70,00% 80,00% 90,00% 100,00% Recall Precision Prediction accuracy Intensity-Based Performance Analysis All Medium and strong Strong Optimization Generalization Recall 64,31% 47,00% Precision 45,53% 38,25% Prediction accuracy 85,35% 72,96% Prediction length 18,93s 18,15s Klaus Schöffmann 38 Sokolova, N., Schoeffmann, K., Taschwer, M., Sarny, S., Putzgruber-Adamitsch, D., & El-Shabrawi, Y. (2021). Automatic detection of pupil reactions in cataract surgery videos. PloS one, 16(10), e0258390.
  • 39.
    Pupil Reaction Detection– Subjective Evaluation Relevant Content Detection in Cataract Surgery Videos Video Detected reactions Intensity Approach Doctors Approach* Doctors 1 6 Rare 12,26% Weak and medium 2 10 Rare 10,55% No dangerous reactions 3 14 Often 17,10% Medium and strong 4 23 Very often 29,09% Very strong and dangerous reactions 5 16 Very often 13,98% Unstable pupil, medium reactions Klaus Schöffmann 39 Sokolova, N., Schoeffmann, K., Taschwer, M., Sarny, S., Putzgruber-Adamitsch, D., & El-Shabrawi, Y. (2021). Automatic detection of pupil reactions in cataract surgery videos. PloS one, 16(10), e0258390.
  • 40.
  • 41.
    • Cataract surgerycan benefit a lot from image and video analysis • teaching and training needs specific content • however, also retrospective analysis and forensics benefit from content-based search • Strong progress has been made over the last years in terms of DL performance • both recall and precision significantly improved with deep learning, for many problems • high-accuracy segmentation possible • Video analysis is beneficial for real-time but also post-operative use • former field is strongly addressed by medical imaging community, latter not so much • Still, there are many open issues… • focus on specific content (events, irregularities) and diverse operations (not only cataracts) • larger datasets with rich annotations • wide-spread application of video analysis to collect large-scale content statistics • generalizable and explainable DL models Conclusions Relevant Content Detection in Cataract Surgery Videos Klaus Schöffmann 41
  • 42.
    Thank You! Assoc.Prof. Dr.Klaus Schoeffmann, Associate Professor at Universität Klagenfurt, Austria ks@itec.aau.at | EndoscopicVideo.com