Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Medical Multimedia Systems and Applications

41 views

Published on

These are the slides of our tutorial presented in Nice on 29th October 2019 at ACM Multimedia 2019.

Published in: Science
  • Be the first to comment

  • Be the first to like this

Medical Multimedia Systems and Applications

  1. 1. Medical Multimedia Systems and Applications Steven Hicks1 / Michael Riegler1, Pål Halvorsen1, Klaus Schoeffmann2 2 Institute of Information Technology Klagenfurt University, Austria 1 Simula Research Laboratory Norway
  2. 2. • Introduction & Overview • Multimedia Data in Medicine • Characteristics of Endoscopic Video • Different Fields and Communities • Application 1: Post-Procedural Usage of Surgery Videos • Domain-Specific Storage for long-term Archiving • Medical Video Content Analysis and Datasets • Medical Video Interaction • Application 2: Diagnostic Decision Support and Case Studies • Knowledge Transfer • Analysis • Feedback • Explainability and Trust • Conclusions & Outlook Agenda ACM Multimedia 2019 Tutorial Medical Multimedia Systems and Applications 2
  3. 3. ACM Multimedia 2019 Tutorial Medical Multimedia Systems and Applications 3 Notice This presentation contains images and videos from medical surgeries, which you may find disturbing!
  4. 4. Introduction ACM Multimedia 2019 Tutorial Medical Multimedia Systems and Applications 4
  5. 5. Medical inspections/interventions produce many kinds of data • Medical text • OR reports, Patient records… • Sensor signals • ECG, EEG, vital signs • Medical images (radiology) • Ultrasound, x-ray • CT, MRI, PET, … • Medical video • Screenings • Surgery Multimedia Data in Medicine ACM Multimedia 2019 Tutorial Medical Multimedia Systems and Applications 5  Signal Processing  Medical Imaging  Robotics  Multimedia  Data Mining
  6. 6. • Traditional open surgery? • Minimally-invasive surgery • Interventions with endoscopes • Reduced trauma for patient • Less invasive and faster • Less rehabilitation time • Microscopic surgery Video Data Sources in Medicine ACM Multimedia 2019 Tutorial Medical Multimedia Systems and Applications 6
  7. 7. Therapeutic Endoscopy ACM Multimedia 2019 Tutorial Medical Multimedia Systems and Applications 7 • Rigid endoscope • Small incisions • Therapy / Surgery • Laparoscopy • Cholecystectomy • Gynecological Surgery • Urological Surgery • … • Arthroscopy • …
  8. 8. Diagnostic Endoscopy ACM Multimedia 2019 Tutorial Medical Multimedia Systems and Applications 8 • Flexible endoscope • Natural orifices • Diagnosis / Inspections • Gastroenterology (colonoscopy, gastroscopy) • Bronchoscopy • Hysteroscopy • … • WCE (Wireless capsule endoscopy)
  9. 9. Endoscopic Video Examples ACM Multimedia 2019 Tutorial Medical Multimedia Systems and Applications 9
  10. 10. Domain-specific Characteristics & Challenges ACM Multimedia 2019 Tutorial Medical Multimedia Systems and Applications 10 • Full HD or 4K (even stereo 3D) • One shot recordings • Up to multiple hours • Homogenous color distribution • Visually very similar content • Circular content area • Fast motion • Geometric distortion • Specular reflections • Occlusions • Smoke, motion blur, blood, flying particles • Size!
  11. 11. Literature Overview ACM Multimedia 2019 Tutorial Medical Multimedia Systems and Applications 11 Münzer, Bernd, Klaus Schoeffmann, and Laszlo Böszörmenyi. "Content-based processing and analysis of endoscopic images and videos: A survey." Multimedia Tools and Applications (2017): 1-40.
  12. 12. Pre-Processing • Image Enhancement • Contrast enhancement, color misalignment correction… • Camera calibration and distortion correction • Specular reflection removal • Comb structure removal & super resolution • … • Information Filtering • Frame filtering • Image segmentation ACM Multimedia 2019 Tutorial Medical Multimedia Systems and Applications 12
  13. 13. Real-time Support at Intervention Time Applications  Diagnosis support  Robot-assisted surgery  Context awareness  Augmented reality ACM Multimedia 2019 Tutorial Medical Multimedia Systems and Applications 13
  14. 14. Post-Procedural Applications Management and Retrieval • Compression and storage • Content-based retrieval • Temporal video segmentation • Video summarization • Visualization & Interaction Quality Assessment  Skills assessment  Education & Training  Error Rating  Assessment of intervention quality ACM Multimedia 2019 Tutorial Medical Multimedia Systems and Applications 14
  15. 15. Post-Procedural Use of Surgery Videos ACM Multimedia 2019 Tutorial Medical Multimedia Systems and Applications 15
  16. 16. • Video documentation of endoscopic procedures is on the rise • “a picture paints a thousand words“, a moving picture paints millions! • In some countries even mandatory already • Current documentation practice poses many problems • Hard task to retrieve relevant information • Huge amounts of storage space • High ratio of irrelevant data (“rubbish”) • Very inefficient encoding (especially for HD content) Motivation for Video Documentation ACM Multimedia 2019 Tutorial Medical Multimedia Systems and Applications 16
  17. 17. • Later inspection of specific moments • Discussion of critical moments (e.g., with OP team) • Information to patients • Preparation of future interventions • Forensics & investigations (e.g., comparisons) • Training & teaching • Surgical quality assessment (technical errors) Post-Procedural Use of Surgical Videos ACM Multimedia 2019 Tutorial Medical Multimedia Systems and Applications 17
  18. 18. Full Storage of Endoscopic Videos • Exemplary hospital • 5 departments (Lap, Gyn, Arthro, GI, ENT) • 2 operation rooms, each 4 ops/day, each op ca. 1-2h •  i.e. 40 interventions per day, each ~ 90 mins. • 60 hours video per day! • Assumption: HD 1920x1080, H.264/AVC • 270 GB / day (1h=4.5 GB) • 1.9 TB / week • 100 TB / year (200 TB MPEG-2) 4K: even more ACM Multimedia 2019 Tutorial Medical Multimedia Systems and Applications 18 Great challenge for a hospital’s IT department!
  19. 19. How to Reduce Storage Requirements? Exploit domain-specific characteristics: 1. Spatial compression optimization 2. Temporal compression optimization 3. Perceptual quality based optimization 4. Long-term archiving strategy Transcoding ACM Multimedia 2019 Tutorial Medical Multimedia Systems and Applications 19 up to 30% up to 40% up to 93%
  20. 20. Study on Video Quality • Subjective quality assessment • Catharina Hospital Eindhoven, NL • 37 participants • 19 experienced surgeons and 18 trainees • 7 women, 30 men, average age: 40 years • Subjective tests regarding maximum compression 1) Perceivable quality loss • Double-Stimulus (ITU-R BT.500-11) • Switch between reference and test video 2) Perceivable semantic information loss • Single Stimulus (ITU-R P.910) • Assessing random videos (incl. reference) Münzer, B., Schoeffmann, K., Böszörmenyi, L., Smulders, J. F., & Jakimowicz, J. J. (2014, May). Investigation of the impact of compression on the perceptional quality of laparoscopic videos. In 2014 IEEE 27th International Symposium on Computer-Based Medical Systems (pp. 153-158). IEEE. Session 1 Session 2 ACM Multimedia 2019 Tutorial Medical Multimedia Systems and Applications 20
  21. 21. Assessment of Video Quality (Session 1) -5 0 5 10 15 20 25 30 35 0 3000 6000 9000 12000 15000 18000 21000 24000 20 22 24 26 28 18 20 22 24 26 18 18 DifferenceMeanOpinionScore(DMOS) Bitrate(Kb/s) Test Conditions Average bitrate Rating difference 1920x1080 1280x720 960x540 640x360 subjectively better than reference Reference video (MPEG-2, HD, 20 (35) Mbit/s) “lossless” ACM Multimedia 2019 Tutorial Medical Multimedia Systems and Applications 21 crf (constant rate factor)
  22. 22. Assessment of Video Quality (Session 2) 1. Visually lossless with 8 Mbit/s Q1 (in comparison to 20 Mbit/s) Reduction: 60% data vs. 0% MOS 2. Good quality with 2,5 Mbit/s and Q2 reduced resolution (1280x720) Reduction: 88% data vs. 7% MOS 3. Acceptable quality with 1,4 Mbit/s Q3 and lower resolution (640x360) Reduction: 93% data vs. 31% MOS 1 2 3 ACM Multimedia 2019 Tutorial Medical Multimedia Systems and Applications 22
  23. 23. Example Videos 1280x720 Weak compression 16 MB (crf 18) 640x360 Strong compression 0,8 MB (crf 26) 20x ACM Multimedia 2019 Tutorial Medical Multimedia Systems and Applications 23
  24. 24. Medical Video Analysis ACM Multimedia 2019 Tutorial Medical Multimedia Systems and Applications 24
  25. 25. • With several hours of videos each day • Manual search in archive becomes impractical! • Automatic content analysis • Filter for relevant scenes in the videos • Anatomical structures • Surgical actions • Instruments • Operation phases • Irregular/Adverse events • … • Content classification (e.g., with neural networks) • Video Retrieval/Interaction systems Medical Videos ACM Multimedia 2019 Tutorial Medical Multimedia Systems and Applications 25 Suture Cutting Injection Coagulation? ? ? ? ?
  26. 26. 1000 frames (sampled from 17min with 1fps) 2 6 ACM Multimedia 2019 Tutorial Medical Multimedia Systems and Applications
  27. 27. Content Relevance Filtering / Instrument Recognition ACM Multimedia 2019 Tutorial Medical Multimedia Systems and Applications 27 Münzer, B., Schoeffmann, K., & Böszörmenyi, L. (2013, December). Relevance segmentation of laparoscopic videos. In Multimedia (ISM), 2013 IEEE International Symposium on (pp. 84-91). IEEE. Primus, M. J., Schoeffmann, K., & Böszörmenyi, L. (2015, June). Instrument classification in laparoscopic videos. In Content-Based Multimedia Indexing (CBMI), 2015 13th International Workshop on (pp. 1-6). IEEE. Instrument detection/segmentation for better content understanding (e.g., op phase segmentation, following instruments in robot-assisted surgery) Out-of-patient Scenes Blurry Scenes Border Area
  28. 28. Smoke Detection ACM Multimedia 2019 Tutorial Medical Multimedia Systems and Applications 28
  29. 29. Smoke Detection ACM Multimedia 2019 Tutorial Medical Multimedia Systems and Applications 29 Cauterization in 90% surgeries Instruments: Laser or HF (100° - 1200° C) filtration system (manual)  Automatic Smoke Detection & Removal? (Real-Time)
  30. 30. Automatic Smoke Detection ACM Multimedia 2019 Tutorial Medical Multimedia Systems and Applications 30 Achievable Performance with Saturation Peak Analysis (SPA)
  31. 31. Automatic Smoke Detection - Performance ACM Multimedia 2019 Tutorial Medical Multimedia Systems and Applications 31 20K images (DS A) 10K images (DS A) 4.5K images (DS B) SPA: Saturation Peak Analysis GLN RGB: GoogLeNet using RGB images GLN SAT: GoogLeNet using saturation channel only Deep Learning
  32. 32. Real-Time Smoke Detection Prototype ACM Multimedia 2019 Tutorial Medical Multimedia Systems and Applications 32 Andreas Leibetseder, Manfred J. Primus, Stefan Petscharnig, and Klaus Schoeffmann. “Image-based Smoke Detection in Laparoscopic Videos“. Proceedings of Computer Assisted and Robotic Endoscopy and Clinical Image-Based Procedures: 4th International Workshop, CARE 2017, and 6th International Workshop, CLIP 2017, held in Conjunction with MICCAI 2017, Quebec City, QC, Canada, September 14, 2017, pp. 70-87
  33. 33. Surgical Action Classification ACM Multimedia 2019 Tutorial Medical Multimedia Systems and Applications 33
  34. 34. Gynecologic Laparoscopy: Relevant Surgical Actions ACM Multimedia 2019 Tutorial Medical Multimedia Systems and Applications 34 Dissection– 58 Segs / 35.517 Pics Coagulation– 212 Segs / 84.786 Pics Cutting cold – 271 Segs / 26.388 Pics Cutting– 106 Segs / 92.653 Pics Hysterectomy– 25 Segs / 68.466 Pics Injection– 52 Segs / 52.355 Pics Suturing– 92 Segs / 321.851 PicsSuction & Irrigation – 173 Segs / 73.977 Pics 1.105 segments (823.000 frames) 9h annotated video of 111 interventions 10-fold cross-validation Stefan Petscharnig and Klaus Schoeffmann. 2018. Learning Laparoscopic Video Shot Classification for Gynecological Surgery. Multimedia Tools and Applications (MTAP), 77, 7, Springer US, 8061- 8079.
  35. 35. Gynecologic Laparoscopy: Surgical Actions Classification ACM Multimedia 2019 Tutorial Medical Multimedia Systems and Applications 35 R...Recall P...Precision
  36. 36. • Early fusion • Integrate motion information from consecutive frames • Feed into CNN as additional input channel(s) • Compare two approaches • Block-Based Motion Estimation (BBME): using block matching • Residual Motion (ResM): local motion • Late fusion • Assume we already know scene boundaries and classify all frames of segments • Temporal aggregation of single-frame classifications • Majority vote (maximum occurrence of class in frames of scene) • Average confidence Fusing Temporal Information with CNNs ACM Multimedia 2019 Tutorial Medical Multimedia Systems and Applications 36 S. Petscharnig, K. Schöffmann, J. Benois-Pineau, S. Chaabouni and J. Keckstein, "Early and Late Fusion of Temporal Information for Classification of Surgical Actions in Laparoscopic Gynecology," 2018 IEEE 31st International Symposium on Computer-Based Medical Systems (CBMS), Karlstad, 2018, pp. 369-374.
  37. 37. Gynecologic Laparoscopy: Surgical Actions Classification ACM Multimedia 2019 Tutorial Medical Multimedia Systems and Applications 37 Petscharnig, S., & Schöffmann, K. (2017). Learning laparoscopic video shot classification for gynecological surgery. Multimedia Tools and Applications, 1-19.
  38. 38. Instrument Segmentation/Recognition ACM Multimedia 2019 Tutorial Medical Multimedia Systems and Applications 38
  39. 39. Instrument Segmentation/Recognition ACM Multimedia 2019 Tutorial Medical Multimedia Systems and Applications 39 INPUT Video recordings of laparoscopic procedures in gynecology OUTPUT Position and category of each instrument in the video
  40. 40. • Use a region-based CNN for 1. Binary instrument segmentation • distinguish between instrument instances and background (without recognizing the actual instrument) 2. Multi-class instrument recognition • Labeling different instrument segments • We approach this task by using • Mask R-CNN • Very small dataset (only about 50 examples/instrument; 12 classes) • Several data augmentation techniques Surgical Instrument Segmentation/Recognition ACM Multimedia 2019 Tutorial Medical Multimedia Systems and Applications 40 Sabrina Kletz, Klaus Schoeffmann, Jenny Benois-Pineau, and Heinrich Husslein. 2019. Identifying Surgical Instruments in Laparoscopy Using Deep Learning Instance Segmentation. Proceedings of the International Conference on Content-Based Multimedia Indexing (CBMI 2019). IEEE, Los Alamitos, CA, USA, 6 pages
  41. 41. Instrument Segmentation/Recognition: Dataset ACM Multimedia 2019 Tutorial Medical Multimedia Systems and Applications 41 11 different instrument types and one class covering unspecified instruments.
  42. 42. • Settings • Training from scratch and transfer learning from COCO dataset • 60/20/20 split for training, validation, and test • SGD as optimizer, different LR={0.01, 0.001, 0.0001} • Evaluation • Average precision with IoU (Jaccard index) for every instance • with ground truth G and the detected region D • COCO metrics • Average precision with different thresholds • AP50 and AP50:95 Instrument Segmentation/Recognition: Experimental Setup ACM Multimedia 2019 Tutorial Medical Multimedia Systems and Applications 42 𝐼𝑜𝑈 = 𝑇 ∩ 𝐷 𝑇 ∪ 𝐷
  43. 43. Instrument Segmentation/Recognition: Quantitative Results ACM Multimedia 2019 Tutorial Medical Multimedia Systems and Applications 43
  44. 44. Quantitative Results of Multi-Class Segmentation ACM Multimedia 2019 Tutorial Medical Multimedia Systems and Applications 44 Classification performance after 50th epoch
  45. 45. Instrument Segmentation/Recognition: Qualitative Results ACM Multimedia 2019 Tutorial Medical Multimedia Systems and Applications 45 Sabrina Kletz, Klaus Schoeffmann, Jenny Benois-Pineau, and Heinrich Husslein. 2019. Identifying Surgical Instruments in Laparoscopy Using Deep Learning Instance Segmentation. Proceedings of the International Conference on Content-Based Multimedia Indexing (CBMI 2019). IEEE, Los Alamitos, CA, USA, 6 pages
  46. 46. Medical Video Datasets ACM Multimedia 2019 Tutorial Medical Multimedia Systems and Applications 46
  47. 47. LapGyn4: Laparoscopic Gynecology Dataset ACM Multimedia 2019 Tutorial Medical Multimedia Systems and Applications 47 Surgical Actions (~31K images) Anatomical Structures (~3K images) Andreas Leibetseder, Stefan Petscharnig, Manfred Jürgen Primus, Sabrina Kletz, Bernd Münzer, Klaus Schoeffmann, and Jörg Keckstein. 2018. Lapgyn4: a dataset for 4 automatic content analysis problems in the domain of laparoscopic gynecology. In Proceedings of the 9th ACM Multimedia Systems Conference (MMSys '18). ACM, New York, NY, USA, 357-362. Instrument Count (~22K images) Suturing on Anatomy (~1K images) • Over 57,000 images • 500+ surgeries • Baseline Evaluations: GoogleNet • 5-fold cross validation over 100 epochs
  48. 48. • Dataset with annotations of endometriosis • benign but potentially painful anomaly affecting females in child-bearing age • Dislocation of uterine-like tissue; cicatrization and enclosed bleedings • Serious and painful disease • Often hard to diagnose GLENDA: Gynecologic Laparoscopy Endometriosis Dataset ACM Multimedia 2019 Tutorial Medical Multimedia Systems and Applications 48
  49. 49. • Many of which show endometriosis cases of varying severity • Pathology: peritoneum, ovary, uterus, deep infiltrated endometriosis (DIE) • No pathology • Region-based and temporal expert annotations • hand-drawn sketches GLENDA Dataset – 25682 Frames from 400+ Surgeries ACM Multimedia 2019 Tutorial Medical Multimedia Systems and Applications 49 Andreas Leibetseder, Sabrina Kletz, Klaus Schoeffmann, Simon Keckstein, and Jörg Keckstein. 2020. GLENDA: Gynecologic Laparoscopy Endometriosis Dataset. Proceedings of the 26th International Conference on Multimedia Modeling 2020 (MMM2020). Lecture Notes in Computer Science, Springer International Publishing, Cham, 12 pages. to appear http://www.itec.aau.at/ftp/datasets/GLENDA/
  50. 50. GLENDA Dataset – Endometriosis Examples ACM Multimedia 2019 Tutorial Medical Multimedia Systems and Applications 50
  51. 51. • Cataract-101 • Videos recorded from 101 cataract surgeries in 2017 and 2018 • Only surgeries without any serious complications • Comes with phase segmentation ground truth (11 phases) Cataract-101 Video Dataset ACM Multimedia 2019 Tutorial Medical Multimedia Systems and Applications 51 Klaus Schoeffmann, Mario Taschwer, Stephanie Sarny, Bernd Münzer, Manfred Jürgen Primus, and Doris Putzgruber. 2018. Cataract-101: video dataset of 101 cataract surgeries. In Proceedings of the 9th ACM Multimedia Systems Conference (MMSys '18). ACM, New York, NY, USA, 421-425. http://www-itec.aau.at/ftp/datasets/ovid/cat-101/
  52. 52. Classification of Cataract OP Phases ACM Multimedia 2019 Tutorial Medical Multimedia Systems and Applications 52 Manfred J. Primus, Doris Putzgruber-Adamitsch, Mario Taschwer, Bernd Münzer, Yosuf El-Shabrawi, Laszlo Böszörmenyi, and Klaus Schoeffmann. 2018. Frame-Based Classification of Operation Phases in Cataract Surgery Videos. In Proceedings of the 24th International Conference on Multimedia Modeling 2018 (MMM2018). Lecture Notes in Computer Science, vol 10704, Springer, Cham, 241-253.
  53. 53. Typical instruments used in Cataract surgery: • Primary incision knife (pik) • Secondary incision knife (sik) • Katena forceps (kf) • Capsulorhexis forceps (cf) • Cannula (c) • 27 gauge cannula (27gc) • Phacoemulsifier handpiece (ph) • Spatula (s) • Irrigation/aspiration handpiece (iah) • Implant injector (ii) Cataract Instrument Recognition (Cat-101 Dataset) ACM Multimedia 2019 Tutorial Medical Multimedia Systems and Applications 53 pik kf + sik cf c 27gc ph s iah ii
  54. 54. • Classification Study • 26 randomly selected videos • Manually annotated 8000 frames for instrument usage (see next slide) • 800 frames for each of the 10 instruments (balanced) • Instrument classification (full frame) and generalization performance • ResNet-50, Inception v3, NASNet Mobile • Multi-label classification, loss=binary cross-entropy, bs=32, 50 epochs training from scratch • Tested with different settings (Adam optimizer, SGD, lrinit=0.1/0.01/0.001) Cataract Instrument Recognition (Cat-101 Dataset) ACM Multimedia 2019 Tutorial Medical Multimedia Systems and Applications 54 Natalia Sokolova, Klaus Schoeffmann, Mario Taschwer, Doris Putzgruber-Adamitsch, and Yosuf El-Shabrawi. 2020. Evaluating the Generalization Performance of Instrument Classification in Cataract Surgery Videos. Proceedings of the 26th International Conference on Multimedia Modeling 2020 (MMM2020). Lecture Notes in Computer Science, Springer International Publishing, Cham, 11 pages. to appear
  55. 55. Medical Video Interaction Tools ACM Multimedia 2019 Tutorial Medical Multimedia Systems and Applications 55
  56. 56. Past/Current Status ACM Multimedia 2019 Tutorial Medical Multimedia Systems and Applications 56 Patient names File Explorers & Segments to Download 2014 2009
  57. 57. Desired Status ACM Multimedia 2019 Tutorial Medical Multimedia Systems and Applications 57 Bernd Münzer, Klaus Schoeffmann and Laszlo Boeszoermenyi. “EndoXplore: A Web-based Video Explorer for Endoscopic Videos“. Proceedings of the IEEE International Symposium on Multimedia 2017 (ISM 2017), Taipei, Taiwan, 2017, pp. 1-2
  58. 58. Special Content Visualization ACM Multimedia 2019 Tutorial Medical Multimedia Systems and Applications 58
  59. 59. • Clinicians check full video recordings for occurrence of technical errors: • Errors are rated according to standardized schemes (e.g., OSATS, GERT) and surgeons are made aware of them • Studies have shown that this significantly improves surgical quality Surgical Quality Assessment (SQA) ACM Multimedia 2019 Tutorial Medical Multimedia Systems and Applications 59
  60. 60. Surgical Quality Assessment (SQA) ACM Multimedia 2019 Tutorial Medical Multimedia Systems and Applications 60
  61. 61. Surgical Quality Assessment (SQA) Software ACM Multimedia 2019 Tutorial Medical Multimedia Systems and Applications 61 • Integrating rating features • More efficient video navigation/browsing Marco A. Hudelist, Heinrich Husslein, Bernd Muenzer, Sabrina Kletz and Klaus Schoeffmann. “A Tool to Support Surgical Quality Assessment“, in Proceedings of the Third IEEE International Conference on Multimedia Big Data (BigMM), Laguna Hills, CA, USA, 2017, pp. 238-239.
  62. 62. (Diagnostic) Decision Support ACM Multimedia 2019 Tutorial Medical Multimedia Systems and Applications 62
  63. 63. Challenges and Requirements ACM Multimedia 2019 Tutorial Medical Multimedia Systems and Applications 63
  64. 64. There is a Need for Complete Systems! anomalies are missed detection depends on experience there is a lack of medical personnel for large scale screening programs ACM Multimedia 2019 Tutorial Medical Multimedia Systems and Applications 64
  65. 65. There is a Need for Complete Systems! Medical knowledge transfer Automated analysis / detection / classification Feedback / visualization & administrative ACM Multimedia 2019 Tutorial Medical Multimedia Systems and Applications 65
  66. 66. • Medical knowledge transfers – need DATA w/Ground Truth • High detection accuracy • Fast and efficient: real-time feedback and large scale • Fit the normal examination procedures • Assist administrative and report writing work • Adhere to ethical, legal, privacy challenges & regulations Key Challenges & Requirements ACM Multimedia 2019 Tutorial Medical Multimedia Systems and Applications 66
  67. 67. Gastrointestinal (GI) Case Study (challenges, system support, datasets, diagnostic decision support, ...) ACM Multimedia 2019 Tutorial Medical Multimedia Systems and Applications 67
  68. 68. • Many types of diseases can potentially affect the human gastrointestinal (GI) tract • about 2.8 millions of new luminal GI cancers (esophagus, stomach, colorectal) are detected yearly • the mortality is about 65% • Screening of the GI tract using different types of endoscopy… • is costly (colonoscopy according to NY Times: $1100/patient, $10 billion dollars) • consumes valuable medical personnel time (1-2 hours) • does not scale to large populations • is intrusive to the patient • … • Current technology may potentially enable automatic algorithmic screening and assisted examinations  a true interdisciplinary activity with high chances of societal impact GI Tract Challenges and Potential ACM Multimedia 2019 Tutorial Medical Multimedia Systems and Applications 68
  69. 69. Colorectal Cancer Women Men Colorectal cancer is the third most common cause of cancer mortality for both women and men, and it is a condition where early detection is important for survival, i.e., a 5-year survival probability of going from a low 10-30% if detected in later stages to a high 90% survival probability in early stages. Colonoscopy is not the ideal screening test. Related to the cancer example, on average 20% of polyps (possible predecessors of cancer) are missed or incompletely removed. The risk of getting cancer largely depend on the endoscopists ability to detect and remove polyps. Large inter- and intra-clinician variations. A 1% increase in detection can decrease the risk of cancer with 3%. ACM Multimedia 2019 Tutorial Medical Multimedia Systems and Applications 69
  70. 70. Automatic Detection of Anomalies Colonoscopy & Gastroscopy ACM Multimedia 2019 Tutorial Medical Multimedia Systems and Applications 70
  71. 71. • A polyp is an abnormal growth of tissue attached to the underlying mucosa • Detection accuracy depends on experience and skills • average miss rates of approx. 20% • large inter- and intra-variations (e.g., a norwegian study shows variations between 36-65% for polyps) • should reach a high (>85%) accuracy threshold to be acceptable • Current technology may potentially enable automated algorithmic assisted examinations • Introduce a digital “third eye” (with high accuracy and real-time processing) Standard endoscopy: Live Polyp Detection ACM Multimedia 2019 Tutorial Medical Multimedia Systems and Applications 71
  72. 72. A complete System ACM Multimedia 2019 Tutorial Medical Multimedia Systems and Applications 72
  73. 73. System Overview ACM Multimedia 2019 Tutorial Medical Multimedia Systems and Applications 73
  74. 74. Medical Knowledge Transfer (Data Collection) ACM Multimedia 2019 Tutorial Medical Multimedia Systems and Applications 74
  75. 75. Available GI Datasets Name Contain Annotation Size Type Usage CVC-ClinicDB Polyps GT masks 12000 images (several versions) Trad. ©, by permission ETIS-Larib Polyp DB Polyps, Normal GT masks 1500 images Trad. ©, by permission ASU-Mayo Clinic DB Polyps, Normal GT masks 20 videos Trad. ©, by permission Colonoscopy Videos DB Various Lesions Sorted 76 videos Trad. Academic Capsule Endoscopy DB Various Lesions and Findings Sorted 3170 images, 47 videos VCE Academic, by request GastroAtlas Various Lesions and Findings Sorted, Text annotations 4449 videos Trad. Academic WEO Atlas Various Lesions and Findings Sorted, Text annotations ? Trad. Academic GASTROLAB Various Lesions and Findings Sorted, Text annotations ? Trad. Academic Atlas of GE Various Lesions Sorted, Text annotations 669 images Trad. ©, by permission KID Various Lesions Sorted 2500 + 47 videos Trad. ©, by permission ASU-Mayo dataset: POLYPS • 20 videos • 10 with polyps, 10 without • 8-64 seconds long • varying resolution • ~18.000 frames/images • image mask of polyp (ground truth) • (currently) restricted use CVC: POLYPS • CVC-356 – 356 polyp images, 1350 normal frames • CVC-612 – 612 polyp images, 1350 normal frames • CVC-968 – 968 polyp images, 1350 normal frames • CVC-12K – 10025 polyp images, 1929 normal frames • image mask of polyp (ground truth) • (currently) restricted use Need more data to transfer the medical knowledge, and thus tools … ACM Multimedia 2019 Tutorial Medical Multimedia Systems and Applications 75
  76. 76. • Which image is not from the same class? … and it gets worse … • Making a mistake between cats and dogs may not matter, but a misclassification here may have lethal consequences Why Can’t CS People Do the Annotation!? PylorusZ-line Z-line Z-line Z-line Z-line ACM Multimedia 2019 Tutorial Medical Multimedia Systems and Applications 76
  77. 77. Available time of the clinicians? ACM Multimedia 2019 Tutorial Medical Multimedia Systems and Applications 77
  78. 78. • Simple and efficient • Web-based • Assisted object tracking Video Annotation Subsystem "Expert Driven Semi-Supervised Elucidation Tool for Medical Endoscopic Videos" Zeno Albisser, et. al. Proceedings of MMSys, Portland, OR, USA, March 2015 ACM Multimedia 2019 Tutorial Medical Multimedia Systems and Applications 78
  79. 79. • For large collection of images • VV / Kvasir dataset • Fully cleaned • Feature extraction mechanisms • Different unsupervised clustering algorithms • Hierarchical image collection visualization • Open source: ClusterTag https://bitbucket.org/mpg_projects/clustertag ClusterTag: Image Clustering and Tagging Tool "ClusterTag: Interactive Visualization, Clustering and Tagging Tool for Big Image Collections" Konstantin Pogorelov, et. al. Proceedings of ICMR, Bucharest, Romania, June 2017 ACM Multimedia 2019 Tutorial Medical Multimedia Systems and Applications 79
  80. 80. • Still need even more efficient tools and data of entire procedures 1. “Annotation” during examination 2. Video with bookmarks 3. Annotate bookmarks 4. Automatically annotate neighboring frames using object tracking – and verify Next version of the annotation tool ACM Multimedia 2019 Tutorial Medical Multimedia Systems and Applications 80
  81. 81. • Multi-Class Image Dataset for Computer Aided GI Disease Detection • GI endoscopy images • Some images contain the position and configuration of the endoscope (scope guide) • 8 different anomalies and anatomical landmarks • v1: 500 images per class, 6 pre-extracted global features • v2: 1000 images per class • v3: 16 classes, multi-label – to be released soon • Open source: http://datasets.simula.no/kvasir/ The Kvasir Dataset "Kvasir: A Multi-Class Image-Dataset for Computer Aided Gastrointestinal Disease Detection" Konstantin Pogorelov, et al. Proceedings of MMSYS, Taiwan, June 2017 ACM Multimedia 2019 Tutorial Medical Multimedia Systems and Applications 81
  82. 82. • Bowel Preparation Quality Video • 21 GI endoscopy videos of colon • Some frames contain the position and configuration of the endoscope (scope guide) • 4 classes showing the four-score Boston Bowel Preparation Scale (BBPS)-defined bowel-preparation quality • 0 - very dirty • … • 3 - very clean • Open source: http://datasets.simula.no/nerthus/ The Nerthus Dataset "Nerthus: A Bowel Preparation Quality Video Dataset" Konstantin Pogorelov, et al. Proceedings of MMSYS, Taiwan, June 2017 ACM Multimedia 2019 Tutorial Medical Multimedia Systems and Applications 82
  83. 83. • Kvasir does not contain segmentation masks • 1000 accurate pixel-accurate masks of the polyps in Kvasir • Some similar datasets exist (e.g., CVC-356, CVC-612, ETIS-Larib Polyp DB), but small, restricted, etc. • http://datasets.simula.no/kvasir-seg/ The Kvasir-SEG Dataset ”Kvasir-SEG: A Segmented Polyp Dataset" Debesh Jha, et al. Proceedings of MMM, Korea, January 2020 ACM Multimedia 2019 Tutorial Medical Multimedia Systems and Applications 83
  84. 84. GI Anomaly Detection System ACM Multimedia 2019 Tutorial Medical Multimedia Systems and Applications 84
  85. 85. • Common approaches • Handcrafted features • Convolutional neural network • Generative Adversarial Networks • Easy to extend with new diseases • Easy to extend with new algorithms • Easy to train • Results are explainable? • Disease Localization? • Real-time? Requirements Detection and Automatic Analysis subsystem ACM Multimedia 2019 Tutorial Medical Multimedia Systems and Applications 85
  86. 86. State-of-The-Art: Example Detection Systems – 5 years ago Polyp-Alert • detects polyps using edges and texture • near real-time feedback during colonoscopy (10fps) • detected 97.7% (42 of 43) of polyp shots on 53 randomly selected (not per frame detection) • one of the few end-to-end systems • Wallapak Tavanapong – from MM community 100s of new approaches the last years, many with good detection results… ACM Multimedia 2019 Tutorial Medical Multimedia Systems and Applications 86
  87. 87. Performance (accuracy and speed) ACM Multimedia 2019 Tutorial Medical Multimedia Systems and Applications 87
  88. 88.  Mayo dataset (18781 images/frames)  masks for all polyps • GF: • JCD and Tamura • recall 98.50%, precision 93.88%, fps ~300 • CNN: • Modified Inception v3: recall 95.86%, precision 80.78%, fps: ~30 • Inception v3 + WEKA: recall: 88.87%, precision: 89.16%, fps: ~30 ASU Mayo Dataset: Polyp Detection ”EIR - Efficient Computer Aided Diagnosis Framework for Gastrointestinal Endoscopies" Michael Riegler, et. al. Proceedings of CBMI, Bucharest, Romania, June 2016 ACM Multimedia 2019 Tutorial Medical Multimedia Systems and Applications 88
  89. 89. • Resource consumption and processing performance of GF: • CNNs (also including GPU support)? • tests so far: ~30 fps (same GPU as above) • but adding layers, more networks, … !?? (newer GPU) • Inception v3: 66 fps, plain CNN: ~40-45 fps • GAN: ~12 fps (for 160x160) ASU Mayo Dataset: Polyp Detection ACM Multimedia 2019 Tutorial Medical Multimedia Systems and Applications 89
  90. 90. • Vestre Viken (VV) multi-disease dataset (250 images per class) • GF: • recall 90.60 % • precision 91.40% • fps ~30 • CNN: • recall: 87.20% • precision: 87.90% • fps: ~30 VV Dataset: Multi-Disease Detection ""Efficient disease detection in gastrointestinal videos - global features versus neural networks" Konstantin Pogorelov, et. al. Multimedia Tools and Applications, 2017 ACM Multimedia 2019 Tutorial Medical Multimedia Systems and Applications 90
  91. 91. • GF • CNN VV Dataset: Multi-Disease Detection ""Efficient disease detection in gastrointestinal videos - global features versus neural networks" Konstantin Pogorelov, et. al. Multimedia Tools and Applications, 2017 ACM Multimedia 2019 Tutorial Medical Multimedia Systems and Applications 91
  92. 92. • 7 different algorithms • Convolutional neural networks (CNN) (2) – trained from scratch • 3-layers • 6-layers • Transfer learning (1) – retrained Inception v3 • Global features (4) • 2 global features (JCD, Tamura) • 6 global features (JCD, Tamura, Color Layout, Edge Histogram, Auto Color Correlogram and PHOG) • 2 different algorithms (Random forest and logistic model tree) • 2 baselines • Random Forrest with one global feature • Majority class • 2-folded cross validation Kvasir Dataset v1: Multi-Disease Detection ACM Multimedia 2019 Tutorial Medical Multimedia Systems and Applications 92
  93. 93. Kvasir Dataset v1: Multi-Disease Detection ACM Multimedia 2019 Tutorial Medical Multimedia Systems and Applications 93
  94. 94. Kvasir Dataset v1: Multi-Disease Detection DyedandLiftedPolypDyedResectionMargin ACM Multimedia 2019 Tutorial Medical Multimedia Systems and Applications 94
  95. 95. Kvasir Dataset v1: Multi-Disease Detection CecumPylorus ACM Multimedia 2019 Tutorial Medical Multimedia Systems and Applications 95
  96. 96. • Using same GF and some new deep features, i.e., • Pre-trained ImageNet dataset Inception v3 • ResNet50 models • Used different ML classifications; • random tree (RT) • random forest (RF) • logistic model tree (LMR) – performed best • Uses weights of 1000 pre-defined concepts as features • Top layer input as features vector (16384 for Inception v3 and 2048 for ResNet50) Kvasir Dataset v1  v2: Multi-Disease Detection Pretrained model Output or top- layer input weights WEKA for classification ACM Multimedia 2019 Tutorial Medical Multimedia Systems and Applications 96
  97. 97. • Multiclass: 16 classes of anomalies and landmarks • Very varying dataset sizes for the different classes • Combination of retrained networks Kvasir Dataset v2  v3: Multi-Disease Detection ACM Multimedia 2019 Tutorial Medical Multimedia Systems and Applications 97
  98. 98. • MICCAI • … • Medico @ MediaEval • BioMedia @ ACM MM Competitions Team Approaches F1 FPS SCL-UMD Global-features and deep-features extraction, Inception-V3 and VGGNet CNN models, followed by machine-learning-based classification using RT, RF, SVM and LMR classifiers 0.848 1.3 FAST-NU-DS Global and local features combined followed by data size reduction by applying K-means clustering and than using logistic regression model for the classification 0.767 2.3 ITEC-AAU Two different custom Inception-like CNN models 0.755 1.4 HKBU A manifold learning method (bidirectional marginal Fisher analysis) learning a compact representation of the data, then machine-learning-based multi-class support vector machine is used for the classification 0.703 2.2 SIMULA GF-features extraction, ResNet50 and Inception-V3 CNN models and followed by machine-learning-based classification using RT, RF and LMR classifiers 0.826 46.0 Team and Run Name F1 MCC Average Processing Speed HCMUS 0,934236452 0,931232439 Fastenough S@M (Simula) 0,929733339 0,928383755 LesCats (Simula) 0,923640116 0,922827982 RUNE (Simula) 0,855590739 0,855590694 UMM-SIM_detection_InResV2-Van_3712 0,836795839 0,836636058 ParaNoMundo_detection_kt12dense201_3808 0,811417906 0,814635359 AAUITEC_detection_LSVM-comb2_5293 0,866259873 0,864100277 SIMULA_detection_run1_5293 0,814535427 0,811510687 FASTNUCES_detection_ver1_300 0,586802677 0,602579617 NOAT_detection_1_5293 0,391347034 0,390125827 HKBU_detection_1_5293 0,482962822 0,460894862 ACM Multimedia 2019 Tutorial Medical Multimedia Systems and Applications 98
  99. 99. Compared: • Handcrafted global features (GF-D) using LIRE • Retrained and fine tuned existing DL architectures (RT-D) • Generative adversarial network (GAN) • Combined various datasets captured by different equipment in different hospitals. • With our best working GAN-based detection approach, • we reached detection specificity of ~94% and accuracy of ~90% with only 356 training and 6,000 test samples, slightly better if increasing training size • though a bit too many false positives (a bit low sensitivity) The Next Level: Comparing Handcrafted and Deep Learning Features – Cross Datasets ACM Multimedia 2019 Tutorial Medical Multimedia Systems and Applications 99
  100. 100. Detecting Bowel Cleanness Levels ACM Multimedia 2019 Tutorial Medical Multimedia Systems and Applications 100
  101. 101. • 7 different algorithms • Convolutional neural networks (CNN) (2) – trained from scratch • 3-layers • 6-layers • Transfer learning (1) – retrained Inception v3 • Global features (4) • 2 global features (JCD, Tamura) • 6 global features (JCD, Tamura, Color Layout, Edge Histogram, Auto Color Correlogram and PHOG) • 2 different algorithms (Random forest and logistic model tree) • 2 baselines • Random Forrest with one global feature • Majority class • 2-folded cross validation Nerthus Dataset: Bowel Cleanness Level ACM Multimedia 2019 Tutorial Medical Multimedia Systems and Applications 101
  102. 102. Nerthus Dataset: Bowel Cleanness Level ACM Multimedia 2019 Tutorial Medical Multimedia Systems and Applications 102
  103. 103. Nerthus Dataset: Bowel Cleanness Level ACM Multimedia 2019 Tutorial Medical Multimedia Systems and Applications 103
  104. 104. Localization / Segmentation ACM Multimedia 2019 Tutorial Medical Multimedia Systems and Applications 104
  105. 105. • Detection first, then process only frames containing polyps • Image enhancements • Detects curve-shaped objects and local maximums • Builds energy map and selects 4 possible locations • Localization performance: • recall 31.83 %, • precision 32.07% • ~30 fps • later better GPU: ~75 fps (detection: 300 fps ; localization 100 fps) ASU Mayo Dataset: Polyp Localization ACM Multimedia 2019 Tutorial Medical Multimedia Systems and Applications 105
  106. 106. • Can we generate the full mask, not only pointing to one of the affected pixels? • Extended and improved the ResUNet architecture and compared to several other segmentation systems Kvasir-SEG: Generate the anomaly mask Input Conv2D (3х3) BN ReLU Conv2D (3х3) Addition Squeeze & Excite Atrous Spatial Pyramidal Pooling (ASPP) BN ReLU Upsampling Attention Conv2D (3х3) BN ReLU Conv2D (3х3) Conv2D (3х3) ReLU Addition Squeeze & Excite BN Conv2D (3х3) ReLU BN Conv2D (3х3) ReLU Addition Squeeze & Excite BN Conv2D (3х3) ReLU BN ASPP Outputs Conv2D (1х1) Sigmoid Concatenate Addition BN ReLU Upsampling Attention Conv2D (3х3) BN ReLU Conv2D (3х3) Concatenate Addition BN ReLU Upsampling Attention Conv2D (3х3) BN ReLU Conv2D (3х3) Concatenate Addition Encoding Decoding ACM Multimedia 2019 Tutorial Medical Multimedia Systems and Applications 106
  107. 107. • Can we generate the full mask, not only pointing to one of the affected pixels? • Extended and improved the ResUNet architecture and compared to several other segmentation systems: • U-Net • ResUNet • ResUNet-mod • ResUNet++ Kvasir-SEG: Generate the anomaly mask ACM Multimedia 2019 Tutorial Medical Multimedia Systems and Applications 107
  108. 108. • Can we generate the full mask, not only pointing to one of the affected pixels? Kvasir-SEG: Generate the anomaly mask Trained and tested on Kvasir-SEG: Original Ground truth UNet ResUNet ResUNet-mod ResUNet++ ACM Multimedia 2019 Tutorial Medical Multimedia Systems and Applications 108
  109. 109. • Can we generate the full mask, not only pointing to one of the affected pixels? Kvasir-SEG: Generate the anomaly mask Trained on CVC-612 and tested on Kvasir-SEG: Original Ground truth UNet ResUNet ResUNet-mod ResUNet++ ACM Multimedia 2019 Tutorial Medical Multimedia Systems and Applications 109
  110. 110. Preprocessing & Augmentation ACM Multimedia 2019 Tutorial Medical Multimedia Systems and Applications 110
  111. 111. • Too little data!! • Blurry images due to camera motion • Objects too close to camera • Under or over scene lighting • Flares • Artificial objects and natural “contaminations” • Low resolution of capsular endoscopes • … Data Challenges: Preprocessing ACM Multimedia 2019 Tutorial Medical Multimedia Systems and Applications 111
  112. 112. Data Enhancements for CNN Training ACM Multimedia 2019 Tutorial Medical Multimedia Systems and Applications 112
  113. 113. Data Enhancements for CNN Training ACM Multimedia 2019 Tutorial Medical Multimedia Systems and Applications 113
  114. 114. • Artifacts in the images can influence the algorithm • Understanding of what the algorithm reacts to is crucial Borders and Overlays ACM Multimedia 2019 Tutorial Medical Multimedia Systems and Applications 114
  115. 115. • Results on Kvasir + CVC-986 • Accuracy improved for almost all models with some preprocessing (F1 from 0.7% to 4.4%) Borders and Overlays ACM Multimedia 2019 Tutorial Medical Multimedia Systems and Applications 115
  116. 116. • Replacing artifacts in the video/image • Different methods • Clipping • Autoencoders • Contextencoder • Context Conditional (CC)-GAN • Some difference but marginal GAN inpainting of Navigation Box ACM Multimedia 2019 Tutorial Medical Multimedia Systems and Applications 116
  117. 117. Automatic Detection of Angiectasia Video Capsule Endoscopy ACM Multimedia 2019 Tutorial Medical Multimedia Systems and Applications 117
  118. 118. Video Capsule (PillCam)  Standard colonoscopy:  expensive  does not scale  intrusive ACM Multimedia 2019 Tutorial Medical Multimedia Systems and Applications 118
  119. 119. Video Capsule (PillCam)  Standard colonoscopy:  expensive  does not scale  intrusive  Wireless Video Capsule endoscopy:  better scale  less intrusive  possible to combine examinations!?  watch hours of video  less expensive? (detection might lead to an endoscopy) ACM Multimedia 2019 Tutorial Medical Multimedia Systems and Applications 119
  120. 120. • Angiectasia is a vascular lesions that can cause of GI bleedings • Medical specialists reach a detection accuracy of about 69% • Medical systems should reach an 85% threshold to be acceptable in clinical use Angiectasia Detection ACM Multimedia 2019 Tutorial Medical Multimedia Systems and Applications 120
  121. 121. Angiectasia Detection: Varying Difficulty ACM Multimedia 2019 Tutorial Medical Multimedia Systems and Applications 121
  122. 122. • By far, GANs give the best detection: • sensitivity: 98% • specificity: 100% • BUT, sloooooow… • Several approaches are better than the average doctor (69%) • Most of the approaches have a too low detection rate, but still better than the baseline • Compromise between accuracy and speed Detection Compared • VCE dataset from GIANA 2017 (300 with angiectasia and 300 without) • 10-fold cross validation ACM Multimedia 2019 Tutorial Medical Multimedia Systems and Applications 122
  123. 123. Detection Feedback ACM Multimedia 2019 Tutorial Medical Multimedia Systems and Applications 123
  124. 124. Detection Subsystem Outputs • Visualize the output of the system to the medical doctors • Simple and easy to understand (most important) • Easy to integrate in hospitals • Live support • Useable for automatic reports, etc. ACM Multimedia 2019 Tutorial Medical Multimedia Systems and Applications 124
  125. 125. Real-time Detection Feedback ACM Multimedia 2019 Tutorial Medical Multimedia Systems and Applications 125
  126. 126. Real-time Detection Feedback ACM Multimedia 2019 Tutorial Medical Multimedia Systems and Applications 126
  127. 127. Increasing Understanding & Assisting Administrative Work ACM Multimedia 2019 Tutorial Medical Multimedia Systems and Applications 127
  128. 128. • Understanding: A black box will not work – neither for patients nor clinicians • Reporting: Critical for communication and evidence, but a huge overhead • Inconsistent descriptions of abnormalities • Poor adoption of existing standards • Time consuming (up to 15 minutes or more) • Boring and reduced job satisfaction Understanding and Reporting ACM Multimedia 2019 Tutorial Medical Multimedia Systems and Applications 128
  129. 129. Mimir: Reporting of Endoscopies • A way of interpreting the output of a neural network • deeper analysis of why the model produces a given result • class discriminatory visualizations based on selected class and layer. • tools for uploading and managing various models. • Automatic generation of modifiable medical reports • Produced Visualizations • grad-CAM technique • saliency and class activation maps ACM Multimedia 2019 Tutorial Medical Multimedia Systems and Applications 129
  130. 130. ACM Multimedia 2019 Tutorial Medical Multimedia Systems and Applications 130
  131. 131. ACM Multimedia 2019 Tutorial Medical Multimedia Systems and Applications 131
  132. 132. Human Reproduction Case Study ACM Multimedia 2019 Tutorial Medical Multimedia Systems and Applications 132
  133. 133. Why semen analysis? • Every year, over 45 million couples experience involuntary childlessness, with 40% of cases due in some part to male fertility problems • Semen analysis is one of the first procures done when determining infertility. • Current methods are either time- consuming and prone to human error, or require expensive laboratory equipment. ACM Multimedia 2019 Tutorial Medical Multimedia Systems and Applications 133
  134. 134. What is semen quality • When analyzing semen quality, we often look at multiple visual features of the spermatozoa (sperm) together with information about the patient. • The problem is that we know that patient parameters impact semen quality, but we don’t know how. • This is a true multimodal problem where our expertise could have great impact. • But right now, let’s look at some visual features which are commonly used to determine quality. ACM Multimedia 2019 Tutorial Medical Multimedia Systems and Applications 134
  135. 135. Sperm count ACM Multimedia 2019 Tutorial Medical Multimedia Systems and Applications 135
  136. 136. Morphology is used to assess the shape and size of a sperm, focusing on the tail, midpiece and head. ACM Multimedia 2019 Tutorial Medical Multimedia Systems and Applications 136
  137. 137. Morphology Examples ACM Multimedia 2019 Tutorial Medical Multimedia Systems and Applications 137
  138. 138. Motility is used to assess the movements of each sperm, the can be grouped into progressive, non-progressive and immotile. Non-progressive ImmotileProgressive ACM Multimedia 2019 Tutorial Medical Multimedia Systems and Applications 138
  139. 139. Non-Progressive Example ACM Multimedia 2019 Tutorial Medical Multimedia Systems and Applications 139
  140. 140. Progressive Example ACM Multimedia 2019 Tutorial Medical Multimedia Systems and Applications 140
  141. 141. Immotile Example ACM Multimedia 2019 Tutorial Medical Multimedia Systems and Applications 141
  142. 142. VISEM – A Multimodal Video Dataset of Human Spermatozoa • We have a dataset consisting of 85 microscopic videos of human semen, all from different participants. • Each video comes with a preliminary semen analysis done according to WHO standards. • The dataset also contains information about each participant (such as age and BMI), sex hormone levels of the participant, and some parameters extracted from existing sperm analysis machines. • Data is open-source and available at datasets.simula.no/visem. Low Mid- low Mid- high High ACM Multimedia 2019 Tutorial Medical Multimedia Systems and Applications 142
  143. 143. How should we tackle this? • Determining the different visual features requires different approaches. • Morphology is more focuses on the spatial features of a frame, while motility requires the temporal dimension. • Not all videos are create equal, some videos are not properly focused or include fluid drift. • We must find clever solutions which incorporate the temporal information of the frames together with the participant-related data. ACM Multimedia 2019 Tutorial Medical Multimedia Systems and Applications 143
  144. 144. Baseline Approach • No other methods to directly compare. • To create a baseline, we calculate the ZeroR across our collected dataset. • Metrics used to measure performance is the mean absolute error (MAE) and the root mean squared error (RMSE). Morphology Baseline Motility Baseline ACM Multimedia 2019 Tutorial Medical Multimedia Systems and Applications 144
  145. 145. Baseline Approach • No other methods to directly compare. • To create a baseline, we calculate the ZeroR across our collected dataset. • Metrics used to measure performance is the mean absolute error (MAE) and the root mean squared error (RMSE). Morphology Baseline Motility Baseline ACM Multimedia 2019 Tutorial Medical Multimedia Systems and Applications 145
  146. 146. 3D Convolutional Neural Networks • Using multiple frame to predict quality using 3D convolutional neural networks. ACM Multimedia 2019 Tutorial Medical Multimedia Systems and Applications 146
  147. 147. Using an autoencoder to extract temporal features into images. • We use an autoencoder which takes multiple frames to extract temporal features into an RGB image. • Used to predict both morphology and motility. ACM Multimedia 2019 Tutorial Medical Multimedia Systems and Applications 147
  148. 148. Generate optical flow from the video frames. • Compress the temporal information of sequential frames by using sparse or dense optical flow. • Clearly see the movement of the different sperm across time. • Using synthetic sperm videos to accurately estimate optical flow using GANs. ACM Multimedia 2019 Tutorial Medical Multimedia Systems and Applications 148
  149. 149. MediaEval 2019 – Medico Multimedia Task • Three different tasks related to analyzing human semen. • Main task is predicting motility and morphology using the VISEM dataset. • 5 submissions using a variety of approaches. • Tune in next week and join the fun in 2020! ACM Multimedia 2019 Tutorial Medical Multimedia Systems and Applications 149
  150. 150. Open challenges and future directions. • How should we combine the patient data with the visual features to better predict semen quality? • What data to use and how to include it. • Tracking individual sperm cells to find the “best” spermatozoon. • Combining semen analysis with embryo data to better understand the relationship between sperm and successful egg fertilization. • Next step… ACM Multimedia 2019 Tutorial Medical Multimedia Systems and Applications 150
  151. 151. Embryo analysis and prediction • Analyzing time-lapse videos of embryo development. • Get a better understanding of early embryo development and the health of offspring. • Increase success rate of in vitro fertilization. • Ethical and legal challenges. ACM Multimedia 2019 Tutorial Medical Multimedia Systems and Applications 151
  152. 152. Predicting Performance of Soccer Players ACM Multimedia 2019 Tutorial Medical Multimedia Systems and Applications 152
  153. 153. Initial challenge: Logging and Monitoring ACM Multimedia 2019 Tutorial Medical Multimedia Systems and Applications 153
  154. 154. pmSys: Reporting using a mobile app ACM Multimedia 2019 Tutorial Medical Multimedia Systems and Applications 154
  155. 155. Coach Web Portal  See team overviews − all, averages − planned load  Send reminders  See individual views  Simple automatic “predictions” ACM Multimedia 2019 Tutorial Medical Multimedia Systems and Applications 155
  156. 156. Would like to perform proper predictions! ACM Multimedia 2019 Tutorial Medical Multimedia Systems and Applications 156
  157. 157. • 2008-09, Spanish 1st division: 24.360 player-days absent • 2017-19: • Premier league clubs paid £217m in wages to injured players • Manchester United has an average cost of £870.00 per injury (high salaries) • Champions Manchester City suffered the second fewest number of injuries • 2018-19 • Manchester City won PL with a minimum margin: 98 vs 97 points (2nd and 3rd highest ever) Important to find an optimal training regime, avoid injuries and pick the right players for the game Would like to perform proper predictions! ACM Multimedia 2019 Tutorial Medical Multimedia Systems and Applications 157
  158. 158. • Recurrent neural networks – Long Short-Term Memory (LSTM) • handles the complexity of sequences – well-suited to classifying, processing and making predictions based on time series data • Motivating example: Airline passengers LSTM ACM Multimedia 2019 Tutorial Medical Multimedia Systems and Applications 158
  159. 159. • Dataset • two professional Norwegian teams • data from 2017 and 2018 • 6000 days of reports • many parameters, but our initial experiments used “readiness to train” • LSTM • sequence numbers of 36 • 30 ephocs • batch size of 4 • 4 layers – input, 2 hidden, output • rmsprop optimizer • Model training • training and predicting on the same player • training on all players but one • Aim: detecting positive and negative peaks Initial Experiments ACM Multimedia 2019 Tutorial Medical Multimedia Systems and Applications 159
  160. 160. Analyzing player data using LSTM (training on one player) Needs more data then training on just ONE player… Team 1 Team 2 ACM Multimedia 2019 Tutorial Medical Multimedia Systems and Applications 160
  161. 161. Analyzing player data using LSTM (training on all but one players) Predicting the positive and negative peaks with a precision and recall above 90% ACM Multimedia 2019 Tutorial Medical Multimedia Systems and Applications 161
  162. 162. • If we manage to give good predictions… • better training • less injuries • better results • Challenges • enough data • detect all corner cases • making the users believe in the predictions Consequences and challenges ACM Multimedia 2019 Tutorial Medical Multimedia Systems and Applications 162
  163. 163. So, MEDICAL MULTIMEDIA - all problems solved!!?? ACM Multimedia 2019 Tutorial Medical Multimedia Systems and Applications 163
  164. 164. • Still improve accuracy and system performance 1. Full system integration 2. Exploiting domain expert knowledge – build datasets 3. Integration of various data, multi-modality – new sensors 4. Explainable AI 5. Patient context information 6. Visualization (AR/VR) 7. Decision support and administrative aids 8. … • The potential for real impact is HUGE!! • screening / diagnosis • personalized medicine • automatic treatment • improving exercise, rehabilitation and sport performance • autonomous and remote surgeries • … Open Challenges & Potential "Multimedia and Medicine: Teammates for Better Disease Detection and Survival" Michael Riegler, et. al. Proceedings ACM MM, Amsterdam, The Netherlands, October 2016 ACM Multimedia 2019 Tutorial Medical Multimedia Systems and Applications 164
  165. 165. • We have given several case-specific examples, but in general, they are common • Doctors want to use all the data for general support: analysis, diagnostics, reporting, teaching, statistics, similarity search / comparisons, … • Currently, … • more and more high quality data is recorded / produced • data analysis methods are promising • multi modal data analysis is not very common • good visualization tools exist, but not used (e.g., AR, VR, …) • some tools are missing • many (other) areas produce separate (isolated) methods • … • and, we need a complete integrated system!  Our multimedia community is needed Summary ACM Multimedia 2019 Tutorial Medical Multimedia Systems and Applications 165
  166. 166. ImageCLEF 2020 CLEF, 22-25 S eptem ber, Thes s aloniki, Greece http://www.im ag eclef.org/2020 #ImageCLEFlifelog2020# (4th edition) An increasingly wide range of personal devices that allow capturing pictures, videos, and audio clips for every m om ent of our lives, are becom ing available. In this context, the task addresses the problem s oflifelogging data retrievaland sum m arization. Organizers: Duc-Tien Dang -Nguyen (University of Bergen), Luca Piras (Pluribus One & University of Cagliari), MichaelR ieg ler & PålHalvorsen (S imula Research Laboratory), Minh-Triet Tran (University of S cience), Cathal G urrin (Dublin City University), Mathias Lux (Klagenfurt University). #ImageCLEFcoral2020# (2nd edition) The increasing use of structure-from -m otion photogram m etry for m odelling large- scale environm ents from action cam eras has driven the next generation of visualization techniques. The task addresses the problem of autom atically segm enting and labeling a collection of im ages that can be used in com bination to create 3D m odels forthe m onitoring ofcoralreefs. Organizers: Jon Cham berlain, Adrian Clark, & Alba G arcía Seco de Herrera (University of Essex), Antonio Cam pello (Wellcome Trust). #ImageCLEFmedical2020# (2nd edition) Medical im ages can be used in a variety of scenarios and this task will com bine the m ost popular m edicaltasks ofIm ageCLEF and continue the last year idea of m ixing various applications, nam ely: autom atic im age captioning and scene understanding, m edical visual question answering and decision support on tuberculosis. This allows to explore synergies between tasks. Organizers: Asm a Ben Abacha & Dina Dem ner-Fushm an (National Library of Medicine), Sadid A. Hasan, V ivek Datla & Joey Liu (Philips Research Cambridge), Obiom a Pelka & Christoph M. Friedrich (University of Applied S ciences and Arts Dortmund), Alba G arcía Seco de Herrera (University of Essex), Yashin Dicente Cid (University of Warwick), Serg e Kozlovski, V itali Liauchuk, & V assili Kovalev (United Institute of Informatics Problems), Henning Müller(HES -S O). #ImageCLEFdrawnUI2020# (new) Enabling people to create websites by drawing them on a piece of paper would m ake the webpage building process m ore accessible. The task addresses the problem of autom atically recognizing hand drawn objects representing website UIs, which willbe further translated into autom atic website code. Organizers: PaulBrie & Fichou Dim itri (teleportHQ), Mihai Dogariu, Liviu Daniel Ștefan, Mihai G abrielConstantin, & Bogdan Ionescu (University Politehnica of Bucharest). Contact on s ocial media Facebook https://www.facebook.com /Im ageClef Twitter https://twitter.com /im ageclef Im ageCLEF 2020 is an evaluation cam paign that is being organized as part ofthe CLEF (Conference and Labs ofthe Evaluation Forum ) labs. The cam paign offers several research tasks that welcom e participation from team s around the world. The results of the cam paign appear in the working notes, published by CEUR (CEUR -W S.org) and are presented in the CLEFconference. Selected contributions am ong the participants will be invited for publication in the following year in the Springer Lecture Notes in Com puter Science (LNCS), together with the annuallab overviews. Target com m unities involve (but are not lim ited to): information retrieval (e.g., text, vision, audio, m ultim edia, social m edia, sensor data), machine learning, deep learning, data mining, natural language processing, image and video processing; with special em phasis on the challenges of multi-modality, multi- linguality, and interactive search. Overall coordination Bogdan Ionescu, University Politehnica of Bucharest, R om ania Henning Müller, HES -S O, S ierre, Switzerland R enaud Pé teri, University of La Rochelle, France Important Dates (depending on tasks) end of April, 2020: registration closes; beginning of May, 2020: runs due; end of May, 2020: working notes due. #imageclef20 #clef2020ACM Multimedia 2019 Tutorial Medical Multimedia Systems and Applications 166
  167. 167. The End… ACM Multimedia 2019 Tutorial Medical Multimedia Systems and Applications 167

×