Successfully reported this slideshow.

Histolab: an Open Source Python Library for Reproducible Digital Pathology

0

Share

1 of 46
1 of 46

Histolab: an Open Source Python Library for Reproducible Digital Pathology

0

Share

Download to read offline

The histo-pathological analysis of tissue sections is the gold standard to assess the presence of many complex diseases, such as tumors and it is expected to be at the center of the AI revolution in medicine, prevision supported by the increasing success of deep learning applications to digital pathology. The aim of histolab is to provide a tool for Whole Slide Images (WSIs) processing in a reproducible environment to support clinical and scientific research. histolab is designed to handle WSIs, automatically detect the tissue, and retrieve informative tiles.

The histo-pathological analysis of tissue sections is the gold standard to assess the presence of many complex diseases, such as tumors and it is expected to be at the center of the AI revolution in medicine, prevision supported by the increasing success of deep learning applications to digital pathology. The aim of histolab is to provide a tool for Whole Slide Images (WSIs) processing in a reproducible environment to support clinical and scientific research. histolab is designed to handle WSIs, automatically detect the tissue, and retrieve informative tiles.

More Related Content

Related Books

Free with a 30 day trial from Scribd

See all

Related Audiobooks

Free with a 30 day trial from Scribd

See all

Histolab: an Open Source Python Library for Reproducible Digital Pathology

  1. 1. : an Open Source Python Library for Reproducible Digital Pathology 9-10-11 Novembre, 2021 Ernesto Arbitrio ernesto.arbitrio@gmail.com @__pamaron__ Alessia Marcolini alessia.marcolini@hk3lab.ai @viperale
  2. 2. 2 Alessia Ernesto Data Science M.Sc. @ TU Eindhoven / TU Berlin Junior Data Scientist @ HK3lab PyCon Italia Organizer Senior Backend Engineer @ YouGov PLC PyCon Italia Organizer Open Source Contributor
  3. 3. 3 I am so excited to work on this new digital pathology project!! day 1 me!
  4. 4. Histopathology Primary diagnostic resource for the identification of complex diseases, in particular of tumors https://www.poliambulanza.it/dipartimenti/dipartimento-di-oncologia/anatomia-patologica Section of Pathology and Tumour Biology, University of Leeds Digital Pathology Scanning of histopathological glass slides to create Whole Slide Images 4
  5. 5. 5 I am so excited to work on this new digital pathology project!! Let’s find some literature first! day 1
  6. 6. 6 day 3 *Reading papers without any mention of repo containing the source code*
  7. 7. 7 day 5 Yay, finally some code!
  8. 8. 8 day 5 *opens code*
  9. 9. 9 day 5 *opens code*
  10. 10. 10 “Tagliatelle” effect, 2017 https://twitter.com/viperale/status/1202980338693226496 day 30
  11. 11. Whole Slide Images Multi-resolution image (e.g. 5× and 20×) Pyramidal format Up to 90,000px × 30,000px Very large in size up to 10GB Scanner vendor specific file format e.g. .svs, .vms, .ndpi, .tif, .bif, .scn Ad-hoc software for viewing and processing Artifacts like shadows, mold, pen marks Image from Y. Wang et al. 2012. SurfaceSlide: A multitouch digital pathology platform 10.1371/journal.pone.0030783 11
  12. 12. Whole Slide Images Multi-resolution image (e.g. 5× and 20×) Pyramidal format Up to 90,000px × 30,000px Very large in size up to 10GB Scanner vendor specific file format e.g. .svs, .vms, .ndpi, .tif, .bif, .scn Ad-hoc software for viewing and processing Artifacts like shadows, mold, pen marks Image from Y. Wang et al. 2012. SurfaceSlide: A multitouch digital pathology platform 10.1371/journal.pone.0030783 11 https://twitter.com/tdmckee/status/1456585006982340611?s=21
  13. 13. Whole Slide Images Multi-resolution image (e.g. 5× and 20×) Pyramidal format Up to 90,000px × 30,000px Very large in size up to 10GB Scanner vendor specific file format e.g. .svs, .vms, .ndpi, .tif, .bif, .scn Ad-hoc software for viewing and processing Artifacts like shadows, mold, pen marks Image from Y. Wang et al. 2012. SurfaceSlide: A multitouch digital pathology platform 10.1371/journal.pone.0030783 11
  14. 14. why? Using WSIs directly as input to DL is unfeasible Preprocessing to create smaller subwindows ("tiles") is required Preprocessing steps usually poorly detailed in research papers Leading to results that are hard to reproduce Need for a reference high quality preprocessing software To enable faster prototyping and faster experimentation 12
  15. 15. why? Using WSIs directly as input to DL is unfeasible Preprocessing to create smaller subwindows ("tiles") is required Preprocessing steps usually poorly detailed in research papers Leading to results that are hard to reproduce Need for a reference high quality preprocessing software To enable faster prototyping and faster experimentation 12
  16. 16. why? Using WSIs directly as input to DL is unfeasible Preprocessing to create smaller subwindows ("tiles") is required Preprocessing steps usually poorly detailed in research papers Leading to results that are hard to reproduce Need for a reference high quality preprocessing software To enable faster prototyping and faster experimentation 12 https://paperswithcode.com/
  17. 17. why? Using WSIs directly as input to DL is unfeasible Preprocessing to create smaller subwindows ("tiles") is required Preprocessing steps usually poorly detailed in research papers Leading to results that are hard to reproduce Need for a reference high quality preprocessing software To enable faster prototyping and faster experimentation 12 https://www.paperswithoutcode.com/ https://paperswithcode.com/
  18. 18. why? Using WSIs directly as input to DL is unfeasible Preprocessing to create smaller subwindows ("tiles") is required Preprocessing steps usually poorly detailed in research papers Leading to results that are hard to reproduce Need for a reference high quality preprocessing software To enable faster prototyping and faster experimentation 12
  19. 19. why? Using WSIs directly as input to DL is unfeasible Preprocessing to create smaller subwindows ("tiles") is required Preprocessing steps usually poorly detailed in research papers Leading to results that are hard to reproduce Need for a reference high quality preprocessing software To enable faster prototyping and faster experimentation 12 https://twitter.com/alexkyllo/status/1457072262520004632
  20. 20. why? Using WSIs directly as input to DL is unfeasible Preprocessing to create smaller subwindows ("tiles") is required Preprocessing steps usually poorly detailed in research papers Leading to results that are hard to reproduce Need for a reference high quality preprocessing software To enable faster prototyping and faster experimentation 12 https://twitter.com/alexkyllo/status/1457072262520004632 https://twitter.com/petebankhead/status/1407630531290927105?s=21
  21. 21. 13 team me! Nicole Ernesto day 60
  22. 22. new open source Python package for reproducible Whole Slide Images preprocessing aimed at an easy integration with a Deep Learning pipeline 14
  23. 23. unifying community-validated procedures for slide preprocessing and tiles extraction introducing best practices from software engineering: automated testing, code versioning and code reviews, Continuous Integration on top of state-of-the-art and well-known libraries, e.g. OpenSlide, NumPy and scikit-image approach 15
  24. 24. Histolab features Interoperability between different formats up to 9 supported formats from the major scanner vendors #1 16
  25. 25. Histolab features Interoperability between different formats up to 9 supported formats from the major scanner vendors #1 Automatic tissue detection and segmentation by using a fixed sequence of image filters #2 16
  26. 26. Histolab features Interoperability between different formats up to 9 supported formats from the major scanner vendors #1 Automatic tissue detection and segmentation by using a fixed sequence of image filters #2 Automatic informative tiles retrieval cropped regions from tissue areas found in #2 #3 16
  27. 27. Histolab features Interoperability between different formats up to 9 supported formats from the major scanner vendors #1 Automatic tissue detection and segmentation by using a fixed sequence of image filters #2 Automatic informative tiles retrieval cropped regions from tissue areas found in #2 #3 Easy access to sample data from TCGA and OpenSlide save to the system cache and import them #4 16
  28. 28. Histolab in action WSI Image dataset (tiles) DL pipeline Prostate Cancer Sample, TCGA-PRAD 16,000px × 15,316px Magnification 5× 512px × 512px Magnification 20× 512px × 512px 17
  29. 29. Tiles extraction #3 in less than 10 lines of code >>> from histolab.data import breast_tissue >>> _, path = breast_tissue() >>> from histolab.slide import Slide >>> slide = Slide(path, "path/to/processed") >>> from histolab.tiler import RandomTiler >>> random_tiles_extractor = RandomTiler( tile_size=(512, 512), n_tiles=10, level=2, seed=42, check_tissue=True, ) >>> random_tiles_extractor.extract(slide) 1. download breast tissue sample from TCGA 2. create a Slide object 3. create a Tiler 4. extract! 18
  30. 30. Tissue detection and tiles extraction Original WSI Tissue Detection Random Tiles Breast Cancer Sample, TCGA-BRCA 96,972px × 30,682px Magnification 20× 512px × 512px 19 RandomTiler
  31. 31. Tissue detection and tiles extraction 20 ScoreTiler >>> from histolab.scorer import NucleiScorer >>> scorer = NucleiScorer() >>> from histolab.tiler import ScoreTiler >>> scored_tiles_extractor = ScoreTiler( scorer, tile_size=(512, 512), n_tiles=10, level=2, seed=42, check_tissue=True, ) >>> scored_tiles_extractor.extract(slide) Representation of the score assigned to each extracted tile by the NucleiScorer. Ovarian Cancer Sample, TCGA-OV 30,001px × 33,987px Nuclei Mask 512px × 512px tile
  32. 32. Original image 1. Grayscale filter 2. Otsu threshold 3. Binary dilation 4. Remove small holes 5. Remove small objects 6. Final mask 7. Biggest Tissue Area Box Tissue detection #2 by using this fixed sequence of image filters Nobuyuki Otsu 1979. A threshold selection method from gray-level histograms 10.1109/TSMC.1979.4310076 21
  33. 33. >>> from histolab.filters.image_filters import Compose, OtsuThreshold, RgbToGrayscale >>> from histolab.filters.morphological_filters import ( BinaryDilation, RemoveSmallHoles, RemoveSmallObjects, ) >>> filters = Compose( [ RgbToGrayscale(), OtsuThreshold(), BinaryDilation(), RemoveSmallHoles(), RemoveSmallObjects(), ] ) >>> filters(image) Original image 1. Grayscale filter 2. Otsu threshold 3. Binary dilation 4. Remove small holes 5. Remove small objects 6. Final mask 7. Biggest Tissue Area Box Nobuyuki Otsu 1979. A threshold selection method from gray-level histograms 10.1109/TSMC.1979.4310076 22 Tissue detection #2
  34. 34. Remove artifacts pen markers Original image 1. Grayscale filter 2. Otsu threshold 3. Apply Mask 4. Green Pen Filter Pen Filters implementation inspired by https://github.com/CODAIT/deep-histopath 23
  35. 35. 24 Finally some stable code I can use! present, day 500 Stable? Is it tested?
  36. 36. 25 present, day 500 Look at this! 100% coverage with 600 unit and integration tests
  37. 37. 25 present, day 500 Look at this! 100% coverage with 600 unit and integration tests
  38. 38. 26 Yes, but how do I get all of these goodies? present, day 500
  39. 39. GitHub github.com/histolab/histolab GitHub Actions for Continuous Integration
  40. 40. GitHub Actions benchmarks.yml 28 https://histolab.github.io/histolab/dev/bench/
  41. 41. GitHub Actions 29 release.yml
  42. 42. Docs histolab.readthedocs.io
  43. 43. 31 https://github.com/histolab/histolab/blob/master/CONTRIBUTING.md Join Give us a ⭐!
  44. 44. $ pip install histolab Installation 32
  45. 45. 33 You will spend more time writing tests than code but at least you will sleep tight at night 100% coverage doesn’t imply 0% bugs but stupid mistakes are easily caught Code formatting and linting is nice but automatize it to focus on the important stuff Lessons learned and notes for future self
  46. 46. Histolab is a joint work with Nicole Bussola, PhD student @ FBK-MPBA / CIBIO Thank you any question? Alessia Marcolini Ernesto Arbitrio alessia.marcolini@hk3lab.ai ernesto.arbitrio@gmail.com @viperale @__pamaron__

×