SlideShare a Scribd company logo
: an Open Source Python Library
for Reproducible Digital Pathology
9-10-11 Novembre, 2021
Ernesto Arbitrio

ernesto.arbitrio@gmail.com
@__pamaron__
Alessia Marcolini 

alessia.marcolini@hk3lab.ai
@viperale
2
Alessia Ernesto
Data Science M.Sc. @ TU Eindhoven / TU Berlin
Junior Data Scientist @ HK3lab
PyCon Italia Organizer
Senior Backend Engineer @ YouGov PLC
PyCon Italia Organizer
Open Source Contributor
3
I am so excited to work
on this new digital
pathology project!!
day 1
me!
Histopathology
Primary diagnostic resource for
the identification of complex
diseases, in particular of tumors
https://www.poliambulanza.it/dipartimenti/dipartimento-di-oncologia/anatomia-patologica
Section of Pathology and Tumour Biology, University of Leeds
Digital Pathology
Scanning of histopathological glass
slides to create Whole Slide Images
4
5
I am so excited to work
on this new digital
pathology project!!
Let’s find some
literature first!
day 1
6
day 3
*Reading papers
without any mention
of repo containing the
source code*
7
day 5
Yay, finally some code!
8
day 5
*opens code*
9
day 5
*opens code*
10
“Tagliatelle” effect, 2017
https://twitter.com/viperale/status/1202980338693226496
day 30
Whole Slide Images
Multi-resolution image (e.g. 5× and 20×)

Pyramidal format
Up to 90,000px × 30,000px

Very large in size up to 10GB
Scanner vendor specific file format
e.g. .svs, .vms, .ndpi, .tif, .bif, .scn
Ad-hoc software
for viewing and processing
Artifacts like shadows,
mold, pen marks
Image from Y. Wang et al. 2012.
SurfaceSlide: A multitouch digital pathology platform
10.1371/journal.pone.0030783
11
Whole Slide Images
Multi-resolution image (e.g. 5× and 20×)

Pyramidal format
Up to 90,000px × 30,000px

Very large in size up to 10GB
Scanner vendor specific file format
e.g. .svs, .vms, .ndpi, .tif, .bif, .scn
Ad-hoc software
for viewing and processing
Artifacts like shadows,
mold, pen marks
Image from Y. Wang et al. 2012.
SurfaceSlide: A multitouch digital pathology platform
10.1371/journal.pone.0030783
11
https://twitter.com/tdmckee/status/1456585006982340611?s=21
Whole Slide Images
Multi-resolution image (e.g. 5× and 20×)

Pyramidal format
Up to 90,000px × 30,000px

Very large in size up to 10GB
Scanner vendor specific file format
e.g. .svs, .vms, .ndpi, .tif, .bif, .scn
Ad-hoc software
for viewing and processing
Artifacts like shadows,
mold, pen marks
Image from Y. Wang et al. 2012.
SurfaceSlide: A multitouch digital pathology platform
10.1371/journal.pone.0030783
11
why?
Using WSIs directly as input to DL is unfeasible
Preprocessing to create smaller subwindows ("tiles") is required
Preprocessing steps usually poorly detailed in research papers
Leading to results that are hard to reproduce
Need for a reference high quality preprocessing software
To enable faster prototyping and faster experimentation
12
why?
Using WSIs directly as input to DL is unfeasible
Preprocessing to create smaller subwindows ("tiles") is required
Preprocessing steps usually poorly detailed in research papers
Leading to results that are hard to reproduce
Need for a reference high quality preprocessing software
To enable faster prototyping and faster experimentation
12
why?
Using WSIs directly as input to DL is unfeasible
Preprocessing to create smaller subwindows ("tiles") is required
Preprocessing steps usually poorly detailed in research papers
Leading to results that are hard to reproduce
Need for a reference high quality preprocessing software
To enable faster prototyping and faster experimentation
12
https://paperswithcode.com/
why?
Using WSIs directly as input to DL is unfeasible
Preprocessing to create smaller subwindows ("tiles") is required
Preprocessing steps usually poorly detailed in research papers
Leading to results that are hard to reproduce
Need for a reference high quality preprocessing software
To enable faster prototyping and faster experimentation
12
https://www.paperswithoutcode.com/
https://paperswithcode.com/
why?
Using WSIs directly as input to DL is unfeasible
Preprocessing to create smaller subwindows ("tiles") is required
Preprocessing steps usually poorly detailed in research papers
Leading to results that are hard to reproduce
Need for a reference high quality preprocessing software
To enable faster prototyping and faster experimentation
12
why?
Using WSIs directly as input to DL is unfeasible
Preprocessing to create smaller subwindows ("tiles") is required
Preprocessing steps usually poorly detailed in research papers
Leading to results that are hard to reproduce
Need for a reference high quality preprocessing software
To enable faster prototyping and faster experimentation
12
https://twitter.com/alexkyllo/status/1457072262520004632
why?
Using WSIs directly as input to DL is unfeasible
Preprocessing to create smaller subwindows ("tiles") is required
Preprocessing steps usually poorly detailed in research papers
Leading to results that are hard to reproduce
Need for a reference high quality preprocessing software
To enable faster prototyping and faster experimentation
12
https://twitter.com/alexkyllo/status/1457072262520004632
https://twitter.com/petebankhead/status/1407630531290927105?s=21
13
team
me!
Nicole
Ernesto
day 60
new open source Python package for 

reproducible Whole Slide Images preprocessing
aimed at an easy

integration with a Deep Learning pipeline
14
unifying community-validated procedures

for slide preprocessing and tiles extraction
introducing best practices from software
engineering: automated testing, code versioning
and code reviews, Continuous Integration
on top of state-of-the-art and well-known libraries,
e.g. OpenSlide, NumPy and scikit-image
approach
15
Histolab features
Interoperability
between different
formats
up to 9 supported
formats from the
major scanner
vendors
#1
16
Histolab features
Interoperability
between different
formats
up to 9 supported
formats from the
major scanner
vendors
#1
Automatic tissue
detection and
segmentation
by using a fixed
sequence of image
filters
#2
16
Histolab features
Interoperability
between different
formats
up to 9 supported
formats from the
major scanner
vendors
#1
Automatic tissue
detection and
segmentation
by using a fixed
sequence of image
filters
#2
Automatic
informative tiles
retrieval
cropped regions
from tissue areas
found in #2
#3
16
Histolab features
Interoperability
between different
formats
up to 9 supported
formats from the
major scanner
vendors
#1
Automatic tissue
detection and
segmentation
by using a fixed
sequence of image
filters
#2
Automatic
informative tiles
retrieval
cropped regions
from tissue areas
found in #2
#3
Easy access to
sample data
from TCGA and
OpenSlide
save to the system
cache and import
them
#4
16
Histolab in action
WSI Image dataset (tiles) DL pipeline
Prostate Cancer Sample, TCGA-PRAD
16,000px × 15,316px
Magnification 5×
512px × 512px
Magnification 20×
512px × 512px
17
Tiles extraction #3
in less than 10 lines of code
>>> from histolab.data import breast_tissue
>>> _, path = breast_tissue()
>>> from histolab.slide import Slide
>>> slide = Slide(path, "path/to/processed")
>>> from histolab.tiler import RandomTiler
>>> random_tiles_extractor = RandomTiler(
tile_size=(512, 512),
n_tiles=10,
level=2,
seed=42,
check_tissue=True,
)
>>> random_tiles_extractor.extract(slide)
1. download breast
tissue sample from
TCGA
2. create a Slide object
3. create a Tiler
4. extract!
18
Tissue detection and tiles extraction
Original WSI
Tissue
Detection
Random Tiles
Breast Cancer Sample, TCGA-BRCA
96,972px × 30,682px
Magnification 20×
512px × 512px
19
RandomTiler
Tissue detection and tiles extraction
20
ScoreTiler
>>> from histolab.scorer import NucleiScorer
>>> scorer = NucleiScorer()
>>> from histolab.tiler import ScoreTiler
>>> scored_tiles_extractor = ScoreTiler(
scorer,
tile_size=(512, 512),
n_tiles=10,
level=2,
seed=42,
check_tissue=True,
)
>>> scored_tiles_extractor.extract(slide)
Representation of the score assigned to each
extracted tile by the NucleiScorer.
Ovarian Cancer Sample, TCGA-OV
30,001px × 33,987px
Nuclei Mask
512px × 512px tile
Original image 1. Grayscale filter 2. Otsu threshold 3. Binary dilation
4. Remove small holes 5. Remove small objects 6. Final mask 7. Biggest Tissue Area Box
Tissue detection #2
by using this fixed sequence of image filters Nobuyuki Otsu 1979.
A threshold selection method from gray-level histograms
10.1109/TSMC.1979.4310076
21
>>> from histolab.filters.image_filters import Compose, OtsuThreshold, RgbToGrayscale
>>> from histolab.filters.morphological_filters import (
BinaryDilation,
RemoveSmallHoles,
RemoveSmallObjects,
)
>>> filters = Compose(
[
RgbToGrayscale(),
OtsuThreshold(),
BinaryDilation(),
RemoveSmallHoles(),
RemoveSmallObjects(),
]
)
>>> filters(image)
Original image 1. Grayscale filter 2. Otsu threshold 3. Binary dilation
4. Remove small holes 5. Remove small objects 6. Final mask 7. Biggest Tissue Area Box
Nobuyuki Otsu 1979.
A threshold selection method from gray-level histograms
10.1109/TSMC.1979.4310076
22
Tissue detection #2
Remove artifacts
pen markers
Original image 1. Grayscale filter 2. Otsu threshold
3. Apply Mask 4. Green Pen Filter
Pen Filters implementation inspired by
https://github.com/CODAIT/deep-histopath
23
24
Finally some stable
code I can use!
present, day 500
Stable? Is it
tested?
25
present, day 500
Look at this!
100% coverage with
600 unit and
integration tests
25
present, day 500
Look at this!
100% coverage with
600 unit and
integration tests
26
Yes, but how do I get
all of these goodies?
present, day 500
GitHub
github.com/histolab/histolab
GitHub Actions
for Continuous Integration
GitHub Actions
benchmarks.yml
28
https://histolab.github.io/histolab/dev/bench/
GitHub Actions
29
release.yml
Docs
histolab.readthedocs.io
31
https://github.com/histolab/histolab/blob/master/CONTRIBUTING.md
Join
Give us a ⭐!
$ pip install histolab
Installation
32
33
You will spend more time writing tests than code

but at least you will sleep tight at night

100% coverage doesn’t imply 0% bugs

but stupid mistakes are easily caught

Code formatting and linting is nice

but automatize it to focus on the important stuff
Lessons learned
and notes for future self
Histolab is a joint work with
Nicole Bussola, PhD student
@ FBK-MPBA / CIBIO
Thank you
any question?
Alessia Marcolini Ernesto Arbitrio

alessia.marcolini@hk3lab.ai ernesto.arbitrio@gmail.com
@viperale @__pamaron__

More Related Content

Similar to Histolab: an Open Source Python Library for Reproducible Digital Pathology

InformationDrivenShort
InformationDrivenShortInformationDrivenShort
InformationDrivenShortDirk Ortloff
 
2015 genome-center
2015 genome-center2015 genome-center
2015 genome-center
c.titus.brown
 
AI for All: Biology is eating the world & AI is eating Biology
AI for All: Biology is eating the world & AI is eating Biology AI for All: Biology is eating the world & AI is eating Biology
AI for All: Biology is eating the world & AI is eating Biology
Intel® Software
 
Splunk September 2023 User Group PDX.pdf
Splunk September 2023 User Group PDX.pdfSplunk September 2023 User Group PDX.pdf
Splunk September 2023 User Group PDX.pdf
Amanda Richardson
 
Smart Hydroponic Plant Growing System using IoT
Smart Hydroponic Plant Growing System using IoTSmart Hydroponic Plant Growing System using IoT
Smart Hydroponic Plant Growing System using IoT
Gustavo Sanchez Collado
 
2013 10-30-sbc361-reproducible designsandsustainablesoftware
2013 10-30-sbc361-reproducible designsandsustainablesoftware2013 10-30-sbc361-reproducible designsandsustainablesoftware
2013 10-30-sbc361-reproducible designsandsustainablesoftware
Yannick Wurm
 
Scott Edmunds @ Balti & Bioinformatics: New Models in Open Data Publishing
Scott Edmunds @ Balti & Bioinformatics: New Models in Open Data PublishingScott Edmunds @ Balti & Bioinformatics: New Models in Open Data Publishing
Scott Edmunds @ Balti & Bioinformatics: New Models in Open Data Publishing
GigaScience, BGI Hong Kong
 
[2017 AI-SOCD] AirBox Data Visualization: demonstration with case studies
[2017 AI-SOCD] AirBox Data Visualization: demonstration with case studies[2017 AI-SOCD] AirBox Data Visualization: demonstration with case studies
[2017 AI-SOCD] AirBox Data Visualization: demonstration with case studies
Ling-Jyh Chen
 
Using Docker Containers to Improve Reproducibility in Software and Web Engine...
Using Docker Containers to Improve Reproducibility in Software and Web Engine...Using Docker Containers to Improve Reproducibility in Software and Web Engine...
Using Docker Containers to Improve Reproducibility in Software and Web Engine...
Vincenzo Ferme
 
Ultra Fast Deep Learning in Hybrid Cloud Using Intel Analytics Zoo & Alluxio
Ultra Fast Deep Learning in Hybrid Cloud Using Intel Analytics Zoo & AlluxioUltra Fast Deep Learning in Hybrid Cloud Using Intel Analytics Zoo & Alluxio
Ultra Fast Deep Learning in Hybrid Cloud Using Intel Analytics Zoo & Alluxio
Alluxio, Inc.
 
Reproducible Workflow with Cytoscape and Jupyter Notebook
Reproducible Workflow with Cytoscape and Jupyter NotebookReproducible Workflow with Cytoscape and Jupyter Notebook
Reproducible Workflow with Cytoscape and Jupyter Notebook
Keiichiro Ono
 
Accessing and scripting CDK from Bioclipse
Accessing and scripting CDK from BioclipseAccessing and scripting CDK from Bioclipse
Accessing and scripting CDK from Bioclipse
Ola Spjuth
 
Report face recognition : ArganRecogn
Report face recognition :  ArganRecognReport face recognition :  ArganRecogn
Report face recognition : ArganRecogn
Ilyas CHAOUA
 
Software Sustainability: Better Software Better Science
Software Sustainability: Better Software Better ScienceSoftware Sustainability: Better Software Better Science
Software Sustainability: Better Software Better Science
Carole Goble
 
AI Food detector; A model of Generative adversarial network for food Classifier
AI Food detector; A model of Generative adversarial network for food ClassifierAI Food detector; A model of Generative adversarial network for food Classifier
AI Food detector; A model of Generative adversarial network for food Classifier
jimmy majumder
 
Intrusion Detection and Prevention System in an Enterprise Network
Intrusion Detection and Prevention System in an Enterprise NetworkIntrusion Detection and Prevention System in an Enterprise Network
Intrusion Detection and Prevention System in an Enterprise NetworkOkehie Collins
 
JiyongKim_HHMI report
JiyongKim_HHMI reportJiyongKim_HHMI report
JiyongKim_HHMI reportJiyong Kim
 
The Rationale for Continuous Delivery
The Rationale for Continuous DeliveryThe Rationale for Continuous Delivery
The Rationale for Continuous Delivery
Perforce
 
Deep Learning for Recommender Systems
Deep Learning for Recommender SystemsDeep Learning for Recommender Systems
Deep Learning for Recommender Systems
Marcel Kurovski
 

Similar to Histolab: an Open Source Python Library for Reproducible Digital Pathology (20)

Pine education-platform
Pine education-platformPine education-platform
Pine education-platform
 
InformationDrivenShort
InformationDrivenShortInformationDrivenShort
InformationDrivenShort
 
2015 genome-center
2015 genome-center2015 genome-center
2015 genome-center
 
AI for All: Biology is eating the world & AI is eating Biology
AI for All: Biology is eating the world & AI is eating Biology AI for All: Biology is eating the world & AI is eating Biology
AI for All: Biology is eating the world & AI is eating Biology
 
Splunk September 2023 User Group PDX.pdf
Splunk September 2023 User Group PDX.pdfSplunk September 2023 User Group PDX.pdf
Splunk September 2023 User Group PDX.pdf
 
Smart Hydroponic Plant Growing System using IoT
Smart Hydroponic Plant Growing System using IoTSmart Hydroponic Plant Growing System using IoT
Smart Hydroponic Plant Growing System using IoT
 
2013 10-30-sbc361-reproducible designsandsustainablesoftware
2013 10-30-sbc361-reproducible designsandsustainablesoftware2013 10-30-sbc361-reproducible designsandsustainablesoftware
2013 10-30-sbc361-reproducible designsandsustainablesoftware
 
Scott Edmunds @ Balti & Bioinformatics: New Models in Open Data Publishing
Scott Edmunds @ Balti & Bioinformatics: New Models in Open Data PublishingScott Edmunds @ Balti & Bioinformatics: New Models in Open Data Publishing
Scott Edmunds @ Balti & Bioinformatics: New Models in Open Data Publishing
 
[2017 AI-SOCD] AirBox Data Visualization: demonstration with case studies
[2017 AI-SOCD] AirBox Data Visualization: demonstration with case studies[2017 AI-SOCD] AirBox Data Visualization: demonstration with case studies
[2017 AI-SOCD] AirBox Data Visualization: demonstration with case studies
 
Using Docker Containers to Improve Reproducibility in Software and Web Engine...
Using Docker Containers to Improve Reproducibility in Software and Web Engine...Using Docker Containers to Improve Reproducibility in Software and Web Engine...
Using Docker Containers to Improve Reproducibility in Software and Web Engine...
 
Ultra Fast Deep Learning in Hybrid Cloud Using Intel Analytics Zoo & Alluxio
Ultra Fast Deep Learning in Hybrid Cloud Using Intel Analytics Zoo & AlluxioUltra Fast Deep Learning in Hybrid Cloud Using Intel Analytics Zoo & Alluxio
Ultra Fast Deep Learning in Hybrid Cloud Using Intel Analytics Zoo & Alluxio
 
Reproducible Workflow with Cytoscape and Jupyter Notebook
Reproducible Workflow with Cytoscape and Jupyter NotebookReproducible Workflow with Cytoscape and Jupyter Notebook
Reproducible Workflow with Cytoscape and Jupyter Notebook
 
Accessing and scripting CDK from Bioclipse
Accessing and scripting CDK from BioclipseAccessing and scripting CDK from Bioclipse
Accessing and scripting CDK from Bioclipse
 
Report face recognition : ArganRecogn
Report face recognition :  ArganRecognReport face recognition :  ArganRecogn
Report face recognition : ArganRecogn
 
Software Sustainability: Better Software Better Science
Software Sustainability: Better Software Better ScienceSoftware Sustainability: Better Software Better Science
Software Sustainability: Better Software Better Science
 
AI Food detector; A model of Generative adversarial network for food Classifier
AI Food detector; A model of Generative adversarial network for food ClassifierAI Food detector; A model of Generative adversarial network for food Classifier
AI Food detector; A model of Generative adversarial network for food Classifier
 
Intrusion Detection and Prevention System in an Enterprise Network
Intrusion Detection and Prevention System in an Enterprise NetworkIntrusion Detection and Prevention System in an Enterprise Network
Intrusion Detection and Prevention System in an Enterprise Network
 
JiyongKim_HHMI report
JiyongKim_HHMI reportJiyongKim_HHMI report
JiyongKim_HHMI report
 
The Rationale for Continuous Delivery
The Rationale for Continuous DeliveryThe Rationale for Continuous Delivery
The Rationale for Continuous Delivery
 
Deep Learning for Recommender Systems
Deep Learning for Recommender SystemsDeep Learning for Recommender Systems
Deep Learning for Recommender Systems
 

Recently uploaded

May Marketo Masterclass, London MUG May 22 2024.pdf
May Marketo Masterclass, London MUG May 22 2024.pdfMay Marketo Masterclass, London MUG May 22 2024.pdf
May Marketo Masterclass, London MUG May 22 2024.pdf
Adele Miller
 
AI Pilot Review: The World’s First Virtual Assistant Marketing Suite
AI Pilot Review: The World’s First Virtual Assistant Marketing SuiteAI Pilot Review: The World’s First Virtual Assistant Marketing Suite
AI Pilot Review: The World’s First Virtual Assistant Marketing Suite
Google
 
Corporate Management | Session 3 of 3 | Tendenci AMS
Corporate Management | Session 3 of 3 | Tendenci AMSCorporate Management | Session 3 of 3 | Tendenci AMS
Corporate Management | Session 3 of 3 | Tendenci AMS
Tendenci - The Open Source AMS (Association Management Software)
 
Innovating Inference - Remote Triggering of Large Language Models on HPC Clus...
Innovating Inference - Remote Triggering of Large Language Models on HPC Clus...Innovating Inference - Remote Triggering of Large Language Models on HPC Clus...
Innovating Inference - Remote Triggering of Large Language Models on HPC Clus...
Globus
 
TROUBLESHOOTING 9 TYPES OF OUTOFMEMORYERROR
TROUBLESHOOTING 9 TYPES OF OUTOFMEMORYERRORTROUBLESHOOTING 9 TYPES OF OUTOFMEMORYERROR
TROUBLESHOOTING 9 TYPES OF OUTOFMEMORYERROR
Tier1 app
 
SOCRadar Research Team: Latest Activities of IntelBroker
SOCRadar Research Team: Latest Activities of IntelBrokerSOCRadar Research Team: Latest Activities of IntelBroker
SOCRadar Research Team: Latest Activities of IntelBroker
SOCRadar
 
Orion Context Broker introduction 20240604
Orion Context Broker introduction 20240604Orion Context Broker introduction 20240604
Orion Context Broker introduction 20240604
Fermin Galan
 
Prosigns: Transforming Business with Tailored Technology Solutions
Prosigns: Transforming Business with Tailored Technology SolutionsProsigns: Transforming Business with Tailored Technology Solutions
Prosigns: Transforming Business with Tailored Technology Solutions
Prosigns
 
RISE with SAP and Journey to the Intelligent Enterprise
RISE with SAP and Journey to the Intelligent EnterpriseRISE with SAP and Journey to the Intelligent Enterprise
RISE with SAP and Journey to the Intelligent Enterprise
Srikant77
 
Large Language Models and the End of Programming
Large Language Models and the End of ProgrammingLarge Language Models and the End of Programming
Large Language Models and the End of Programming
Matt Welsh
 
Cracking the code review at SpringIO 2024
Cracking the code review at SpringIO 2024Cracking the code review at SpringIO 2024
Cracking the code review at SpringIO 2024
Paco van Beckhoven
 
How to Position Your Globus Data Portal for Success Ten Good Practices
How to Position Your Globus Data Portal for Success Ten Good PracticesHow to Position Your Globus Data Portal for Success Ten Good Practices
How to Position Your Globus Data Portal for Success Ten Good Practices
Globus
 
Lecture 1 Introduction to games development
Lecture 1 Introduction to games developmentLecture 1 Introduction to games development
Lecture 1 Introduction to games development
abdulrafaychaudhry
 
Using IESVE for Room Loads Analysis - Australia & New Zealand
Using IESVE for Room Loads Analysis - Australia & New ZealandUsing IESVE for Room Loads Analysis - Australia & New Zealand
Using IESVE for Room Loads Analysis - Australia & New Zealand
IES VE
 
How Recreation Management Software Can Streamline Your Operations.pptx
How Recreation Management Software Can Streamline Your Operations.pptxHow Recreation Management Software Can Streamline Your Operations.pptx
How Recreation Management Software Can Streamline Your Operations.pptx
wottaspaceseo
 
Globus Compute wth IRI Workflows - GlobusWorld 2024
Globus Compute wth IRI Workflows - GlobusWorld 2024Globus Compute wth IRI Workflows - GlobusWorld 2024
Globus Compute wth IRI Workflows - GlobusWorld 2024
Globus
 
Dominate Social Media with TubeTrivia AI’s Addictive Quiz Videos.pdf
Dominate Social Media with TubeTrivia AI’s Addictive Quiz Videos.pdfDominate Social Media with TubeTrivia AI’s Addictive Quiz Videos.pdf
Dominate Social Media with TubeTrivia AI’s Addictive Quiz Videos.pdf
AMB-Review
 
Cyaniclab : Software Development Agency Portfolio.pdf
Cyaniclab : Software Development Agency Portfolio.pdfCyaniclab : Software Development Agency Portfolio.pdf
Cyaniclab : Software Development Agency Portfolio.pdf
Cyanic lab
 
BoxLang: Review our Visionary Licenses of 2024
BoxLang: Review our Visionary Licenses of 2024BoxLang: Review our Visionary Licenses of 2024
BoxLang: Review our Visionary Licenses of 2024
Ortus Solutions, Corp
 
Top Features to Include in Your Winzo Clone App for Business Growth (4).pptx
Top Features to Include in Your Winzo Clone App for Business Growth (4).pptxTop Features to Include in Your Winzo Clone App for Business Growth (4).pptx
Top Features to Include in Your Winzo Clone App for Business Growth (4).pptx
rickgrimesss22
 

Recently uploaded (20)

May Marketo Masterclass, London MUG May 22 2024.pdf
May Marketo Masterclass, London MUG May 22 2024.pdfMay Marketo Masterclass, London MUG May 22 2024.pdf
May Marketo Masterclass, London MUG May 22 2024.pdf
 
AI Pilot Review: The World’s First Virtual Assistant Marketing Suite
AI Pilot Review: The World’s First Virtual Assistant Marketing SuiteAI Pilot Review: The World’s First Virtual Assistant Marketing Suite
AI Pilot Review: The World’s First Virtual Assistant Marketing Suite
 
Corporate Management | Session 3 of 3 | Tendenci AMS
Corporate Management | Session 3 of 3 | Tendenci AMSCorporate Management | Session 3 of 3 | Tendenci AMS
Corporate Management | Session 3 of 3 | Tendenci AMS
 
Innovating Inference - Remote Triggering of Large Language Models on HPC Clus...
Innovating Inference - Remote Triggering of Large Language Models on HPC Clus...Innovating Inference - Remote Triggering of Large Language Models on HPC Clus...
Innovating Inference - Remote Triggering of Large Language Models on HPC Clus...
 
TROUBLESHOOTING 9 TYPES OF OUTOFMEMORYERROR
TROUBLESHOOTING 9 TYPES OF OUTOFMEMORYERRORTROUBLESHOOTING 9 TYPES OF OUTOFMEMORYERROR
TROUBLESHOOTING 9 TYPES OF OUTOFMEMORYERROR
 
SOCRadar Research Team: Latest Activities of IntelBroker
SOCRadar Research Team: Latest Activities of IntelBrokerSOCRadar Research Team: Latest Activities of IntelBroker
SOCRadar Research Team: Latest Activities of IntelBroker
 
Orion Context Broker introduction 20240604
Orion Context Broker introduction 20240604Orion Context Broker introduction 20240604
Orion Context Broker introduction 20240604
 
Prosigns: Transforming Business with Tailored Technology Solutions
Prosigns: Transforming Business with Tailored Technology SolutionsProsigns: Transforming Business with Tailored Technology Solutions
Prosigns: Transforming Business with Tailored Technology Solutions
 
RISE with SAP and Journey to the Intelligent Enterprise
RISE with SAP and Journey to the Intelligent EnterpriseRISE with SAP and Journey to the Intelligent Enterprise
RISE with SAP and Journey to the Intelligent Enterprise
 
Large Language Models and the End of Programming
Large Language Models and the End of ProgrammingLarge Language Models and the End of Programming
Large Language Models and the End of Programming
 
Cracking the code review at SpringIO 2024
Cracking the code review at SpringIO 2024Cracking the code review at SpringIO 2024
Cracking the code review at SpringIO 2024
 
How to Position Your Globus Data Portal for Success Ten Good Practices
How to Position Your Globus Data Portal for Success Ten Good PracticesHow to Position Your Globus Data Portal for Success Ten Good Practices
How to Position Your Globus Data Portal for Success Ten Good Practices
 
Lecture 1 Introduction to games development
Lecture 1 Introduction to games developmentLecture 1 Introduction to games development
Lecture 1 Introduction to games development
 
Using IESVE for Room Loads Analysis - Australia & New Zealand
Using IESVE for Room Loads Analysis - Australia & New ZealandUsing IESVE for Room Loads Analysis - Australia & New Zealand
Using IESVE for Room Loads Analysis - Australia & New Zealand
 
How Recreation Management Software Can Streamline Your Operations.pptx
How Recreation Management Software Can Streamline Your Operations.pptxHow Recreation Management Software Can Streamline Your Operations.pptx
How Recreation Management Software Can Streamline Your Operations.pptx
 
Globus Compute wth IRI Workflows - GlobusWorld 2024
Globus Compute wth IRI Workflows - GlobusWorld 2024Globus Compute wth IRI Workflows - GlobusWorld 2024
Globus Compute wth IRI Workflows - GlobusWorld 2024
 
Dominate Social Media with TubeTrivia AI’s Addictive Quiz Videos.pdf
Dominate Social Media with TubeTrivia AI’s Addictive Quiz Videos.pdfDominate Social Media with TubeTrivia AI’s Addictive Quiz Videos.pdf
Dominate Social Media with TubeTrivia AI’s Addictive Quiz Videos.pdf
 
Cyaniclab : Software Development Agency Portfolio.pdf
Cyaniclab : Software Development Agency Portfolio.pdfCyaniclab : Software Development Agency Portfolio.pdf
Cyaniclab : Software Development Agency Portfolio.pdf
 
BoxLang: Review our Visionary Licenses of 2024
BoxLang: Review our Visionary Licenses of 2024BoxLang: Review our Visionary Licenses of 2024
BoxLang: Review our Visionary Licenses of 2024
 
Top Features to Include in Your Winzo Clone App for Business Growth (4).pptx
Top Features to Include in Your Winzo Clone App for Business Growth (4).pptxTop Features to Include in Your Winzo Clone App for Business Growth (4).pptx
Top Features to Include in Your Winzo Clone App for Business Growth (4).pptx
 

Histolab: an Open Source Python Library for Reproducible Digital Pathology

  • 1. : an Open Source Python Library for Reproducible Digital Pathology 9-10-11 Novembre, 2021 Ernesto Arbitrio ernesto.arbitrio@gmail.com @__pamaron__ Alessia Marcolini alessia.marcolini@hk3lab.ai @viperale
  • 2. 2 Alessia Ernesto Data Science M.Sc. @ TU Eindhoven / TU Berlin Junior Data Scientist @ HK3lab PyCon Italia Organizer Senior Backend Engineer @ YouGov PLC PyCon Italia Organizer Open Source Contributor
  • 3. 3 I am so excited to work on this new digital pathology project!! day 1 me!
  • 4. Histopathology Primary diagnostic resource for the identification of complex diseases, in particular of tumors https://www.poliambulanza.it/dipartimenti/dipartimento-di-oncologia/anatomia-patologica Section of Pathology and Tumour Biology, University of Leeds Digital Pathology Scanning of histopathological glass slides to create Whole Slide Images 4
  • 5. 5 I am so excited to work on this new digital pathology project!! Let’s find some literature first! day 1
  • 6. 6 day 3 *Reading papers without any mention of repo containing the source code*
  • 7. 7 day 5 Yay, finally some code!
  • 11. Whole Slide Images Multi-resolution image (e.g. 5× and 20×) Pyramidal format Up to 90,000px × 30,000px Very large in size up to 10GB Scanner vendor specific file format e.g. .svs, .vms, .ndpi, .tif, .bif, .scn Ad-hoc software for viewing and processing Artifacts like shadows, mold, pen marks Image from Y. Wang et al. 2012. SurfaceSlide: A multitouch digital pathology platform 10.1371/journal.pone.0030783 11
  • 12. Whole Slide Images Multi-resolution image (e.g. 5× and 20×) Pyramidal format Up to 90,000px × 30,000px Very large in size up to 10GB Scanner vendor specific file format e.g. .svs, .vms, .ndpi, .tif, .bif, .scn Ad-hoc software for viewing and processing Artifacts like shadows, mold, pen marks Image from Y. Wang et al. 2012. SurfaceSlide: A multitouch digital pathology platform 10.1371/journal.pone.0030783 11 https://twitter.com/tdmckee/status/1456585006982340611?s=21
  • 13. Whole Slide Images Multi-resolution image (e.g. 5× and 20×) Pyramidal format Up to 90,000px × 30,000px Very large in size up to 10GB Scanner vendor specific file format e.g. .svs, .vms, .ndpi, .tif, .bif, .scn Ad-hoc software for viewing and processing Artifacts like shadows, mold, pen marks Image from Y. Wang et al. 2012. SurfaceSlide: A multitouch digital pathology platform 10.1371/journal.pone.0030783 11
  • 14. why? Using WSIs directly as input to DL is unfeasible Preprocessing to create smaller subwindows ("tiles") is required Preprocessing steps usually poorly detailed in research papers Leading to results that are hard to reproduce Need for a reference high quality preprocessing software To enable faster prototyping and faster experimentation 12
  • 15. why? Using WSIs directly as input to DL is unfeasible Preprocessing to create smaller subwindows ("tiles") is required Preprocessing steps usually poorly detailed in research papers Leading to results that are hard to reproduce Need for a reference high quality preprocessing software To enable faster prototyping and faster experimentation 12
  • 16. why? Using WSIs directly as input to DL is unfeasible Preprocessing to create smaller subwindows ("tiles") is required Preprocessing steps usually poorly detailed in research papers Leading to results that are hard to reproduce Need for a reference high quality preprocessing software To enable faster prototyping and faster experimentation 12 https://paperswithcode.com/
  • 17. why? Using WSIs directly as input to DL is unfeasible Preprocessing to create smaller subwindows ("tiles") is required Preprocessing steps usually poorly detailed in research papers Leading to results that are hard to reproduce Need for a reference high quality preprocessing software To enable faster prototyping and faster experimentation 12 https://www.paperswithoutcode.com/ https://paperswithcode.com/
  • 18. why? Using WSIs directly as input to DL is unfeasible Preprocessing to create smaller subwindows ("tiles") is required Preprocessing steps usually poorly detailed in research papers Leading to results that are hard to reproduce Need for a reference high quality preprocessing software To enable faster prototyping and faster experimentation 12
  • 19. why? Using WSIs directly as input to DL is unfeasible Preprocessing to create smaller subwindows ("tiles") is required Preprocessing steps usually poorly detailed in research papers Leading to results that are hard to reproduce Need for a reference high quality preprocessing software To enable faster prototyping and faster experimentation 12 https://twitter.com/alexkyllo/status/1457072262520004632
  • 20. why? Using WSIs directly as input to DL is unfeasible Preprocessing to create smaller subwindows ("tiles") is required Preprocessing steps usually poorly detailed in research papers Leading to results that are hard to reproduce Need for a reference high quality preprocessing software To enable faster prototyping and faster experimentation 12 https://twitter.com/alexkyllo/status/1457072262520004632 https://twitter.com/petebankhead/status/1407630531290927105?s=21
  • 22. new open source Python package for reproducible Whole Slide Images preprocessing aimed at an easy integration with a Deep Learning pipeline 14
  • 23. unifying community-validated procedures for slide preprocessing and tiles extraction introducing best practices from software engineering: automated testing, code versioning and code reviews, Continuous Integration on top of state-of-the-art and well-known libraries, e.g. OpenSlide, NumPy and scikit-image approach 15
  • 24. Histolab features Interoperability between different formats up to 9 supported formats from the major scanner vendors #1 16
  • 25. Histolab features Interoperability between different formats up to 9 supported formats from the major scanner vendors #1 Automatic tissue detection and segmentation by using a fixed sequence of image filters #2 16
  • 26. Histolab features Interoperability between different formats up to 9 supported formats from the major scanner vendors #1 Automatic tissue detection and segmentation by using a fixed sequence of image filters #2 Automatic informative tiles retrieval cropped regions from tissue areas found in #2 #3 16
  • 27. Histolab features Interoperability between different formats up to 9 supported formats from the major scanner vendors #1 Automatic tissue detection and segmentation by using a fixed sequence of image filters #2 Automatic informative tiles retrieval cropped regions from tissue areas found in #2 #3 Easy access to sample data from TCGA and OpenSlide save to the system cache and import them #4 16
  • 28. Histolab in action WSI Image dataset (tiles) DL pipeline Prostate Cancer Sample, TCGA-PRAD 16,000px × 15,316px Magnification 5× 512px × 512px Magnification 20× 512px × 512px 17
  • 29. Tiles extraction #3 in less than 10 lines of code >>> from histolab.data import breast_tissue >>> _, path = breast_tissue() >>> from histolab.slide import Slide >>> slide = Slide(path, "path/to/processed") >>> from histolab.tiler import RandomTiler >>> random_tiles_extractor = RandomTiler( tile_size=(512, 512), n_tiles=10, level=2, seed=42, check_tissue=True, ) >>> random_tiles_extractor.extract(slide) 1. download breast tissue sample from TCGA 2. create a Slide object 3. create a Tiler 4. extract! 18
  • 30. Tissue detection and tiles extraction Original WSI Tissue Detection Random Tiles Breast Cancer Sample, TCGA-BRCA 96,972px × 30,682px Magnification 20× 512px × 512px 19 RandomTiler
  • 31. Tissue detection and tiles extraction 20 ScoreTiler >>> from histolab.scorer import NucleiScorer >>> scorer = NucleiScorer() >>> from histolab.tiler import ScoreTiler >>> scored_tiles_extractor = ScoreTiler( scorer, tile_size=(512, 512), n_tiles=10, level=2, seed=42, check_tissue=True, ) >>> scored_tiles_extractor.extract(slide) Representation of the score assigned to each extracted tile by the NucleiScorer. Ovarian Cancer Sample, TCGA-OV 30,001px × 33,987px Nuclei Mask 512px × 512px tile
  • 32. Original image 1. Grayscale filter 2. Otsu threshold 3. Binary dilation 4. Remove small holes 5. Remove small objects 6. Final mask 7. Biggest Tissue Area Box Tissue detection #2 by using this fixed sequence of image filters Nobuyuki Otsu 1979. A threshold selection method from gray-level histograms 10.1109/TSMC.1979.4310076 21
  • 33. >>> from histolab.filters.image_filters import Compose, OtsuThreshold, RgbToGrayscale >>> from histolab.filters.morphological_filters import ( BinaryDilation, RemoveSmallHoles, RemoveSmallObjects, ) >>> filters = Compose( [ RgbToGrayscale(), OtsuThreshold(), BinaryDilation(), RemoveSmallHoles(), RemoveSmallObjects(), ] ) >>> filters(image) Original image 1. Grayscale filter 2. Otsu threshold 3. Binary dilation 4. Remove small holes 5. Remove small objects 6. Final mask 7. Biggest Tissue Area Box Nobuyuki Otsu 1979. A threshold selection method from gray-level histograms 10.1109/TSMC.1979.4310076 22 Tissue detection #2
  • 34. Remove artifacts pen markers Original image 1. Grayscale filter 2. Otsu threshold 3. Apply Mask 4. Green Pen Filter Pen Filters implementation inspired by https://github.com/CODAIT/deep-histopath 23
  • 35. 24 Finally some stable code I can use! present, day 500 Stable? Is it tested?
  • 36. 25 present, day 500 Look at this! 100% coverage with 600 unit and integration tests
  • 37. 25 present, day 500 Look at this! 100% coverage with 600 unit and integration tests
  • 38. 26 Yes, but how do I get all of these goodies? present, day 500
  • 44. $ pip install histolab Installation 32
  • 45. 33 You will spend more time writing tests than code but at least you will sleep tight at night 100% coverage doesn’t imply 0% bugs but stupid mistakes are easily caught Code formatting and linting is nice but automatize it to focus on the important stuff Lessons learned and notes for future self
  • 46. Histolab is a joint work with Nicole Bussola, PhD student @ FBK-MPBA / CIBIO Thank you any question? Alessia Marcolini Ernesto Arbitrio alessia.marcolini@hk3lab.ai ernesto.arbitrio@gmail.com @viperale @__pamaron__