SlideShare a Scribd company logo
1 of 48
Download to read offline
Predicting Breast
Cancer
Proliferation
Scores with
Apache SystemML
Mike Dusenberry & Madison J Myers
IBM Spark Technology Center, SF
Let us introduce ourselves.
We like health and we like data.
Found a
Challenge
Breast Cancer Tumor Proliferation
Challenge
● Images of Tumors: Can be analyzed and given a
score for medical assessment.
● Tumor Score: Difficult to determine and takes a
trained eye.
● Currently assessed by Pathologists (M.D./D.O.).
● Dataset contains 500 images of breast cancer tissue,
each at more than 15GB.
Context
•Breast cancer is a leading cause of cancerous death in women.
•Survival rates increase as early detection increases, incentivizing
quicker detection.
•Tumor cell proliferation is a strong indicator of a patient’s
prognosis.
•Currently, pathologists classify tumors based on proliferation by
counting the dividing cell nuclei in hematoxylin & eosin stained
slides by hand with a microscope.
•Suffers due to underlying subjectivity.
Example
Image:
Example
Zoom-In of
Image
Looking for
Nuclei
Characteristics
Grading System in Invasive Breast Cancer
Goal:
Predict tumor scores from slide images.
Okay easy,
where do we
start?
Blockers:
1) Using really, really large images
1. Using really, really large images
2. Limited number of images
1. Using really, really large images
2. Limited number of images
3. Current state of the art model for
this type of task is a deep CNN
Seriously,
where do we
start?
Reference Paper:
“Automated Grading of Gliomas
using Deep Learning in Digital
Pathology Images”
Daniel L. Rubin, MD, MS Lab
Department of Radiology &
Department of Medicine
(Biomedical Informatics Research),
Stanford University
“Automated Grading of Gliomas using Deep
Learning in Digital Pathology Images”
1. Cut a “whole-slide” image into square “tiles” at 20x magnification.
2. Filter the “tiles” to remove any without tissue.
3. Cut the remaining “tiles” into smaller “samples”.
4. Assign a tumor score label to each sample based on the tumor score of the
“whole-slide” image.
5. Repeat 1-4 for all “whole-slide” images.
6. Train a convolutional neural network with the resulting dataset of labeled
“samples”.
7. Good results!
20 slides vs 500 slides
… we have lots of data
Our Approach:
● Utilize Apache Spark to cut and filter all 500
labeled, extremely high-resolution tumor slide
images into 4.7 million smaller square
samples.
● Utilize Apache SystemML on top of Spark to
train a convolutional neural network on the
labeled samples.
After
preprocessing,
over 7 terabytes of
data...
What is Apache Spark?
● Apache Spark is a fast and general engine for large-scale data
processing.
● Combines ML, SQL, streaming, and other complex analytics.
● Extends Scala idioms, as well as R/Python DataFrame idioms to
cluster computing.
● APIs for Scala, Java, Python, R.
● Simple to use!
● Much more information
at https://spark.apache.org/.
What is Apache SystemML?
● Apache SystemML is a machine learning system for running
distributed linear algebra on top of Apache Spark.
● Exposes high-level R-like & Python-like languages focused on
linear algebra.
● APIs for Python, Scala, Java.
● Much more information
at http://systemml.apache.org/.
Preprocessing
Preprocessing Approach at a High Level
“Whole-Slide” Image Image Tiles
1024x1024x3
pixel tiles
Image Tiles
Preprocessing Approach at a High Level (cont.)
Example“Tile” Image
(1024x1024x3)
Now that we have
tiles, we need to filter
out non-tissue. We
did this with
thresholding.
% of breast cancer
tissue in image
After
applying
thresholding
to tiles to
close small
gaps and
adipose
tissue:
If >= 90%,
we keep the
tile.
Preprocessing Approach at a High Level (cont.)
Example “Sample”
Image (256x256x3)
Image Tiles
Example Filtered
“Tile” Image Tile Samples
Preprocessing
Code
Example:
Now for the
good
stuff……
Machine
Learning
What are CNNs, or
Convolutional
Neural Networks?
● Deep Learning
model
● State of the art
for computer
vision tasks
● and audio….
● and….
Example Convolutional Neural Network
Large data,
remember?
Apache SystemML & Apache Spark
Breast Cancer
ConvNet w/
SystemML
Training ConvNet w/ PySpark API
Entire Pipeline Diagram
Example
“Sample” Image
Image Tiles
Example Filtered
“Tile” Image Tile Samples“Whole-Slide” Image
ConvNet:
“Tumor
Proliferation
Score”
Thank You
Backup
% of breast cancer
tissue in image
After
applying
thresholding
to tiles to
close small
gaps and
adipose
tissue:
If >= 90%,
we keep the
tile.
Where are we
now?
● Preprocessing: Complete
● Machine Learning:
○ Small-scale tests complete
○ Large-scale tests in progress.

More Related Content

Similar to Predicting Breast Cancer Proliferation Scores with Apache SystemML - UC Berkeley - 09.07.16 MWD MJM

[DSC Europe 23][DigiHealth] Vesna Pajic - Machine Learning Techniques for omi...
[DSC Europe 23][DigiHealth] Vesna Pajic - Machine Learning Techniques for omi...[DSC Europe 23][DigiHealth] Vesna Pajic - Machine Learning Techniques for omi...
[DSC Europe 23][DigiHealth] Vesna Pajic - Machine Learning Techniques for omi...DataScienceConferenc1
 
Towards automated phenotypic cell profiling with high-content imaging
Towards automated phenotypic cell profiling with high-content imagingTowards automated phenotypic cell profiling with high-content imaging
Towards automated phenotypic cell profiling with high-content imagingOla Spjuth
 
Machine learning for java developers
Machine learning for java developersMachine learning for java developers
Machine learning for java developersNirmal Fernando
 
Artificial Intelligence in pathology
Artificial Intelligence in pathologyArtificial Intelligence in pathology
Artificial Intelligence in pathologynehaSingh1543
 
Brain Tumor Detection Using Deep Learning ppt new made.pptx
Brain Tumor Detection Using Deep Learning ppt new made.pptxBrain Tumor Detection Using Deep Learning ppt new made.pptx
Brain Tumor Detection Using Deep Learning ppt new made.pptxvikyt2211
 
Deep Learning Techniques for Breast Cancer Risk Prediction.pptx
Deep Learning Techniques for Breast Cancer Risk Prediction.pptxDeep Learning Techniques for Breast Cancer Risk Prediction.pptx
Deep Learning Techniques for Breast Cancer Risk Prediction.pptxAnuraag Moharana
 
AI/ML Webinar - Improve Public Health
AI/ML Webinar - Improve Public HealthAI/ML Webinar - Improve Public Health
AI/ML Webinar - Improve Public HealthAmazon Web Services
 
Machine Learning for Incident Detection: Getting Started
Machine Learning for Incident Detection: Getting StartedMachine Learning for Incident Detection: Getting Started
Machine Learning for Incident Detection: Getting StartedSqrrl
 
IRJET - Breast Cancer Prediction using Supervised Machine Learning Algorithms...
IRJET - Breast Cancer Prediction using Supervised Machine Learning Algorithms...IRJET - Breast Cancer Prediction using Supervised Machine Learning Algorithms...
IRJET - Breast Cancer Prediction using Supervised Machine Learning Algorithms...IRJET Journal
 
Continuous modeling - automating model building on high-performance e-Infrast...
Continuous modeling - automating model building on high-performance e-Infrast...Continuous modeling - automating model building on high-performance e-Infrast...
Continuous modeling - automating model building on high-performance e-Infrast...Ola Spjuth
 
Automatic System for Detection and Classification of Brain Tumors
Automatic System for Detection and Classification of Brain TumorsAutomatic System for Detection and Classification of Brain Tumors
Automatic System for Detection and Classification of Brain TumorsFatma Sayed Ibrahim
 
Towards Automated AI-guided Drug Discovery Labs
Towards Automated AI-guided Drug Discovery LabsTowards Automated AI-guided Drug Discovery Labs
Towards Automated AI-guided Drug Discovery LabsOla Spjuth
 
AIQC - ISCB 2022.pdf
AIQC - ISCB 2022.pdfAIQC - ISCB 2022.pdf
AIQC - ISCB 2022.pdfLayne Sadler
 
ABSTRACT.ppt
ABSTRACT.pptABSTRACT.ppt
ABSTRACT.pptSesuraja3
 
Using Deep Learning on Apache Spark to Diagnose Thoracic Pathology from Chest...
Using Deep Learning on Apache Spark to Diagnose Thoracic Pathology from Chest...Using Deep Learning on Apache Spark to Diagnose Thoracic Pathology from Chest...
Using Deep Learning on Apache Spark to Diagnose Thoracic Pathology from Chest...Databricks
 
Bioinformatics-R program의 실례
Bioinformatics-R program의 실례Bioinformatics-R program의 실례
Bioinformatics-R program의 실례mothersafe
 
Apache Spark NLP for Healthcare: Lessons Learned Building Real-World Healthca...
Apache Spark NLP for Healthcare: Lessons Learned Building Real-World Healthca...Apache Spark NLP for Healthcare: Lessons Learned Building Real-World Healthca...
Apache Spark NLP for Healthcare: Lessons Learned Building Real-World Healthca...Databricks
 
Brain_Stroke_prediction_AIL Presentation_V1.pptx
Brain_Stroke_prediction_AIL Presentation_V1.pptxBrain_Stroke_prediction_AIL Presentation_V1.pptx
Brain_Stroke_prediction_AIL Presentation_V1.pptxssuser0be3ba1
 

Similar to Predicting Breast Cancer Proliferation Scores with Apache SystemML - UC Berkeley - 09.07.16 MWD MJM (20)

[DSC Europe 23][DigiHealth] Vesna Pajic - Machine Learning Techniques for omi...
[DSC Europe 23][DigiHealth] Vesna Pajic - Machine Learning Techniques for omi...[DSC Europe 23][DigiHealth] Vesna Pajic - Machine Learning Techniques for omi...
[DSC Europe 23][DigiHealth] Vesna Pajic - Machine Learning Techniques for omi...
 
Progeny Clinical
Progeny ClinicalProgeny Clinical
Progeny Clinical
 
Towards automated phenotypic cell profiling with high-content imaging
Towards automated phenotypic cell profiling with high-content imagingTowards automated phenotypic cell profiling with high-content imaging
Towards automated phenotypic cell profiling with high-content imaging
 
Madhavi tippani
Madhavi tippaniMadhavi tippani
Madhavi tippani
 
Machine learning for java developers
Machine learning for java developersMachine learning for java developers
Machine learning for java developers
 
Artificial Intelligence in pathology
Artificial Intelligence in pathologyArtificial Intelligence in pathology
Artificial Intelligence in pathology
 
Brain Tumor Detection Using Deep Learning ppt new made.pptx
Brain Tumor Detection Using Deep Learning ppt new made.pptxBrain Tumor Detection Using Deep Learning ppt new made.pptx
Brain Tumor Detection Using Deep Learning ppt new made.pptx
 
Deep Learning Techniques for Breast Cancer Risk Prediction.pptx
Deep Learning Techniques for Breast Cancer Risk Prediction.pptxDeep Learning Techniques for Breast Cancer Risk Prediction.pptx
Deep Learning Techniques for Breast Cancer Risk Prediction.pptx
 
AI/ML Webinar - Improve Public Health
AI/ML Webinar - Improve Public HealthAI/ML Webinar - Improve Public Health
AI/ML Webinar - Improve Public Health
 
Machine Learning for Incident Detection: Getting Started
Machine Learning for Incident Detection: Getting StartedMachine Learning for Incident Detection: Getting Started
Machine Learning for Incident Detection: Getting Started
 
IRJET - Breast Cancer Prediction using Supervised Machine Learning Algorithms...
IRJET - Breast Cancer Prediction using Supervised Machine Learning Algorithms...IRJET - Breast Cancer Prediction using Supervised Machine Learning Algorithms...
IRJET - Breast Cancer Prediction using Supervised Machine Learning Algorithms...
 
Continuous modeling - automating model building on high-performance e-Infrast...
Continuous modeling - automating model building on high-performance e-Infrast...Continuous modeling - automating model building on high-performance e-Infrast...
Continuous modeling - automating model building on high-performance e-Infrast...
 
Automatic System for Detection and Classification of Brain Tumors
Automatic System for Detection and Classification of Brain TumorsAutomatic System for Detection and Classification of Brain Tumors
Automatic System for Detection and Classification of Brain Tumors
 
Towards Automated AI-guided Drug Discovery Labs
Towards Automated AI-guided Drug Discovery LabsTowards Automated AI-guided Drug Discovery Labs
Towards Automated AI-guided Drug Discovery Labs
 
AIQC - ISCB 2022.pdf
AIQC - ISCB 2022.pdfAIQC - ISCB 2022.pdf
AIQC - ISCB 2022.pdf
 
ABSTRACT.ppt
ABSTRACT.pptABSTRACT.ppt
ABSTRACT.ppt
 
Using Deep Learning on Apache Spark to Diagnose Thoracic Pathology from Chest...
Using Deep Learning on Apache Spark to Diagnose Thoracic Pathology from Chest...Using Deep Learning on Apache Spark to Diagnose Thoracic Pathology from Chest...
Using Deep Learning on Apache Spark to Diagnose Thoracic Pathology from Chest...
 
Bioinformatics-R program의 실례
Bioinformatics-R program의 실례Bioinformatics-R program의 실례
Bioinformatics-R program의 실례
 
Apache Spark NLP for Healthcare: Lessons Learned Building Real-World Healthca...
Apache Spark NLP for Healthcare: Lessons Learned Building Real-World Healthca...Apache Spark NLP for Healthcare: Lessons Learned Building Real-World Healthca...
Apache Spark NLP for Healthcare: Lessons Learned Building Real-World Healthca...
 
Brain_Stroke_prediction_AIL Presentation_V1.pptx
Brain_Stroke_prediction_AIL Presentation_V1.pptxBrain_Stroke_prediction_AIL Presentation_V1.pptx
Brain_Stroke_prediction_AIL Presentation_V1.pptx
 

Predicting Breast Cancer Proliferation Scores with Apache SystemML - UC Berkeley - 09.07.16 MWD MJM