SlideShare a Scribd company logo
WIFI SSID:SparkAISummit | Password: UnifiedAnalytics
Yuhao Yang, Intel
Bala Chandrasekaran, Dell EMC
Using Deep Learning on Apache
Spark to diagnose thoracic
pathology from chest X-rays
#UnifiedAnalytics #SparkAISummit
Agenda
• Problem statement
• Analytics Zoo
• Chest Xray Model architecture
• Results and Observations
3#UnifiedAnalytics #SparkAISummit
Predicting diseases in Chest X-rays
• Develop a ML pipeline in Apache Spark and train a deep
learning model to predict disease in Chest X-rays
▪ An integrated ML pipeline with Analytics Zoo on Apache Spark
▪ Demonstrate feature engineering and transfer learning APIs in
Analytics Zoo
▪ Use Spark worker nodes to train at scale
• CheXNet
– Developed at Stanford University, CheXNet is a model for
identifying thoracic pathologies from the NIH ChestXray14 dataset
– https://stanfordmlgroup.github.io/projects/chexnet/
4#UnifiedAnalytics #SparkAISummit
X-ray dataset from NIH
5#UnifiedAnalytics #SparkAISummit
• 112,120 images from over 30000 patients
• Multi label (14 diseases)
• Unbalanced datasets
• Close to 50% of the images have ‘No findings’
• Infiltration get the most positive samples
(19894) and Hernia get the least positive
samples (227)
0
10000
20000
30000
40000
50000
60000
70000
Atelectasis
Consolidation
Infiltration
Pneumothorax
Edema
Emphysema
Fibrosis
Effusion
Pneumonia
PleuralThickening
Cardiomegaly
Nodule
Mass
Hernia
NoFindings
Frequency
00000013_005.png,Emphysema | Infiltration | Pleural_Thickening
00000013_006.png,Effusion|Infiltration
00000013_007.png,Infiltration
00000013_008.png,No Finding
Analytics Zoo
Analytics Zoo
Reference Use Cases
Anomaly detection, sentiment analysis, fraud detection, chatbot, sequence prediction,
etc.
Built-In Deep Learning Models
Image classification, object detection, text classification, recommendations, sequence-
to-sequence, anomaly detection, etc.
Feature Engineering
Feature transformations for
• Image, text, 3D imaging, time series, speech, etc.
High-Level Pipeline APIs
• Distributed TensorFlow and Keras on Spark
• Native deep learning support in Spark DataFrames and ML Pipelines
• Model serving API for model serving/inference pipelines
Backends Spark, TensorFlow, Keras, BigDL, OpenVINO, MKL-DNN, etc.
7#UnifiedAnalytics #SparkAISummit
Distributed TensorFlow, Keras and BigDL on Apache Spark
https://github.com/intel-analytics/analytics-zoo/
Bringing Deep Learning To Big Data Platform
• Distributed deep learning framework for Apache Spark*
• Make deep learning more accessible to big data users
and data scientists
– Write deep learning applications as standard Spark
programs
– Run on existing Spark/Hadoop clusters (no changes
needed)
• Feature parity with popular deep learning frameworks
– E.g., Caffe, Torch, Tensorflow, etc.
• High performance (on CPU)
– Built-in Intel MKL and multi-threaded programming
• Efficient scale-out
– Leveraging Spark for distributed training & inference
8#UnifiedAnalytics #SparkAISummit
https://github.com/intel-analytics/BigDL
https://bigdl-project.github.io/
9
Architecture
• Training using synchronous mini-batch SGD
• Distributed training iterations run as standard Spark jobs
Parameter Synchronization
10#UnifiedAnalytics #SparkAISummit
Task n
1 2 n
1 2 n
1 2 n 1 2 n
1 2 n
Task 1 Task 2
local gradient local gradient local gradient
∑ ∑ ∑
gradient 1 gradient n
weight 1 weight 2 weight n
“Parameter Synchronization” Job
update update update
gradient 2
Integrated Pipeline for Machine Learning
and Deep Learning
ImageReader Image Pre-
processing
Image
Classification fit
Images
in HDFS
Images
DataFrame
Images as
RDD
Image Classification
Model
Building the X-ray Model
Let’s build the model
13#UnifiedAnalytics #SparkAISummit
Read the X-ray images
Feature Engineering
Define the model
Define optimizer
Train the model
Read the X-ray images as Spark DataFrames
• Initialize NNContext and load X-ray images into
DataFrames using NNImageReader
• Process loaded X-ray images and add labels (another
DataFrame) using Spark transformations
14#UnifiedAnalytics #SparkAISummit
Read the X-ray
images
Feature Engineering
Define the model
Define optimizer
Training the model
from zoo.pipeline.nnframes import NNImageReader
imageDF = NNImageReader.readImages(image_path, sc,
resizeH=256, resizeW=256, image_codec=1)
trainingDF = imageDF.join(labelDF, on="Image_Index",
how="inner")
Feature Engineering – Image Pre-processing
In-built APIs for feature engineering using ChainedPreprocessing API
15#UnifiedAnalytics #SparkAISummit
Read the X-ray
images
Feature Engineering
Define the model
Define optimizer
Training the model
ImageCenterCrop()
ImageFlip()
ImageChannelNormalize()
ImageMatToTensor()
ImageCenterCrop(224, 224)
Random flip and brightness
ImageChannelNormalize()
To Tensor()
Transfer Learning using ResNet-50
trained with ImageNet
16#UnifiedAnalytics #SparkAISummit
Condition E
Condition D
Condition C
Condition B
Condition A
Patient A
Add a new
input layer
Remove the existing
classification layer and add
new classification layer
Defining the model with Transfer Learning APIs
17#UnifiedAnalytics #SparkAISummit
Read the X-ray
images
Feature Engineering
Define the model
Define optimizer
Training the model
• Load a pre-trained model using Net.load_bigdl. The model is
trained with ImageNet dataset
– Inception
– ResNet 50
– DenseNet
• Remove the final softmax layer of ResNet-50
• Add new input (for resized x-ray images) and output layer (to
predict the 14 diseases). Activation function is Sigmoid
• Avoid overfitting
– Regularization
– Dropout
Defining the model with Transfer Learning APIs
18#UnifiedAnalytics #SparkAISummit
Read the X-ray
images
Feature Engineering
Define the model
Define optimizer
Training the model
def get_resnet_model(model_path, label_length):
full_model = Net.load_bigdl(model_path)
model = full_model.new_graph(["pool5"])
inputNode = Input(name="input", shape=(3, 224, 224))
resnet = model.to_keras()(inputNode)
flatten =
GlobalAveragePooling2D(dim_ordering='th')(resnet)
dropout = Dropout(0.2)(flatten)
logits = Dense(label_length,
W_regularizer=L2Regularizer
(1e-1), b_regularizer=L2Regularizer(1e-1),
activation="sigmoid")(dropout)
lrModel = Model(inputNode, logits)
return lrModel
Define the Optimizer
19#UnifiedAnalytics #SparkAISummit
Read the X-ray
images
Feature Engineering
Define the model
Define optimizer
Training the model
• Evaluated two optimizers: SGD and Adam Optimizer
• Learning rate scheduler is implemented in two
phases:
– Warmup + Plateau schedule
– Warmup: Gradually increase the learning rate for 5
epochs
– Plateau: Plateau("Loss", factor=0.1, patience=1,
mode="min", epsilon=0.01, cooldown=0, min_lr=1e-
15)
Train the model using ML Pipelines
20#UnifiedAnalytics #SparkAISummit
Read the X-ray
images
Feature Engineering
Define the model
Define optimizer
Training the model
• Analytics Zoo API NNEstimator to build the model
• .fit() produces a neural network model which is a Transformer
• You can now run .predict() on the model for inference
• AUC-RoC is used to measure the accuracy of the model. Spark
ML pipeline API BinaryClassificationEvaluator to determine the
AUC-ROC for each disease.
estimator = NNEstimator(xray_model, BinaryCrossEntropy(), transformer)
.setBatchSize(batch_size).setMaxEpoch(num_epoch)
.setFeaturesCol("image").setCachingSample(False).
.setValidation(EveryEpoch(), validationDF, [AUC()], batch_size)
.setOptimMethod(optim_method)
Xray_nnmodel = estimator.fit(trainingDF)
Results and observations
21#UnifiedAnalytics #SparkAISummit
Spark parameter recommendations
• Number of executors Number of compute nodes
• Number of executor cores Number of physical cores
minus two
• Executor memory
Number of cores per executor * (Memory required for each core +
data partition)
22#UnifiedAnalytics #SparkAISummit
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
0
0.00002
0.00004
0.00006
0.00008
0.0001
0.00012
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
AverageAUC(Higherisbetter)
LearningRate
Epochs
Learning rate
Average AUC w/ Learning Rate Scheduler
Average AUC w/o Learning Rate Scheduler
Impact on Learning Rate Scheduler
• Adam (with Learning
Rate Scheduler)
outperforms SGD
– Warmup
– Plateau
• Learning rate scheduler
helps is covering the
model much faster
23#UnifiedAnalytics #SparkAISummit
1E-7
Base model comparison
24#UnifiedAnalytics #SparkAISummit
0.5
0.55
0.6
0.65
0.7
0.75
0.8
0.85
0.9
0.95
AUC(Higherisbetter)
Inception-V3 DenseNet-121 ResNet-50
0.55
0.6
0.65
0.7
0.75
0.8
0.85
1 2 3 4 5 6 7 8
AverageAUC(Higherisbetter)
Batch per thread
Batch size and Scalability
• Two factors:
– Batches per thread (mini-batch): Number of
images in a iteration per worker
– Global batch size: Batches per thread *
Number of workers
• Batch size is a critical parameter for
model accuracy as well as for
distributed performance
– Increasing the batch size increases
parallelism
– Increasing the batch size may lead to a loss in
generalization performance especially
25#UnifiedAnalytics #SparkAISummit
Scalability
26#UnifiedAnalytics #SparkAISummit
0
0.5
1
1.5
2
2.5
3
3.5
4 8 12 16
Speedup(Higherisbetter)
Number of nodes
3x speed up from 4 nodes to 16 nodes.
~ 2.5 hours to train the model
bz=512
bz=1024
bz=1536
bz=2048
* 15 epochs and 32 cores per executor
Future Work
• Continue to improve the model performance and
scalability
– Implement LARS and other advanced learning rate scheduler for
large scale training
• Increase the feature set
– Incorporate patient profile information
• Develop performance characteristic and sizing
guidelines for image classification problems
27#UnifiedAnalytics #SparkAISummit
Links
Code and white paper available at
https://github.com/dell-ai-engineering/BigDL-
ImageProcessing-Examples
28#UnifiedAnalytics #SparkAISummit
Acknowledgements
Mehmood Abd, Michael Bennett, Sajan Govindan, Jenwei Hsieh, Andrew Kipp,
Dharmesh Patel, Leela Uppuluri, and Luke Wilson
29#UnifiedAnalytics #SparkAISummit
(Alphabetical)
Thank you!
DON’T FORGET TO RATE
AND REVIEW THE SESSIONS
SEARCH SPARK + AI SUMMIT

More Related Content

What's hot

A beginner's guide to Style Transfer and recent trends
A beginner's guide to Style Transfer and recent trendsA beginner's guide to Style Transfer and recent trends
A beginner's guide to Style Transfer and recent trends
JaeJun Yoo
 
Introduction to Generative Adversarial Networks (GANs)
Introduction to Generative Adversarial Networks (GANs)Introduction to Generative Adversarial Networks (GANs)
Introduction to Generative Adversarial Networks (GANs)
Appsilon Data Science
 
Domain Transfer and Adaptation Survey
Domain Transfer and Adaptation SurveyDomain Transfer and Adaptation Survey
Domain Transfer and Adaptation Survey
Sangwoo Mo
 
Les applications du Deep Learning
Les applications du Deep LearningLes applications du Deep Learning
Les applications du Deep Learning
Jedha Bootcamp
 
Artificial Neural Network | Deep Neural Network Explained | Artificial Neural...
Artificial Neural Network | Deep Neural Network Explained | Artificial Neural...Artificial Neural Network | Deep Neural Network Explained | Artificial Neural...
Artificial Neural Network | Deep Neural Network Explained | Artificial Neural...
Simplilearn
 
Transfer Learning and Fine Tuning for Cross Domain Image Classification with ...
Transfer Learning and Fine Tuning for Cross Domain Image Classification with ...Transfer Learning and Fine Tuning for Cross Domain Image Classification with ...
Transfer Learning and Fine Tuning for Cross Domain Image Classification with ...
Sujit Pal
 
neural networks
 neural networks neural networks
Transfer Learning and Fine-tuning Deep Neural Networks
 Transfer Learning and Fine-tuning Deep Neural Networks Transfer Learning and Fine-tuning Deep Neural Networks
Transfer Learning and Fine-tuning Deep Neural Networks
PyData
 
Autoencoders
AutoencodersAutoencoders
Autoencoders
CloudxLab
 
Imagen: Photorealistic Text-to-Image Diffusion Models with Deep Language Unde...
Imagen: Photorealistic Text-to-Image Diffusion Models with Deep Language Unde...Imagen: Photorealistic Text-to-Image Diffusion Models with Deep Language Unde...
Imagen: Photorealistic Text-to-Image Diffusion Models with Deep Language Unde...
Vitaly Bondar
 
Image Processing
Image ProcessingImage Processing
Image Processing
Rolando
 
Python for Image and Video processing applications
Python for Image and Video processing applicationsPython for Image and Video processing applications
Le Reseau De Neurones
Le Reseau De NeuronesLe Reseau De Neurones
Le Reseau De Neuronesguestf80d95
 
Cnn
CnnCnn
Graph coloring using backtracking
Graph coloring using backtrackingGraph coloring using backtracking
Graph coloring using backtracking
shashidharPapishetty
 
Lab manual of Digital image processing using python by khalid Shaikh
Lab manual of Digital image processing using python by khalid ShaikhLab manual of Digital image processing using python by khalid Shaikh
Lab manual of Digital image processing using python by khalid Shaikh
khalidsheikh24
 
Generative Models for General Audiences
Generative Models for General AudiencesGenerative Models for General Audiences
Generative Models for General Audiences
Sangwoo Mo
 
Introduction to Autoencoders
Introduction to AutoencodersIntroduction to Autoencoders
Introduction to Autoencoders
Yan Xu
 
Intro to Deep learning - Autoencoders
Intro to Deep learning - Autoencoders Intro to Deep learning - Autoencoders
Intro to Deep learning - Autoencoders
Akash Goel
 
Convolutional neural network
Convolutional neural network Convolutional neural network
Convolutional neural network
Yan Xu
 

What's hot (20)

A beginner's guide to Style Transfer and recent trends
A beginner's guide to Style Transfer and recent trendsA beginner's guide to Style Transfer and recent trends
A beginner's guide to Style Transfer and recent trends
 
Introduction to Generative Adversarial Networks (GANs)
Introduction to Generative Adversarial Networks (GANs)Introduction to Generative Adversarial Networks (GANs)
Introduction to Generative Adversarial Networks (GANs)
 
Domain Transfer and Adaptation Survey
Domain Transfer and Adaptation SurveyDomain Transfer and Adaptation Survey
Domain Transfer and Adaptation Survey
 
Les applications du Deep Learning
Les applications du Deep LearningLes applications du Deep Learning
Les applications du Deep Learning
 
Artificial Neural Network | Deep Neural Network Explained | Artificial Neural...
Artificial Neural Network | Deep Neural Network Explained | Artificial Neural...Artificial Neural Network | Deep Neural Network Explained | Artificial Neural...
Artificial Neural Network | Deep Neural Network Explained | Artificial Neural...
 
Transfer Learning and Fine Tuning for Cross Domain Image Classification with ...
Transfer Learning and Fine Tuning for Cross Domain Image Classification with ...Transfer Learning and Fine Tuning for Cross Domain Image Classification with ...
Transfer Learning and Fine Tuning for Cross Domain Image Classification with ...
 
neural networks
 neural networks neural networks
neural networks
 
Transfer Learning and Fine-tuning Deep Neural Networks
 Transfer Learning and Fine-tuning Deep Neural Networks Transfer Learning and Fine-tuning Deep Neural Networks
Transfer Learning and Fine-tuning Deep Neural Networks
 
Autoencoders
AutoencodersAutoencoders
Autoencoders
 
Imagen: Photorealistic Text-to-Image Diffusion Models with Deep Language Unde...
Imagen: Photorealistic Text-to-Image Diffusion Models with Deep Language Unde...Imagen: Photorealistic Text-to-Image Diffusion Models with Deep Language Unde...
Imagen: Photorealistic Text-to-Image Diffusion Models with Deep Language Unde...
 
Image Processing
Image ProcessingImage Processing
Image Processing
 
Python for Image and Video processing applications
Python for Image and Video processing applicationsPython for Image and Video processing applications
Python for Image and Video processing applications
 
Le Reseau De Neurones
Le Reseau De NeuronesLe Reseau De Neurones
Le Reseau De Neurones
 
Cnn
CnnCnn
Cnn
 
Graph coloring using backtracking
Graph coloring using backtrackingGraph coloring using backtracking
Graph coloring using backtracking
 
Lab manual of Digital image processing using python by khalid Shaikh
Lab manual of Digital image processing using python by khalid ShaikhLab manual of Digital image processing using python by khalid Shaikh
Lab manual of Digital image processing using python by khalid Shaikh
 
Generative Models for General Audiences
Generative Models for General AudiencesGenerative Models for General Audiences
Generative Models for General Audiences
 
Introduction to Autoencoders
Introduction to AutoencodersIntroduction to Autoencoders
Introduction to Autoencoders
 
Intro to Deep learning - Autoencoders
Intro to Deep learning - Autoencoders Intro to Deep learning - Autoencoders
Intro to Deep learning - Autoencoders
 
Convolutional neural network
Convolutional neural network Convolutional neural network
Convolutional neural network
 

Similar to Using Deep Learning on Apache Spark to Diagnose Thoracic Pathology from Chest X-rays

Automated ML Workflow for Distributed Big Data Using Analytics Zoo (CVPR2020 ...
Automated ML Workflow for Distributed Big Data Using Analytics Zoo (CVPR2020 ...Automated ML Workflow for Distributed Big Data Using Analytics Zoo (CVPR2020 ...
Automated ML Workflow for Distributed Big Data Using Analytics Zoo (CVPR2020 ...
Jason Dai
 
Analytics Zoo: Building Analytics and AI Pipeline for Apache Spark and BigDL ...
Analytics Zoo: Building Analytics and AI Pipeline for Apache Spark and BigDL ...Analytics Zoo: Building Analytics and AI Pipeline for Apache Spark and BigDL ...
Analytics Zoo: Building Analytics and AI Pipeline for Apache Spark and BigDL ...
Databricks
 
Deep Learning with Apache Spark: an Introduction
Deep Learning with Apache Spark: an IntroductionDeep Learning with Apache Spark: an Introduction
Deep Learning with Apache Spark: an Introduction
Emanuele Bezzi
 
System mldl meetup
System mldl meetupSystem mldl meetup
System mldl meetup
Ganesan Narayanasamy
 
DeepLearning001&ApacheMXNetWithSparkForInference-ACNA2018
DeepLearning001&ApacheMXNetWithSparkForInference-ACNA2018DeepLearning001&ApacheMXNetWithSparkForInference-ACNA2018
DeepLearning001&ApacheMXNetWithSparkForInference-ACNA2018
Apache MXNet
 
Ultra Fast Deep Learning in Hybrid Cloud Using Intel Analytics Zoo & Alluxio
Ultra Fast Deep Learning in Hybrid Cloud Using Intel Analytics Zoo & AlluxioUltra Fast Deep Learning in Hybrid Cloud Using Intel Analytics Zoo & Alluxio
Ultra Fast Deep Learning in Hybrid Cloud Using Intel Analytics Zoo & Alluxio
Alluxio, Inc.
 
Typesafe spark- Zalando meetup
Typesafe spark- Zalando meetupTypesafe spark- Zalando meetup
Typesafe spark- Zalando meetup
Stavros Kontopoulos
 
BigDL webinar - Deep Learning Library for Spark
BigDL webinar - Deep Learning Library for SparkBigDL webinar - Deep Learning Library for Spark
BigDL webinar - Deep Learning Library for Spark
DESMOND YUEN
 
Data Science and Deep Learning on Spark with 1/10th of the Code with Roope As...
Data Science and Deep Learning on Spark with 1/10th of the Code with Roope As...Data Science and Deep Learning on Spark with 1/10th of the Code with Roope As...
Data Science and Deep Learning on Spark with 1/10th of the Code with Roope As...
Databricks
 
Smart Data Conference: DL4J and DataVec
Smart Data Conference: DL4J and DataVecSmart Data Conference: DL4J and DataVec
Smart Data Conference: DL4J and DataVec
Josh Patterson
 
FPGA Hardware Accelerator for Machine Learning
FPGA Hardware Accelerator for Machine Learning FPGA Hardware Accelerator for Machine Learning
FPGA Hardware Accelerator for Machine Learning
Dr. Swaminathan Kathirvel
 
StackNet Meta-Modelling framework
StackNet Meta-Modelling frameworkStackNet Meta-Modelling framework
StackNet Meta-Modelling framework
Sri Ambati
 
Energy analytics with Apache Spark workshop
Energy analytics with Apache Spark workshopEnergy analytics with Apache Spark workshop
Energy analytics with Apache Spark workshop
QuantUniversity
 
Running Emerging AI Applications on Big Data Platforms with Ray On Apache Spark
Running Emerging AI Applications on Big Data Platforms with Ray On Apache SparkRunning Emerging AI Applications on Big Data Platforms with Ray On Apache Spark
Running Emerging AI Applications on Big Data Platforms with Ray On Apache Spark
Databricks
 
Scaling AI in production using PyTorch
Scaling AI in production using PyTorchScaling AI in production using PyTorch
Scaling AI in production using PyTorch
geetachauhan
 
Machinel Learning with spark
Machinel Learning with spark Machinel Learning with spark
Machinel Learning with spark
Ons Dridi
 
Scientific
Scientific Scientific
Scientific
marpierc
 
Deep Learning Pipelines for High Energy Physics using Apache Spark with Distr...
Deep Learning Pipelines for High Energy Physics using Apache Spark with Distr...Deep Learning Pipelines for High Energy Physics using Apache Spark with Distr...
Deep Learning Pipelines for High Energy Physics using Apache Spark with Distr...
Databricks
 
Biomedical Signal and Image Analytics using MATLAB
Biomedical Signal and Image Analytics using MATLABBiomedical Signal and Image Analytics using MATLAB
Biomedical Signal and Image Analytics using MATLAB
CodeOps Technologies LLP
 
Building Deep Reinforcement Learning Applications on Apache Spark with Analyt...
Building Deep Reinforcement Learning Applications on Apache Spark with Analyt...Building Deep Reinforcement Learning Applications on Apache Spark with Analyt...
Building Deep Reinforcement Learning Applications on Apache Spark with Analyt...
Databricks
 

Similar to Using Deep Learning on Apache Spark to Diagnose Thoracic Pathology from Chest X-rays (20)

Automated ML Workflow for Distributed Big Data Using Analytics Zoo (CVPR2020 ...
Automated ML Workflow for Distributed Big Data Using Analytics Zoo (CVPR2020 ...Automated ML Workflow for Distributed Big Data Using Analytics Zoo (CVPR2020 ...
Automated ML Workflow for Distributed Big Data Using Analytics Zoo (CVPR2020 ...
 
Analytics Zoo: Building Analytics and AI Pipeline for Apache Spark and BigDL ...
Analytics Zoo: Building Analytics and AI Pipeline for Apache Spark and BigDL ...Analytics Zoo: Building Analytics and AI Pipeline for Apache Spark and BigDL ...
Analytics Zoo: Building Analytics and AI Pipeline for Apache Spark and BigDL ...
 
Deep Learning with Apache Spark: an Introduction
Deep Learning with Apache Spark: an IntroductionDeep Learning with Apache Spark: an Introduction
Deep Learning with Apache Spark: an Introduction
 
System mldl meetup
System mldl meetupSystem mldl meetup
System mldl meetup
 
DeepLearning001&ApacheMXNetWithSparkForInference-ACNA2018
DeepLearning001&ApacheMXNetWithSparkForInference-ACNA2018DeepLearning001&ApacheMXNetWithSparkForInference-ACNA2018
DeepLearning001&ApacheMXNetWithSparkForInference-ACNA2018
 
Ultra Fast Deep Learning in Hybrid Cloud Using Intel Analytics Zoo & Alluxio
Ultra Fast Deep Learning in Hybrid Cloud Using Intel Analytics Zoo & AlluxioUltra Fast Deep Learning in Hybrid Cloud Using Intel Analytics Zoo & Alluxio
Ultra Fast Deep Learning in Hybrid Cloud Using Intel Analytics Zoo & Alluxio
 
Typesafe spark- Zalando meetup
Typesafe spark- Zalando meetupTypesafe spark- Zalando meetup
Typesafe spark- Zalando meetup
 
BigDL webinar - Deep Learning Library for Spark
BigDL webinar - Deep Learning Library for SparkBigDL webinar - Deep Learning Library for Spark
BigDL webinar - Deep Learning Library for Spark
 
Data Science and Deep Learning on Spark with 1/10th of the Code with Roope As...
Data Science and Deep Learning on Spark with 1/10th of the Code with Roope As...Data Science and Deep Learning on Spark with 1/10th of the Code with Roope As...
Data Science and Deep Learning on Spark with 1/10th of the Code with Roope As...
 
Smart Data Conference: DL4J and DataVec
Smart Data Conference: DL4J and DataVecSmart Data Conference: DL4J and DataVec
Smart Data Conference: DL4J and DataVec
 
FPGA Hardware Accelerator for Machine Learning
FPGA Hardware Accelerator for Machine Learning FPGA Hardware Accelerator for Machine Learning
FPGA Hardware Accelerator for Machine Learning
 
StackNet Meta-Modelling framework
StackNet Meta-Modelling frameworkStackNet Meta-Modelling framework
StackNet Meta-Modelling framework
 
Energy analytics with Apache Spark workshop
Energy analytics with Apache Spark workshopEnergy analytics with Apache Spark workshop
Energy analytics with Apache Spark workshop
 
Running Emerging AI Applications on Big Data Platforms with Ray On Apache Spark
Running Emerging AI Applications on Big Data Platforms with Ray On Apache SparkRunning Emerging AI Applications on Big Data Platforms with Ray On Apache Spark
Running Emerging AI Applications on Big Data Platforms with Ray On Apache Spark
 
Scaling AI in production using PyTorch
Scaling AI in production using PyTorchScaling AI in production using PyTorch
Scaling AI in production using PyTorch
 
Machinel Learning with spark
Machinel Learning with spark Machinel Learning with spark
Machinel Learning with spark
 
Scientific
Scientific Scientific
Scientific
 
Deep Learning Pipelines for High Energy Physics using Apache Spark with Distr...
Deep Learning Pipelines for High Energy Physics using Apache Spark with Distr...Deep Learning Pipelines for High Energy Physics using Apache Spark with Distr...
Deep Learning Pipelines for High Energy Physics using Apache Spark with Distr...
 
Biomedical Signal and Image Analytics using MATLAB
Biomedical Signal and Image Analytics using MATLABBiomedical Signal and Image Analytics using MATLAB
Biomedical Signal and Image Analytics using MATLAB
 
Building Deep Reinforcement Learning Applications on Apache Spark with Analyt...
Building Deep Reinforcement Learning Applications on Apache Spark with Analyt...Building Deep Reinforcement Learning Applications on Apache Spark with Analyt...
Building Deep Reinforcement Learning Applications on Apache Spark with Analyt...
 

More from Databricks

DW Migration Webinar-March 2022.pptx
DW Migration Webinar-March 2022.pptxDW Migration Webinar-March 2022.pptx
DW Migration Webinar-March 2022.pptx
Databricks
 
Data Lakehouse Symposium | Day 1 | Part 1
Data Lakehouse Symposium | Day 1 | Part 1Data Lakehouse Symposium | Day 1 | Part 1
Data Lakehouse Symposium | Day 1 | Part 1
Databricks
 
Data Lakehouse Symposium | Day 1 | Part 2
Data Lakehouse Symposium | Day 1 | Part 2Data Lakehouse Symposium | Day 1 | Part 2
Data Lakehouse Symposium | Day 1 | Part 2
Databricks
 
Data Lakehouse Symposium | Day 2
Data Lakehouse Symposium | Day 2Data Lakehouse Symposium | Day 2
Data Lakehouse Symposium | Day 2
Databricks
 
Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4
Databricks
 
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
Databricks
 
Democratizing Data Quality Through a Centralized Platform
Democratizing Data Quality Through a Centralized PlatformDemocratizing Data Quality Through a Centralized Platform
Democratizing Data Quality Through a Centralized Platform
Databricks
 
Learn to Use Databricks for Data Science
Learn to Use Databricks for Data ScienceLearn to Use Databricks for Data Science
Learn to Use Databricks for Data Science
Databricks
 
Why APM Is Not the Same As ML Monitoring
Why APM Is Not the Same As ML MonitoringWhy APM Is Not the Same As ML Monitoring
Why APM Is Not the Same As ML Monitoring
Databricks
 
The Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
The Function, the Context, and the Data—Enabling ML Ops at Stitch FixThe Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
The Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
Databricks
 
Stage Level Scheduling Improving Big Data and AI Integration
Stage Level Scheduling Improving Big Data and AI IntegrationStage Level Scheduling Improving Big Data and AI Integration
Stage Level Scheduling Improving Big Data and AI Integration
Databricks
 
Simplify Data Conversion from Spark to TensorFlow and PyTorch
Simplify Data Conversion from Spark to TensorFlow and PyTorchSimplify Data Conversion from Spark to TensorFlow and PyTorch
Simplify Data Conversion from Spark to TensorFlow and PyTorch
Databricks
 
Scaling your Data Pipelines with Apache Spark on Kubernetes
Scaling your Data Pipelines with Apache Spark on KubernetesScaling your Data Pipelines with Apache Spark on Kubernetes
Scaling your Data Pipelines with Apache Spark on Kubernetes
Databricks
 
Scaling and Unifying SciKit Learn and Apache Spark Pipelines
Scaling and Unifying SciKit Learn and Apache Spark PipelinesScaling and Unifying SciKit Learn and Apache Spark Pipelines
Scaling and Unifying SciKit Learn and Apache Spark Pipelines
Databricks
 
Sawtooth Windows for Feature Aggregations
Sawtooth Windows for Feature AggregationsSawtooth Windows for Feature Aggregations
Sawtooth Windows for Feature Aggregations
Databricks
 
Redis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
Redis + Apache Spark = Swiss Army Knife Meets Kitchen SinkRedis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
Redis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
Databricks
 
Re-imagine Data Monitoring with whylogs and Spark
Re-imagine Data Monitoring with whylogs and SparkRe-imagine Data Monitoring with whylogs and Spark
Re-imagine Data Monitoring with whylogs and Spark
Databricks
 
Raven: End-to-end Optimization of ML Prediction Queries
Raven: End-to-end Optimization of ML Prediction QueriesRaven: End-to-end Optimization of ML Prediction Queries
Raven: End-to-end Optimization of ML Prediction Queries
Databricks
 
Processing Large Datasets for ADAS Applications using Apache Spark
Processing Large Datasets for ADAS Applications using Apache SparkProcessing Large Datasets for ADAS Applications using Apache Spark
Processing Large Datasets for ADAS Applications using Apache Spark
Databricks
 
Massive Data Processing in Adobe Using Delta Lake
Massive Data Processing in Adobe Using Delta LakeMassive Data Processing in Adobe Using Delta Lake
Massive Data Processing in Adobe Using Delta Lake
Databricks
 

More from Databricks (20)

DW Migration Webinar-March 2022.pptx
DW Migration Webinar-March 2022.pptxDW Migration Webinar-March 2022.pptx
DW Migration Webinar-March 2022.pptx
 
Data Lakehouse Symposium | Day 1 | Part 1
Data Lakehouse Symposium | Day 1 | Part 1Data Lakehouse Symposium | Day 1 | Part 1
Data Lakehouse Symposium | Day 1 | Part 1
 
Data Lakehouse Symposium | Day 1 | Part 2
Data Lakehouse Symposium | Day 1 | Part 2Data Lakehouse Symposium | Day 1 | Part 2
Data Lakehouse Symposium | Day 1 | Part 2
 
Data Lakehouse Symposium | Day 2
Data Lakehouse Symposium | Day 2Data Lakehouse Symposium | Day 2
Data Lakehouse Symposium | Day 2
 
Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4
 
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
 
Democratizing Data Quality Through a Centralized Platform
Democratizing Data Quality Through a Centralized PlatformDemocratizing Data Quality Through a Centralized Platform
Democratizing Data Quality Through a Centralized Platform
 
Learn to Use Databricks for Data Science
Learn to Use Databricks for Data ScienceLearn to Use Databricks for Data Science
Learn to Use Databricks for Data Science
 
Why APM Is Not the Same As ML Monitoring
Why APM Is Not the Same As ML MonitoringWhy APM Is Not the Same As ML Monitoring
Why APM Is Not the Same As ML Monitoring
 
The Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
The Function, the Context, and the Data—Enabling ML Ops at Stitch FixThe Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
The Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
 
Stage Level Scheduling Improving Big Data and AI Integration
Stage Level Scheduling Improving Big Data and AI IntegrationStage Level Scheduling Improving Big Data and AI Integration
Stage Level Scheduling Improving Big Data and AI Integration
 
Simplify Data Conversion from Spark to TensorFlow and PyTorch
Simplify Data Conversion from Spark to TensorFlow and PyTorchSimplify Data Conversion from Spark to TensorFlow and PyTorch
Simplify Data Conversion from Spark to TensorFlow and PyTorch
 
Scaling your Data Pipelines with Apache Spark on Kubernetes
Scaling your Data Pipelines with Apache Spark on KubernetesScaling your Data Pipelines with Apache Spark on Kubernetes
Scaling your Data Pipelines with Apache Spark on Kubernetes
 
Scaling and Unifying SciKit Learn and Apache Spark Pipelines
Scaling and Unifying SciKit Learn and Apache Spark PipelinesScaling and Unifying SciKit Learn and Apache Spark Pipelines
Scaling and Unifying SciKit Learn and Apache Spark Pipelines
 
Sawtooth Windows for Feature Aggregations
Sawtooth Windows for Feature AggregationsSawtooth Windows for Feature Aggregations
Sawtooth Windows for Feature Aggregations
 
Redis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
Redis + Apache Spark = Swiss Army Knife Meets Kitchen SinkRedis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
Redis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
 
Re-imagine Data Monitoring with whylogs and Spark
Re-imagine Data Monitoring with whylogs and SparkRe-imagine Data Monitoring with whylogs and Spark
Re-imagine Data Monitoring with whylogs and Spark
 
Raven: End-to-end Optimization of ML Prediction Queries
Raven: End-to-end Optimization of ML Prediction QueriesRaven: End-to-end Optimization of ML Prediction Queries
Raven: End-to-end Optimization of ML Prediction Queries
 
Processing Large Datasets for ADAS Applications using Apache Spark
Processing Large Datasets for ADAS Applications using Apache SparkProcessing Large Datasets for ADAS Applications using Apache Spark
Processing Large Datasets for ADAS Applications using Apache Spark
 
Massive Data Processing in Adobe Using Delta Lake
Massive Data Processing in Adobe Using Delta LakeMassive Data Processing in Adobe Using Delta Lake
Massive Data Processing in Adobe Using Delta Lake
 

Recently uploaded

A presentation that explain the Power BI Licensing
A presentation that explain the Power BI LicensingA presentation that explain the Power BI Licensing
A presentation that explain the Power BI Licensing
AlessioFois2
 
The Building Blocks of QuestDB, a Time Series Database
The Building Blocks of QuestDB, a Time Series DatabaseThe Building Blocks of QuestDB, a Time Series Database
The Building Blocks of QuestDB, a Time Series Database
javier ramirez
 
Predictably Improve Your B2B Tech Company's Performance by Leveraging Data
Predictably Improve Your B2B Tech Company's Performance by Leveraging DataPredictably Improve Your B2B Tech Company's Performance by Leveraging Data
Predictably Improve Your B2B Tech Company's Performance by Leveraging Data
Kiwi Creative
 
在线办理(英国UCA毕业证书)创意艺术大学毕业证在读证明一模一样
在线办理(英国UCA毕业证书)创意艺术大学毕业证在读证明一模一样在线办理(英国UCA毕业证书)创意艺术大学毕业证在读证明一模一样
在线办理(英国UCA毕业证书)创意艺术大学毕业证在读证明一模一样
v7oacc3l
 
Challenges of Nation Building-1.pptx with more important
Challenges of Nation Building-1.pptx with more importantChallenges of Nation Building-1.pptx with more important
Challenges of Nation Building-1.pptx with more important
Sm321
 
一比一原版(UCSB文凭证书)圣芭芭拉分校毕业证如何办理
一比一原版(UCSB文凭证书)圣芭芭拉分校毕业证如何办理一比一原版(UCSB文凭证书)圣芭芭拉分校毕业证如何办理
一比一原版(UCSB文凭证书)圣芭芭拉分校毕业证如何办理
nuttdpt
 
4th Modern Marketing Reckoner by MMA Global India & Group M: 60+ experts on W...
4th Modern Marketing Reckoner by MMA Global India & Group M: 60+ experts on W...4th Modern Marketing Reckoner by MMA Global India & Group M: 60+ experts on W...
4th Modern Marketing Reckoner by MMA Global India & Group M: 60+ experts on W...
Social Samosa
 
一比一原版(Glasgow毕业证书)格拉斯哥大学毕业证如何办理
一比一原版(Glasgow毕业证书)格拉斯哥大学毕业证如何办理一比一原版(Glasgow毕业证书)格拉斯哥大学毕业证如何办理
一比一原版(Glasgow毕业证书)格拉斯哥大学毕业证如何办理
g4dpvqap0
 
Experts live - Improving user adoption with AI
Experts live - Improving user adoption with AIExperts live - Improving user adoption with AI
Experts live - Improving user adoption with AI
jitskeb
 
一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理
一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理
一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理
g4dpvqap0
 
University of New South Wales degree offer diploma Transcript
University of New South Wales degree offer diploma TranscriptUniversity of New South Wales degree offer diploma Transcript
University of New South Wales degree offer diploma Transcript
soxrziqu
 
Intelligence supported media monitoring in veterinary medicine
Intelligence supported media monitoring in veterinary medicineIntelligence supported media monitoring in veterinary medicine
Intelligence supported media monitoring in veterinary medicine
AndrzejJarynowski
 
一比一原版(UMN文凭证书)明尼苏达大学毕业证如何办理
一比一原版(UMN文凭证书)明尼苏达大学毕业证如何办理一比一原版(UMN文凭证书)明尼苏达大学毕业证如何办理
一比一原版(UMN文凭证书)明尼苏达大学毕业证如何办理
nyfuhyz
 
一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理
一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理
一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理
74nqk8xf
 
Global Situational Awareness of A.I. and where its headed
Global Situational Awareness of A.I. and where its headedGlobal Situational Awareness of A.I. and where its headed
Global Situational Awareness of A.I. and where its headed
vikram sood
 
My burning issue is homelessness K.C.M.O.
My burning issue is homelessness K.C.M.O.My burning issue is homelessness K.C.M.O.
My burning issue is homelessness K.C.M.O.
rwarrenll
 
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
Timothy Spann
 
The Ipsos - AI - Monitor 2024 Report.pdf
The  Ipsos - AI - Monitor 2024 Report.pdfThe  Ipsos - AI - Monitor 2024 Report.pdf
The Ipsos - AI - Monitor 2024 Report.pdf
Social Samosa
 
Udemy_2024_Global_Learning_Skills_Trends_Report (1).pdf
Udemy_2024_Global_Learning_Skills_Trends_Report (1).pdfUdemy_2024_Global_Learning_Skills_Trends_Report (1).pdf
Udemy_2024_Global_Learning_Skills_Trends_Report (1).pdf
Fernanda Palhano
 
Palo Alto Cortex XDR presentation .......
Palo Alto Cortex XDR presentation .......Palo Alto Cortex XDR presentation .......
Palo Alto Cortex XDR presentation .......
Sachin Paul
 

Recently uploaded (20)

A presentation that explain the Power BI Licensing
A presentation that explain the Power BI LicensingA presentation that explain the Power BI Licensing
A presentation that explain the Power BI Licensing
 
The Building Blocks of QuestDB, a Time Series Database
The Building Blocks of QuestDB, a Time Series DatabaseThe Building Blocks of QuestDB, a Time Series Database
The Building Blocks of QuestDB, a Time Series Database
 
Predictably Improve Your B2B Tech Company's Performance by Leveraging Data
Predictably Improve Your B2B Tech Company's Performance by Leveraging DataPredictably Improve Your B2B Tech Company's Performance by Leveraging Data
Predictably Improve Your B2B Tech Company's Performance by Leveraging Data
 
在线办理(英国UCA毕业证书)创意艺术大学毕业证在读证明一模一样
在线办理(英国UCA毕业证书)创意艺术大学毕业证在读证明一模一样在线办理(英国UCA毕业证书)创意艺术大学毕业证在读证明一模一样
在线办理(英国UCA毕业证书)创意艺术大学毕业证在读证明一模一样
 
Challenges of Nation Building-1.pptx with more important
Challenges of Nation Building-1.pptx with more importantChallenges of Nation Building-1.pptx with more important
Challenges of Nation Building-1.pptx with more important
 
一比一原版(UCSB文凭证书)圣芭芭拉分校毕业证如何办理
一比一原版(UCSB文凭证书)圣芭芭拉分校毕业证如何办理一比一原版(UCSB文凭证书)圣芭芭拉分校毕业证如何办理
一比一原版(UCSB文凭证书)圣芭芭拉分校毕业证如何办理
 
4th Modern Marketing Reckoner by MMA Global India & Group M: 60+ experts on W...
4th Modern Marketing Reckoner by MMA Global India & Group M: 60+ experts on W...4th Modern Marketing Reckoner by MMA Global India & Group M: 60+ experts on W...
4th Modern Marketing Reckoner by MMA Global India & Group M: 60+ experts on W...
 
一比一原版(Glasgow毕业证书)格拉斯哥大学毕业证如何办理
一比一原版(Glasgow毕业证书)格拉斯哥大学毕业证如何办理一比一原版(Glasgow毕业证书)格拉斯哥大学毕业证如何办理
一比一原版(Glasgow毕业证书)格拉斯哥大学毕业证如何办理
 
Experts live - Improving user adoption with AI
Experts live - Improving user adoption with AIExperts live - Improving user adoption with AI
Experts live - Improving user adoption with AI
 
一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理
一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理
一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理
 
University of New South Wales degree offer diploma Transcript
University of New South Wales degree offer diploma TranscriptUniversity of New South Wales degree offer diploma Transcript
University of New South Wales degree offer diploma Transcript
 
Intelligence supported media monitoring in veterinary medicine
Intelligence supported media monitoring in veterinary medicineIntelligence supported media monitoring in veterinary medicine
Intelligence supported media monitoring in veterinary medicine
 
一比一原版(UMN文凭证书)明尼苏达大学毕业证如何办理
一比一原版(UMN文凭证书)明尼苏达大学毕业证如何办理一比一原版(UMN文凭证书)明尼苏达大学毕业证如何办理
一比一原版(UMN文凭证书)明尼苏达大学毕业证如何办理
 
一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理
一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理
一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理
 
Global Situational Awareness of A.I. and where its headed
Global Situational Awareness of A.I. and where its headedGlobal Situational Awareness of A.I. and where its headed
Global Situational Awareness of A.I. and where its headed
 
My burning issue is homelessness K.C.M.O.
My burning issue is homelessness K.C.M.O.My burning issue is homelessness K.C.M.O.
My burning issue is homelessness K.C.M.O.
 
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
 
The Ipsos - AI - Monitor 2024 Report.pdf
The  Ipsos - AI - Monitor 2024 Report.pdfThe  Ipsos - AI - Monitor 2024 Report.pdf
The Ipsos - AI - Monitor 2024 Report.pdf
 
Udemy_2024_Global_Learning_Skills_Trends_Report (1).pdf
Udemy_2024_Global_Learning_Skills_Trends_Report (1).pdfUdemy_2024_Global_Learning_Skills_Trends_Report (1).pdf
Udemy_2024_Global_Learning_Skills_Trends_Report (1).pdf
 
Palo Alto Cortex XDR presentation .......
Palo Alto Cortex XDR presentation .......Palo Alto Cortex XDR presentation .......
Palo Alto Cortex XDR presentation .......
 

Using Deep Learning on Apache Spark to Diagnose Thoracic Pathology from Chest X-rays

  • 1. WIFI SSID:SparkAISummit | Password: UnifiedAnalytics
  • 2. Yuhao Yang, Intel Bala Chandrasekaran, Dell EMC Using Deep Learning on Apache Spark to diagnose thoracic pathology from chest X-rays #UnifiedAnalytics #SparkAISummit
  • 3. Agenda • Problem statement • Analytics Zoo • Chest Xray Model architecture • Results and Observations 3#UnifiedAnalytics #SparkAISummit
  • 4. Predicting diseases in Chest X-rays • Develop a ML pipeline in Apache Spark and train a deep learning model to predict disease in Chest X-rays ▪ An integrated ML pipeline with Analytics Zoo on Apache Spark ▪ Demonstrate feature engineering and transfer learning APIs in Analytics Zoo ▪ Use Spark worker nodes to train at scale • CheXNet – Developed at Stanford University, CheXNet is a model for identifying thoracic pathologies from the NIH ChestXray14 dataset – https://stanfordmlgroup.github.io/projects/chexnet/ 4#UnifiedAnalytics #SparkAISummit
  • 5. X-ray dataset from NIH 5#UnifiedAnalytics #SparkAISummit • 112,120 images from over 30000 patients • Multi label (14 diseases) • Unbalanced datasets • Close to 50% of the images have ‘No findings’ • Infiltration get the most positive samples (19894) and Hernia get the least positive samples (227) 0 10000 20000 30000 40000 50000 60000 70000 Atelectasis Consolidation Infiltration Pneumothorax Edema Emphysema Fibrosis Effusion Pneumonia PleuralThickening Cardiomegaly Nodule Mass Hernia NoFindings Frequency 00000013_005.png,Emphysema | Infiltration | Pleural_Thickening 00000013_006.png,Effusion|Infiltration 00000013_007.png,Infiltration 00000013_008.png,No Finding
  • 7. Analytics Zoo Reference Use Cases Anomaly detection, sentiment analysis, fraud detection, chatbot, sequence prediction, etc. Built-In Deep Learning Models Image classification, object detection, text classification, recommendations, sequence- to-sequence, anomaly detection, etc. Feature Engineering Feature transformations for • Image, text, 3D imaging, time series, speech, etc. High-Level Pipeline APIs • Distributed TensorFlow and Keras on Spark • Native deep learning support in Spark DataFrames and ML Pipelines • Model serving API for model serving/inference pipelines Backends Spark, TensorFlow, Keras, BigDL, OpenVINO, MKL-DNN, etc. 7#UnifiedAnalytics #SparkAISummit Distributed TensorFlow, Keras and BigDL on Apache Spark https://github.com/intel-analytics/analytics-zoo/
  • 8. Bringing Deep Learning To Big Data Platform • Distributed deep learning framework for Apache Spark* • Make deep learning more accessible to big data users and data scientists – Write deep learning applications as standard Spark programs – Run on existing Spark/Hadoop clusters (no changes needed) • Feature parity with popular deep learning frameworks – E.g., Caffe, Torch, Tensorflow, etc. • High performance (on CPU) – Built-in Intel MKL and multi-threaded programming • Efficient scale-out – Leveraging Spark for distributed training & inference 8#UnifiedAnalytics #SparkAISummit https://github.com/intel-analytics/BigDL https://bigdl-project.github.io/
  • 9. 9 Architecture • Training using synchronous mini-batch SGD • Distributed training iterations run as standard Spark jobs
  • 10. Parameter Synchronization 10#UnifiedAnalytics #SparkAISummit Task n 1 2 n 1 2 n 1 2 n 1 2 n 1 2 n Task 1 Task 2 local gradient local gradient local gradient ∑ ∑ ∑ gradient 1 gradient n weight 1 weight 2 weight n “Parameter Synchronization” Job update update update gradient 2
  • 11. Integrated Pipeline for Machine Learning and Deep Learning ImageReader Image Pre- processing Image Classification fit Images in HDFS Images DataFrame Images as RDD Image Classification Model
  • 13. Let’s build the model 13#UnifiedAnalytics #SparkAISummit Read the X-ray images Feature Engineering Define the model Define optimizer Train the model
  • 14. Read the X-ray images as Spark DataFrames • Initialize NNContext and load X-ray images into DataFrames using NNImageReader • Process loaded X-ray images and add labels (another DataFrame) using Spark transformations 14#UnifiedAnalytics #SparkAISummit Read the X-ray images Feature Engineering Define the model Define optimizer Training the model from zoo.pipeline.nnframes import NNImageReader imageDF = NNImageReader.readImages(image_path, sc, resizeH=256, resizeW=256, image_codec=1) trainingDF = imageDF.join(labelDF, on="Image_Index", how="inner")
  • 15. Feature Engineering – Image Pre-processing In-built APIs for feature engineering using ChainedPreprocessing API 15#UnifiedAnalytics #SparkAISummit Read the X-ray images Feature Engineering Define the model Define optimizer Training the model ImageCenterCrop() ImageFlip() ImageChannelNormalize() ImageMatToTensor() ImageCenterCrop(224, 224) Random flip and brightness ImageChannelNormalize() To Tensor()
  • 16. Transfer Learning using ResNet-50 trained with ImageNet 16#UnifiedAnalytics #SparkAISummit Condition E Condition D Condition C Condition B Condition A Patient A Add a new input layer Remove the existing classification layer and add new classification layer
  • 17. Defining the model with Transfer Learning APIs 17#UnifiedAnalytics #SparkAISummit Read the X-ray images Feature Engineering Define the model Define optimizer Training the model • Load a pre-trained model using Net.load_bigdl. The model is trained with ImageNet dataset – Inception – ResNet 50 – DenseNet • Remove the final softmax layer of ResNet-50 • Add new input (for resized x-ray images) and output layer (to predict the 14 diseases). Activation function is Sigmoid • Avoid overfitting – Regularization – Dropout
  • 18. Defining the model with Transfer Learning APIs 18#UnifiedAnalytics #SparkAISummit Read the X-ray images Feature Engineering Define the model Define optimizer Training the model def get_resnet_model(model_path, label_length): full_model = Net.load_bigdl(model_path) model = full_model.new_graph(["pool5"]) inputNode = Input(name="input", shape=(3, 224, 224)) resnet = model.to_keras()(inputNode) flatten = GlobalAveragePooling2D(dim_ordering='th')(resnet) dropout = Dropout(0.2)(flatten) logits = Dense(label_length, W_regularizer=L2Regularizer (1e-1), b_regularizer=L2Regularizer(1e-1), activation="sigmoid")(dropout) lrModel = Model(inputNode, logits) return lrModel
  • 19. Define the Optimizer 19#UnifiedAnalytics #SparkAISummit Read the X-ray images Feature Engineering Define the model Define optimizer Training the model • Evaluated two optimizers: SGD and Adam Optimizer • Learning rate scheduler is implemented in two phases: – Warmup + Plateau schedule – Warmup: Gradually increase the learning rate for 5 epochs – Plateau: Plateau("Loss", factor=0.1, patience=1, mode="min", epsilon=0.01, cooldown=0, min_lr=1e- 15)
  • 20. Train the model using ML Pipelines 20#UnifiedAnalytics #SparkAISummit Read the X-ray images Feature Engineering Define the model Define optimizer Training the model • Analytics Zoo API NNEstimator to build the model • .fit() produces a neural network model which is a Transformer • You can now run .predict() on the model for inference • AUC-RoC is used to measure the accuracy of the model. Spark ML pipeline API BinaryClassificationEvaluator to determine the AUC-ROC for each disease. estimator = NNEstimator(xray_model, BinaryCrossEntropy(), transformer) .setBatchSize(batch_size).setMaxEpoch(num_epoch) .setFeaturesCol("image").setCachingSample(False). .setValidation(EveryEpoch(), validationDF, [AUC()], batch_size) .setOptimMethod(optim_method) Xray_nnmodel = estimator.fit(trainingDF)
  • 22. Spark parameter recommendations • Number of executors Number of compute nodes • Number of executor cores Number of physical cores minus two • Executor memory Number of cores per executor * (Memory required for each core + data partition) 22#UnifiedAnalytics #SparkAISummit
  • 23. 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 0 0.00002 0.00004 0.00006 0.00008 0.0001 0.00012 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 AverageAUC(Higherisbetter) LearningRate Epochs Learning rate Average AUC w/ Learning Rate Scheduler Average AUC w/o Learning Rate Scheduler Impact on Learning Rate Scheduler • Adam (with Learning Rate Scheduler) outperforms SGD – Warmup – Plateau • Learning rate scheduler helps is covering the model much faster 23#UnifiedAnalytics #SparkAISummit 1E-7
  • 24. Base model comparison 24#UnifiedAnalytics #SparkAISummit 0.5 0.55 0.6 0.65 0.7 0.75 0.8 0.85 0.9 0.95 AUC(Higherisbetter) Inception-V3 DenseNet-121 ResNet-50
  • 25. 0.55 0.6 0.65 0.7 0.75 0.8 0.85 1 2 3 4 5 6 7 8 AverageAUC(Higherisbetter) Batch per thread Batch size and Scalability • Two factors: – Batches per thread (mini-batch): Number of images in a iteration per worker – Global batch size: Batches per thread * Number of workers • Batch size is a critical parameter for model accuracy as well as for distributed performance – Increasing the batch size increases parallelism – Increasing the batch size may lead to a loss in generalization performance especially 25#UnifiedAnalytics #SparkAISummit
  • 26. Scalability 26#UnifiedAnalytics #SparkAISummit 0 0.5 1 1.5 2 2.5 3 3.5 4 8 12 16 Speedup(Higherisbetter) Number of nodes 3x speed up from 4 nodes to 16 nodes. ~ 2.5 hours to train the model bz=512 bz=1024 bz=1536 bz=2048 * 15 epochs and 32 cores per executor
  • 27. Future Work • Continue to improve the model performance and scalability – Implement LARS and other advanced learning rate scheduler for large scale training • Increase the feature set – Incorporate patient profile information • Develop performance characteristic and sizing guidelines for image classification problems 27#UnifiedAnalytics #SparkAISummit
  • 28. Links Code and white paper available at https://github.com/dell-ai-engineering/BigDL- ImageProcessing-Examples 28#UnifiedAnalytics #SparkAISummit
  • 29. Acknowledgements Mehmood Abd, Michael Bennett, Sajan Govindan, Jenwei Hsieh, Andrew Kipp, Dharmesh Patel, Leela Uppuluri, and Luke Wilson 29#UnifiedAnalytics #SparkAISummit (Alphabetical)
  • 30. Thank you! DON’T FORGET TO RATE AND REVIEW THE SESSIONS SEARCH SPARK + AI SUMMIT