SlideShare a Scribd company logo
1 of 25
BY: MICHAEL BATAVIA
KEY INSIGHTS OF USING DEEP
LEARNING TO ANALYZE
HEALTHCARE DATA
WHO AM I? My name is Michael Batavia and I’m a freshman at the
NYU Tandon School of Engineering.
I’ve done several research projects with machine learning
and deep learning and consider it my specialty.
Some projects:
• My Regeneron STS 2021 Winning Project
• My Winning App for the 2020 Congressional App
Challenge in District NY-14
• A Full Kaggle Data Analysis on Avocado Prices
Now, I’m here to teach you the best data analysis tools to
deal with complicated health care data.
WHAT ARE WE LOOKING
FOR?
• Before we begin to describe the data analysis techniques that we might use to
analyze healthcare data,
• Let’s think about the types of data that we might have.
• What are possible data inputs that you might need to pre-process when
you first obtain your healthcare data?
TYPES OF HEALTHCARE DATA
Different types of data that we might be given to analyze:
• ECG chart readings
• Slide images from an H&E Machine of metastatic / benign lymph nodes
• A chart containing a variety of serum measurements from diabetes patients
• JPEG / PNG photos of cancerous tumors (already preprocessed)
• A chart containing various physical attributes of a patient (height, weight, BMI, muscle mass, body fat percentage)
• Pictures of construction equipment to detect in metropolitan cities (computer vision for the blind)
• Test measurements for prosthetic limbs for veterans
• Real-time measurements of computer vision enabled walking canes
WHAT DOES THE DATA LOOK LIKE?
• For this workshop, we’re going to talk about the most common type of data that you
will be working with in healthcare:
• IMAGES!
When working with images, it is critical to know how many images you have on the
topic and what file format the images are. Make sure that any data analysis you do
can suit the file format specified in the images.
ORGANIZIN
G YOUR
DATA
Once you have converted your data into an
appropriate file format (via the use of online tools or
programming APIs), you need to know how to
organize your data.
• Think! If you have a bunch of breast cancer
data containing various pictures of metastatic
and benign tissue and their corresponding
labels, how would you organize the data?
This may seem like a simple step but it makes your
life much easier if you do this before you do any
complex data maneuvers.
It’s Important To VISUALIZE YOUR
DATA!
LOADING IN YOUR DATA
If you have organized your data sufficiently, you can load in the data to your designated programming language.
Here are some tips to help with the loading of your data:
• If you created a CSV file like I stated earlier, you can use imaging libraries like PIL to load in images with
common file formats to Python using their file location. Then, you can replace the file name in your CSV to the
actual stored image in memory.
• If loading in a large set of images, be wary of memory restrictions. Try to load data in mini-batches when
possible to prevent loading delays and out of memory crashes.
• Try shuffling the files when they are loaded into the program. This will help the eventual machine learning
portion of your program not remember sequential patterns with your files. Randomness is good!
When you load in your photos, use libraires like numpy to convert those images into multi-dimensional arrays for
future pre-processing. Look at the image on the next slide for an explanation of multi-dimensional arrays.
A PICTURE AS AN INTENSITY
ARRAY
THE REAL WORLD OF HEALTHCARE DATA
If you have a collection of images containing magnified benign and malignant tumors for throat
cancer detection, what inherent problem would you find in the data?
CLASS IMBALANCE
NORMALIZING DATA
Once you have loaded in your image data, it is considered custom to normalize it in order for the machine
learning algorithms to be able to deal with the wide range of pixel values across the three or more channels
in the images.
Normalizing an image is quite simple once you have converted the images in your program into multi-
dimensional arrays of pixel values. All you have to do is divide each pixel value in the array by 255 (the
maximum pixel value). Libraries like numpy can do this easily through the power of broadcasting. An
alternative technique that can also be done is standardization with nearly identical results.
AUGMENTING YOUR DATA
A common technique that is done with most images that are put through neural networks is to augment them through an image data
generator.
An image data generator will take already existing images and apply small transformations of them customized to your liking. These
transformation depend on the type of augmentations you would expect to have in your testing data. A full list of common translations
are listed below. All you need to do is to apply the generator to the training data of the neural network before the training begins and
both the original and augmented images will be fed into the neural network!
Common Augmentations:
• Rotating 90 Degrees either clockwise or counterclockwise
• Flipping an image vertically or horizontally
• Translating the image vertically or horizontally
• Increasing the brightness or contrast of an image
• Shearing or zooming in on an image
APPLYING A NEURAL NETWORK
For those of you who have experience with neural networks, what type of neural
network would work best when dealing with images?
What type of inner components would you put in the neural network to lead to optimal
results?
CHOOSING YOUR NEURAL NETWORK
The most common type of neural network to use for analyzing patterns in images is a
convolutional neural network (CNN).
A convolutional neural network is a specific type of neural network that is able to find both
small and large patterns in classes of images through processes called convolutions and
pooling to differentiate them from other classes. The inherent minute differences between
images is how a convolutional neural network can complete classification so fast and
accurately compared to manual classification.
With healthcare data, this is especially important! Fast classifications lead to rapid pre-
diagnosis!
USING A PRE-TRAINED NEURAL NETWORK
Although custom-crafted CNNs may successfully classify between two or more classes, it
may sometimes be easier to use a pre-trained CNN to deal with your problem.
A pre-trained neural network is a neural network that has already been compiled and
trained on a specific training set. Common types of pre-trained neural networks include
ResNet, YOLOv5, VGG16 and EfficientNet.
Since pre-trained neural networks are only supposed to work with a specific dataset, it is up
to you to customize the inputs and outputs of the neural network to work with your dataset. It
is also your responsibility to choose whether to apply the pre-trained weights from the neural
network into your own network. This often depends on the types of images that the pre-
trained network was trained on.
OPTIMIZING YOUR NEURAL NETWORK
There are some optimizations that you can do to your neural network to make it even faster and possibly even more efficient. It’s up to you
whether to implement these optimizations to your custom neural network or to your pre-trained neural network.
1. Fine-tune layers of a pre-trained neural network
a. Leads to the pre-trained neural network learning more specific patterns for your classification task
2. Use tuned versions of the ReLU activation layer
a. Avoids the “dying ReLU” problem via possible exploding or vanishing gradients.
b. Instead, use LeakyReLU or ELU activation.
3. Use early stopping and model checkpoints.
a. With early stopping, you can save CPU/GPU usage on your computer when your model tends to not improve more than a
threshold ε that you set in the code.
b. With model checkpoints, you can start/stop training in your neural network at any time. You can also save a checkpoint when your
model is finished to make it portable. This way, you can test the model anywhere on any data rather than having it restricted to
your computer.
4. Create a constant that reflects the class imbalance in the data (output bias).
a. Using this constant will speed up convergence in your neural network and eliminate training periods where the network is just
learning the class imbalance.
b. The formula to calculate this output bias is shown in the figure to the right.
NEURAL NETWORK METRICS FOR HEALTHCARE
DATA
When our neural network finally trains, what metrics do we want to look for in order to see how well the network can differentiate between benign/malignant
cancer cells or between positive/negative diagnosis of diabetes.
• Validation Accuracy
• This metric measures the accuracy of the neural network’s classifications on a set of data that the neural network has never seen (validation
data). This metric helps to see how the neural network can generalize on new data outside of its training data.
• Precision
• This metric measures how much of the classifications made of one class are correct. For example, a model with 50% cancer diagnosis
precision means that when the model predicts a cancer cell is malignant, it is correct 50% of the time.
• Recall
• This metric measures how much of the actual class was identified by the model. For example, a model with 11% cancer diagnosis recall
means that the neural network can correctly identify 11% of all malignant tumors in the data.
• AUROC
• The area under the receiving operating characteristic curve (AUROC) is a measure on how well your neural network can effectively distinguish
between two classes. This is usually reported as another validation metric in addition to validation accuracy when doing binary classification
for healthcare data.
CREATING RESULT GRAPHS
Once you obtain your metrics, it is quite wise to create graphs to represent the results of your deep
learning investigation so that other scientists can easily understand your conclusions.
You can easily do this through the use of graphing libraries like matplotlib, seaborn or bokeh. All you
need to do is just which plots to present with your research. Common choices include a plot of
your neural network’s training and validation accuracy over time, a plot of your neural
network’s training and validation loss over time, a plot of your neural network’s architecture
and possible visualizations of your data with comparisons between the ground truth label and
the network’s predicted label.
You can also create other graphs but these graphs depend on the specific experiments that you
perform in your paper. Let’s take a look at a few examples.
The LeNet Convolutional Neural Network
A Result From My Research Paper
Research Quality Colab
Notebook
If you are interested in seeing how I was able to create a
research paper using a custom-crafted CNN to detect real-
world breast cancer tumor images in lymph nodes, I highly
suggest you check out my Colab notebook here:
https://github.com/AstroNoodles/Mini-
Projects/blob/master/Parallel_Sync_CNN_Research_B.ipyn
b.
It makes use of a lot more advanced techniques such as
hypertuning, the use of TensorBoard and batch
normalization but through more research and
experimentation, you will be able to learn these ideas as
well as submit your own research project or create your own
deep neural network to submit to competitions like this one!
Thanks for Listening
To My Presentation!
I hope you enjoyed it as much as I had fun
making it!
Are there any questions for me?
My website: https://astronoodles.github.io/

More Related Content

What's hot

What's hot (20)

What is Deep Learning and how it helps to Healthcare Sector?
What is Deep Learning and how it helps to Healthcare Sector?What is Deep Learning and how it helps to Healthcare Sector?
What is Deep Learning and how it helps to Healthcare Sector?
 
Deep learning in medicine: An introduction and applications to next-generatio...
Deep learning in medicine: An introduction and applications to next-generatio...Deep learning in medicine: An introduction and applications to next-generatio...
Deep learning in medicine: An introduction and applications to next-generatio...
 
Virtual Worlds And Real World
Virtual Worlds And Real WorldVirtual Worlds And Real World
Virtual Worlds And Real World
 
A Modified CNN-Based Face Recognition System
A Modified CNN-Based Face Recognition SystemA Modified CNN-Based Face Recognition System
A Modified CNN-Based Face Recognition System
 
Report (1)
Report (1)Report (1)
Report (1)
 
Techniques of Brain Cancer Detection from MRI using Machine Learning
Techniques of Brain Cancer Detection from MRI using Machine LearningTechniques of Brain Cancer Detection from MRI using Machine Learning
Techniques of Brain Cancer Detection from MRI using Machine Learning
 
ENHANCED SYSTEM FOR COMPUTER-AIDED DETECTION OF MRI BRAIN TUMORS
ENHANCED SYSTEM FOR COMPUTER-AIDED DETECTION OF MRI BRAIN TUMORSENHANCED SYSTEM FOR COMPUTER-AIDED DETECTION OF MRI BRAIN TUMORS
ENHANCED SYSTEM FOR COMPUTER-AIDED DETECTION OF MRI BRAIN TUMORS
 
IRJET- Brain Tumor Detection using Image Processing and MATLAB Application
IRJET-  	  Brain Tumor Detection using Image Processing and MATLAB ApplicationIRJET-  	  Brain Tumor Detection using Image Processing and MATLAB Application
IRJET- Brain Tumor Detection using Image Processing and MATLAB Application
 
DRL Medical Imaging Literature Review
DRL Medical Imaging Literature ReviewDRL Medical Imaging Literature Review
DRL Medical Imaging Literature Review
 
My own Machine Learning project - Breast Cancer Prediction
My own Machine Learning project - Breast Cancer PredictionMy own Machine Learning project - Breast Cancer Prediction
My own Machine Learning project - Breast Cancer Prediction
 
Prospects of Deep Learning in Medical Imaging
Prospects of Deep Learning in Medical ImagingProspects of Deep Learning in Medical Imaging
Prospects of Deep Learning in Medical Imaging
 
Chest X-ray Pneumonia Classification with Deep Learning
Chest X-ray Pneumonia Classification with Deep LearningChest X-ray Pneumonia Classification with Deep Learning
Chest X-ray Pneumonia Classification with Deep Learning
 
Brain Tumor Segmentation in MRI Images
Brain Tumor Segmentation in MRI ImagesBrain Tumor Segmentation in MRI Images
Brain Tumor Segmentation in MRI Images
 
Pneumonia Detection using CNN
Pneumonia Detection using CNNPneumonia Detection using CNN
Pneumonia Detection using CNN
 
A survey of deep learning approaches to medical applications
A survey of deep learning approaches to medical applicationsA survey of deep learning approaches to medical applications
A survey of deep learning approaches to medical applications
 
Deep learning and Healthcare
Deep learning and HealthcareDeep learning and Healthcare
Deep learning and Healthcare
 
Brain Tumor Detection and Classification using Adaptive Boosting
Brain Tumor Detection and Classification using Adaptive BoostingBrain Tumor Detection and Classification using Adaptive Boosting
Brain Tumor Detection and Classification using Adaptive Boosting
 
COVID-19 detection from scarce chest X-Ray image data using few-shot deep lea...
COVID-19 detection from scarce chest X-Ray image data using few-shot deep lea...COVID-19 detection from scarce chest X-Ray image data using few-shot deep lea...
COVID-19 detection from scarce chest X-Ray image data using few-shot deep lea...
 
Brain Image Fusion using DWT and Laplacian Pyramid Approach and Tumor Detecti...
Brain Image Fusion using DWT and Laplacian Pyramid Approach and Tumor Detecti...Brain Image Fusion using DWT and Laplacian Pyramid Approach and Tumor Detecti...
Brain Image Fusion using DWT and Laplacian Pyramid Approach and Tumor Detecti...
 
How to create your own artificial neural networks
How to create your own artificial neural networksHow to create your own artificial neural networks
How to create your own artificial neural networks
 

Similar to Key Insights Of Using Deep Learning To Analyze Healthcare Data | Workshop From Girls Computing NY AI Summit 2021

Application of machine learning in industrial applications
Application of machine learning in industrial applicationsApplication of machine learning in industrial applications
Application of machine learning in industrial applications
Anish Das
 
Screening of Mental Health in Adolescence.pptx
Screening of Mental Health in Adolescence.pptxScreening of Mental Health in Adolescence.pptx
Screening of Mental Health in Adolescence.pptx
NitishChoudhary23
 

Similar to Key Insights Of Using Deep Learning To Analyze Healthcare Data | Workshop From Girls Computing NY AI Summit 2021 (20)

first review.pptxgghggggvvvvbbvvvvvhhjjjbbvvvvbbbbbhhhhhhhhhbbh
first review.pptxgghggggvvvvbbvvvvvhhjjjbbvvvvbbbbbhhhhhhhhhbbhfirst review.pptxgghggggvvvvbbvvvvvhhjjjbbvvvvbbbbbhhhhhhhhhbbh
first review.pptxgghggggvvvvbbvvvvvhhjjjbbvvvvbbbbbhhhhhhhhhbbh
 
Brain Tumor Detection Using Deep Learning ppt new made.pptx
Brain Tumor Detection Using Deep Learning ppt new made.pptxBrain Tumor Detection Using Deep Learning ppt new made.pptx
Brain Tumor Detection Using Deep Learning ppt new made.pptx
 
Breast Cancer Prediction - Arwa Marfatia.pptx
Breast Cancer Prediction - Arwa Marfatia.pptxBreast Cancer Prediction - Arwa Marfatia.pptx
Breast Cancer Prediction - Arwa Marfatia.pptx
 
Blood Cell Image Classification for Detecting Malaria using CNN
Blood Cell Image Classification for Detecting Malaria using CNNBlood Cell Image Classification for Detecting Malaria using CNN
Blood Cell Image Classification for Detecting Malaria using CNN
 
heart final last sem.pptx
heart final last sem.pptxheart final last sem.pptx
heart final last sem.pptx
 
APPLICATION OF CNN MODEL ON MEDICAL IMAGE
APPLICATION OF CNN MODEL ON MEDICAL IMAGEAPPLICATION OF CNN MODEL ON MEDICAL IMAGE
APPLICATION OF CNN MODEL ON MEDICAL IMAGE
 
Application of machine learning in industrial applications
Application of machine learning in industrial applicationsApplication of machine learning in industrial applications
Application of machine learning in industrial applications
 
A DEEP LEARNING APPROACH FOR SEMANTIC SEGMENTATION IN BRAIN TUMOR IMAGES
A DEEP LEARNING APPROACH FOR SEMANTIC SEGMENTATION IN BRAIN TUMOR IMAGESA DEEP LEARNING APPROACH FOR SEMANTIC SEGMENTATION IN BRAIN TUMOR IMAGES
A DEEP LEARNING APPROACH FOR SEMANTIC SEGMENTATION IN BRAIN TUMOR IMAGES
 
ppt.pdf
ppt.pdfppt.pdf
ppt.pdf
 
Image Analytics In Healthcare
Image Analytics In HealthcareImage Analytics In Healthcare
Image Analytics In Healthcare
 
Artificial Neural Networks for data mining
Artificial Neural Networks for data miningArtificial Neural Networks for data mining
Artificial Neural Networks for data mining
 
Artificial Neural Networks for Data Mining
Artificial Neural Networks for Data MiningArtificial Neural Networks for Data Mining
Artificial Neural Networks for Data Mining
 
Retinal Image Analysis using Machine Learning and Deep.pptx
Retinal Image Analysis using Machine Learning and Deep.pptxRetinal Image Analysis using Machine Learning and Deep.pptx
Retinal Image Analysis using Machine Learning and Deep.pptx
 
covid 19 detection using lung x-rays.pptx.pptx
covid 19 detection using lung x-rays.pptx.pptxcovid 19 detection using lung x-rays.pptx.pptx
covid 19 detection using lung x-rays.pptx.pptx
 
BIG DATA AND MACHINE LEARNING
BIG DATA AND MACHINE LEARNINGBIG DATA AND MACHINE LEARNING
BIG DATA AND MACHINE LEARNING
 
Batch -13.pptx lung cancer detection using transfer learning
Batch -13.pptx lung cancer detection using transfer learningBatch -13.pptx lung cancer detection using transfer learning
Batch -13.pptx lung cancer detection using transfer learning
 
Screening of Mental Health in Adolescence.pptx
Screening of Mental Health in Adolescence.pptxScreening of Mental Health in Adolescence.pptx
Screening of Mental Health in Adolescence.pptx
 
brain tumor presentation.pptxbraintumorpresentationonbraintumor
brain tumor presentation.pptxbraintumorpresentationonbraintumorbrain tumor presentation.pptxbraintumorpresentationonbraintumor
brain tumor presentation.pptxbraintumorpresentationonbraintumor
 
Lecture-1-Introduction to Deep learning.pptx
Lecture-1-Introduction to Deep learning.pptxLecture-1-Introduction to Deep learning.pptx
Lecture-1-Introduction to Deep learning.pptx
 
International Journal of Computational Engineering Research(IJCER)
International Journal of Computational Engineering Research(IJCER)International Journal of Computational Engineering Research(IJCER)
International Journal of Computational Engineering Research(IJCER)
 

Recently uploaded

Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
Joaquim Jorge
 

Recently uploaded (20)

Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
Tech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdfTech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdf
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 

Key Insights Of Using Deep Learning To Analyze Healthcare Data | Workshop From Girls Computing NY AI Summit 2021

  • 1. BY: MICHAEL BATAVIA KEY INSIGHTS OF USING DEEP LEARNING TO ANALYZE HEALTHCARE DATA
  • 2. WHO AM I? My name is Michael Batavia and I’m a freshman at the NYU Tandon School of Engineering. I’ve done several research projects with machine learning and deep learning and consider it my specialty. Some projects: • My Regeneron STS 2021 Winning Project • My Winning App for the 2020 Congressional App Challenge in District NY-14 • A Full Kaggle Data Analysis on Avocado Prices Now, I’m here to teach you the best data analysis tools to deal with complicated health care data.
  • 3. WHAT ARE WE LOOKING FOR? • Before we begin to describe the data analysis techniques that we might use to analyze healthcare data, • Let’s think about the types of data that we might have. • What are possible data inputs that you might need to pre-process when you first obtain your healthcare data?
  • 4. TYPES OF HEALTHCARE DATA Different types of data that we might be given to analyze: • ECG chart readings • Slide images from an H&E Machine of metastatic / benign lymph nodes • A chart containing a variety of serum measurements from diabetes patients • JPEG / PNG photos of cancerous tumors (already preprocessed) • A chart containing various physical attributes of a patient (height, weight, BMI, muscle mass, body fat percentage) • Pictures of construction equipment to detect in metropolitan cities (computer vision for the blind) • Test measurements for prosthetic limbs for veterans • Real-time measurements of computer vision enabled walking canes
  • 5. WHAT DOES THE DATA LOOK LIKE? • For this workshop, we’re going to talk about the most common type of data that you will be working with in healthcare: • IMAGES! When working with images, it is critical to know how many images you have on the topic and what file format the images are. Make sure that any data analysis you do can suit the file format specified in the images.
  • 6. ORGANIZIN G YOUR DATA Once you have converted your data into an appropriate file format (via the use of online tools or programming APIs), you need to know how to organize your data. • Think! If you have a bunch of breast cancer data containing various pictures of metastatic and benign tissue and their corresponding labels, how would you organize the data? This may seem like a simple step but it makes your life much easier if you do this before you do any complex data maneuvers.
  • 7. It’s Important To VISUALIZE YOUR DATA!
  • 8. LOADING IN YOUR DATA If you have organized your data sufficiently, you can load in the data to your designated programming language. Here are some tips to help with the loading of your data: • If you created a CSV file like I stated earlier, you can use imaging libraries like PIL to load in images with common file formats to Python using their file location. Then, you can replace the file name in your CSV to the actual stored image in memory. • If loading in a large set of images, be wary of memory restrictions. Try to load data in mini-batches when possible to prevent loading delays and out of memory crashes. • Try shuffling the files when they are loaded into the program. This will help the eventual machine learning portion of your program not remember sequential patterns with your files. Randomness is good! When you load in your photos, use libraires like numpy to convert those images into multi-dimensional arrays for future pre-processing. Look at the image on the next slide for an explanation of multi-dimensional arrays.
  • 9. A PICTURE AS AN INTENSITY ARRAY
  • 10. THE REAL WORLD OF HEALTHCARE DATA If you have a collection of images containing magnified benign and malignant tumors for throat cancer detection, what inherent problem would you find in the data?
  • 12. NORMALIZING DATA Once you have loaded in your image data, it is considered custom to normalize it in order for the machine learning algorithms to be able to deal with the wide range of pixel values across the three or more channels in the images. Normalizing an image is quite simple once you have converted the images in your program into multi- dimensional arrays of pixel values. All you have to do is divide each pixel value in the array by 255 (the maximum pixel value). Libraries like numpy can do this easily through the power of broadcasting. An alternative technique that can also be done is standardization with nearly identical results.
  • 13. AUGMENTING YOUR DATA A common technique that is done with most images that are put through neural networks is to augment them through an image data generator. An image data generator will take already existing images and apply small transformations of them customized to your liking. These transformation depend on the type of augmentations you would expect to have in your testing data. A full list of common translations are listed below. All you need to do is to apply the generator to the training data of the neural network before the training begins and both the original and augmented images will be fed into the neural network! Common Augmentations: • Rotating 90 Degrees either clockwise or counterclockwise • Flipping an image vertically or horizontally • Translating the image vertically or horizontally • Increasing the brightness or contrast of an image • Shearing or zooming in on an image
  • 14. APPLYING A NEURAL NETWORK For those of you who have experience with neural networks, what type of neural network would work best when dealing with images? What type of inner components would you put in the neural network to lead to optimal results?
  • 15. CHOOSING YOUR NEURAL NETWORK The most common type of neural network to use for analyzing patterns in images is a convolutional neural network (CNN). A convolutional neural network is a specific type of neural network that is able to find both small and large patterns in classes of images through processes called convolutions and pooling to differentiate them from other classes. The inherent minute differences between images is how a convolutional neural network can complete classification so fast and accurately compared to manual classification. With healthcare data, this is especially important! Fast classifications lead to rapid pre- diagnosis!
  • 16.
  • 17. USING A PRE-TRAINED NEURAL NETWORK Although custom-crafted CNNs may successfully classify between two or more classes, it may sometimes be easier to use a pre-trained CNN to deal with your problem. A pre-trained neural network is a neural network that has already been compiled and trained on a specific training set. Common types of pre-trained neural networks include ResNet, YOLOv5, VGG16 and EfficientNet. Since pre-trained neural networks are only supposed to work with a specific dataset, it is up to you to customize the inputs and outputs of the neural network to work with your dataset. It is also your responsibility to choose whether to apply the pre-trained weights from the neural network into your own network. This often depends on the types of images that the pre- trained network was trained on.
  • 18.
  • 19. OPTIMIZING YOUR NEURAL NETWORK There are some optimizations that you can do to your neural network to make it even faster and possibly even more efficient. It’s up to you whether to implement these optimizations to your custom neural network or to your pre-trained neural network. 1. Fine-tune layers of a pre-trained neural network a. Leads to the pre-trained neural network learning more specific patterns for your classification task 2. Use tuned versions of the ReLU activation layer a. Avoids the “dying ReLU” problem via possible exploding or vanishing gradients. b. Instead, use LeakyReLU or ELU activation. 3. Use early stopping and model checkpoints. a. With early stopping, you can save CPU/GPU usage on your computer when your model tends to not improve more than a threshold ε that you set in the code. b. With model checkpoints, you can start/stop training in your neural network at any time. You can also save a checkpoint when your model is finished to make it portable. This way, you can test the model anywhere on any data rather than having it restricted to your computer. 4. Create a constant that reflects the class imbalance in the data (output bias). a. Using this constant will speed up convergence in your neural network and eliminate training periods where the network is just learning the class imbalance. b. The formula to calculate this output bias is shown in the figure to the right.
  • 20. NEURAL NETWORK METRICS FOR HEALTHCARE DATA When our neural network finally trains, what metrics do we want to look for in order to see how well the network can differentiate between benign/malignant cancer cells or between positive/negative diagnosis of diabetes. • Validation Accuracy • This metric measures the accuracy of the neural network’s classifications on a set of data that the neural network has never seen (validation data). This metric helps to see how the neural network can generalize on new data outside of its training data. • Precision • This metric measures how much of the classifications made of one class are correct. For example, a model with 50% cancer diagnosis precision means that when the model predicts a cancer cell is malignant, it is correct 50% of the time. • Recall • This metric measures how much of the actual class was identified by the model. For example, a model with 11% cancer diagnosis recall means that the neural network can correctly identify 11% of all malignant tumors in the data. • AUROC • The area under the receiving operating characteristic curve (AUROC) is a measure on how well your neural network can effectively distinguish between two classes. This is usually reported as another validation metric in addition to validation accuracy when doing binary classification for healthcare data.
  • 21. CREATING RESULT GRAPHS Once you obtain your metrics, it is quite wise to create graphs to represent the results of your deep learning investigation so that other scientists can easily understand your conclusions. You can easily do this through the use of graphing libraries like matplotlib, seaborn or bokeh. All you need to do is just which plots to present with your research. Common choices include a plot of your neural network’s training and validation accuracy over time, a plot of your neural network’s training and validation loss over time, a plot of your neural network’s architecture and possible visualizations of your data with comparisons between the ground truth label and the network’s predicted label. You can also create other graphs but these graphs depend on the specific experiments that you perform in your paper. Let’s take a look at a few examples.
  • 22. The LeNet Convolutional Neural Network
  • 23. A Result From My Research Paper
  • 24. Research Quality Colab Notebook If you are interested in seeing how I was able to create a research paper using a custom-crafted CNN to detect real- world breast cancer tumor images in lymph nodes, I highly suggest you check out my Colab notebook here: https://github.com/AstroNoodles/Mini- Projects/blob/master/Parallel_Sync_CNN_Research_B.ipyn b. It makes use of a lot more advanced techniques such as hypertuning, the use of TensorBoard and batch normalization but through more research and experimentation, you will be able to learn these ideas as well as submit your own research project or create your own deep neural network to submit to competitions like this one!
  • 25. Thanks for Listening To My Presentation! I hope you enjoyed it as much as I had fun making it! Are there any questions for me? My website: https://astronoodles.github.io/