IDENTIFICATION OF
ALZHEIMER'S DISEASE USING A
DEEP LEARNING METHOD BASED
ON T1-W BRAIN MRI IMAGES
University of Milano-Bicocca
Master's Degree in Data Science
Big Data in Biotechnology & Biosciences Course
Academic Year 2022-2023
Authors:
Giorgio CARBONE matr. n° 811974
Emilio LINGENTHAL matr. n° 889111
GitHub Repository
Alzheimer’s Disease
❑ Alzheimer’s Disease (AD): irreversible progressive
neurodegenerative disease
❑ memory loss, cognitive impairment, behavioural
changes and eventually death
❑ no prevention methods or treatment
❑ Brain Changes:
❑ Accumulation of the beta-amyloid peptide
outside neurons and hyperphosphorylated tau
protein aggregates inside neurons
❑ Death of neurons
❑ Inflammation and atrophy of brain tissue
Regions of brain affected by Alzheimer’s disease. Source.
Alzheimer’s Disease diagnosis
❑ Review of symptoms, medical history, medication
history and interviews with friends and family
❑ Cognitive Assessment/Test: memory, problem
solving, attention, counting, and language
❑ Psychiatric evaluation for depression diagnosis
❑ Laboratory Tests: cerebrospinal fluid (CSF) and
plasma collection and examination to measure the
level of dementia-related biomarkers
❑ β-amyloid peptide and p-tau protein
❑ Brain scans: CT, PET and MRI, the latter used to verify:
❑ hippocampus volume reduction
❑ cortical thickness reduction in certain brain
areas
❑ ventricles enlargement
Healthy vs Alzheimer's brain disease in T1-W MRI images. Source.
Project Objectives
1. to build a dataset of 2D brain axial images slicing 3D T1-weighted brain MRI images,
❑ Enriched with clinical (demographics, clinical assessments, and cognitive assessments), genetic and
biospecimen subjects data
❑ Data exploration of the created dataset
2. to train a Deep Neural Network for the classification of 2D axial T1-W brain images of Cognitive Normal (CN)
and Alzheimer’s Disease (AD) subjects
3. to evaluate the effect of different techniques for handling the class imbalance problem on classifier
performance and bias
❑ Random Undersampling on largest class subjects
❑ Stratified Undersampling of subjects in the largest class based on metadata
❑ Training of a Generative Model (Wesserstein GAN) for the oversampling of the minority class
THE ADNI DATABASE,
DATA ACQUISITION AND ENRICHMENT
AND DATA PREPARATION
Alzheimer's Disease Neuroimaging Initiative: ADNI
❑ Longitudinal multicenter study and open-access data
repository
❑ 63 sites in the US and Canada, > 2000 participants
❑ Designed to develop biomarkers for the early detection
and tracking of Alzheimer’s disease (AD):
❑ Clinical and Neuropsychological (Demographics,
Cognitive assessments, clinical assessments)
❑ Imaging (MRI and PET)
❑ Genetic (APOE and TOMM40 Genotyping)
❑ Biochemical (CSF/plasma tau, β-amyloid)
❑ Eligible patients (age 55-90, speak English or Spanish)
are divided through a screening process in:
❑ Alzheimer’s Disease (AD), Mild Cognitive
Impairment (MCI), Cognitive Normal (CN)
Data Acquisition & Data Enrichment
❑ Objective: to combine heterogeneous data using
❑ Bridge Table: subject's unique ID / MRI image ID
❑ Volumetric Axial MRI image acquisition for AD, MCI and
CN patients
❑ T1-weighted
❑ MPR; GradWarp; B1 Correction; N3; Scaled
❑ volumetric .nii files
CN MCI AD
n. of subjects 179 317 118
Image ID Subject ID
I63897 123_S_5678
❑ Metadata Acquisition
❑ Demographics → Age, Ethnicity, Gender, Weight, Dominant Hand, Marital status, Education Level, employment status
❑ Neuropsychological data → MMSE (Mini Mental State Exam), Clinical Dementia Rating (CDR), Geriatric Depression
Scale (GDS)
❑ Genetic data → APOE A1, APOE A2 genotyping
❑ Biochemical data → plasma tau and β-amyloid concentrations
Image Preprocessing
1. Subject division into training (83%), validation (7%) and
test (10%) splits
2. Extraction of sequences of .png axial MRI images/slices
from .nii volumetric data
3. Removal of 40 − 45% of the images at the beginning
of the sequence and 25% at the end of the sequence
❑ effects of AD on brain structure are visible mostly
on the middle part of the head
4. Resizing to 160 × 192 with Lanczos interpolation
° train subj (img) val subj (img) test subj
(img)
AD 89 (~17k) 8 (2378) 16 (4011)
MCI 270 (~72k) 22 (5995) 25 (7276)
NC 151 (~39k) 13 (2194) 20 (6737)
DATA EXPLORATION
Data Exploration
❑ Descriptive analysis on subject metadata based on questionnaires
❑ 614 subjects
❑ Only kept attributes with <10% missing or inconsistent values
❑ Divided attributes in categories:
❑ Social
❑ Physical
❑ Biological
❑ Cognitive (target included) and psychological tests
❑ Highlighted relations between the diagnosis and relevant attributes (used
for diagnosing Alzheimer’s disease)
Social attributes
❑ 85% of subjects are retired
❑ Only 10% are in a nursing home/assisted living situation
Social attributes
❑ 85% of subjects are married
Physical attributes: gender and ethnicity
❑ 45% female – 55% male
❑ Almost 90% are white
Ethnicity Distribution
Physical attributes: weight and age
❑ Weight: avg = 74 Kg, mostly between 60 Kg and 80 Kg
❑ Over 80% of subjects over 70 years old
(Kg)
Body
(Years Old)
Genetic attributes: APOE gene versions
❑ Some mutations in APOE alleles are known genetic risk factors:
❑ APOE4 → increases risk (2-3 times if one allele is 4, up to 15 times if both alleles are 4)
❑ APOE2 → decreases risk by half
(Gene Version) (Gene Version)
Cognitive attributes: Diagnosis
❑ Diagnosis
❑ MCI : Mild Cognitive Impairment
❑ CN: Cognitively normal
❑ AD: Alzheimer’s Disease
❑ <20% subjects have Alzheimer’s Disease
Cognitive attributes: MMSE Scale
❑ MMSE Scale: Score between 0 (lowest) and 30
(highest) that rates the performances of the subject
in the following tasks:
• Orientation to time and place (knowing the
date, the day of the week, the month, the year,
the season, the place where the subject is and
the name of the interviewer)
• Memory (registration of three words)
• Attention and calculation (serial subtraction of
7s from 100)
• Language (naming of two objects, repetition of
a phrase, following a three-stage command,
reading and obeying a written command,
writing a sentence)
Cognitive attributes: GDS Scale
❑ Grade Depression Scale(GDS SCALE):
• It measures the depression of the subject from
0 (completely happy) to 15 (severely
depressed).
• It's composed of the following questions:
❑ 0-4: Normal
❑ 5-8: Mildly depressed
❑ 8-11: Moderately depressed
❑ 12-15: Severely depressed
Cognitive attributes: CDR
❑ CDR (Clinical Dementia Rating): scale from 0 (best)
to 3 (worst).
• Values are 0, 0.5, 1, 2, 3 (can go up to 5 in severe
cases)
❑ rates the severity of dementia in the following
categories:
• Memory
• Orientation
• Judgment and problem-solving
• Community affairs
• Home and hobbies
• Personal care
Target correlation: APOE
❑ Cognitively normal (CN) subjects have
very rarely both APOE4 mutations
❑ Subjects with Alzheimer’s Disease (AD)
have a much more even distribution
Target correlation: MMSE
❑ >95% of cognitively normal (CN) subjects
have a score above 25
❑ >70% of subjects with Alzheimer’s Disease
(AD) have scores lower than 25
Target correlation: CDR
❑ Cognitively Normal (CN) subjects:
❑ >80% have a score of 0
❑ Only two have a score o 1
❑ Subjects with mild cognitive impairment (MCI):
❑ >70% have a score of 0-0.5
❑ 5% have a score of 2
❑ Subjects with Alzheimer’s Disease (AD):
❑ 20% have a score of 2
❑ Only three have a score of 3
Target correlation: GDS
❑ >85% of Cognitively Normal (CN)
subjects have a score under 4 (not
depressed)
Biochemical attributes: p-Tau 181 plasma concentration
❑ Plasma phosphorylated-tau181 (p-tau181) is
a promising biomarker for Alzheimer's
disease (AD) [8]
❑ According to some studies [8] → Higher
plasma tau should be associated with
higher CSF tau and AD dementia
❑ plasma p-Tau 181 levels (pg/mL) are
available for 70% of subjects
❑ No particular differences are found between
the distributions of p-Tau 181 plasma
concentrations in AD, MCI and NC subjects
Biochemical attributes: plasma amyloid-β 1–42/1–40
❑ A reduced amyloid-β (Aβ) 42/40 peptide
concentration ratio in blood plasma [9]
❑ represents a peripheral biomarker of the
cerebral amyloid pathology observed in
Alzheimer’s disease brains.
❑ No particular differences are found between the
distributions of amyloid-β (Aβ) 42/40 ratio in MCI
and NC subjecs
❑ But plasma amyloid-β (Aβ) 42/40 ratio are
available for only 7% of subjects (only CN and MCI
subjects)
BRAIN MRI IMAGES CLASSIFICATION
AND THE CLASS IMBALANCE PROBLEM
Brain Image Classification
❑ GOAL: classify cognitive impairment level by using only brain MRI slices:
❑ CN: Cognitively Normal
❑ MCI: Mild Cognitive Impairment
❑ AD: Alzheimer’s Disease
❑ HOW: using ResNet18, a Convolutional Neural Network, on these images
❑ ATTENTION: the model has to generalize well on new patients → separating subjects
when performing train-validation-test split
Train Validation Test
n. of subjects
(n. of images)
240
(~47k)
21
(~4.5k)
36
(~11k)
3-Class Classification Issues
❑ PROBLEM: not enough interclass variability between
the 3 classes images
❑ CNNs immediately overfit on the train set and have
poor performances on the test set
❑ Tried different dropout rates, activation functions
and weight initializers
❑ Tried simpler CNNs
❑ Tried to solve class imbalance problem
❑ Removed MCI and switched to a binary classification
task:
❑ CN
❑ AD
CNN Architecture Summary
❑ 2-class classification task using a custom Resnet-18
CNN
❑ Hyperparameters that gave the best performances on
the validation set while limiting overfitting, following [1]
• Epochs: 50
• Batch size: 32
• Loss function: Binary cross entropy
• Optimizer: Adam
• Dropout rate: 0.25
MRI slice
192 x 160 x 1
Input Layer
96 x 80 x 64
48 x 40 x 64
24 x 20 x 128
192 x 160 x 1
Flatten 512
Conv2D (X filters, filter_size = 3x3, stride = Y)
Batch Normalization
LeakyReLU
Conv2D X, Y
=
Conv2D (64 kernels, kernel_size = 7x7, stride = 2)
Batch Normalization
LeakyReLU
MaxPooling (kernel_size = 3x3, stride = 2)
Conv2D 64, 1
Conv2D 64, 2
Conv2D 128, 2
Conv2D 128, 2
Conv2D 256, 2
Conv2D 256, 2
12 x 10 x 256
Conv2D 512, 2
Conv2D 512, 2
6 x 5 x 512
AvgPooling (kernel_size = 6x5) 1 x 1 x 512
FC (Softmax) 1
Custom
Layer
Params: 11,190,082
Trainable: 11,180,354
Non-trainable: 9,728
Class imbalance
❑ Main paper [1] idea:
❑ Only tried to achieve high accuracy, no distinction
between FP and FN
❑ Same (imbalanced) distribution of CN and AD
users in train, validation and test set
❑ Our take:
❑ FN errors (diagnosing as healthy a patient who’s
actually sick) are worse than FP
❑ Creation of a balanced train set on subjects (to
decrease FN) keeping test and validation sets
unbalanced (to keep a good guess of real
performances on new patients)
Train Validation Test
n. of
subjects
(n. of
images)
total 240
(~47k)
21
(~4.5k)
36
(~11k)
CN 151
(~40k)
13
(~2.4k)
20
(~6.7k)
AD 89
(~17k)
8
(~2.1k)
16
(~4k)
Random undersampling
❑ Undersampling the majority class (CN) by
randomly selecting 89 subjects from the CN
subjects to match the number of AD subjects
Train Validation Test
n. of
subjects
(n. of
images)
total 178
(~36.5k)
21
(~4.5k)
36
(~11k)
CN 89
(~19.5k)
13
(~2.4k)
20
(~6.7k)
AD 89
(~17k)
8
(~2.1k)
16
(~4k)
Stratified undersampling
❑ Random undersampling method on the train set did
not consider metadata: some attributes have an
impact on MRI scans images
❑ To have a more representative image dataset and
avoid bias we undersample the CN class subjects
while mantaining the original distribution of the
following attributes:
❑ Age (older subjects’ MRI scans tend to look like AD
brains even if they don’t have it)
❑ Weight (levels of fat influence MRI scans [6])
❑ Sex (MRI scans are slightly different between
genders)
❑ We divided Age and weight into categories to perform
the undersampling
Train Validation Test
n. of
subjects
(n. of
images)
total 178
(~36.5k)
21
(~4.5k)
36
(~11k)
CN 89
(~20k)
13
(~2.4k)
20
(~6.7k)
AD 89
(~17k)
8
(~2.1k)
16
(~4k)
OVERSAMPLING VIA
SYNTHETIC BRAIN MRI GENERATION
USING A WGAN-GP
Using a WGAN-GP to oversample the minority class via generation
❑ Objectives:
❑ Train a WGAN-GP to synthesize axial brain MRI for
data augmentation purposes
❑ high quality and realistic AD and CN images
❑ Oversampling → the downsides of Undersampling
❑ Data scarcity in medical imaging
❑ Fewer Subjects/Images → Reduced variability
→ overfitting and reduced accuracy
❑ Generative Models → the downsides of Traditional Data
Augmentation in medical imaging
❑ “heavy” geometric augmentations → damage the
image’s semantic content
❑ applied one image at a time → no information about
images distribution
CN
AD
WGAN
if w/ WGAN
if w/out WGAN
WGAN-GP: Wasserstein GAN with Gradient Penalty
❑ Trained two WGAN-GP to generate AD and NC images
❑ WGAN-GP → Extension of GAN Architecture
❑ The Generator → learn images distribution
❑ Input → 128-D noise vector randomly sampled
from a uniform distribution
❑ Transposed Conv Layers → increase input size
❑ Output → 160x192x1 image
❑ The Critic → CNN to distinguish fake/real images
❑ Input → 160x192x1 image (real or synthesised)
❑ Output → score the realness/fakeness of the
image
Noise Vector [0, 1)
128
Fake image
160 x 192 x 1
Real image
160 x 192 x 1
Batch of Real &
Fake Images
Wasserstein
DIstance with
grandient penalty
Loss
Function
Generator
Critic
Weights Update
Weights Update
Backpropagation
Minimize Error
WGAN-GP: Training
❑ Loss Function →
❑ Based on Wasserstein Distance → distance between
fake and real images distributions
❑ increased stability in GAN training
❑ proportional to image quality
❑ Generator and critic are trained alternately and
adversarially until equilibrium
❑ Generator → minimize Wasserstein Loss
❑ Critic → maximixe Wasserstein Loss
Choosing the best AC-GAN model weights
1. The first set of models selection was based on:
❑ ↑ visual quality qualitative evaluation of sample images generated during each epoch
❑ ↓ discriminator losses
2. Generated Images Quality Metrics
❑ ↓ Frechet Inception Distance (FID) → real/fake images distributions distance
❑ ↑ Inception Score (IS) → diversity and quality of generated images
WGAN (AD) WGAN (CN)
IS ↑ 2.51 (± 1.70) 2.71 (± 0.12)
FID ↓ 40.12 (± 0.02) 50.67 (± 8.13)
Real and synthetic MRI sample
Real Fake
Alzheimer’s
Disease (AD)
Cognitive
Normal (CN)
RESULTS
Results: Metrics
❑ Considered the following metrics:
❑ Accuracy =
𝑛. 𝑜𝑓 𝑠𝑢𝑏𝑗𝑒𝑐𝑡𝑠 𝑐𝑜𝑟𝑟𝑒𝑐𝑡𝑙𝑦 𝑐𝑙𝑎𝑠𝑠𝑖𝑓𝑖𝑒𝑑
𝑡𝑜𝑡𝑎𝑙 𝑠𝑢𝑏𝑗𝑒𝑐𝑡𝑠
❑ Recall on AD =
𝑛. 𝑜𝑓 𝐴𝐷 𝑠𝑢𝑏𝑗𝑒𝑐𝑡𝑠 𝑐𝑜𝑟𝑟𝑒𝑐𝑡𝑙𝑦 𝑐𝑙𝑎𝑠𝑠𝑖𝑓𝑖𝑒𝑑
𝑛. 𝑜𝑓 𝐴𝐷 𝑠𝑢𝑏𝑗𝑒𝑐𝑡𝑠
❑ Recall on CN =
𝑛. 𝑜𝑓 𝐶𝑁 𝑠𝑢𝑏𝑗𝑒𝑐𝑡𝑠 𝑐𝑜𝑟𝑟𝑒𝑐𝑡𝑙𝑦 𝑐𝑙𝑎𝑠𝑠𝑖𝑓𝑖𝑒𝑑
𝑛. 𝑜𝑓 𝐶𝑁 𝑠𝑢𝑏𝑗𝑒𝑐𝑡𝑠
Results: Unbalanced Train Set
Ground
Truth
Prediction
AD Recall CN Recall Accuracy
0.35 0.90 0.70
❑ The model adopted by [1], that only aimed for high
accuracy
❑ As expected, the imbalance in the training set
causes a bias: the model predicts very often CN
❑ High CN Recall
❑ Very low AD Recall
❑ n. of subjects (n. of images) per class/split
° train subj (img) val subj (img) test subj
(img)
AD 89 (~17k) 8 (2378) 16 (4011)
NC 151 (~39k) 13 (2194) 20 (6737)
Results: Balanced Train Set (random undersampling)
AD Recall CN Recall Accuracy
0.58 0.86 0.76
❑ Balancing the train set gave the expected result:
❑ Significant improvement in AD Recall
❑ Accuracy is also a little higher
❑ n. of subjects (n. of images) per class/split
Ground
Truth
Prediction
° train subj (img) val subj (img) test subj
(img)
AD 89 (~17k) 8 (2378) 16 (4011)
NC 89 (~19.5k) 13 (2194) 20 (6737)
Results: Balanced Train Set (stratified undersampling)
AD Recall CN Recall Accuracy
0.66 0.91 0.82
❑ The bias reduction obtained by the stratified
undersampling on the train is reflected by overally
higher performances
❑ n. of subjects (n. of images) per class/split
Ground
Truth
Prediction
° train subj (img) val subj (img) test subj
(img)
AD 89 (~17k) 8 (2378) 16 (4011)
NC 89 (~20k) 13 (2194) 20 (6737)
Results: Balanced Train Set (GAN oversampling)
AD Recall CN Recall Accuracy
0.72 0.92 0.85
❑ Balancing the minority class by
oversampling led to the best performance:
❑ Improvement in AD Recall
❑ Best accuracy value among the tested
methods
❑ n. of subjects (n. of images) per class/split
Ground
Truth
Prediction
° train subj (img) val subj (img) test subj
(img)
AD 89 (~17k + 22k
fake)
8 (2378) 16 (4011)
NC 151 (~39k) 13 (2194) 20 (6737)
Future Developments
1. Image Preprocessing:
• Some scans had excessively high slices, resulting in unhelpful images for classification.
• Proposed solution: Implement an algorithm to discard unhelpful slices.
2. 3D-MRI scan CNN Implementation:
• Explore the use of CNNs on entire scans, not just selected slices.
3. Integrating MRI Scan Slices and Biomarkers in classification:
• Combine MRI images with traditional biomarkers used for identifying Alzheimer's disease.
4. Attribute Stratification for Training set undersampling:
• Currently using Age, Weight, and Gender as attributes for stratifying the training dataset.
• Idea: Include additional attributes for stratification.
• Creation of an even more representative training set, enhancing performance.
❑ All these improvements should specifically aim to improve AD Recall (which we couldn’t get
higher than 0.72) to avoid classifying a patient as healthy when acutally affected by
Alzheimer’s
Non-rapresentative slice
/ Bibliography
1. Konidaris, F., Tagaris, T., Sdraka, M., & Stafylopatis, A. (2019). Generative Adversarial Networks as an Advanced Data Augmentation
Technique for MRI Data: Proceedings of the 14th International Joint Conference on Computer Vision, Imaging and Computer Graphics
Theory and Applications, 48–59. https://doi.org/10.5220/0007363900480059
2. Alzheimer’s Association. 2023 Alzheimer’s Disease Facts and Figures. Alzheimers Dement 2023;19(4). DOI 10.1002/alz.13016
3. van Oostveen, W. M., & de Lange, E. C. M. (2021). Imaging Techniques in Alzheimer’s Disease: A Review of Applications in Early Diagnosis
and Longitudinal Monitoring. International Journal of Molecular Sciences, 22(4), Article 4. https://doi.org/10.3390/ijms22042110
4. How Is Alzheimer’s Disease Diagnosed? National Institute on Aging. Retrieved June 7, 2023, from https://www.nia.nih.gov/health/how-
alzheimers-disease-diagnosed
5. Data used in preparation of this project were obtained from the Alzheimer’s Disease Neuroimaging Initiative (ADNI) database
(adni.loni.usc.edu).
6. Anderson Mon, Christoph Abé, Timothy C. Durazzo and Dieter J. Meyerhoff (2015). Potential effects of fat on magnetic resonance signal
intensity and derived brain tissue volumes. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4876040/
7. Arjovsky, M., Chintala, S., and Bottou, L. (2017). Wasserstein gan. arXiv preprint arXiv:1701.07875
8. McGrath ER, Beiser AS, O'Donnell A, Yang Q, Ghosh S, Gonzales MM, Himali JJ, Satizabal CL, Johnson KA, Tracy RP, Seshadri S. Blood
Phosphorylated Tau 181 as a Biomarker for Amyloid Burden on Brain PET in Cognitively Healthy Adults. J Alzheimers Dis. 2022;87(4):1517-
1526. doi: 10.3233/JAD-215639. PMID: 35491781.
9. Schindler SE, Bollinger JG, Ovod V, Mawuenyega KG, Li Y, Gordon BA, Holtzman DM, Morris JC, Benzinger TLS, Xiong C, Fagan AM, Bateman
RJ. High-precision plasma β-amyloid 42/40 predicts current and future brain amyloidosis. Neurology. 2019 Oct 22;93(17):e1647-e1659. doi:
10.1212/WNL.0000000000008081. Epub 2019 Aug 1. PMID: 31371569; PMCID: PMC6946467.

Identification Of Alzheimer's Disease Using A Deep Learning Method Based On T1-w Brain Mri Images

  • 1.
    IDENTIFICATION OF ALZHEIMER'S DISEASEUSING A DEEP LEARNING METHOD BASED ON T1-W BRAIN MRI IMAGES University of Milano-Bicocca Master's Degree in Data Science Big Data in Biotechnology & Biosciences Course Academic Year 2022-2023 Authors: Giorgio CARBONE matr. n° 811974 Emilio LINGENTHAL matr. n° 889111 GitHub Repository
  • 2.
    Alzheimer’s Disease ❑ Alzheimer’sDisease (AD): irreversible progressive neurodegenerative disease ❑ memory loss, cognitive impairment, behavioural changes and eventually death ❑ no prevention methods or treatment ❑ Brain Changes: ❑ Accumulation of the beta-amyloid peptide outside neurons and hyperphosphorylated tau protein aggregates inside neurons ❑ Death of neurons ❑ Inflammation and atrophy of brain tissue Regions of brain affected by Alzheimer’s disease. Source.
  • 3.
    Alzheimer’s Disease diagnosis ❑Review of symptoms, medical history, medication history and interviews with friends and family ❑ Cognitive Assessment/Test: memory, problem solving, attention, counting, and language ❑ Psychiatric evaluation for depression diagnosis ❑ Laboratory Tests: cerebrospinal fluid (CSF) and plasma collection and examination to measure the level of dementia-related biomarkers ❑ β-amyloid peptide and p-tau protein ❑ Brain scans: CT, PET and MRI, the latter used to verify: ❑ hippocampus volume reduction ❑ cortical thickness reduction in certain brain areas ❑ ventricles enlargement Healthy vs Alzheimer's brain disease in T1-W MRI images. Source.
  • 4.
    Project Objectives 1. tobuild a dataset of 2D brain axial images slicing 3D T1-weighted brain MRI images, ❑ Enriched with clinical (demographics, clinical assessments, and cognitive assessments), genetic and biospecimen subjects data ❑ Data exploration of the created dataset 2. to train a Deep Neural Network for the classification of 2D axial T1-W brain images of Cognitive Normal (CN) and Alzheimer’s Disease (AD) subjects 3. to evaluate the effect of different techniques for handling the class imbalance problem on classifier performance and bias ❑ Random Undersampling on largest class subjects ❑ Stratified Undersampling of subjects in the largest class based on metadata ❑ Training of a Generative Model (Wesserstein GAN) for the oversampling of the minority class
  • 5.
    THE ADNI DATABASE, DATAACQUISITION AND ENRICHMENT AND DATA PREPARATION
  • 6.
    Alzheimer's Disease NeuroimagingInitiative: ADNI ❑ Longitudinal multicenter study and open-access data repository ❑ 63 sites in the US and Canada, > 2000 participants ❑ Designed to develop biomarkers for the early detection and tracking of Alzheimer’s disease (AD): ❑ Clinical and Neuropsychological (Demographics, Cognitive assessments, clinical assessments) ❑ Imaging (MRI and PET) ❑ Genetic (APOE and TOMM40 Genotyping) ❑ Biochemical (CSF/plasma tau, β-amyloid) ❑ Eligible patients (age 55-90, speak English or Spanish) are divided through a screening process in: ❑ Alzheimer’s Disease (AD), Mild Cognitive Impairment (MCI), Cognitive Normal (CN)
  • 7.
    Data Acquisition &Data Enrichment ❑ Objective: to combine heterogeneous data using ❑ Bridge Table: subject's unique ID / MRI image ID ❑ Volumetric Axial MRI image acquisition for AD, MCI and CN patients ❑ T1-weighted ❑ MPR; GradWarp; B1 Correction; N3; Scaled ❑ volumetric .nii files CN MCI AD n. of subjects 179 317 118 Image ID Subject ID I63897 123_S_5678 ❑ Metadata Acquisition ❑ Demographics → Age, Ethnicity, Gender, Weight, Dominant Hand, Marital status, Education Level, employment status ❑ Neuropsychological data → MMSE (Mini Mental State Exam), Clinical Dementia Rating (CDR), Geriatric Depression Scale (GDS) ❑ Genetic data → APOE A1, APOE A2 genotyping ❑ Biochemical data → plasma tau and β-amyloid concentrations
  • 8.
    Image Preprocessing 1. Subjectdivision into training (83%), validation (7%) and test (10%) splits 2. Extraction of sequences of .png axial MRI images/slices from .nii volumetric data 3. Removal of 40 − 45% of the images at the beginning of the sequence and 25% at the end of the sequence ❑ effects of AD on brain structure are visible mostly on the middle part of the head 4. Resizing to 160 × 192 with Lanczos interpolation ° train subj (img) val subj (img) test subj (img) AD 89 (~17k) 8 (2378) 16 (4011) MCI 270 (~72k) 22 (5995) 25 (7276) NC 151 (~39k) 13 (2194) 20 (6737)
  • 9.
  • 10.
    Data Exploration ❑ Descriptiveanalysis on subject metadata based on questionnaires ❑ 614 subjects ❑ Only kept attributes with <10% missing or inconsistent values ❑ Divided attributes in categories: ❑ Social ❑ Physical ❑ Biological ❑ Cognitive (target included) and psychological tests ❑ Highlighted relations between the diagnosis and relevant attributes (used for diagnosing Alzheimer’s disease)
  • 11.
    Social attributes ❑ 85%of subjects are retired ❑ Only 10% are in a nursing home/assisted living situation
  • 12.
    Social attributes ❑ 85%of subjects are married
  • 13.
    Physical attributes: genderand ethnicity ❑ 45% female – 55% male ❑ Almost 90% are white Ethnicity Distribution
  • 14.
    Physical attributes: weightand age ❑ Weight: avg = 74 Kg, mostly between 60 Kg and 80 Kg ❑ Over 80% of subjects over 70 years old (Kg) Body (Years Old)
  • 15.
    Genetic attributes: APOEgene versions ❑ Some mutations in APOE alleles are known genetic risk factors: ❑ APOE4 → increases risk (2-3 times if one allele is 4, up to 15 times if both alleles are 4) ❑ APOE2 → decreases risk by half (Gene Version) (Gene Version)
  • 16.
    Cognitive attributes: Diagnosis ❑Diagnosis ❑ MCI : Mild Cognitive Impairment ❑ CN: Cognitively normal ❑ AD: Alzheimer’s Disease ❑ <20% subjects have Alzheimer’s Disease
  • 17.
    Cognitive attributes: MMSEScale ❑ MMSE Scale: Score between 0 (lowest) and 30 (highest) that rates the performances of the subject in the following tasks: • Orientation to time and place (knowing the date, the day of the week, the month, the year, the season, the place where the subject is and the name of the interviewer) • Memory (registration of three words) • Attention and calculation (serial subtraction of 7s from 100) • Language (naming of two objects, repetition of a phrase, following a three-stage command, reading and obeying a written command, writing a sentence)
  • 18.
    Cognitive attributes: GDSScale ❑ Grade Depression Scale(GDS SCALE): • It measures the depression of the subject from 0 (completely happy) to 15 (severely depressed). • It's composed of the following questions: ❑ 0-4: Normal ❑ 5-8: Mildly depressed ❑ 8-11: Moderately depressed ❑ 12-15: Severely depressed
  • 19.
    Cognitive attributes: CDR ❑CDR (Clinical Dementia Rating): scale from 0 (best) to 3 (worst). • Values are 0, 0.5, 1, 2, 3 (can go up to 5 in severe cases) ❑ rates the severity of dementia in the following categories: • Memory • Orientation • Judgment and problem-solving • Community affairs • Home and hobbies • Personal care
  • 20.
    Target correlation: APOE ❑Cognitively normal (CN) subjects have very rarely both APOE4 mutations ❑ Subjects with Alzheimer’s Disease (AD) have a much more even distribution
  • 21.
    Target correlation: MMSE ❑>95% of cognitively normal (CN) subjects have a score above 25 ❑ >70% of subjects with Alzheimer’s Disease (AD) have scores lower than 25
  • 22.
    Target correlation: CDR ❑Cognitively Normal (CN) subjects: ❑ >80% have a score of 0 ❑ Only two have a score o 1 ❑ Subjects with mild cognitive impairment (MCI): ❑ >70% have a score of 0-0.5 ❑ 5% have a score of 2 ❑ Subjects with Alzheimer’s Disease (AD): ❑ 20% have a score of 2 ❑ Only three have a score of 3
  • 23.
    Target correlation: GDS ❑>85% of Cognitively Normal (CN) subjects have a score under 4 (not depressed)
  • 24.
    Biochemical attributes: p-Tau181 plasma concentration ❑ Plasma phosphorylated-tau181 (p-tau181) is a promising biomarker for Alzheimer's disease (AD) [8] ❑ According to some studies [8] → Higher plasma tau should be associated with higher CSF tau and AD dementia ❑ plasma p-Tau 181 levels (pg/mL) are available for 70% of subjects ❑ No particular differences are found between the distributions of p-Tau 181 plasma concentrations in AD, MCI and NC subjects
  • 25.
    Biochemical attributes: plasmaamyloid-β 1–42/1–40 ❑ A reduced amyloid-β (Aβ) 42/40 peptide concentration ratio in blood plasma [9] ❑ represents a peripheral biomarker of the cerebral amyloid pathology observed in Alzheimer’s disease brains. ❑ No particular differences are found between the distributions of amyloid-β (Aβ) 42/40 ratio in MCI and NC subjecs ❑ But plasma amyloid-β (Aβ) 42/40 ratio are available for only 7% of subjects (only CN and MCI subjects)
  • 26.
    BRAIN MRI IMAGESCLASSIFICATION AND THE CLASS IMBALANCE PROBLEM
  • 27.
    Brain Image Classification ❑GOAL: classify cognitive impairment level by using only brain MRI slices: ❑ CN: Cognitively Normal ❑ MCI: Mild Cognitive Impairment ❑ AD: Alzheimer’s Disease ❑ HOW: using ResNet18, a Convolutional Neural Network, on these images ❑ ATTENTION: the model has to generalize well on new patients → separating subjects when performing train-validation-test split Train Validation Test n. of subjects (n. of images) 240 (~47k) 21 (~4.5k) 36 (~11k)
  • 28.
    3-Class Classification Issues ❑PROBLEM: not enough interclass variability between the 3 classes images ❑ CNNs immediately overfit on the train set and have poor performances on the test set ❑ Tried different dropout rates, activation functions and weight initializers ❑ Tried simpler CNNs ❑ Tried to solve class imbalance problem ❑ Removed MCI and switched to a binary classification task: ❑ CN ❑ AD
  • 29.
    CNN Architecture Summary ❑2-class classification task using a custom Resnet-18 CNN ❑ Hyperparameters that gave the best performances on the validation set while limiting overfitting, following [1] • Epochs: 50 • Batch size: 32 • Loss function: Binary cross entropy • Optimizer: Adam • Dropout rate: 0.25 MRI slice 192 x 160 x 1 Input Layer 96 x 80 x 64 48 x 40 x 64 24 x 20 x 128 192 x 160 x 1 Flatten 512 Conv2D (X filters, filter_size = 3x3, stride = Y) Batch Normalization LeakyReLU Conv2D X, Y = Conv2D (64 kernels, kernel_size = 7x7, stride = 2) Batch Normalization LeakyReLU MaxPooling (kernel_size = 3x3, stride = 2) Conv2D 64, 1 Conv2D 64, 2 Conv2D 128, 2 Conv2D 128, 2 Conv2D 256, 2 Conv2D 256, 2 12 x 10 x 256 Conv2D 512, 2 Conv2D 512, 2 6 x 5 x 512 AvgPooling (kernel_size = 6x5) 1 x 1 x 512 FC (Softmax) 1 Custom Layer Params: 11,190,082 Trainable: 11,180,354 Non-trainable: 9,728
  • 30.
    Class imbalance ❑ Mainpaper [1] idea: ❑ Only tried to achieve high accuracy, no distinction between FP and FN ❑ Same (imbalanced) distribution of CN and AD users in train, validation and test set ❑ Our take: ❑ FN errors (diagnosing as healthy a patient who’s actually sick) are worse than FP ❑ Creation of a balanced train set on subjects (to decrease FN) keeping test and validation sets unbalanced (to keep a good guess of real performances on new patients) Train Validation Test n. of subjects (n. of images) total 240 (~47k) 21 (~4.5k) 36 (~11k) CN 151 (~40k) 13 (~2.4k) 20 (~6.7k) AD 89 (~17k) 8 (~2.1k) 16 (~4k)
  • 31.
    Random undersampling ❑ Undersamplingthe majority class (CN) by randomly selecting 89 subjects from the CN subjects to match the number of AD subjects Train Validation Test n. of subjects (n. of images) total 178 (~36.5k) 21 (~4.5k) 36 (~11k) CN 89 (~19.5k) 13 (~2.4k) 20 (~6.7k) AD 89 (~17k) 8 (~2.1k) 16 (~4k)
  • 32.
    Stratified undersampling ❑ Randomundersampling method on the train set did not consider metadata: some attributes have an impact on MRI scans images ❑ To have a more representative image dataset and avoid bias we undersample the CN class subjects while mantaining the original distribution of the following attributes: ❑ Age (older subjects’ MRI scans tend to look like AD brains even if they don’t have it) ❑ Weight (levels of fat influence MRI scans [6]) ❑ Sex (MRI scans are slightly different between genders) ❑ We divided Age and weight into categories to perform the undersampling Train Validation Test n. of subjects (n. of images) total 178 (~36.5k) 21 (~4.5k) 36 (~11k) CN 89 (~20k) 13 (~2.4k) 20 (~6.7k) AD 89 (~17k) 8 (~2.1k) 16 (~4k)
  • 33.
    OVERSAMPLING VIA SYNTHETIC BRAINMRI GENERATION USING A WGAN-GP
  • 34.
    Using a WGAN-GPto oversample the minority class via generation ❑ Objectives: ❑ Train a WGAN-GP to synthesize axial brain MRI for data augmentation purposes ❑ high quality and realistic AD and CN images ❑ Oversampling → the downsides of Undersampling ❑ Data scarcity in medical imaging ❑ Fewer Subjects/Images → Reduced variability → overfitting and reduced accuracy ❑ Generative Models → the downsides of Traditional Data Augmentation in medical imaging ❑ “heavy” geometric augmentations → damage the image’s semantic content ❑ applied one image at a time → no information about images distribution CN AD WGAN if w/ WGAN if w/out WGAN
  • 35.
    WGAN-GP: Wasserstein GANwith Gradient Penalty ❑ Trained two WGAN-GP to generate AD and NC images ❑ WGAN-GP → Extension of GAN Architecture ❑ The Generator → learn images distribution ❑ Input → 128-D noise vector randomly sampled from a uniform distribution ❑ Transposed Conv Layers → increase input size ❑ Output → 160x192x1 image ❑ The Critic → CNN to distinguish fake/real images ❑ Input → 160x192x1 image (real or synthesised) ❑ Output → score the realness/fakeness of the image Noise Vector [0, 1) 128 Fake image 160 x 192 x 1 Real image 160 x 192 x 1 Batch of Real & Fake Images Wasserstein DIstance with grandient penalty Loss Function Generator Critic Weights Update Weights Update Backpropagation Minimize Error
  • 36.
    WGAN-GP: Training ❑ LossFunction → ❑ Based on Wasserstein Distance → distance between fake and real images distributions ❑ increased stability in GAN training ❑ proportional to image quality ❑ Generator and critic are trained alternately and adversarially until equilibrium ❑ Generator → minimize Wasserstein Loss ❑ Critic → maximixe Wasserstein Loss
  • 37.
    Choosing the bestAC-GAN model weights 1. The first set of models selection was based on: ❑ ↑ visual quality qualitative evaluation of sample images generated during each epoch ❑ ↓ discriminator losses 2. Generated Images Quality Metrics ❑ ↓ Frechet Inception Distance (FID) → real/fake images distributions distance ❑ ↑ Inception Score (IS) → diversity and quality of generated images WGAN (AD) WGAN (CN) IS ↑ 2.51 (± 1.70) 2.71 (± 0.12) FID ↓ 40.12 (± 0.02) 50.67 (± 8.13)
  • 38.
    Real and syntheticMRI sample Real Fake Alzheimer’s Disease (AD) Cognitive Normal (CN)
  • 39.
  • 40.
    Results: Metrics ❑ Consideredthe following metrics: ❑ Accuracy = 𝑛. 𝑜𝑓 𝑠𝑢𝑏𝑗𝑒𝑐𝑡𝑠 𝑐𝑜𝑟𝑟𝑒𝑐𝑡𝑙𝑦 𝑐𝑙𝑎𝑠𝑠𝑖𝑓𝑖𝑒𝑑 𝑡𝑜𝑡𝑎𝑙 𝑠𝑢𝑏𝑗𝑒𝑐𝑡𝑠 ❑ Recall on AD = 𝑛. 𝑜𝑓 𝐴𝐷 𝑠𝑢𝑏𝑗𝑒𝑐𝑡𝑠 𝑐𝑜𝑟𝑟𝑒𝑐𝑡𝑙𝑦 𝑐𝑙𝑎𝑠𝑠𝑖𝑓𝑖𝑒𝑑 𝑛. 𝑜𝑓 𝐴𝐷 𝑠𝑢𝑏𝑗𝑒𝑐𝑡𝑠 ❑ Recall on CN = 𝑛. 𝑜𝑓 𝐶𝑁 𝑠𝑢𝑏𝑗𝑒𝑐𝑡𝑠 𝑐𝑜𝑟𝑟𝑒𝑐𝑡𝑙𝑦 𝑐𝑙𝑎𝑠𝑠𝑖𝑓𝑖𝑒𝑑 𝑛. 𝑜𝑓 𝐶𝑁 𝑠𝑢𝑏𝑗𝑒𝑐𝑡𝑠
  • 41.
    Results: Unbalanced TrainSet Ground Truth Prediction AD Recall CN Recall Accuracy 0.35 0.90 0.70 ❑ The model adopted by [1], that only aimed for high accuracy ❑ As expected, the imbalance in the training set causes a bias: the model predicts very often CN ❑ High CN Recall ❑ Very low AD Recall ❑ n. of subjects (n. of images) per class/split ° train subj (img) val subj (img) test subj (img) AD 89 (~17k) 8 (2378) 16 (4011) NC 151 (~39k) 13 (2194) 20 (6737)
  • 42.
    Results: Balanced TrainSet (random undersampling) AD Recall CN Recall Accuracy 0.58 0.86 0.76 ❑ Balancing the train set gave the expected result: ❑ Significant improvement in AD Recall ❑ Accuracy is also a little higher ❑ n. of subjects (n. of images) per class/split Ground Truth Prediction ° train subj (img) val subj (img) test subj (img) AD 89 (~17k) 8 (2378) 16 (4011) NC 89 (~19.5k) 13 (2194) 20 (6737)
  • 43.
    Results: Balanced TrainSet (stratified undersampling) AD Recall CN Recall Accuracy 0.66 0.91 0.82 ❑ The bias reduction obtained by the stratified undersampling on the train is reflected by overally higher performances ❑ n. of subjects (n. of images) per class/split Ground Truth Prediction ° train subj (img) val subj (img) test subj (img) AD 89 (~17k) 8 (2378) 16 (4011) NC 89 (~20k) 13 (2194) 20 (6737)
  • 44.
    Results: Balanced TrainSet (GAN oversampling) AD Recall CN Recall Accuracy 0.72 0.92 0.85 ❑ Balancing the minority class by oversampling led to the best performance: ❑ Improvement in AD Recall ❑ Best accuracy value among the tested methods ❑ n. of subjects (n. of images) per class/split Ground Truth Prediction ° train subj (img) val subj (img) test subj (img) AD 89 (~17k + 22k fake) 8 (2378) 16 (4011) NC 151 (~39k) 13 (2194) 20 (6737)
  • 45.
    Future Developments 1. ImagePreprocessing: • Some scans had excessively high slices, resulting in unhelpful images for classification. • Proposed solution: Implement an algorithm to discard unhelpful slices. 2. 3D-MRI scan CNN Implementation: • Explore the use of CNNs on entire scans, not just selected slices. 3. Integrating MRI Scan Slices and Biomarkers in classification: • Combine MRI images with traditional biomarkers used for identifying Alzheimer's disease. 4. Attribute Stratification for Training set undersampling: • Currently using Age, Weight, and Gender as attributes for stratifying the training dataset. • Idea: Include additional attributes for stratification. • Creation of an even more representative training set, enhancing performance. ❑ All these improvements should specifically aim to improve AD Recall (which we couldn’t get higher than 0.72) to avoid classifying a patient as healthy when acutally affected by Alzheimer’s Non-rapresentative slice
  • 46.
    / Bibliography 1. Konidaris,F., Tagaris, T., Sdraka, M., & Stafylopatis, A. (2019). Generative Adversarial Networks as an Advanced Data Augmentation Technique for MRI Data: Proceedings of the 14th International Joint Conference on Computer Vision, Imaging and Computer Graphics Theory and Applications, 48–59. https://doi.org/10.5220/0007363900480059 2. Alzheimer’s Association. 2023 Alzheimer’s Disease Facts and Figures. Alzheimers Dement 2023;19(4). DOI 10.1002/alz.13016 3. van Oostveen, W. M., & de Lange, E. C. M. (2021). Imaging Techniques in Alzheimer’s Disease: A Review of Applications in Early Diagnosis and Longitudinal Monitoring. International Journal of Molecular Sciences, 22(4), Article 4. https://doi.org/10.3390/ijms22042110 4. How Is Alzheimer’s Disease Diagnosed? National Institute on Aging. Retrieved June 7, 2023, from https://www.nia.nih.gov/health/how- alzheimers-disease-diagnosed 5. Data used in preparation of this project were obtained from the Alzheimer’s Disease Neuroimaging Initiative (ADNI) database (adni.loni.usc.edu). 6. Anderson Mon, Christoph Abé, Timothy C. Durazzo and Dieter J. Meyerhoff (2015). Potential effects of fat on magnetic resonance signal intensity and derived brain tissue volumes. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4876040/ 7. Arjovsky, M., Chintala, S., and Bottou, L. (2017). Wasserstein gan. arXiv preprint arXiv:1701.07875 8. McGrath ER, Beiser AS, O'Donnell A, Yang Q, Ghosh S, Gonzales MM, Himali JJ, Satizabal CL, Johnson KA, Tracy RP, Seshadri S. Blood Phosphorylated Tau 181 as a Biomarker for Amyloid Burden on Brain PET in Cognitively Healthy Adults. J Alzheimers Dis. 2022;87(4):1517- 1526. doi: 10.3233/JAD-215639. PMID: 35491781. 9. Schindler SE, Bollinger JG, Ovod V, Mawuenyega KG, Li Y, Gordon BA, Holtzman DM, Morris JC, Benzinger TLS, Xiong C, Fagan AM, Bateman RJ. High-precision plasma β-amyloid 42/40 predicts current and future brain amyloidosis. Neurology. 2019 Oct 22;93(17):e1647-e1659. doi: 10.1212/WNL.0000000000008081. Epub 2019 Aug 1. PMID: 31371569; PMCID: PMC6946467.