Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Machine Learning models for classification and prediction on osteoporotic spinal fractures
1. M A C H I N E L E A R N I N G M O D E L S
F O R C L A S S I F I C A T I O N A N D P R E D I C T I O N
O N O S T E O P O R O T I C S P I N A L F R A C T U R E S
Erennio Iannotta– UP919761
M S c I n f or m a t i on S yst e m s
2. T H E
J O U R N E Y
P L A N
0 1
P R O B L E M
D O M A I N
A brief introduction to the
problem of Osteoporosis, then
to Machine Learning, its
purposes and techniques, its
evaluation methods, focusing
on the applied project
techniques
0 2
0 3
0 4
P R O J E C T
W O R K
Project development tools and
work flow presentation, going
in the details of the main steps
Q U E S T I O N
& A N S W E R S
Any further questions?
Just ask :)
C O N C L U S I O N S
A N D F U T U R E
D E V E L O P M E N T S
A final evaluation with personal
conclusions regarding the
whole project work, with hints
for future developments
Erennio Iannotta – UP919761 MSc Information Systems – 2018/2019
3. Erennio Iannotta – UP919761 MSc Information Systems – 2018/2019
T H E
J O U R N E Y
P L A N
0 1
P R O B L E M
D O M A I N
A brief introduction to the
problem of Osteoporosis, then
to Machine Learning, its
purposes and techniques, its
evaluation methods, focusing
on the applied project
techniques
4. P R O B L E M D O M A I N
Erennio Iannotta – UP919761 MSc Information Systems – 2018/2019
O S T E O P O R O S I S
Osteoporosis is a progressive
condition that is characterized
by a reduction of Bone Mineral
Density (BMD) leading to
greater bones' fragility.
Healthy Bone Osteoporotic Bone
Bone Density comparison
5. P R O B L E M D O M A I N
Erennio Iannotta – UP919761 MSc Information Systems – 2018/2019
O S T E O P O R O S I S
Consequences:
• Pain
• Difficultly walking
• Paralysis
• Death
Spinal Fractures
6. M A C H I N E L E A R N I N G
Erennio Iannotta – UP919761 MSc Information Systems – 2018/2019
W H A T I S I T ?
7. M A C H I N E L E A R N I N G
Erennio Iannotta – UP919761 MSc Information Systems – 2018/2019
N E U R A L N E T W O R K S
A neural network is a type of machine
learning which models itself after the
human brain.
This creates an artificial neural network
that via an algorithm allows the computer
to learn by incorporating new data.
Made of:
• Nodes
Two ways of learning:
• Supervised
• Unsupervised
Evaluation criteria:
• ROC-AUC Curve
• Confusion Matrix
8. M A C H I N E L E A R N I N G
Erennio Iannotta – UP919761 MSc Information Systems – 2018/2019
R O C – A U C C U R V E
True Positive Rate (TPR) vs False Positive Rate (FPR)
Perfect separability
True Positive Rate:
!"
!" + $%
False Positive Rate:
$"
!% + $"
9. M A C H I N E L E A R N I N G
Erennio Iannotta – UP919761 MSc Information Systems – 2018/2019
R O C – A U C C U R V E
True Positive Rate (TPR) vs False Positive Rate (FPR)
Good separability
True Positive Rate:
!"
!" + $%
False Positive Rate:
$"
!% + $"
10. M A C H I N E L E A R N I N G
Erennio Iannotta – UP919761 MSc Information Systems – 2018/2019
R O C – A U C C U R V E
True Positive Rate:
!"
!" + $%
True Positive Rate (TPR) vs False Positive Rate (FPR)
No Separability
False Positive Rate:
$"
!% + $"
11. M A C H I N E L E A R N I N G
Erennio Iannotta – UP919761 MSc Information Systems – 2018/2019
T H E C O N F U S I O N - M A T R I X
True Positive False Negative
False Positive True Negative
Confusion
Matrix
PredictedValues Actual Values
Positive (1) Negative (0)
Positive (1)
Negative (0)
12. F A L S E P O S I T I V E
M A C H I N E L E A R N I N G
Erennio Iannotta – UP919761 MSc Information Systems – 2018/2019
M O D E L S ’ C O N F U S I O N - M A T R I X E V A L U A T I O N
You are
pregnant!
F A L S E N E G A T I V E
You are
not
pregnant!
Whait…
What?!
13. 0 2
P R O J E C T
W O R K
Erennio Iannotta – UP919761 MSc Information Systems – 2018/2019
T H E
J O U R N E Y
P L A N
Project development tools and
work flow presentation, going
in the details of the main steps
14. P R O J E C T W O R K F L O W
D A T A P R E P R O C E S S I N G"
Understand the data in order to
improve its quality, to give a better
knowledge base to the Machine
Learning algorithms.
M O D E L I N G"
The application of Machine Learning
algorithms to learn and predict new
informations, based on the previously
prepared data
" F I N A L E V A L U A T I O N
Cost-Analysis based evaluation to find
the best Analyzed methodology
Erennio Iannotta – UP919761 MSc Information Systems – 2018/2019
15. R +
R - S T U D I O
Used for:
• Explorative Data Analysis during
the Data Understanding step
Used for:
• Data preparation
• Modeling and local Evaluation
• Final Evaluation
D E V E L O P M E N T T O O L S
Erennio Iannotta – UP919761 MSc Information Systems – 2018/2019
P Y T H O N +
J U P Y T E R N O T E B O O K
16. D A T A
U N D E R S T A N D I N G
This is a phase of information extraction, meant to
find the best insight abouth the composition of the
data, to manipulate it in the next steps.
In this phase the data will be prepared, following the
insight given by the understanding phase, to be the best
knowledge base as possible
D A T A P R E P R O C E S S I N G
Erennio Iannotta – UP919761 MSc Information Systems – 2018/2019
D A T A
P R E P A R A T I O N
17. D A T A D E S C R I P T I O N
D A T A U N D E R S T A N D I N G
Erennio Iannotta – UP919761 MSc Information Systems – 2018/2019
Data source:
• UK Biobank (http://www.ukbiobank.ac.uk/)
Data Composition:
• Shrinked from 680 to 29 variables
Data Acquisition:
• Supervised
• Answering survey
• Analysis of biological samples (blood, saliva)
18. D A T A D E S C R I P T I O N
D A T A U N D E R S T A N D I N G
Erennio Iannotta – UP919761 MSc Information Systems – 2018/2019
All the variables:
• 29
• Eid
• Sex (gender)
• Age
• Ethnic
• Weight
• Height
• BMI
• Waist
• BMI_Category
• Waist_Category
• VBX
• HIPX
• Menopause
• HRT
• Smoking
• ReumathoidArthrits
• SecondaryOsteoporosis
• Alcohol
• Alcohol24
• VitaminD
• Calcium
• Dose_Walk
• Dose_moderate
• Dose_vigorous
• Dose_pleasure
• Dose_sport
• Dose_exercise
• Dose_lightDIY
• Dose_heavyDIY
19. D A T A D E S C R I P T I O N
D A T A U N D E R S T A N D I N G
Erennio Iannotta – UP919761 MSc Information Systems – 2018/2019
All the variables:
• 29
Main variables kept after the analysis:
• 17
• Sex (gender)
• Age
• Weight
• Height
• VBX
• HIPX
• Menopause
• HRT
• Smoking
• ReumathoidArthrits
• SecondaryOsteoporosis
• Alcohol
• VitaminD
• Calcium
• Dose_Walk
• Dose_moderate
• Dose_vigorous
20. D A T A D E S C R I P T I O N
D A T A U N D E R S T A N D I N G
Erennio Iannotta – UP919761 MSc Information Systems – 2018/2019
All the variables:
• 29
Main variables kept after the analysis:
• 17
• Sex (gender)
• Age
• Weight
• Height
• Class
• HIPX
• Menopause
• HRT
• Smoking
• ReumathoidArthrits
• SecondaryOsteoporosis
• Alcohol
• VitaminD
• Calcium
• Dose_Walk
• Dose_moderate
• Dose_vigorous
21. M I S S I N G D A T A A N A L Y S I S
D A T A U N D E R S T A N D I N G
Erennio Iannotta – UP919761 MSc Information Systems – 2018/2019
Number of patients without missing values:
• 74.708
Number of patients without a spinal fracture:
• 74.554
Number of patients affected by spinal fracture:
• 154
22. D I S T R I B U T I O N O F M I S S I N G D A T A P E R F E A T U R E
D A T A U N D E R S T A N D I N G
Erennio Iannotta – UP919761 MSc Information Systems – 2018/2019
23. D I S T R I B U T I O N O F M I S S I N G D A T A P E R F E A T U R E
D A T A U N D E R S T A N D I N G
Erennio Iannotta – UP919761 MSc Information Systems – 2018/2019
24. D I S T R I B U T I O N O F M I S S I N G D A T A P E R F E A T U R E
D A T A U N D E R S T A N D I N G
Erennio Iannotta – UP919761 MSc Information Systems – 2018/2019
25. M I S S I N G D A T A A N A L Y S I S
D A T A U N D E R S T A N D I N G
Erennio Iannotta – UP919761 MSc Information Systems – 2018/2019
Number of patients without missing values:
• 153.884
Number of patients without a spinal fracture:
• 153.606
Number of patients affected by spinal fracture:
• 278
Without Fracture Affected by Fracture
26. S E P A R A B I L I T Y A N A L Y S I S T H R O U G H T - S N E
D A T A U N D E R S T A N D I N G
Erennio Iannotta – UP919761 MSc Information Systems – 2018/2019
t-SNE, that stands for t-distributed Stocastic Neighbour Embedding, is
an high dimensionality embedding approach specific for visualization
of high-dimensional datasets in a low-dimensional space (usually
composed of two or three dimension) through nonlinear
dimensionality reduction
27. S E P A R A B I L I T Y A N A L Y S I S T H R O U G H T - S N E
D A T A U N D E R S T A N D I N G
Erennio Iannotta – UP919761 MSc Information Systems – 2018/2019
• All the Patients together
• Male Patients
• Female Patients
• Female Patients affected by menopause
• Female Patients Not affected by menopause
28. S E P A R A B I L I T Y A N A L Y S I S T H R O U G H T - S N E
D A T A U N D E R S T A N D I N G
Erennio Iannotta – UP919761 MSc Information Systems – 2018/2019
All the Observations
Sampled Observations
A L L P A T I E N T S
29. S E P A R A B I L I T Y A N A L Y S I S T H R O U G H T - S N E
D A T A U N D E R S T A N D I N G
Erennio Iannotta – UP919761 MSc Information Systems – 2018/2019
All the Observations
Sampled Observations
M A L E P A T I E N T S
30. S E P A R A B I L I T Y A N A L Y S I S T H R O U G H T - S N E
D A T A U N D E R S T A N D I N G
Erennio Iannotta – UP919761 MSc Information Systems – 2018/2019
All the Observations
Sampled Observations
A L L F E M A L E P A T I E N T S
31. S E P A R A B I L I T Y A N A L Y S I S T H R O U G H T - S N E
D A T A U N D E R S T A N D I N G
Erennio Iannotta – UP919761 MSc Information Systems – 2018/2019
All the Observations
Sampled Observations
F E M A L E P A T I E N T S A F F E C T E D B Y M E N O P A U S E
32. S E P A R A B I L I T Y A N A L Y S I S T H R O U G H T - S N E
D A T A U N D E R S T A N D I N G
Erennio Iannotta – UP919761 MSc Information Systems – 2018/2019
All the Observations
Sampled Observations
F E M A L E P A T I E N T S N O T A F F E C T E D B Y M E N O P A U S E
33. S E P A R A B I L I T Y A N A L Y S I S T H R O U G H T - S N E
D A T A U N D E R S T A N D I N G
Erennio Iannotta – UP919761 MSc Information Systems – 2018/2019
Female Patients in menopause condition + All other patients
R E S U L T S
34. 1
D A T A
S T A N D A R D I Z A T I O N
2
3
D A T A
S P L I T T I N G
The dataset has to be split into 3 different group of data
because of previous t-SNE considerations:
1. Complete data
2. Female Patients in Menopause Condition / Other Patients
3. Male Patients /Female Patients in Menopause / Female
patients Not in Menopause
Then, for each subset, the data has to be split into the Training,
Validation and Test sets.
D A T A
U N D E R S A M P L I N G
The subsets have to be undersampled to reduce the imbalance
between the two classes of patients.
The used technique is the Random undersampler.
The phase in which the dataset become a good knowledge base
D A T A P R E P A R A T I O N
Erennio Iannotta – UP919761 MSc Information Systems – 2018/2019
35. N E U R A L N E T W O R K
T R A I N I N G
M O D E L I N G
Erennio Iannotta – UP919761 MSc Information Systems – 2018/2019
N E U R A L N E T W O R K
L O C A L E V A L U A T I O N
36. Erennio Iannotta – UP919761 MSc Information Systems – 2018/2019
MODELING
T R A I N I N G A L G O R I T H M
For each group previously defined:
1. Two basic training using the original dataset, undersampled
with a ratio of 0.5 and 1.0 between the two classes (for
instance, 100 fractured patients and 200 not fracuted ones
with a ratio of 0.5, 100 fractured patients and 100 not
fractured ones with a ratio of 1.0).
2. Then there will be the search of the best Neural Network
between one and two layers, composed of a number of
neurons that goes from 1 to the double of the input size,
compared using the AUC score.
3. Once that the best Neural Network has been found, all the
False Negative extracted from this best model, will be
classified and ranked through formula obtained by
Professor Lee in one of his publications.
4. Then there will be 5 new datasets for each of the starting
bases: 10,20,30,40,50 percentage of the ranked patients
will be added to the datasets and there will be the search
for a new Neural Network for each of the new datasets, with
ratio of 0.5 and 1.
37. Erennio Iannotta – UP919761 MSc Information Systems – 2018/2019
MODELING
T R A I N I N G A L G O R I T H M S T E P S
1. Base Neural Networks training.
2. Search fot the best Neural Networks among all the
possible ones.
3. False Negative evaluation.
4. Dataset enhancing.
38. Erennio Iannotta – UP919761 MSc Information Systems – 2018/2019
MODELING
L O C A L E V A L U A T I O N
39. Erennio Iannotta – UP919761 MSc Information Systems – 2018/2019
MODELING
L O C A L E V A L U A T I O N
Best Neural Network specific:
• Trained with a base ratio of 0.5
• Enchanced with 20% of the False Negative patients
• Trained with a local ratio of 0.5
Once found the best Neural Network, we have to use the
Test set to get the model ready for the final evaluation.
40. A C O S T - M A T R I X A N A L Y S I S
F I N A L E V A L U A T I O N
Erennio Iannotta – UP919761 MSc Information Systems – 2018/2019
41. Erennio Iannotta – UP919761 MSc Information Systems – 2018/2019
FINAL EVALUATION
C O S T M A T R I X A N A L Y S I S
True
Positive
False
Negative
False
Positive
True
Negative
Confusion
Matrix
£ 0.00 £ 47.00
£ 453.00 £ 0.00
Cost
Matrix
• Data source for costs: Southampton Hospital
42. Erennio Iannotta – UP919761 MSc Information Systems – 2018/2019
FINAL EVALUATION
C O S T M A T R I X A N A L Y S I S
• Total Costs: £ 202,277.00
Group 1: All the Patients
• Total Costs: £ 226,965.00
Group 2: t-SNE Division
• Total Costs: £ 295,974.00
Group 3: Complete Division
43. 0 3
C O N C L U S I O N S
A N D F U T U R E
D E V E L O P M E N T S
Erennio Iannotta – UP919761 MSc Information Systems – 2018/2019
T H E
J O U R N E Y
P L A N
A final evaluation with personal
conclusions regarding the
whole project work, with hints
for future developments
44. C O N C L U S I O N S
A N D F U T U R E D E V E L O P M E N T S
Erennio Iannotta – UP919761 MSc Information Systems – 2018/2019
T H E B E S T M O D E L
• 2 hidden layers
• 16 neurons on input layer
• 32 neurons on first hidden layer
• 29 neurons on second hidden layer
• 1 neuron on output layer
45. C O N C L U S I O N S
A N D F U T U R E D E V E L O P M E N T S
Erennio Iannotta – UP919761 MSc Information Systems – 2018/2019
T H E B E S T M O D E L
• 13 % of the healthy patients are going to have a
check for security
• 1 ill patient out of 3 needs a double check to
find his fractured status
46. C O N C L U S I O N S
A N D F U T U R E D E V E L O P M E N T S
Erennio Iannotta – UP919761 MSc Information Systems – 2018/2019
• T-SNE gave us better results for single Networks, but
worst on overall cost’s analysis.
• Even with a misclassification of 1 out of 3, Neural
Networks are a good tool to deal with this kind of
issues.
I N C O N C L U S I O N …
47. C O N C L U S I O N S
A N D F U T U R E D E V E L O P M E N T S
Erennio Iannotta – UP919761 MSc Information Systems – 2018/2019
• Deep Learning for Deep Neural Networks could be
applied on this same way of working to find differences
with the Machine Learning ones, and compare the two
kinds of Neural Networks, finding the best approach to
this problem.
F U T U R E D E V E L O P M E N T S
48. 0 4
Q U E S T I O N
& A N S W E R S
Erennio Iannotta – UP919761 MSc Information Systems – 2018/2019
T H E
J O U R N E Y
P L A N
Any further question?
Just ask :)
49. A N Y Q U E S T I O N S ?
Erennio Iannotta – UP919761 MSc Information Systems – 2018/2019
50. T H A N K Y O U
F O R T H E A T T E N T I O N
Erennio Iannotta – UP919761 MSc Information Systems – 2018/2019
Project data available at:
https://github.com/TimeParadox89/MSc-Thesis
Slides available at:
https://www.slideshare.net/ErennioIannotta