Machine Learning
Methods Used in
Disease Prediction
Mustafa Salih Oğuz
Outline
● Preventable Diseases & Deaths
● Disease Prediction Studies
● Future of Disease Prediction
Preventable
Diseases &
Deaths
Preventable
Diseases
Vaccine-Preventable Diseases
Any disease that
we can take
precaution is
preventable.
75%
of healthcare costs are spent on
preventable diseases
*in US
Preventable
Deaths
World Health Organization
Cancer Survival Rate
Preventive Interventions to Children
Disease
Prediction
Technology
Predicting
Septic Shock
Paper: A targeted real-time early
warning score for septic shock
Data Collection
ML Algorithms and Techniques
Regression Cox proportional hazards model
54 features
Results
Predicting
Hospitalization
Paper: Prediction of hospitalization
due to heart diseases by supervised
learning methods
Data Collection
Total of 200 features
ML Algorithms and Techniques
AdaBoostSVM Logistic Regression Naive Bayes
Results
82%
Success Rate
AdaBoost with trees
Easier to Explain
K-Likelihood Ratio Test
(K-LRT)
Predicting
Alzheimer’s
Disease
Paper: Identification of clusters of
rapid and slow decliners among
subjects at risk for Alzheimer’s
disease
Data Collection
Alzheimer’s Disease Neuroimaging Initiative Studies
ML Algorithms and Techniques
Multilayer Clustering
1. Create Similarity Table
2. Find Optimal Solution
Results
1 Real-Time
2 Non-Real-Time
2 Supervised
1 Unsupervised
0 Reinforcement
Future of Disease
Prediction
Widespread
Wearable
Devices
Constant flow of data will make it
easier to predict anything.
Wearable Device Adaptation
Sci-Fi Part (or Reality of the Future)
Complete
Digital Twins
Learn what is going wrong and what
may go wrong. Take precautions for
dangers that awaits and make good
choices for yourself.
iCarbonX
Babylon Health
Advanced ML
Algorithms
Experimenting on digital twins.
Discovering new causations and
treatments. Deep Reinforcement
Learning
Deepmind’s AlphaGo Zero
Deep Reinforcement Learning
Some
Major
Challenges
Data & Human Body
It is possible to
save millions of
lives through
disease prediction
using machine
learning.
*if you can find data
Thank you for
your attention.

Machine Learning for Disease Prediction

Editor's Notes

  • #2 topic definition - why is it interesting and/or important?
  • #3 Firstly I am going to inform you about preventable diseases and deaths. Why they are preventable and how can we may prevent them.
  • #5 I want to start with the diseases that we can prevent with vaccines. Which are called vaccine-preventable diseases, interestingly enough. This chart shows how vaccination affects the population. (Explain the cahart).
  • #6 These are 25 diseases that are preventable with vaccines. Some of these caused great deal of pain in previous centuries but now we don’t see them much around thanks to vaccination. Also I know vaccination is a political debate in United States but hopefully not in Turkey.
  • #7 Apart from vaccine-preventable diseases, actually any disease that we can take precaution is preventable. Only we need to know we are under risk of that disease. Sometimes this precaution can be as simple as doing sports or having a more healthy diet.
  • #8 It is not just precious human lives we are losing it is also money going out of taxpayers pockets. 75% of US healthcare costs are spent on preventable diseases which makes a huge amount of billions of dollars by the way.
  • #9 Now this is a more heavy topic, preventable deaths. Can we actually prevent death, isn’t it actually predestined, can we change the destiny? I will leave the philosophical debate to you and I will explain what I mean by that. Preventing the death of someone indefinitely is impossible. However, by trying to get an early signal in some of the most frequently seen cases we can actually extend the life expectancy of people. Which means we can prevent early deaths and help average human live more. In the chart you can see top 10 death causes from world health organization. First one is ischaemic heart disease, which is about veins getting blocked. If we can monitor veins of the patient frequently, we can understand the changes and act accordingly, it doesn’t happen overnight. However Strokes happen suddenly right? Actually strokes also happen because of a series of things and we can follow them. Even though it is sudden like a shock, it is possible to predict it which we will see in a minute.
  • #10 When I say preventing death I don’t mean no one is going to die from cancer at all. Probably gonna happen in the future but what I say is early diagnosis. I we can understand there is a cancer getting worse, we can predict someone is going to die in a certain time.
  • #11 If we know what is coming we can prevent it with different interventions. These are some preventive interventions that are done to reduce number of death children. I turns out breastfeedşng saves children’s lives (not surprising), but taking zinc or vitamin A is interesting, we can give these to the child if we can predict what is coming before it comes which is death.
  • #12 Most widely used techniques/solutions employed in that field is machine learning. Ok, now let’s take a look at what people have done in this field. Of course this is a new field so, you won’t see most of the things that I talked about but there are very promising works out there.
  • #13 In the first study, scientist successfully predicted septic shocks. Equipment used nowadays can detect severe septic shocks when it happens but none can predict it is going to happen. Septic shock happens when sepsis gets worse which is organ damage caused by infection. They are kept in intensive care units and they are carefully watched. So, this guys used data from intensive care units to produce a score to predict septic shocks hours earlier.
  • #14 They gathered the data from this publicly available database. There have electronic health records of 16.000 patients in intensive care unit. They splitted the data set as development and validation sets. Development set used to calculate the score and validation set to evaluate the performance. It is actually train and test sets. 13.000 samples for training and 3.000 for test set.
  • #15 They used a supervised learning algorithm more specifically regression. However, they don’t particularly talk about the name of the algorithm. They also used Cox proportional hazards model for calculating the risk for each time t with 54 features. This model is used in medicine to calculate survival probability of someone with given certain data.
  • #17 Next study actually predicts if a person is going to be hospitalized or not according to their past health records, so that they can prevent them from being hospitalized. In this project one of the goals is to actually lowering spendings of the state hospitals.
  • #19 They experimented with these algorithms to find which one is the best fit. They had two criteria, one is of course which one is more accurate and the other one is which one would give result that are easier to explain to doctors.
  • #20 Even though there are little differences, all algorithms gave very similar results in terms of accuracy. They think this is the limit of prediction with the available data. So, we can easily say that data is far more important than algorithms that are used.
  • #21 Alzheimer’s disease is so diverse in nature that diagnosing is too hard. But in this paper, they managed to cluster patients into two categories and they were able to predict Alzheimer’s disease 4-5 years before it happens so that people could use medicine and a new lifestyle to delay it further. Because
  • #22 They took data from two initiative studies about the disease. Data include 5-year of outcomes and biomarker data from 550 subjects with mild cognitive impairment (MCI)
  • #23 In this project they used unsupervised learning. However different part is they used an algorithm called multilayer clustering. Clustering algorithms are rarely used in this kind of work. Because every algorithm and every parameter change can give very different clusters. In this project they used multilayer clustering because it automatically determined the size and the number of clusters.
  • #24 They found out there are two groups. Fast decliners were in danger.
  • #25 I showed you 1 real time and 2 non-real time predictions. Also 2 supervised and 1 unsupervised learning approach. Actually there are a lot more study that can be explored and people use all kind of algorithms except from reinforcement learning. Like many things reinforcement learning has no application that I could find. But I will get to this in the future.
  • #28 unit shipments of wearable devices in millions
  • #30 Simulation on it?
  • #31 iCarbonX is a Chinese unicorn which means it is a privately owned company that is valued more than one billion dollars. They are building digital copies of people and want to help people make right choices in their lives. They gather vast amount of data from their users and use them in machine learning. In his TED Talk, the founder says digital twins may cost millions of dollars for now but he estimates about 3 to 5 years it is going to be very cheap. Much like DNA sequencing. They use both biological data and lifestyle data to copy a human
  • #35 Of course there are some MAJOR challenges in the way of disease prediction applications. Biggest one being access to patient data. Fuel of ML is data and without it the best algorithms are useless. For prediction, continuous labeled data may be needed. Not only it is not available, it is also very hard to get even though you had patients that are willing to give their data. The other challenge is human body. Human body is EXTREMELY complex. There are many different systems work with each other and some seemingly non-related problem can affect other systems and organs. Not only that, there are 7 billion different human organism on earth. Which means no two human is same, so solution applying to one patient most likely won’t solve another patient’s problem. Doctors have a saying for this. They say, “there is no disease there is patient” meaning you don’t look for the disease you try to get to know the person because from family history to environment, from past Other problems including computational power and more powerful machine learning algorithms will probably be solved in the future.