SlideShare a Scribd company logo
1 of 20
Hidden Markov Models for Detecting
Changes in Health Outcomes and
Comparing Groups of Subjects
ZHANG ZEYANG
SEPTEMBER 2015
Acknowledgement
Final project for the degree
MSc. Computational Statistics and Machine Learning at UCL.
Many thanks to my supervisors:
Dr. David Barber (UCL)
Dr. Steven Barrett (GSK) and Dr. Maria Costa (GSK)
Background: COPD disease
Chronic Obstructive Pulmonary Disease (COPD) can be summarized as:
• A collection of lung diseases including chronic bronchitis and emphysema etc.
• A long-term condition that causes inflammation in the lungs, damaged lung tissue and
a narrowing of the airways, making breathing difficult.
• A life-threatening respiratory disease that is commonly seen both in the UK and
worldwide.
Background: COPD Exacerbations
An (acute) exacerbation of COPD is characterized as:
• Worsening of COPD symptoms ( dyspnea, cough, and/or sputum) beyond day-to-day
variations that usually last for a few days.
• Lack of a standardized, consistent and commonly accepted definition.
• The studies of the efficacies of new therapies on COPD have been hampered by the
difficulty in identifying and quantifying exacerbations.
• New approaches are being sought to better recognize and understand exacerbations.
Introduction
A Patient-Reported instrument has been employed to monitor the health of COPD
patients, in which the participating patients are divided into two treatment groups (
Drug A and Drug B) and all required to answer 14 questions in an electronic survey,
reflecting their
• Chest symptoms
• Cough and sputum symptoms
• Breathless symptoms
• General well-being
on a daily basis during the clinical trials (around 6 months). For each question, a patient
has to assign a score where a higher score indicates a more severe symptom.
Dataset
The sum of the 14 scores of each study day forms a time-series data for each patient.
Meanwhile, the clinical exacerbations of each patient over the same periods are also
recorded together with other individual information such as number of historical
exacerbations, treatment group etc.
Objectives
In a nutshell, this project aims to
1. Construct an accurate yet computationally efficient model for detecting COPD
exacerbations based on patients’ self-reported health scores.
2. Develop a systematic method to evaluate the model and benchmark the
detected results against the clinical exacerbations.
3. Based on the detected exacerbations, compare the treatment efficacies of
Drug A and Drug B at cohort level.
Hidden Markov Model (HMM)
The Hidden Markov Model (HMM),
which can be represented by a Direct
Acyclic Graph, is an unsupervised
machine learning model used in this
project to find exacerbations.
The HMM in our model consists of:
• Hidden variable h , that represents the
exacerbation status of each individual.
• Observed variable v , that represents the
reported health scores of each individual.
The most likely exacerbation status of an
individual can be inferred via the Viterbi
algorithm.
Evolution of Models & Results
Through restructuring and manipulating the HMM, we can adapt the model to
accommodate various assumptions of exacerbations to generate more satisfying results.
Evolution of Models & Results
Evaluation: Precision, Recall Measures
An instrument from Information Retrieval is borrowed to evaluated the performance of the
HMM in detecting the exacerbations.
Clinical Exacerbation Not Clinical Exacerbation
Detected by HMM a (true positives) b (false positives) a + b
Not Detected by HMM c (false negatives) d (true negatives) c + d
a + c b + d a+b+c+d
In measurement of days, we have
Recall =
𝑎
𝑎+𝑐
= P(detected by model |clinical exacerbation)
Precision =
𝑎
𝑎+𝑏
= P(clinical exacerbation |detected by model)
Evaluation: Composite Measure - F
Both high Recall and high Precision values are desired; however, it is easy to see that
there is generally a trade-off between Recall and Precision.
A sensitive model tends to over-identify exacerbations, thus generating a high Recall
and low Precision and vice versa.
Therefore, based on Recall and Precision values, a composite measure is created
F-measure =
1
π
1
𝑅
+(1−π)(
1
𝑃
)
,
to strike a balance between these two indices, where π here represents the weightage
we assigned to Recall. A higher F-measure indicates a better performance of the
model in detecting the exacerbations.
Evaluation: Parameters
When restructuring and designing HMM, a series of
parameters were created .
The parameters govern the sensitivity of the model in
detecting the exacerbations and embody some basic
assumptions of an exacerbation.
For instance, α represents our belief of how likely a
patient enters an exacerbation from a non-
exacerbation day in general. A Recall-Precision Curve
can be plotted for various values of α.
Evaluation: Parameters Tuning
In this project, we believe that the clinically identified exacerbations are not the
‘universal truth’. A significant amount of exacerbations could have been missed by the
clinicians as the patients may not always approach on time when the symptoms
deteriorate.
Therefore, we are more interested in creating a model that can successfully identify
most clinically found exacerbations (high Recall), while being more tolerant to over-
detection of exacerbations (low Precision).
We thus adjust our HMM to the ‘ideal sensitivity’ by setting the parameters to the
values that generate the highest F-measure for π=0.80 and π=0.85 respectively.
Evaluation: Comparison of Performances
When comparing with an existing method (Method X), the performances of our HMM
seem encouraging.
Cohort Level Analysis: Regression
Based on the exacerbations detected by our HMM (at π=0.80 and π=0.85 respectively), we can
analyze the impacts different factors (treatment groups, historical exacerbations) have on the
exacerbation frequencies of the COPD patients.
It is natural to assume that a patient’s frequency of exacerbation follows a Poisson distribution –
Poisson(t·λ). Hence, a Poisson Regression is then fitted to the data. Using a log link, we have
θ ’β = log(t) + log(λ)
where θ ’ represents the predictor variables (or independent variables, regressors), such as
historical exacerbations and treatment groups.
Cohort Level Analysis: Output for π=0.8
Cohort Level Analysis: Output for π=0.85
Further Improvements
 Overcome the constraints of the data.
 Improve design of the HMM (3-order HMM, heterogeneous transition matrix etc.).
 When evaluating, take measurements in events of exacerbations instead of
measurements in days for better accuracy.
Q & A
Thank you!

More Related Content

Similar to Project Presentation

Sample size and power calculations
Sample size and power calculationsSample size and power calculations
Sample size and power calculationsRamachandra Barik
 
850-0490-00 Rev A-Oscillatory MAP ABP Case Study
850-0490-00 Rev A-Oscillatory MAP ABP Case Study 850-0490-00 Rev A-Oscillatory MAP ABP Case Study
850-0490-00 Rev A-Oscillatory MAP ABP Case Study Christina Mason
 
ppt1221[1][1].pptx
ppt1221[1][1].pptxppt1221[1][1].pptx
ppt1221[1][1].pptxAbebe334138
 
Postgrad med j 2015-pflug-77-82
Postgrad med j 2015-pflug-77-82Postgrad med j 2015-pflug-77-82
Postgrad med j 2015-pflug-77-82jhon huillca
 
unmatched case control studies
unmatched case control studiesunmatched case control studies
unmatched case control studiesMrinmoy Bharadwaz
 
Epidemiological Approaches for Evaluation of diagnostic tests.pptx
Epidemiological Approaches for Evaluation of diagnostic tests.pptxEpidemiological Approaches for Evaluation of diagnostic tests.pptx
Epidemiological Approaches for Evaluation of diagnostic tests.pptxBhoj Raj Singh
 
Basics of Quality Assurance-Medical Laboratory Services
Basics of Quality Assurance-Medical Laboratory ServicesBasics of Quality Assurance-Medical Laboratory Services
Basics of Quality Assurance-Medical Laboratory ServicesAhmad Al Natour
 
HEART DISEASES PREDICTION USING MACHINE LEARNING ALGORITHM
HEART DISEASES PREDICTION USING MACHINE LEARNING ALGORITHMHEART DISEASES PREDICTION USING MACHINE LEARNING ALGORITHM
HEART DISEASES PREDICTION USING MACHINE LEARNING ALGORITHMPoojaSri45
 
VALIDITY AND RELIABLITY OF A SCREENING TEST seminar 2.pptx
VALIDITY AND RELIABLITY OF A SCREENING TEST seminar 2.pptxVALIDITY AND RELIABLITY OF A SCREENING TEST seminar 2.pptx
VALIDITY AND RELIABLITY OF A SCREENING TEST seminar 2.pptxShaliniPattanayak
 
David Madigan MedicReS World Congress 2014
David Madigan MedicReS World Congress 2014David Madigan MedicReS World Congress 2014
David Madigan MedicReS World Congress 2014MedicReS
 
Basics of quality assurance laboratory services
Basics of quality assurance laboratory servicesBasics of quality assurance laboratory services
Basics of quality assurance laboratory servicesAhmadAlnatour5
 
Isabel + Mark - PE
Isabel + Mark - PEIsabel + Mark - PE
Isabel + Mark - PEguestd97854
 
Pulmonary Embolism
Pulmonary EmbolismPulmonary Embolism
Pulmonary Embolismguestd97854
 
Assignment Pharmacoeconomics Fatma Adel Soliman
Assignment Pharmacoeconomics Fatma Adel SolimanAssignment Pharmacoeconomics Fatma Adel Soliman
Assignment Pharmacoeconomics Fatma Adel SolimanAsia Smith
 

Similar to Project Presentation (20)

QC test
QC testQC test
QC test
 
Sample size and power calculations
Sample size and power calculationsSample size and power calculations
Sample size and power calculations
 
850-0490-00 Rev A-Oscillatory MAP ABP Case Study
850-0490-00 Rev A-Oscillatory MAP ABP Case Study 850-0490-00 Rev A-Oscillatory MAP ABP Case Study
850-0490-00 Rev A-Oscillatory MAP ABP Case Study
 
ppt1221[1][1].pptx
ppt1221[1][1].pptxppt1221[1][1].pptx
ppt1221[1][1].pptx
 
The Lachman Test
The Lachman TestThe Lachman Test
The Lachman Test
 
Postgrad med j 2015-pflug-77-82
Postgrad med j 2015-pflug-77-82Postgrad med j 2015-pflug-77-82
Postgrad med j 2015-pflug-77-82
 
unmatched case control studies
unmatched case control studiesunmatched case control studies
unmatched case control studies
 
Epidemiological Approaches for Evaluation of diagnostic tests.pptx
Epidemiological Approaches for Evaluation of diagnostic tests.pptxEpidemiological Approaches for Evaluation of diagnostic tests.pptx
Epidemiological Approaches for Evaluation of diagnostic tests.pptx
 
Basics of Quality Assurance-Medical Laboratory Services
Basics of Quality Assurance-Medical Laboratory ServicesBasics of Quality Assurance-Medical Laboratory Services
Basics of Quality Assurance-Medical Laboratory Services
 
HEART DISEASES PREDICTION USING MACHINE LEARNING ALGORITHM
HEART DISEASES PREDICTION USING MACHINE LEARNING ALGORITHMHEART DISEASES PREDICTION USING MACHINE LEARNING ALGORITHM
HEART DISEASES PREDICTION USING MACHINE LEARNING ALGORITHM
 
VALIDITY AND RELIABLITY OF A SCREENING TEST seminar 2.pptx
VALIDITY AND RELIABLITY OF A SCREENING TEST seminar 2.pptxVALIDITY AND RELIABLITY OF A SCREENING TEST seminar 2.pptx
VALIDITY AND RELIABLITY OF A SCREENING TEST seminar 2.pptx
 
David Madigan MedicReS World Congress 2014
David Madigan MedicReS World Congress 2014David Madigan MedicReS World Congress 2014
David Madigan MedicReS World Congress 2014
 
Basics of quality assurance laboratory services
Basics of quality assurance laboratory servicesBasics of quality assurance laboratory services
Basics of quality assurance laboratory services
 
Ecm PE
Ecm   PEEcm   PE
Ecm PE
 
E C M PE
E C M    PEE C M    PE
E C M PE
 
Isabel + Mark - PE
Isabel + Mark - PEIsabel + Mark - PE
Isabel + Mark - PE
 
ECM PE
ECM PEECM PE
ECM PE
 
Pulmonary Embolism
Pulmonary EmbolismPulmonary Embolism
Pulmonary Embolism
 
Assignment Pharmacoeconomics Fatma Adel Soliman
Assignment Pharmacoeconomics Fatma Adel SolimanAssignment Pharmacoeconomics Fatma Adel Soliman
Assignment Pharmacoeconomics Fatma Adel Soliman
 
Simple TCI
Simple TCISimple TCI
Simple TCI
 

Project Presentation

  • 1. Hidden Markov Models for Detecting Changes in Health Outcomes and Comparing Groups of Subjects ZHANG ZEYANG SEPTEMBER 2015
  • 2. Acknowledgement Final project for the degree MSc. Computational Statistics and Machine Learning at UCL. Many thanks to my supervisors: Dr. David Barber (UCL) Dr. Steven Barrett (GSK) and Dr. Maria Costa (GSK)
  • 3. Background: COPD disease Chronic Obstructive Pulmonary Disease (COPD) can be summarized as: • A collection of lung diseases including chronic bronchitis and emphysema etc. • A long-term condition that causes inflammation in the lungs, damaged lung tissue and a narrowing of the airways, making breathing difficult. • A life-threatening respiratory disease that is commonly seen both in the UK and worldwide.
  • 4. Background: COPD Exacerbations An (acute) exacerbation of COPD is characterized as: • Worsening of COPD symptoms ( dyspnea, cough, and/or sputum) beyond day-to-day variations that usually last for a few days. • Lack of a standardized, consistent and commonly accepted definition. • The studies of the efficacies of new therapies on COPD have been hampered by the difficulty in identifying and quantifying exacerbations. • New approaches are being sought to better recognize and understand exacerbations.
  • 5. Introduction A Patient-Reported instrument has been employed to monitor the health of COPD patients, in which the participating patients are divided into two treatment groups ( Drug A and Drug B) and all required to answer 14 questions in an electronic survey, reflecting their • Chest symptoms • Cough and sputum symptoms • Breathless symptoms • General well-being on a daily basis during the clinical trials (around 6 months). For each question, a patient has to assign a score where a higher score indicates a more severe symptom.
  • 6. Dataset The sum of the 14 scores of each study day forms a time-series data for each patient. Meanwhile, the clinical exacerbations of each patient over the same periods are also recorded together with other individual information such as number of historical exacerbations, treatment group etc.
  • 7. Objectives In a nutshell, this project aims to 1. Construct an accurate yet computationally efficient model for detecting COPD exacerbations based on patients’ self-reported health scores. 2. Develop a systematic method to evaluate the model and benchmark the detected results against the clinical exacerbations. 3. Based on the detected exacerbations, compare the treatment efficacies of Drug A and Drug B at cohort level.
  • 8. Hidden Markov Model (HMM) The Hidden Markov Model (HMM), which can be represented by a Direct Acyclic Graph, is an unsupervised machine learning model used in this project to find exacerbations. The HMM in our model consists of: • Hidden variable h , that represents the exacerbation status of each individual. • Observed variable v , that represents the reported health scores of each individual. The most likely exacerbation status of an individual can be inferred via the Viterbi algorithm.
  • 9. Evolution of Models & Results Through restructuring and manipulating the HMM, we can adapt the model to accommodate various assumptions of exacerbations to generate more satisfying results.
  • 10. Evolution of Models & Results
  • 11. Evaluation: Precision, Recall Measures An instrument from Information Retrieval is borrowed to evaluated the performance of the HMM in detecting the exacerbations. Clinical Exacerbation Not Clinical Exacerbation Detected by HMM a (true positives) b (false positives) a + b Not Detected by HMM c (false negatives) d (true negatives) c + d a + c b + d a+b+c+d In measurement of days, we have Recall = 𝑎 𝑎+𝑐 = P(detected by model |clinical exacerbation) Precision = 𝑎 𝑎+𝑏 = P(clinical exacerbation |detected by model)
  • 12. Evaluation: Composite Measure - F Both high Recall and high Precision values are desired; however, it is easy to see that there is generally a trade-off between Recall and Precision. A sensitive model tends to over-identify exacerbations, thus generating a high Recall and low Precision and vice versa. Therefore, based on Recall and Precision values, a composite measure is created F-measure = 1 π 1 𝑅 +(1−π)( 1 𝑃 ) , to strike a balance between these two indices, where π here represents the weightage we assigned to Recall. A higher F-measure indicates a better performance of the model in detecting the exacerbations.
  • 13. Evaluation: Parameters When restructuring and designing HMM, a series of parameters were created . The parameters govern the sensitivity of the model in detecting the exacerbations and embody some basic assumptions of an exacerbation. For instance, α represents our belief of how likely a patient enters an exacerbation from a non- exacerbation day in general. A Recall-Precision Curve can be plotted for various values of α.
  • 14. Evaluation: Parameters Tuning In this project, we believe that the clinically identified exacerbations are not the ‘universal truth’. A significant amount of exacerbations could have been missed by the clinicians as the patients may not always approach on time when the symptoms deteriorate. Therefore, we are more interested in creating a model that can successfully identify most clinically found exacerbations (high Recall), while being more tolerant to over- detection of exacerbations (low Precision). We thus adjust our HMM to the ‘ideal sensitivity’ by setting the parameters to the values that generate the highest F-measure for π=0.80 and π=0.85 respectively.
  • 15. Evaluation: Comparison of Performances When comparing with an existing method (Method X), the performances of our HMM seem encouraging.
  • 16. Cohort Level Analysis: Regression Based on the exacerbations detected by our HMM (at π=0.80 and π=0.85 respectively), we can analyze the impacts different factors (treatment groups, historical exacerbations) have on the exacerbation frequencies of the COPD patients. It is natural to assume that a patient’s frequency of exacerbation follows a Poisson distribution – Poisson(t·λ). Hence, a Poisson Regression is then fitted to the data. Using a log link, we have θ ’β = log(t) + log(λ) where θ ’ represents the predictor variables (or independent variables, regressors), such as historical exacerbations and treatment groups.
  • 17. Cohort Level Analysis: Output for π=0.8
  • 18. Cohort Level Analysis: Output for π=0.85
  • 19. Further Improvements  Overcome the constraints of the data.  Improve design of the HMM (3-order HMM, heterogeneous transition matrix etc.).  When evaluating, take measurements in events of exacerbations instead of measurements in days for better accuracy.
  • 20. Q & A Thank you!

Editor's Notes

  1. People are normally diagnosed in their 40s or 50s.
  2. From point 2 to point 3, it makes it difficult when we test new drugs or monitor the progression of COPD disease. Better identify better quantify exacerbations.
  3. I will avoid the details of the algorithms. The Viterbi considers transition and emission distributions.
  4. However, the diagram above is just a basic HMM. In the context of this project, we adapt redesign the HMM.
  5. Bear in mind, this results are generated based on the assumed values of parameters in Viterbi.
  6. To what extent are they in line (in agreement ) with clinical exacerbations??????
  7. It is up to us how sensitive we want the model to be. A composite measure is then desired.
  8. This is the trick!