SlideShare a Scribd company logo
1 of 36
Download to read offline
Thesis Defense
Investigating the Use Of Novel Data Mining And Machine

Learning Methods in Healthcare Data Sources Of Multiple

Nature
Roberto Batista
Content
• Introduction
• Part I - Survey Data
• Overview

• Literature review

• Methods

• Exploratory Data Analysis

• Data Transformation

• Conclusions
• Part II - Electronic Health Record
• Overview

• Literature review

• Methods

• Exploratory Data Analysis

• Data Transformation

• Conclusions
2
Introduction
3
2008 2015
9.4% 84%
Introduction
• Non-federal hospitals with basic systems.
The Office of the National Coordinator for Health Information Technology (ONC) 4
gender ethnicity religion age social finance exams
Health Data Sources
National Library of Medicine (NIH)
EMR
$
$
$
$
$
$ SpO2
EHR
5
Survey Electronic Medical Record Claim Data Vital Signs Data Electronic Health Record
PART I PART II
Thesis Components
EHR
6
Survey Electronic Health Record
Part I
Survey Data
-How to identify personality traits groups in the
Health and Retirement Study survey data?
7
Health and Retirement Study
(HRS)
8
HRS Overview
3 surveys
6 aspects
5
aspects
5
aspects
22,000
>50 yo
9 aspects
4
Derived
Datasets
58.54%
Medical Ethics
Training
9
Literature Review
Gould et al., 2015:
Verifies the symptoms of anxiety and depression in
veterans and non-veterans using CES-D and BAI.
Seligman et Al., 2018:
Machine Learning improves the understanding of social
determinants of health.
Hülür et al., 2015:
Investigates association between subjective memory,
subjective age and personality traits.
Fehrman et al., 2015:
Personality correlation with the consumption of eight
psychoactive drugs and its consumption by individuals.
Aschwanden et al., 2019:
Personality traits associations with the probability of
having a preventive screening for cancer.
Five personality Traits
(OCEAN):

• Openness

• Conscientiousness

• Extraversion

• Agreeableness

• Neuroticism
10
Machine Learning Studies
HRS Datasets Overview
11
HRS - RANDHRS Core HRS Exit HRS Post-Exit
• Adult ADHD

• Financial

• Material Hardship

• Long-term Care

• Medication Non-
Adherence

• Religious
• Proxy informant

• Health

• Family

• Finance
• Proxy informant

• Unresolved
financial
situations
1992
|
2016
1992
|
2016
1992
|
2016
1992
|
2016
HRS Datasets of Interest
12
HRS - RANDHRS Core HRS Exit HRS Post-Exit
HRS Datasets of Interest
2006, 2008,
2010, 2012
HRS - RAND
HRS Core - Section LB - Left-Behind
Subjective well-being, lifestyle and experience of stress, quality of
Social ties, personality traits, work-related beliefs, and self-
related beliefs.
HRS Core - Section D - Cognition
Immediate and delayed free recall, working memory and mental
processing, vocabulary, mental status, and self-rated memory.
13
HRS - RANDHRS Core HRS Exit HRS Post-Exit
2006, 2008,
2010, 2012
Data Conversion
14
HRS
Data Transformation
HRS:
• RAND
• Core D
• Core LB
15
Methods
Cloud of Individuals:
Stars represents 

individuals
Cloud of Variables:
Points represents 

variables
A B C
1 a1 b2 c1
⋮ ⋮ ⋮ ⋮
i a2 b2 c3
i’ a1 b1 c1
⋮ ⋮ ⋮ ⋮
N a4 b2 c2
• Unsupervised Machine Learning
• Multiple Correspondence Analysis (MCA)

• Clustering
16
sophist_A lot
sophist_Some
bminded_A lot
curious_A lot
intellig_A lot
imagina_A lot
creative_A lot sympath_A lot
softheart_A lot
caring_A lot
warm_A lot
helpful_A lot
talkactive_A lot
active_A lot
lively_A lot
friendly_A lot
outgoing_A lot
careless_A lot
careless_Not at all
thorough_A lot
hardworker_A lot responsible_A lot
organized_A lot
calm_A lot
nervous_Not at all
worry_Not at all
moody_Not at all
−0.5
0.0
0.5
1.0
1.5
−1.00 −0.75 −0.50 −0.25 0.00
Dim1 (8.1%)
Dim2(4.7%)
Region 1
●
● ●
●
●
●
●
●
● ●●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
sophist_A little
bminded_Some
curious_Some
intellig_Some
imagina_A little
creative_A little
creative_Some
sympath_Some
softheart_Some
caring_Some
warm_Some
helpful_Some
talkactive_A little
talkactive_Some
active_Some
lively_Some
friendly_Some
outgoing_Some
careless_A little
careless_Some
thorough_Some
hardworker_Some
organized_Some
calm_Some
nervous_A little
nervous_Some
worry_A little
worry_Some
moody_A little
moody_Some
−0.50
−0.25
0.00
0.25
0.50
0.0 0.5
Dim1 (8.1%)
Dim2(4.7%) Region 2
sophist_A little
sophist_Not at all
bminded_A little
bminded_Not at all
curious_A little
intellig_A little
imagina_A little
imagina_Not at all
creative_Not at all
sympath_A little
softheart_A little
caring_A little
caring_Some
warm_A little
helpful_A little
talkactive_A little
talkactive_Not at all
active_A little
lively_A little
friendly_A little
friendly_Some
outgoing_A little
outgoing_Not at all
thorough_A little
hardworker_A little
responsible_A little
responsible_Some
organized_A little
organized_Not at all
calm_A little
calm_Some
nervous_A lot
worry_A lot
moody_A lot
−1
0
1
2
−0.5 0.0 0.5 1.0 1.5 2.0
Dim1 (8.1%)
Dim2(4.7%)
Region 3 bminded_Not at all
curious_Not at all
intellig_Not at all
imagina_Not at all
sympath_Not at all
softheart_Not at all
warm_Not at all
helpful_Not at all
active_Not at all
lively_Not at all
friendly_Not at all
thorough_Not at all
hardworker_Not at all
calm_Not at all
1
2
3
4
5
0.5 1.0 1.5 2.0
Dim1 (8.1%)
Dim2(4.7%)
Region 4
17
Conclusions
Clusters:
18
Conclusions
• The hierarchical clustering technique applied to the low
dimensional representation of participants, provided by the MCA
method, suggested a reasonable separation of the respondent
profile characterized by a personality scale.

• This can be applied to survey design and sampling procedures.

• This can support correlation studies with other physical and mental
health indicators.
19
Paper Presented and Published
18th IEEE International
Conference on
Machine Learning and
Applications - ICMLA
2019
December 16-19, Boca Raton,
Florida, USA
20
Part II
Electronic Health Record
- How to predict Intensive Care Unit (ICU) Length
of Stay (LOS) using Machine Learning models?
21
Medical Information Mart for
Intensive Care - III
(MIMIC-III)
22
MIMIC-III Overview
NB, 15 >
2.1 days
7.76%
380
meas.
11.5%
44.1%
53,423
adm
6.9
days
EHR
7,870
38,597 
23
Beth Israel Deaconess Medical Center
CareVue DB
MetaVision DB
MIMIC-III
24
Literature Review
Azari et al., 2012:
Approached the LOS prediction identifying similar groups. Reached
accuracy of 74.3%.
Van Houdenhoven et al., 2007:
LOS prediction elective esophagectomy with reconstruction for
carcinoma, with presence of gastroesophageal reflux disease, and
respiratory minute volume transthoracic. R2 of 45%.
Clark & Ryan, 2002:
Tested with demographics younger than 55 years old reach the
highest accuracy of 69%, individuals in the range of 55 and 70 yo
reached 13%, and the group older than 70 years old 17%.
Gustafson, 1968:
Uses five different methodologies for predicting the LOS of inguinal
herniotomy patients.
Afrin et al., 2019:
Predict LOS using three classifications, focused on the age and
death outcome of the patients. Accuracy 54.8% (RF and LR).
Intensive Care Unit (ICU) Length of Stay (LOS)
25
Wait time for
ICU Admission
ICU Management Important predictor
for Death Rate
ICU Cost
Data Accessing
Data Specimens only Research Training - CITI Program:
1. Belmont Report and Its Principles (ID 1127)

2. History and Ethics of Human Subjects Research (ID 498)

3. Basic Institutional Review Board (IRB) Regulations and

4. Review Process (ID 2)

5. Records-Based Research (ID 5)

6. Genetic Research in Human Populations (ID 6)

7. Populations in Research Requiring Additional Considerations and/or Protections (ID16680)

8. Conflicts of Interest in Human Subjects Research (ID 17464)
26
Exploratory Data Analysis
26 CSV Files SQLite
CSV to SQLite
Conversion
27
Data Transformation
28
CSV
STAYS
CSV
PATIENTS
1. ROW_ID
2. SUBJECT_ID
3. GENDER
4. DOB
5. DOD
6. DOD_HOSP
7. DOD_SSN
8. EXPIRE_FLAG
1. ROW_ID
2. SUBJECT_ID
3. HADM_ID
4. ICUSTAY_ID
5. DBSOURCE
6. FIRST_CAREUNIT
7. LAST_CAREUNIT
8. FIRST_WARDID
9. LAST_WARDID
10.INTIME
11.OUTTIME
12.LOS
CSV
DIAGNOSIS
1. ROW_ID
2. SUBJECT_ID
3. HADM_ID
4. SEQ_NUM
5. ICD9_CODE
CSV
ADMISSIONS
1. ROW_ID
2. SUBJECT_ID
3. HADM_ID
4. ADMITTIME
5. DISCHTIME
6. DEATHTIME
7. ADMISSION_TYPE
8. ADMISSION_LOCATION
9. DISCHARGE_LOCATION
10.INSURANCE
11.LANGUAGE
12.RELIGION
13.MARITAL_STATUS
14.ETHNICITY
15.EDREGTIME
16.EDOUTTIME
17.DIAGNOSIS
18.HOSPITAL_EXPIRE_FLAG
19.HAS_CHARTEVENTS_DATA
Data Transformation
2
1
3
4
5
6
7
29
Methods
Tidymodels framework:

• rsample (data sampling)

• recipes (data preprocess)

• parsnip (machine learning modeling)

• yardstick (performance evaluation)

• Algorithm families: Decision Trees,
Random Forest, Boosted Trees, SVM,
and Linear Regression
x
x
x
x
30
Methods
Predictors
• Ethnicity

• Respiratory diagnosis

Subset

• ICU: SICU

• Admission Type: Urgency

Linear Regression
• R2 Adj.: 63.75%

• RMSE: 9.56

Classifier
• Accuracy: 92.7%
31
x
x
x
x
Conclusions
• LOS prediction is a very specific prediction task, case oriented and is
unlikely that one model can generalize for any case.

• It was possible to create a specific prediction model for:

• Surgical Intensive Care Unit

• Admitted from Emergency

• Patients with respiratory disease diagnosed

• The use of the novel R library tidymodels enables the use of multiple
ML libraries, under a unifying collection of packages for modeling and
statistical analysis that share the underlying design philosophy, grammar,
and data structures of the modern data science tools in the tidyverse.
32
Next Steps
• Format a paper and submit to
machine learning conferences/
journals.

• ACM-BCB ’20: 8th ACM
International Conference on
Bioinformatics, Computational
Biology,and Health Informatics
• Apply unsupervised learning
technics used in the part I to the
MIMIC-III dataset.

• Create MIMIC-III subsets with
Lab Exams for further
investigation.
33
Thanks to
34
Thanks to
35
Friends at
Thanks to
Icons: http://www.flaticons.com 36
Friends at

More Related Content

Similar to Investigating the use of novel data mining and machine learning methods in healthcare data sources of multiple natures.

Clinical Data Models - The Hyve - Bio IT World April 2019
Clinical Data Models - The Hyve - Bio IT World April 2019Clinical Data Models - The Hyve - Bio IT World April 2019
Clinical Data Models - The Hyve - Bio IT World April 2019Kees van Bochove
 
decentralization: a trend in biomedical research
decentralization: a trend in biomedical researchdecentralization: a trend in biomedical research
decentralization: a trend in biomedical researchBrian Bot
 
Clinical Research Informatics Year-in-Review 2024
Clinical Research Informatics Year-in-Review 2024Clinical Research Informatics Year-in-Review 2024
Clinical Research Informatics Year-in-Review 2024Peter Embi
 
Peter Embi's 2011 AMIA CRI Year-in-Review
Peter Embi's 2011 AMIA CRI Year-in-ReviewPeter Embi's 2011 AMIA CRI Year-in-Review
Peter Embi's 2011 AMIA CRI Year-in-ReviewPeter Embi
 
Embi cri review-2012-final
Embi cri review-2012-finalEmbi cri review-2012-final
Embi cri review-2012-finalPeter Embi
 
From Text to Data to the World: The Future of Knowledge Graphs
From Text to Data to the World: The Future of Knowledge GraphsFrom Text to Data to the World: The Future of Knowledge Graphs
From Text to Data to the World: The Future of Knowledge GraphsPaul Groth
 
Deliverable 5 - Hypothesis Tests for Two SamplesCompetencyForm.docx
Deliverable 5 - Hypothesis Tests for Two SamplesCompetencyForm.docxDeliverable 5 - Hypothesis Tests for Two SamplesCompetencyForm.docx
Deliverable 5 - Hypothesis Tests for Two SamplesCompetencyForm.docxrandyburney60861
 
Leveraging Medical Health Record Data for Identifying Research Study Particip...
Leveraging Medical Health Record Data for Identifying Research Study Particip...Leveraging Medical Health Record Data for Identifying Research Study Particip...
Leveraging Medical Health Record Data for Identifying Research Study Particip...SC CTSI at USC and CHLA
 
Meeting the Computational Challenges Associated with Human Health
Meeting the Computational Challenges Associated with Human HealthMeeting the Computational Challenges Associated with Human Health
Meeting the Computational Challenges Associated with Human HealthPhilip Bourne
 
Utility and Added Value of Classifications in Health Information Systems
Utility and Added Value of Classifications in Health Information SystemsUtility and Added Value of Classifications in Health Information Systems
Utility and Added Value of Classifications in Health Information SystemsBedirhan Ustun
 
Delivering on the promise of data-driven healthcare: trade-offs, challenges, ...
Delivering on the promise of data-driven healthcare: trade-offs, challenges, ...Delivering on the promise of data-driven healthcare: trade-offs, challenges, ...
Delivering on the promise of data-driven healthcare: trade-offs, challenges, ...Paolo Missier
 
Personalized health knowledge graph ckg workshop - iswc 2018 (2)
Personalized health knowledge graph   ckg workshop - iswc 2018 (2)Personalized health knowledge graph   ckg workshop - iswc 2018 (2)
Personalized health knowledge graph ckg workshop - iswc 2018 (2)Amélie Gyrard
 
Open Educational Resources for Big Data Science
Open Educational Resources for Big Data ScienceOpen Educational Resources for Big Data Science
Open Educational Resources for Big Data ScienceWilliam Hersh, MD
 
Health Care Processes and Decision Making_Lecture 4_slides
Health Care Processes and Decision Making_Lecture 4_slidesHealth Care Processes and Decision Making_Lecture 4_slides
Health Care Processes and Decision Making_Lecture 4_slidesCMDLearning
 
Lecture C
Lecture CLecture C
Lecture CCMDLMS
 
Vojtech huser-2009-amia-clinical-research-informatics-panel-eligibility-v011
Vojtech huser-2009-amia-clinical-research-informatics-panel-eligibility-v011Vojtech huser-2009-amia-clinical-research-informatics-panel-eligibility-v011
Vojtech huser-2009-amia-clinical-research-informatics-panel-eligibility-v011Vojtech Huser
 
Evaluation methods in heathcare systems
Evaluation methods in heathcare systemsEvaluation methods in heathcare systems
Evaluation methods in heathcare systemsMarsa Gholamzadeh
 

Similar to Investigating the use of novel data mining and machine learning methods in healthcare data sources of multiple natures. (20)

Connecting eh rdataquad12
Connecting eh rdataquad12Connecting eh rdataquad12
Connecting eh rdataquad12
 
Clinical Data Models - The Hyve - Bio IT World April 2019
Clinical Data Models - The Hyve - Bio IT World April 2019Clinical Data Models - The Hyve - Bio IT World April 2019
Clinical Data Models - The Hyve - Bio IT World April 2019
 
decentralization: a trend in biomedical research
decentralization: a trend in biomedical researchdecentralization: a trend in biomedical research
decentralization: a trend in biomedical research
 
Clinical Research Informatics Year-in-Review 2024
Clinical Research Informatics Year-in-Review 2024Clinical Research Informatics Year-in-Review 2024
Clinical Research Informatics Year-in-Review 2024
 
Peter Embi's 2011 AMIA CRI Year-in-Review
Peter Embi's 2011 AMIA CRI Year-in-ReviewPeter Embi's 2011 AMIA CRI Year-in-Review
Peter Embi's 2011 AMIA CRI Year-in-Review
 
Embi cri review-2012-final
Embi cri review-2012-finalEmbi cri review-2012-final
Embi cri review-2012-final
 
From Text to Data to the World: The Future of Knowledge Graphs
From Text to Data to the World: The Future of Knowledge GraphsFrom Text to Data to the World: The Future of Knowledge Graphs
From Text to Data to the World: The Future of Knowledge Graphs
 
Deliverable 5 - Hypothesis Tests for Two SamplesCompetencyForm.docx
Deliverable 5 - Hypothesis Tests for Two SamplesCompetencyForm.docxDeliverable 5 - Hypothesis Tests for Two SamplesCompetencyForm.docx
Deliverable 5 - Hypothesis Tests for Two SamplesCompetencyForm.docx
 
Leveraging Medical Health Record Data for Identifying Research Study Particip...
Leveraging Medical Health Record Data for Identifying Research Study Particip...Leveraging Medical Health Record Data for Identifying Research Study Particip...
Leveraging Medical Health Record Data for Identifying Research Study Particip...
 
Meeting the Computational Challenges Associated with Human Health
Meeting the Computational Challenges Associated with Human HealthMeeting the Computational Challenges Associated with Human Health
Meeting the Computational Challenges Associated with Human Health
 
Utility and Added Value of Classifications in Health Information Systems
Utility and Added Value of Classifications in Health Information SystemsUtility and Added Value of Classifications in Health Information Systems
Utility and Added Value of Classifications in Health Information Systems
 
Delivering on the promise of data-driven healthcare: trade-offs, challenges, ...
Delivering on the promise of data-driven healthcare: trade-offs, challenges, ...Delivering on the promise of data-driven healthcare: trade-offs, challenges, ...
Delivering on the promise of data-driven healthcare: trade-offs, challenges, ...
 
Innovative project1
Innovative project1Innovative project1
Innovative project1
 
Personalized health knowledge graph ckg workshop - iswc 2018 (2)
Personalized health knowledge graph   ckg workshop - iswc 2018 (2)Personalized health knowledge graph   ckg workshop - iswc 2018 (2)
Personalized health knowledge graph ckg workshop - iswc 2018 (2)
 
NISO Working Group Connection Live! Research Data Metrics Landscape: An Updat...
NISO Working Group Connection Live! Research Data Metrics Landscape: An Updat...NISO Working Group Connection Live! Research Data Metrics Landscape: An Updat...
NISO Working Group Connection Live! Research Data Metrics Landscape: An Updat...
 
Open Educational Resources for Big Data Science
Open Educational Resources for Big Data ScienceOpen Educational Resources for Big Data Science
Open Educational Resources for Big Data Science
 
Health Care Processes and Decision Making_Lecture 4_slides
Health Care Processes and Decision Making_Lecture 4_slidesHealth Care Processes and Decision Making_Lecture 4_slides
Health Care Processes and Decision Making_Lecture 4_slides
 
Lecture C
Lecture CLecture C
Lecture C
 
Vojtech huser-2009-amia-clinical-research-informatics-panel-eligibility-v011
Vojtech huser-2009-amia-clinical-research-informatics-panel-eligibility-v011Vojtech huser-2009-amia-clinical-research-informatics-panel-eligibility-v011
Vojtech huser-2009-amia-clinical-research-informatics-panel-eligibility-v011
 
Evaluation methods in heathcare systems
Evaluation methods in heathcare systemsEvaluation methods in heathcare systems
Evaluation methods in heathcare systems
 

More from Roberto Williams Batista (8)

Investigating the Use Of Novel Data Mining And Machine Learning Methods in He...
Investigating the Use Of Novel Data Mining And Machine Learning Methods in He...Investigating the Use Of Novel Data Mining And Machine Learning Methods in He...
Investigating the Use Of Novel Data Mining And Machine Learning Methods in He...
 
Songdo Demographics
Songdo DemographicsSongdo Demographics
Songdo Demographics
 
Robbiot intro
Robbiot introRobbiot intro
Robbiot intro
 
Introduction to Data Science in IoT Projects.
Introduction to Data Science in IoT Projects.Introduction to Data Science in IoT Projects.
Introduction to Data Science in IoT Projects.
 
Project Luckie
Project LuckieProject Luckie
Project Luckie
 
Robbio intro
Robbio introRobbio intro
Robbio intro
 
ROBBIoT
ROBBIoTROBBIoT
ROBBIoT
 
Introdução a Wearables
Introdução a WearablesIntrodução a Wearables
Introdução a Wearables
 

Recently uploaded

CebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxCebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxolyaivanovalion
 
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...amitlee9823
 
BigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptxBigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptxolyaivanovalion
 
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Callshivangimorya083
 
Determinants of health, dimensions of health, positive health and spectrum of...
Determinants of health, dimensions of health, positive health and spectrum of...Determinants of health, dimensions of health, positive health and spectrum of...
Determinants of health, dimensions of health, positive health and spectrum of...shambhavirathore45
 
Discover Why Less is More in B2B Research
Discover Why Less is More in B2B ResearchDiscover Why Less is More in B2B Research
Discover Why Less is More in B2B Researchmichael115558
 
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAl Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAroojKhan71
 
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% SecureCall me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% SecurePooja Nehwal
 
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...amitlee9823
 
Invezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz1
 
Accredited-Transport-Cooperatives-Jan-2021-Web.pdf
Accredited-Transport-Cooperatives-Jan-2021-Web.pdfAccredited-Transport-Cooperatives-Jan-2021-Web.pdf
Accredited-Transport-Cooperatives-Jan-2021-Web.pdfadriantubila
 
Midocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxMidocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxolyaivanovalion
 
Best VIP Call Girls Noida Sector 22 Call Me: 8448380779
Best VIP Call Girls Noida Sector 22 Call Me: 8448380779Best VIP Call Girls Noida Sector 22 Call Me: 8448380779
Best VIP Call Girls Noida Sector 22 Call Me: 8448380779Delhi Call girls
 
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptxBPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptxMohammedJunaid861692
 
Vip Model Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...
Vip Model  Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...Vip Model  Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...
Vip Model Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...shivangimorya083
 
Zuja dropshipping via API with DroFx.pptx
Zuja dropshipping via API with DroFx.pptxZuja dropshipping via API with DroFx.pptx
Zuja dropshipping via API with DroFx.pptxolyaivanovalion
 
Log Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptxLog Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptxJohnnyPlasten
 
FESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfFESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfMarinCaroMartnezBerg
 

Recently uploaded (20)

CebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxCebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptx
 
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
 
BigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptxBigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptx
 
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
 
Determinants of health, dimensions of health, positive health and spectrum of...
Determinants of health, dimensions of health, positive health and spectrum of...Determinants of health, dimensions of health, positive health and spectrum of...
Determinants of health, dimensions of health, positive health and spectrum of...
 
Discover Why Less is More in B2B Research
Discover Why Less is More in B2B ResearchDiscover Why Less is More in B2B Research
Discover Why Less is More in B2B Research
 
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAl Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
 
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% SecureCall me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
 
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get CytotecAbortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
 
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
 
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in Kishangarh
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in  KishangarhDelhi 99530 vip 56974 Genuine Escort Service Call Girls in  Kishangarh
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in Kishangarh
 
Invezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signals
 
Accredited-Transport-Cooperatives-Jan-2021-Web.pdf
Accredited-Transport-Cooperatives-Jan-2021-Web.pdfAccredited-Transport-Cooperatives-Jan-2021-Web.pdf
Accredited-Transport-Cooperatives-Jan-2021-Web.pdf
 
Midocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxMidocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFx
 
Best VIP Call Girls Noida Sector 22 Call Me: 8448380779
Best VIP Call Girls Noida Sector 22 Call Me: 8448380779Best VIP Call Girls Noida Sector 22 Call Me: 8448380779
Best VIP Call Girls Noida Sector 22 Call Me: 8448380779
 
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptxBPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
 
Vip Model Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...
Vip Model  Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...Vip Model  Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...
Vip Model Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...
 
Zuja dropshipping via API with DroFx.pptx
Zuja dropshipping via API with DroFx.pptxZuja dropshipping via API with DroFx.pptx
Zuja dropshipping via API with DroFx.pptx
 
Log Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptxLog Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptx
 
FESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfFESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdf
 

Investigating the use of novel data mining and machine learning methods in healthcare data sources of multiple natures.

  • 1. Thesis Defense Investigating the Use Of Novel Data Mining And Machine Learning Methods in Healthcare Data Sources Of Multiple Nature Roberto Batista
  • 2. Content • Introduction • Part I - Survey Data • Overview • Literature review • Methods • Exploratory Data Analysis • Data Transformation • Conclusions • Part II - Electronic Health Record • Overview • Literature review • Methods • Exploratory Data Analysis • Data Transformation • Conclusions 2
  • 4. 2008 2015 9.4% 84% Introduction • Non-federal hospitals with basic systems. The Office of the National Coordinator for Health Information Technology (ONC) 4 gender ethnicity religion age social finance exams
  • 5. Health Data Sources National Library of Medicine (NIH) EMR $ $ $ $ $ $ SpO2 EHR 5 Survey Electronic Medical Record Claim Data Vital Signs Data Electronic Health Record
  • 6. PART I PART II Thesis Components EHR 6 Survey Electronic Health Record
  • 7. Part I Survey Data -How to identify personality traits groups in the Health and Retirement Study survey data? 7
  • 8. Health and Retirement Study (HRS) 8
  • 9. HRS Overview 3 surveys 6 aspects 5 aspects 5 aspects 22,000 >50 yo 9 aspects 4 Derived Datasets 58.54% Medical Ethics Training 9
  • 10. Literature Review Gould et al., 2015: Verifies the symptoms of anxiety and depression in veterans and non-veterans using CES-D and BAI. Seligman et Al., 2018: Machine Learning improves the understanding of social determinants of health. Hülür et al., 2015: Investigates association between subjective memory, subjective age and personality traits. Fehrman et al., 2015: Personality correlation with the consumption of eight psychoactive drugs and its consumption by individuals. Aschwanden et al., 2019: Personality traits associations with the probability of having a preventive screening for cancer. Five personality Traits (OCEAN): • Openness • Conscientiousness • Extraversion • Agreeableness • Neuroticism 10 Machine Learning Studies
  • 11. HRS Datasets Overview 11 HRS - RANDHRS Core HRS Exit HRS Post-Exit • Adult ADHD • Financial • Material Hardship • Long-term Care • Medication Non- Adherence • Religious • Proxy informant • Health • Family • Finance • Proxy informant • Unresolved financial situations 1992 | 2016 1992 | 2016 1992 | 2016 1992 | 2016
  • 12. HRS Datasets of Interest 12 HRS - RANDHRS Core HRS Exit HRS Post-Exit
  • 13. HRS Datasets of Interest 2006, 2008, 2010, 2012 HRS - RAND HRS Core - Section LB - Left-Behind Subjective well-being, lifestyle and experience of stress, quality of Social ties, personality traits, work-related beliefs, and self- related beliefs. HRS Core - Section D - Cognition Immediate and delayed free recall, working memory and mental processing, vocabulary, mental status, and self-rated memory. 13 HRS - RANDHRS Core HRS Exit HRS Post-Exit 2006, 2008, 2010, 2012
  • 15. Data Transformation HRS: • RAND • Core D • Core LB 15
  • 16. Methods Cloud of Individuals: Stars represents individuals Cloud of Variables: Points represents variables A B C 1 a1 b2 c1 ⋮ ⋮ ⋮ ⋮ i a2 b2 c3 i’ a1 b1 c1 ⋮ ⋮ ⋮ ⋮ N a4 b2 c2 • Unsupervised Machine Learning • Multiple Correspondence Analysis (MCA) • Clustering 16
  • 17. sophist_A lot sophist_Some bminded_A lot curious_A lot intellig_A lot imagina_A lot creative_A lot sympath_A lot softheart_A lot caring_A lot warm_A lot helpful_A lot talkactive_A lot active_A lot lively_A lot friendly_A lot outgoing_A lot careless_A lot careless_Not at all thorough_A lot hardworker_A lot responsible_A lot organized_A lot calm_A lot nervous_Not at all worry_Not at all moody_Not at all −0.5 0.0 0.5 1.0 1.5 −1.00 −0.75 −0.50 −0.25 0.00 Dim1 (8.1%) Dim2(4.7%) Region 1 ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● sophist_A little bminded_Some curious_Some intellig_Some imagina_A little creative_A little creative_Some sympath_Some softheart_Some caring_Some warm_Some helpful_Some talkactive_A little talkactive_Some active_Some lively_Some friendly_Some outgoing_Some careless_A little careless_Some thorough_Some hardworker_Some organized_Some calm_Some nervous_A little nervous_Some worry_A little worry_Some moody_A little moody_Some −0.50 −0.25 0.00 0.25 0.50 0.0 0.5 Dim1 (8.1%) Dim2(4.7%) Region 2 sophist_A little sophist_Not at all bminded_A little bminded_Not at all curious_A little intellig_A little imagina_A little imagina_Not at all creative_Not at all sympath_A little softheart_A little caring_A little caring_Some warm_A little helpful_A little talkactive_A little talkactive_Not at all active_A little lively_A little friendly_A little friendly_Some outgoing_A little outgoing_Not at all thorough_A little hardworker_A little responsible_A little responsible_Some organized_A little organized_Not at all calm_A little calm_Some nervous_A lot worry_A lot moody_A lot −1 0 1 2 −0.5 0.0 0.5 1.0 1.5 2.0 Dim1 (8.1%) Dim2(4.7%) Region 3 bminded_Not at all curious_Not at all intellig_Not at all imagina_Not at all sympath_Not at all softheart_Not at all warm_Not at all helpful_Not at all active_Not at all lively_Not at all friendly_Not at all thorough_Not at all hardworker_Not at all calm_Not at all 1 2 3 4 5 0.5 1.0 1.5 2.0 Dim1 (8.1%) Dim2(4.7%) Region 4 17
  • 19. Conclusions • The hierarchical clustering technique applied to the low dimensional representation of participants, provided by the MCA method, suggested a reasonable separation of the respondent profile characterized by a personality scale. • This can be applied to survey design and sampling procedures. • This can support correlation studies with other physical and mental health indicators. 19
  • 20. Paper Presented and Published 18th IEEE International Conference on Machine Learning and Applications - ICMLA 2019 December 16-19, Boca Raton, Florida, USA 20
  • 21. Part II Electronic Health Record - How to predict Intensive Care Unit (ICU) Length of Stay (LOS) using Machine Learning models? 21
  • 22. Medical Information Mart for Intensive Care - III (MIMIC-III) 22
  • 23. MIMIC-III Overview NB, 15 > 2.1 days 7.76% 380 meas. 11.5% 44.1% 53,423 adm 6.9 days EHR 7,870 38,597  23
  • 24. Beth Israel Deaconess Medical Center CareVue DB MetaVision DB MIMIC-III 24
  • 25. Literature Review Azari et al., 2012: Approached the LOS prediction identifying similar groups. Reached accuracy of 74.3%. Van Houdenhoven et al., 2007: LOS prediction elective esophagectomy with reconstruction for carcinoma, with presence of gastroesophageal reflux disease, and respiratory minute volume transthoracic. R2 of 45%. Clark & Ryan, 2002: Tested with demographics younger than 55 years old reach the highest accuracy of 69%, individuals in the range of 55 and 70 yo reached 13%, and the group older than 70 years old 17%. Gustafson, 1968: Uses five different methodologies for predicting the LOS of inguinal herniotomy patients. Afrin et al., 2019: Predict LOS using three classifications, focused on the age and death outcome of the patients. Accuracy 54.8% (RF and LR). Intensive Care Unit (ICU) Length of Stay (LOS) 25 Wait time for ICU Admission ICU Management Important predictor for Death Rate ICU Cost
  • 26. Data Accessing Data Specimens only Research Training - CITI Program: 1. Belmont Report and Its Principles (ID 1127) 2. History and Ethics of Human Subjects Research (ID 498) 3. Basic Institutional Review Board (IRB) Regulations and 4. Review Process (ID 2) 5. Records-Based Research (ID 5) 6. Genetic Research in Human Populations (ID 6) 7. Populations in Research Requiring Additional Considerations and/or Protections (ID16680) 8. Conflicts of Interest in Human Subjects Research (ID 17464) 26
  • 27. Exploratory Data Analysis 26 CSV Files SQLite CSV to SQLite Conversion 27
  • 28. Data Transformation 28 CSV STAYS CSV PATIENTS 1. ROW_ID 2. SUBJECT_ID 3. GENDER 4. DOB 5. DOD 6. DOD_HOSP 7. DOD_SSN 8. EXPIRE_FLAG 1. ROW_ID 2. SUBJECT_ID 3. HADM_ID 4. ICUSTAY_ID 5. DBSOURCE 6. FIRST_CAREUNIT 7. LAST_CAREUNIT 8. FIRST_WARDID 9. LAST_WARDID 10.INTIME 11.OUTTIME 12.LOS CSV DIAGNOSIS 1. ROW_ID 2. SUBJECT_ID 3. HADM_ID 4. SEQ_NUM 5. ICD9_CODE CSV ADMISSIONS 1. ROW_ID 2. SUBJECT_ID 3. HADM_ID 4. ADMITTIME 5. DISCHTIME 6. DEATHTIME 7. ADMISSION_TYPE 8. ADMISSION_LOCATION 9. DISCHARGE_LOCATION 10.INSURANCE 11.LANGUAGE 12.RELIGION 13.MARITAL_STATUS 14.ETHNICITY 15.EDREGTIME 16.EDOUTTIME 17.DIAGNOSIS 18.HOSPITAL_EXPIRE_FLAG 19.HAS_CHARTEVENTS_DATA
  • 30. Methods Tidymodels framework: • rsample (data sampling) • recipes (data preprocess) • parsnip (machine learning modeling) • yardstick (performance evaluation) • Algorithm families: Decision Trees, Random Forest, Boosted Trees, SVM, and Linear Regression x x x x 30
  • 31. Methods Predictors • Ethnicity • Respiratory diagnosis Subset • ICU: SICU • Admission Type: Urgency Linear Regression • R2 Adj.: 63.75% • RMSE: 9.56 Classifier • Accuracy: 92.7% 31 x x x x
  • 32. Conclusions • LOS prediction is a very specific prediction task, case oriented and is unlikely that one model can generalize for any case. • It was possible to create a specific prediction model for: • Surgical Intensive Care Unit • Admitted from Emergency • Patients with respiratory disease diagnosed • The use of the novel R library tidymodels enables the use of multiple ML libraries, under a unifying collection of packages for modeling and statistical analysis that share the underlying design philosophy, grammar, and data structures of the modern data science tools in the tidyverse. 32
  • 33. Next Steps • Format a paper and submit to machine learning conferences/ journals. • ACM-BCB ’20: 8th ACM International Conference on Bioinformatics, Computational Biology,and Health Informatics • Apply unsupervised learning technics used in the part I to the MIMIC-III dataset. • Create MIMIC-III subsets with Lab Exams for further investigation. 33