Presentation by Prof. Mendel Singer of Case Western Reserve University, on the issue of "big data" in health care and policy research. Presented at the Myers-JDC-Brookdale Institute in Jerusalem.
Call Girls Hebbal Just Call 7001305949 Top Class Call Girl Service Available
Prof Mendel Singer Big Data Meets Public Health and Medicine 2018 12-22
1. Mendel E. Singer, PhD MPH
Associate Professor
Vice Chair for Education
Dept. of Population and Quantitative Health Sciences
Case School of Medicine
Case Western Reserve University
Cleveland, Ohio, USA
2. PhD in Operations Research
• (Applied Math/Stat)
Biostatistics
Health Services Research
Community and Public Health
• Did MPH program
Done mathematical modeling studies and
biostatistics to claims data and EMR to semi-
structured interviews and focus groups
Heavy in education administration,
transitioning to more research
3. It’s a lot easier to use with consumer
behavior
• Tons of data, all usable and well categorized
Manufacturer, cost, color, size, type of goods, etc…
• Data is mostly objective and easily measured and
tracked
• Interest drives purchasing
• Distance from data to profit is short, and easily and
quickly tested
Methods well developed because of
business applications
4. Reportable diseases
• Infectious disease tracking
Surveillance, Epidemiology and End Results (SEER)
Program
• Details about the cancer (e.g. tumor size, location, histology)
• Details about treatment and mortality
• Linked to Medicare claims data
Justice system
Census data – sociodemographics by neighborhood
Death certificates
Emergency department visits for drug overdose
Medical Claims Data (Medicare/Medicaid,
commercial)
5. If the rate of fatal opioid overdose in my
home county was applied to the USA, it
would be the 3rd leading cause of death.
Medical Examiner investigates each
accidental overdose.
• Toxicology – what drugs in system
• Data on injury (e.g. others present), history
(rehabilitation, incarceration, medical reason for
drug) and sociodemographics
Public data – but only for those who died
How do we identify those at risk?
6. State prescription database
• Need identifiers; download patient at a time
State emergency dept. data for overdoses
• Need identifiers; download one patient at a time
• Where do identifiers come from?
Incarceration data
• Access difficult for people alive
Medicare/Medicaid claims data
• Long lag time; access issues to data
Emergency medical services (EMS) data updated every 15
minutes
• Not available for researchers
Toxicology data from medical examiner’s lab (untapped)
• Fatalities, police lab
• Potential for de-identified data
Police records
• Also difficult for people who are alive
7. ClearPath is the merging of electronic
health records from the 3 main
hospital/health systems in Northeast Ohio
• Cleveland Clinic
• University Hospitals
• MetroHealth Medical Center
Nearly all hospitals in Northeast Ohio
(43/45)
All integrated health care delivery systems
• Full service outpatient, urgent care, emergency,
hospital
8. Almost 5 years in ….
Data from one system is almost ready
Many complications in actually getting
the data, process dragging
• Lots of delays
• Many levels of approval (lots of legal folks
involved)
• Turnover at hospitals
• Concern over being compared to other hospitals
• Benefit to hospital not sufficiently clear
9. All hospitalizations for most of the
patients
• And lots of outpatient data
Out of system use
• Not all physicians in one of these systems
• Data on Rx’s written, but not Rx’s filled
Different electronic health record
systems means data not coded
consistently across the health systems
10. Social Media
• Individual postings, blogs, organization page
Internet Searching
• Tracking outbreaks
• Side-effects/complications of treatment
Grocery store purchases
• Effectiveness of programs to increase fruit and vegetable
consumption
E.g. changes in displays, in-store marketing
Prescription produce programs
Wearable electronic devices, e.g. FitBit
Sensors- detect movement and electricity
Continuous data often results in substantial
computing and storage issues
11. Insurance billing has been the driving force
behind computerized medical records in the U.S.
• Ease of filing
• Maximizing reimbursement
• Minimizing time to reimbursement (e.g. minimizing
rejection of claims)
Track utilization of services
Link outpatient, inpatient, prescription drug data
Diagnosis and procedure codes
Reimbursement (proxy for cost)
Data available for purchase
12. Advantages – Know what is being done
• Record of service utilization, including type, location,
reimbursement
• Diagnosis codes – for creating and following cohorts
• Hospital discharge and DRG codes
Disadvantages – Don’t know details, limited
outcomes
• Tests are done – don’t know the result
• No knowledge of physician exam
• Don’t know about symptoms
• Crude proxies for severity (e.g. hospitalization,
multi-drug therapy)
13. Benefits
• Utilization
• Test results, not just those taken
• Tests ordered but not taken
• Private as well as public insurance
• Many hospitals use the same system
E.g. Epic is a very popular system in US Hospitals and
health care systems
• MIMIC database – free to the public
De-identified intensive care unit (ICU) records
50,000 ICU stays over 12 years
Has all chart events, test results
Other data also available – physionet.org
14. Challenges
• Physician notes and text reports – other free text
fields
Fields often avoided – but that’s the largely untapped
potential
• Information on Rx’s written, but not those filled
• Few HMO’s left in US
Kaiser downsized tremendously
HMOs in Israel, but lots of inpatient out-of-system use
• Lack of connectedness – out of system use
Doctors in different health care systems (or private
practice)
15. Difficulty connecting better care to profits
• Value-based care a step in the right direction (fee for
service is bad business)
• Profit drives the software and analysis
• How to document/market better care and outcomes?
Public hasn’t differentiated well between systems
Failure of Report Cards
Each clinical problem needs to be addressed
separately
• Can’t use a single algorithm with unique parameters,
like internet store
• Best data mining will be specific to institution
The insurer is the one with the largest profit motive
Better for HMO where insurer=provider
17. Clinicians wary of computer guys treating their patients
Americans and their docs don’t want anyone else telling
them how to practice
• “You need evidence, but I know” (a doc once told me this)
Fear of being replaced by computers, not just aided by
• Some specialties are at risk of being downsized
• Image recognition very advanced - already being used for cancer
detection
Many data scientists have no concept of clinical impact
• Get excited about incredibly small increase in accuracy
• My not make the effort to understand clinicians
Data scientists tendency to believe you really don’t have
to know anything about the problem
• Data will tell us, without us getting in the way
• IBM Watson Health – algorithm is great, but need specific knowledge
for most applications. Layoffs.
• Algorithms without clinical input works for some things
• Often requires clinician conceptualization of variables
Based on how they think in practice
18.
19. Categorize data
• Effects often aren’t linear
• Easier to interpret and apply
Buckets of patients by severity, risk
• Labels aligned to treatment guidelines
• Consistency across studies
20. Categorize data
• Effects often aren’t linear
• Easier to interpret and apply
Buckets of patients by severity, risk
• Labels aligned to treatment guidelines
• Consistency across studies
Categories arbitrary, then evidence fixes the
categories (nearly forever)
• Data can guide the creation of categories
Continuous data – no loss of information
• Think more in terms of probabilities, not risk group
• Ways of modeling non-linear effects
• Software handles this easily for years
Done more in biological applications
22. Physician note fields largely useless
• Free text – not categorized
• Using it would be subjective
• Not practical – labor intensive
• Wouldn’t be reproducible
• Symptom not present OR not checked
• Wide practice variation in the visit
23. Physician note fields largely useless
• Free text – not categorized
• Using it would be subjective
• Not practical – labor intensive
• Wouldn’t be reproducible
• Symptom not present OR not checked
• Wide practice variation in the visit
Natural Language Processing (NLP)
• Despite above problems, it works great on
physician notes.
• Validates reasonably well across institutions
• Very well developed in English
24. Large number of missing values
makes variable useless
Might be systematically missing
Imputation methods help to a point
25. Large number of missing values
makes variable useless
Might be systematically missing
Imputation methods help to a point
Machine learning can deal with
missing values
It can work in practice
If variable reliably predicts bad
outcomes, who cares if it’s missing for
most?
• E.g. Rovsing’s sign in appendicitis
If present, always appendicitis. But 1/10,000
27. Concern with overfitting models
Need many observations per
variable in the model
• # Observations >> # Variables
Machine learning can work when
• #Variables > # Observations
28. Natural Language Processing for:
• Analyzing open-ended questions
• Analyzing text reports (police, criminal justice,
medical records)
Voice Assistant (Siri, Alexa) for
conducting interviews
• Trained to respond to voice commands
• Conduct semi-structured interviews!
30. MIMIC III – De-Identified ICU EMR
Patients admitted to hospital for kidney
injury/kidney failure
Data at 12 hours after admission to the
intensive care unit (ICU)
4 Clinical variables, age and sex
Outcomes:
• Mortality
• Dialysis
• Ventilator
31. Compared Traditional Logistic Regression
with:
• Logistic with LASSO
• Classification Trees
• Random Forest
Random Forest did much better than the
others – quite well with few measures
• Area Under Curve ~ .9
Most common summary measures:
• Current Values
• # consecutive increases, decreases
Some models used range, mean, std dev.,
change
33. What data sources can work?
How do you identify drug addicts?
How do we know if they have mental
health challenges?
How can you determine if they have
received mental health treatment?
What are appropriate outcomes?
How can you determine their outcomes?
What are the obstacles and limitations?
34. What data sources can work?
How do you identify drug addicts?
• Police/criminal justice, rehabilitation center
How do we know if they have mental
health challenges?
• Electronic medical records, Prescriptions
• Psychiatric hospitals
How can you determine if they have
received mental health treatment?
• Health care claims data
• Electronic medical records, Prescriptions
35. What data sources can work?
What are appropriate outcomes?
• Mental health visits
• Adherence with medication (Rx refills)
• Psychiatric hospitalization
• Suicide attempts
How can you determine their outcomes?
• Electronic medical records
• Rx refills
What are the obstacles and limitations?
• Access to various data sets
Any data, getting identifiers
Linking to HMO data
Funding for study – including EMR data