Learning from MIMIC
Leo Anthony Celi MD MS MPH
Institute for Medical Engineering and Science, MIT
Beth Israel Deaconess Medical Center,
Harvard Medical School
Disclosure
• No conflict of interest for this presentation
• Laboratory of Computational Physiology receives
research funding from
– Philips
– SAP
– Amazon
– Microsoft
– AthenaHealth
• Sana received funding from
– Vodafone
Crowdsourcing Knowledge Discovery
Medical Information Mart for Intensive Care
Lesson No. 1
Causality can never be established by
an observational study.
• Climate change is partly brought about by
human activities.
• Smoking causes cancer and heart disease.
• Vaccination does not lead to autism.
Significant correlation (Spearman coefficient =
0.75) between the treatment effects in RCTs vs.
observational studies across 45 diverse topics in
general internal medicine
No significant difference in effect estimates
between RCTs and observational studies
regardless of the observational study design or
heterogeneity.
Study matched 18 unique propensity score studies in
the ICU setting with at least one RCT evaluating the
same clinical question, and found a high degree of
agreement between their estimates of relative risk and
effect size.
There are way too many questions – it is impossible to
perform an RCT for each and every one of them.
Food for Thought
• Is it a matter of re-framing the question?
• Rather than “Is treatment A better than
treatment B or no treatment for disease X?”,
we ask “Can we predict who will respond to
treatment A?”.
• Rather than “Will X harm or benefit Y?”, look
at patients who are similar to X and compare
those who got X with those who did not.
Lesson No. 2
A single-centre database is not
particularly useful.
Lesson No. 3
A movement calling for data sharing and
continuous and transparent peer review has
accompanied the rise of big data, but at the
moment, these remain a pipe dream.
“Why should I share MY data?”
is sometimes guised as concerns
for patient privacy and security.
Lesson No. 4
Unreliable research, not cancer or AIDS,
is the biggest problem
that our generation needs to fix,
but most clinicians seem unfazed by the issue.
MIMIC: More than Just Open Data
• Software that captures the communication
within research teams and documents the
learning which can be shared once the paper
is published
• Platform to support sharing of queries, codes,
patient cohorts, etc. to allow ease of
replication or modification of study design by
other research groups
Lesson No. 4
There are lots of new approaches in the field
of causal inference but traditional
methodologists seem quick to dismiss them.
Lesson No. 5
Given the cost of RCTs, our colleagues in
low- and middle-income countries are
“stuck” with applying research
performed in high-income countries.
Lesson No. 6
A divide persists between
data scientists/engineers
and clinicians.
• Preponderance of papers showcasing
algorithms developed on isolated benchmark
datasets
• Evaluation metrics (e.g. AUC, RMSE) tell us
nothing about the impact of different
performance in the real world
• Results not publicized to relevant community
Take Home Points
• The value of large amounts of data hinges on
the ability of researchers to share data,
methodologies, and findings in an open
setting.
• If empirical value is to be had from the
analysis of retrospective data, a more
continuous peer-review must be created by
groups working together on similar problems.
Datathon Model
A new research model that brings together required
experts from different fields in a venue that espouses
constructive collaboration, group learning, error
checking, and methodological review during the initial
design and subsequent phases of research.
What, after all, has maintained the
human race on this old globe despite all
the calamities of nature and all the
tragic failings of mankind, if not the faith
in new possibilities and the courage to
advocate them.

Big Data: Learning from MIMIC- Celi

  • 1.
    Learning from MIMIC LeoAnthony Celi MD MS MPH Institute for Medical Engineering and Science, MIT Beth Israel Deaconess Medical Center, Harvard Medical School
  • 2.
    Disclosure • No conflictof interest for this presentation • Laboratory of Computational Physiology receives research funding from – Philips – SAP – Amazon – Microsoft – AthenaHealth • Sana received funding from – Vodafone
  • 3.
    Crowdsourcing Knowledge Discovery MedicalInformation Mart for Intensive Care
  • 4.
    Lesson No. 1 Causalitycan never be established by an observational study.
  • 5.
    • Climate changeis partly brought about by human activities. • Smoking causes cancer and heart disease. • Vaccination does not lead to autism.
  • 6.
    Significant correlation (Spearmancoefficient = 0.75) between the treatment effects in RCTs vs. observational studies across 45 diverse topics in general internal medicine
  • 7.
    No significant differencein effect estimates between RCTs and observational studies regardless of the observational study design or heterogeneity.
  • 8.
    Study matched 18unique propensity score studies in the ICU setting with at least one RCT evaluating the same clinical question, and found a high degree of agreement between their estimates of relative risk and effect size.
  • 9.
    There are waytoo many questions – it is impossible to perform an RCT for each and every one of them.
  • 10.
    Food for Thought •Is it a matter of re-framing the question? • Rather than “Is treatment A better than treatment B or no treatment for disease X?”, we ask “Can we predict who will respond to treatment A?”. • Rather than “Will X harm or benefit Y?”, look at patients who are similar to X and compare those who got X with those who did not.
  • 11.
    Lesson No. 2 Asingle-centre database is not particularly useful.
  • 15.
    Lesson No. 3 Amovement calling for data sharing and continuous and transparent peer review has accompanied the rise of big data, but at the moment, these remain a pipe dream.
  • 16.
    “Why should Ishare MY data?” is sometimes guised as concerns for patient privacy and security.
  • 17.
    Lesson No. 4 Unreliableresearch, not cancer or AIDS, is the biggest problem that our generation needs to fix, but most clinicians seem unfazed by the issue.
  • 18.
    MIMIC: More thanJust Open Data • Software that captures the communication within research teams and documents the learning which can be shared once the paper is published • Platform to support sharing of queries, codes, patient cohorts, etc. to allow ease of replication or modification of study design by other research groups
  • 22.
    Lesson No. 4 Thereare lots of new approaches in the field of causal inference but traditional methodologists seem quick to dismiss them.
  • 24.
    Lesson No. 5 Giventhe cost of RCTs, our colleagues in low- and middle-income countries are “stuck” with applying research performed in high-income countries.
  • 26.
    Lesson No. 6 Adivide persists between data scientists/engineers and clinicians.
  • 27.
    • Preponderance ofpapers showcasing algorithms developed on isolated benchmark datasets • Evaluation metrics (e.g. AUC, RMSE) tell us nothing about the impact of different performance in the real world • Results not publicized to relevant community
  • 30.
    Take Home Points •The value of large amounts of data hinges on the ability of researchers to share data, methodologies, and findings in an open setting. • If empirical value is to be had from the analysis of retrospective data, a more continuous peer-review must be created by groups working together on similar problems.
  • 31.
    Datathon Model A newresearch model that brings together required experts from different fields in a venue that espouses constructive collaboration, group learning, error checking, and methodological review during the initial design and subsequent phases of research.
  • 32.
    What, after all,has maintained the human race on this old globe despite all the calamities of nature and all the tragic failings of mankind, if not the faith in new possibilities and the courage to advocate them.