Big Data in Pediatric Critical Care by Mohit Mehra

BIG DATA IN PEDIATRIC CRITICAL CARE
By Mohit Mehra
Lead Data Engineer, VPICU
1
August 5th, 2017

VPICU Mission
• Formed in 1998
• Assist doctors to make acurate decisions using
advanced computational techniques and AI based
on data
• Find similarities between a patient at bedside and
historical populations with known outcomes
• What kind of decisions:
• Diagnosis (sepsis, ARDS, etc.)
• Severity of illness
• Physiologic trajectory
• Re-admissions
• Treatment recommendations https://youtu.be/f2OqkRhj5uI
THE LAURA P. AND LELAND
K. WHITTIER VIRTUAL PICU
TTIER VIRTUAL PICU
3

But where is pediatric healthcare?
Source: http://www.lorienpratt.com/are-machine-learning-and-big-data-all-about-just-advertising-and-marketing/
5

Brief History of IOTs in ICU
• 1940s – 1990s:
• Several patient monitoring devices/systems
• Creation of EMR with data aggregration
• LAN/Web based connectivity
• HIPAA (Health Insurance Portability and
Accountability Act of 1996)
• Health Level 7 (HL7) standard
• 2000s:
• Clinical information systems
• Philips CareVue, GE Centricity
• RDBMS based data storage
• Sharing of high level information via
proprietary standards
• EHR(Cerner, Epic), Meaningful Use
6

EHR Adoption
7
https://dashboard.healthit.gov/evaluations/data-briefs/non-federal-acute-care-hospital-ehr-adoption-2008-2015.php

Remaining Challenges
• Interoperability:
• Medical Device industry still uses proprietary data formats and protocols
• Some EHR vendors do not adhere to open standards
• Liability:
• HIPAA privacy and security regulations have lead to a protectionist culture
• Hospitals are reluctant to share data to address research needs
• Velocity/Volume of Big Data:
• High frequency waveforms generate petabytes of data
• Storage on traditional platforms is not scalable
• Skills: Healthcare IT have a skills gap when it comes to big data technologies
8

Hadoop/Big Data Journey
• Reason: Single distributed data repository
• Timeline: 3 year old journey with a small cluster
• POC:
• HDP 2.2/Ambari 2.1
• Services: HDFS, Hive, Ranger, Oozie
• Growing Pains:
• HDP/Ambari upgrades
• Deploying custom libraries
• Audit Logs space
• Hardware failure
• User/group multi-tenancy
10

On-Prem Infrastructure
Batch Data
• Monitor Data
• EHR Charted Data
• Labs Data
• Medications Data
• Intubation Data
• Ventilation Data
12

13
Streaming HL7 Data
• Monitor Data
• ADT Data
• Labs Data

14
Oozie Jobs
• Hourly, Daily, Weekly
• Sqoop, REST API, Jars
• Pushes to HDFS
• Pushes to Hive ORC

15
Mirth Engine
• Frequency: 15s - 4h
• TCP, File IO channel
• HL7  JSON
• Queuing capability

16
Nifi/Kafka
• Reads JSON
• Pushes to Kafka queue
• Pushes to HDFS
• Pushes to HBase

18
Spark Data Pipeline
• Data cleansing
• Data munging
• Data tagging
• Data aggreation
• Data de-identification
• Outputs ML dataset

20
Single patient data
Vitals
Labs
Interventions
Drugs
time

21
Data Science Stack
• Architecture: NVIDIA GPUS
• ML Algos: Recurrent Neural Networks
• ML Frameworks: Keras + Theano

How much data is enough?
23
Performance (on same
holdout set) increases
with amount of data
available for training

Conclusions
• Our goal is to save lives by making timely and accurate predictions
• More data and better algorithms can lead to breakthroughs
• Data science/Machine Learning has a great potential in our space
• Our team continues to make progress in using big data technologies
to enable research and development
24

VPICU Research Team
Clinical Researchers
• Randall Wetzel, MD
• Roby Khemani, MD
• Sareen Shah, MD
Computer Scientists - CHLA
• Melissa Aczon, PhD - Senior Data Scientist
• Brett Bailey - Software Engineer
• Alysia Flynn, PhD - Data Ninja
• Alec Gunney - Data Scientist
• Long Van Ho - Data Scientist
• David Ledbetter - Senior Data Scientist
• Mohit Mehra – Lead Data Engineer
• Mike Reilly - Infrastructure
• Paul Vee - Senior Program Manager
25
Human-computer interaction, visualization
Jeff Heer Stanford
Diana Maclean Stanford
Pia Pal Stanford
Machine learning, similarity search
Ben Marlin UMass Amherst
Artificial Intelligence, probabilistic models –
Christian Shelton UC Riverside
Busra Celikkaya UC Riverside
Large-scale statistical analysis
Amy Braverman NASA JPL, UCLA
Data systems and software architecture
Dan Crichton NASA JPL
Chris Mattmann NASA JPL, USC

DS Challenges
• How do we generate features
• Irregular, sparsely sampled time-series measurements (HR vs BP vs
oxygenation vs medications)
• Missing data (should we impute or not, unit conversion)
• Noise in the data (fidelity in human charted vs machine generated)
• Hard NLP problems (Is MAP = mean arterial pressure or mean airway
pressure?, heart rate = hr = pulse)
28

Big Data in Pediatric Critical Care by Mohit Mehra

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Big Data in Pediatric Critical Care by Mohit Mehra

Similar to Big Data in Pediatric Critical Care by Mohit Mehra (20)

More from Data Con LA

More from Data Con LA (20)

Recently uploaded

Recently uploaded (20)

Big Data in Pediatric Critical Care by Mohit Mehra

Editor's Notes