SlideShare a Scribd company logo
1 of 28
BIG DATA IN PEDIATRIC CRITICAL CARE
By Mohit Mehra
Lead Data Engineer, VPICU
1
August 5th, 2017
Data is critical
2
VPICU Mission
• Formed in 1998
• Assist doctors to make acurate decisions using
advanced computational techniques and AI based
on data
• Find similarities between a patient at bedside and
historical populations with known outcomes
• What kind of decisions:
• Diagnosis (sepsis, ARDS, etc.)
• Severity of illness
• Physiologic trajectory
• Re-admissions
• Treatment recommendations https://youtu.be/f2OqkRhj5uI
THE LAURA P. AND LELAND
K. WHITTIER VIRTUAL PICU
TTIER VIRTUAL PICU
3
An evolving landscape
4
But where is pediatric healthcare?
Source: http://www.lorienpratt.com/are-machine-learning-and-big-data-all-about-just-advertising-and-marketing/
5
Brief History of IOTs in ICU
• 1940s – 1990s:
• Several patient monitoring devices/systems
• Creation of EMR with data aggregration
• LAN/Web based connectivity
• HIPAA (Health Insurance Portability and
Accountability Act of 1996)
• Health Level 7 (HL7) standard
• 2000s:
• Clinical information systems
• Philips CareVue, GE Centricity
• RDBMS based data storage
• Sharing of high level information via
proprietary standards
• EHR(Cerner, Epic), Meaningful Use
6
EHR Adoption
7
https://dashboard.healthit.gov/evaluations/data-briefs/non-federal-acute-care-hospital-ehr-adoption-2008-2015.php
Remaining Challenges
• Interoperability:
• Medical Device industry still uses proprietary data formats and protocols
• Some EHR vendors do not adhere to open standards
• Liability:
• HIPAA privacy and security regulations have lead to a protectionist culture
• Hospitals are reluctant to share data to address research needs
• Velocity/Volume of Big Data:
• High frequency waveforms generate petabytes of data
• Storage on traditional platforms is not scalable
• Skills: Healthcare IT have a skills gap when it comes to big data technologies
8
VPICU Data Engineering
9
Hadoop/Big Data Journey
• Reason: Single distributed data repository
• Timeline: 3 year old journey with a small cluster
• POC:
• HDP 2.2/Ambari 2.1
• Services: HDFS, Hive, Ranger, Oozie
• Growing Pains:
• HDP/Ambari upgrades
• Deploying custom libraries
• Audit Logs space
• Hardware failure
• User/group multi-tenancy
10
On-Prem Infrastructure
11
On-Prem Infrastructure
Batch Data
• Monitor Data
• EHR Charted Data
• Labs Data
• Medications Data
• Intubation Data
• Ventilation Data
12
On-Prem Infrastructure
13
Streaming HL7 Data
• Monitor Data
• ADT Data
• Labs Data
On-Prem Infrastructure
14
Oozie Jobs
• Hourly, Daily, Weekly
• Sqoop, REST API, Jars
• Pushes to HDFS
• Pushes to Hive ORC
On-Prem Infrastructure
15
Mirth Engine
• Frequency: 15s - 4h
• TCP, File IO channel
• HL7  JSON
• Queuing capability
On-Prem Infrastructure
16
Nifi/Kafka
• Reads JSON
• Pushes to Kafka queue
• Pushes to HDFS
• Pushes to HBase
On-Prem Infrastructure
17
On-Prem Infrastructure
18
Spark Data Pipeline
• Data cleansing
• Data munging
• Data tagging
• Data aggreation
• Data de-identification
• Outputs ML dataset
On-Prem Infrastructure
19
On-Prem Infrastructure
20
Single patient data
Vitals
Labs
Interventions
Drugs
time
On-Prem Infrastructure
21
Data Science Stack
• Architecture: NVIDIA GPUS
• ML Algos: Recurrent Neural Networks
• ML Frameworks: Keras + Theano
Cloud Infrastructure
22
How much data is enough?
23
Performance (on same
holdout set) increases
with amount of data
available for training
Conclusions
• Our goal is to save lives by making timely and accurate predictions
• More data and better algorithms can lead to breakthroughs
• Data science/Machine Learning has a great potential in our space
• Our team continues to make progress in using big data technologies
to enable research and development
24
VPICU Research Team
Clinical Researchers
• Randall Wetzel, MD
• Roby Khemani, MD
• Sareen Shah, MD
Computer Scientists - CHLA
• Melissa Aczon, PhD - Senior Data Scientist
• Brett Bailey - Software Engineer
• Alysia Flynn, PhD - Data Ninja
• Alec Gunney - Data Scientist
• Long Van Ho - Data Scientist
• David Ledbetter - Senior Data Scientist
• Mohit Mehra – Lead Data Engineer
• Mike Reilly - Infrastructure
• Paul Vee - Senior Program Manager
25
Human-computer interaction, visualization
Jeff Heer Stanford
Diana Maclean Stanford
Pia Pal Stanford
Machine learning, similarity search
Ben Marlin UMass Amherst
Artificial Intelligence, probabilistic models –
Christian Shelton UC Riverside
Busra Celikkaya UC Riverside
Large-scale statistical analysis
Amy Braverman NASA JPL, UCLA
Data systems and software architecture
Dan Crichton NASA JPL
Chris Mattmann NASA JPL, USC
Q&A
26
Backups
27
DS Challenges
• How do we generate features
• Irregular, sparsely sampled time-series measurements (HR vs BP vs
oxygenation vs medications)
• Missing data (should we impute or not, unit conversion)
• Noise in the data (fidelity in human charted vs machine generated)
• Hard NLP problems (Is MAP = mean arterial pressure or mean airway
pressure?, heart rate = hr = pulse)
28

More Related Content

What's hot

FAIR publishing of research outputs in Australia 20180526
FAIR publishing of research outputs in Australia 20180526FAIR publishing of research outputs in Australia 20180526
FAIR publishing of research outputs in Australia 20180526Keith Russell
 
Using Big Data for Improved Healthcare Operations and Analytics
Using Big Data for Improved Healthcare Operations and AnalyticsUsing Big Data for Improved Healthcare Operations and Analytics
Using Big Data for Improved Healthcare Operations and AnalyticsPerficient, Inc.
 
Converged IT and Data Commons
Converged IT and Data CommonsConverged IT and Data Commons
Converged IT and Data CommonsSimon Twigger
 
Use of data in safe havens: ethics and reproducibility issues
Use of data in safe havens: ethics and reproducibility issuesUse of data in safe havens: ethics and reproducibility issues
Use of data in safe havens: ethics and reproducibility issuesLouise Corti
 
NMR Automatic Structure Verficiation
NMR Automatic Structure VerficiationNMR Automatic Structure Verficiation
NMR Automatic Structure VerficiationPistoia Alliance
 
Workflow-Driven Geoinformatics Applications and Training in the Big Data Era
Workflow-Driven Geoinformatics Applications and Training in the Big Data EraWorkflow-Driven Geoinformatics Applications and Training in the Big Data Era
Workflow-Driven Geoinformatics Applications and Training in the Big Data EraIlkay Altintas, Ph.D.
 
Managing and sharing confidential data in Australian social science
Managing and sharing confidential data	in Australian social scienceManaging and sharing confidential data	in Australian social science
Managing and sharing confidential data in Australian social scienceARDC
 
FAIR for the future: embracing all things data
FAIR for the future: embracing all things dataFAIR for the future: embracing all things data
FAIR for the future: embracing all things dataARDC
 
Re tooling for data management-support
Re tooling for data management-supportRe tooling for data management-support
Re tooling for data management-supportSherry Lake
 
Big Data in Healthcare and Medical Devices
Big Data in Healthcare and Medical DevicesBig Data in Healthcare and Medical Devices
Big Data in Healthcare and Medical DevicesPremNarayanan6
 
The Data Management Ecosystem
The Data Management EcosystemThe Data Management Ecosystem
The Data Management EcosystemJohn Kunze
 
Why quality control and quality assurance is important for the legacy of GEOT...
Why quality control and quality assurance is important for the legacy of GEOT...Why quality control and quality assurance is important for the legacy of GEOT...
Why quality control and quality assurance is important for the legacy of GEOT...Adam Leadbetter
 
OpenID Foundation Workshop at EIC 2018 - HEART Working Group Update
OpenID Foundation Workshop at EIC 2018 - HEART Working Group UpdateOpenID Foundation Workshop at EIC 2018 - HEART Working Group Update
OpenID Foundation Workshop at EIC 2018 - HEART Working Group UpdateMikeLeszcz
 
Digital Transformation: Big Data and Data Science Learning Path
Digital Transformation: Big Data and Data Science Learning PathDigital Transformation: Big Data and Data Science Learning Path
Digital Transformation: Big Data and Data Science Learning PathChulalongkorn University
 
Exploring New Methods for Protecting and Distributing Confidential Research ...
Exploring New Methods for Protecting and Distributing Confidential Research ...Exploring New Methods for Protecting and Distributing Confidential Research ...
Exploring New Methods for Protecting and Distributing Confidential Research ...Bryan Beecher
 
Data management profiles workshop
Data management profiles workshopData management profiles workshop
Data management profiles workshoplindahauck
 
Big data big rewards meeting 3
Big data big rewards meeting 3Big data big rewards meeting 3
Big data big rewards meeting 3szarinammd
 
Modern Healthcare Information Technology
Modern Healthcare Information TechnologyModern Healthcare Information Technology
Modern Healthcare Information TechnologyJeffrey Paulette
 

What's hot (20)

FAIR publishing of research outputs in Australia 20180526
FAIR publishing of research outputs in Australia 20180526FAIR publishing of research outputs in Australia 20180526
FAIR publishing of research outputs in Australia 20180526
 
Using Big Data for Improved Healthcare Operations and Analytics
Using Big Data for Improved Healthcare Operations and AnalyticsUsing Big Data for Improved Healthcare Operations and Analytics
Using Big Data for Improved Healthcare Operations and Analytics
 
Converged IT and Data Commons
Converged IT and Data CommonsConverged IT and Data Commons
Converged IT and Data Commons
 
Use of data in safe havens: ethics and reproducibility issues
Use of data in safe havens: ethics and reproducibility issuesUse of data in safe havens: ethics and reproducibility issues
Use of data in safe havens: ethics and reproducibility issues
 
NMR Automatic Structure Verficiation
NMR Automatic Structure VerficiationNMR Automatic Structure Verficiation
NMR Automatic Structure Verficiation
 
Workflow-Driven Geoinformatics Applications and Training in the Big Data Era
Workflow-Driven Geoinformatics Applications and Training in the Big Data EraWorkflow-Driven Geoinformatics Applications and Training in the Big Data Era
Workflow-Driven Geoinformatics Applications and Training in the Big Data Era
 
Managing and sharing confidential data in Australian social science
Managing and sharing confidential data	in Australian social scienceManaging and sharing confidential data	in Australian social science
Managing and sharing confidential data in Australian social science
 
FAIR for the future: embracing all things data
FAIR for the future: embracing all things dataFAIR for the future: embracing all things data
FAIR for the future: embracing all things data
 
Re tooling for data management-support
Re tooling for data management-supportRe tooling for data management-support
Re tooling for data management-support
 
Big Data in Healthcare and Medical Devices
Big Data in Healthcare and Medical DevicesBig Data in Healthcare and Medical Devices
Big Data in Healthcare and Medical Devices
 
The Data Management Ecosystem
The Data Management EcosystemThe Data Management Ecosystem
The Data Management Ecosystem
 
Why quality control and quality assurance is important for the legacy of GEOT...
Why quality control and quality assurance is important for the legacy of GEOT...Why quality control and quality assurance is important for the legacy of GEOT...
Why quality control and quality assurance is important for the legacy of GEOT...
 
Epidemic info 3 min
Epidemic info 3 minEpidemic info 3 min
Epidemic info 3 min
 
OpenID Foundation Workshop at EIC 2018 - HEART Working Group Update
OpenID Foundation Workshop at EIC 2018 - HEART Working Group UpdateOpenID Foundation Workshop at EIC 2018 - HEART Working Group Update
OpenID Foundation Workshop at EIC 2018 - HEART Working Group Update
 
Digital Transformation: Big Data and Data Science Learning Path
Digital Transformation: Big Data and Data Science Learning PathDigital Transformation: Big Data and Data Science Learning Path
Digital Transformation: Big Data and Data Science Learning Path
 
Henderson "Institutional Identifiers"
Henderson "Institutional Identifiers"Henderson "Institutional Identifiers"
Henderson "Institutional Identifiers"
 
Exploring New Methods for Protecting and Distributing Confidential Research ...
Exploring New Methods for Protecting and Distributing Confidential Research ...Exploring New Methods for Protecting and Distributing Confidential Research ...
Exploring New Methods for Protecting and Distributing Confidential Research ...
 
Data management profiles workshop
Data management profiles workshopData management profiles workshop
Data management profiles workshop
 
Big data big rewards meeting 3
Big data big rewards meeting 3Big data big rewards meeting 3
Big data big rewards meeting 3
 
Modern Healthcare Information Technology
Modern Healthcare Information TechnologyModern Healthcare Information Technology
Modern Healthcare Information Technology
 

Similar to Big Data in Pediatric Critical Care by Mohit Mehra

Big Data at Geisinger Health System: Big Wins in a Short Time
Big Data at Geisinger Health System: Big Wins in a Short TimeBig Data at Geisinger Health System: Big Wins in a Short Time
Big Data at Geisinger Health System: Big Wins in a Short TimeDataWorks Summit
 
Starting the Hadoop Journey at a Global Leader in Cancer Research
Starting the Hadoop Journey at a Global Leader in Cancer ResearchStarting the Hadoop Journey at a Global Leader in Cancer Research
Starting the Hadoop Journey at a Global Leader in Cancer ResearchDataWorks Summit/Hadoop Summit
 
Data Harmonization for a Molecularly Driven Health System
Data Harmonization for a Molecularly Driven Health SystemData Harmonization for a Molecularly Driven Health System
Data Harmonization for a Molecularly Driven Health SystemWarren Kibbe
 
Data Harmonization for a Molecularly Driven Health System
Data Harmonization for a Molecularly Driven Health SystemData Harmonization for a Molecularly Driven Health System
Data Harmonization for a Molecularly Driven Health SystemWarren Kibbe
 
Meeting Federal Research Requirements for Data Management Plans, Public Acces...
Meeting Federal Research Requirements for Data Management Plans, Public Acces...Meeting Federal Research Requirements for Data Management Plans, Public Acces...
Meeting Federal Research Requirements for Data Management Plans, Public Acces...ICPSR
 
UCSF Informatics Day 2014 - Doug Berman, "A Brief Tour of UCSF’s Clinical Dat...
UCSF Informatics Day 2014 - Doug Berman, "A Brief Tour of UCSF’s Clinical Dat...UCSF Informatics Day 2014 - Doug Berman, "A Brief Tour of UCSF’s Clinical Dat...
UCSF Informatics Day 2014 - Doug Berman, "A Brief Tour of UCSF’s Clinical Dat...CTSI at UCSF
 
UCSF Informatics Day 2014 - Sorena Nadaf, "Translational Informatics OnCore C...
UCSF Informatics Day 2014 - Sorena Nadaf, "Translational Informatics OnCore C...UCSF Informatics Day 2014 - Sorena Nadaf, "Translational Informatics OnCore C...
UCSF Informatics Day 2014 - Sorena Nadaf, "Translational Informatics OnCore C...CTSI at UCSF
 
PRISM Project Update
PRISM Project UpdatePRISM Project Update
PRISM Project Updateimgcommcall
 
Medidata AMUG Meeting / Presentation 2013
Medidata AMUG Meeting / Presentation 2013Medidata AMUG Meeting / Presentation 2013
Medidata AMUG Meeting / Presentation 2013Brock Heinz
 
Big data's impact on healthcare
Big data's impact on healthcareBig data's impact on healthcare
Big data's impact on healthcareRené Kuipers
 
The MedRed Ontology for Representing Clinical Data Acquisition Metadata
The MedRed Ontology for Representing Clinical Data Acquisition MetadataThe MedRed Ontology for Representing Clinical Data Acquisition Metadata
The MedRed Ontology for Representing Clinical Data Acquisition MetadataJean-Paul Calbimonte
 
Elmallah june27 11am_room230_a
Elmallah june27 11am_room230_aElmallah june27 11am_room230_a
Elmallah june27 11am_room230_aDataWorks Summit
 

Similar to Big Data in Pediatric Critical Care by Mohit Mehra (20)

Hadoop Enabled Healthcare
Hadoop Enabled HealthcareHadoop Enabled Healthcare
Hadoop Enabled Healthcare
 
Big Data at Geisinger Health System: Big Wins in a Short Time
Big Data at Geisinger Health System: Big Wins in a Short TimeBig Data at Geisinger Health System: Big Wins in a Short Time
Big Data at Geisinger Health System: Big Wins in a Short Time
 
Starting the Hadoop Journey at a Global Leader in Cancer Research
Starting the Hadoop Journey at a Global Leader in Cancer ResearchStarting the Hadoop Journey at a Global Leader in Cancer Research
Starting the Hadoop Journey at a Global Leader in Cancer Research
 
Data Harmonization for a Molecularly Driven Health System
Data Harmonization for a Molecularly Driven Health SystemData Harmonization for a Molecularly Driven Health System
Data Harmonization for a Molecularly Driven Health System
 
Providing support for JC Bradleys vision of open science using RSC cheminform...
Providing support for JC Bradleys vision of open science using RSC cheminform...Providing support for JC Bradleys vision of open science using RSC cheminform...
Providing support for JC Bradleys vision of open science using RSC cheminform...
 
How to Architect Smarter Systems for Healthcare
How to Architect Smarter Systems for HealthcareHow to Architect Smarter Systems for Healthcare
How to Architect Smarter Systems for Healthcare
 
Medical image analysis, retrieval and evaluation infrastructures
Medical image analysis, retrieval and evaluation infrastructuresMedical image analysis, retrieval and evaluation infrastructures
Medical image analysis, retrieval and evaluation infrastructures
 
Data Harmonization for a Molecularly Driven Health System
Data Harmonization for a Molecularly Driven Health SystemData Harmonization for a Molecularly Driven Health System
Data Harmonization for a Molecularly Driven Health System
 
Medical image analysis and big data evaluation infrastructures
Medical image analysis and big data evaluation infrastructuresMedical image analysis and big data evaluation infrastructures
Medical image analysis and big data evaluation infrastructures
 
Meeting Federal Research Requirements for Data Management Plans, Public Acces...
Meeting Federal Research Requirements for Data Management Plans, Public Acces...Meeting Federal Research Requirements for Data Management Plans, Public Acces...
Meeting Federal Research Requirements for Data Management Plans, Public Acces...
 
UCSF Informatics Day 2014 - Doug Berman, "A Brief Tour of UCSF’s Clinical Dat...
UCSF Informatics Day 2014 - Doug Berman, "A Brief Tour of UCSF’s Clinical Dat...UCSF Informatics Day 2014 - Doug Berman, "A Brief Tour of UCSF’s Clinical Dat...
UCSF Informatics Day 2014 - Doug Berman, "A Brief Tour of UCSF’s Clinical Dat...
 
Big data analystics
Big data analysticsBig data analystics
Big data analystics
 
UCSF Informatics Day 2014 - Sorena Nadaf, "Translational Informatics OnCore C...
UCSF Informatics Day 2014 - Sorena Nadaf, "Translational Informatics OnCore C...UCSF Informatics Day 2014 - Sorena Nadaf, "Translational Informatics OnCore C...
UCSF Informatics Day 2014 - Sorena Nadaf, "Translational Informatics OnCore C...
 
PRISM Project Update
PRISM Project UpdatePRISM Project Update
PRISM Project Update
 
Future is Now
Future is NowFuture is Now
Future is Now
 
Dr. Eliot Siegel: Watson and Deep QA Software in Pursuit of Personalized Medi...
Dr. Eliot Siegel: Watson and Deep QA Software in Pursuit of Personalized Medi...Dr. Eliot Siegel: Watson and Deep QA Software in Pursuit of Personalized Medi...
Dr. Eliot Siegel: Watson and Deep QA Software in Pursuit of Personalized Medi...
 
Medidata AMUG Meeting / Presentation 2013
Medidata AMUG Meeting / Presentation 2013Medidata AMUG Meeting / Presentation 2013
Medidata AMUG Meeting / Presentation 2013
 
Big data's impact on healthcare
Big data's impact on healthcareBig data's impact on healthcare
Big data's impact on healthcare
 
The MedRed Ontology for Representing Clinical Data Acquisition Metadata
The MedRed Ontology for Representing Clinical Data Acquisition MetadataThe MedRed Ontology for Representing Clinical Data Acquisition Metadata
The MedRed Ontology for Representing Clinical Data Acquisition Metadata
 
Elmallah june27 11am_room230_a
Elmallah june27 11am_room230_aElmallah june27 11am_room230_a
Elmallah june27 11am_room230_a
 

More from Data Con LA

Data Con LA 2022 Keynotes
Data Con LA 2022 KeynotesData Con LA 2022 Keynotes
Data Con LA 2022 KeynotesData Con LA
 
Data Con LA 2022 Keynotes
Data Con LA 2022 KeynotesData Con LA 2022 Keynotes
Data Con LA 2022 KeynotesData Con LA
 
Data Con LA 2022 Keynote
Data Con LA 2022 KeynoteData Con LA 2022 Keynote
Data Con LA 2022 KeynoteData Con LA
 
Data Con LA 2022 - Startup Showcase
Data Con LA 2022 - Startup ShowcaseData Con LA 2022 - Startup Showcase
Data Con LA 2022 - Startup ShowcaseData Con LA
 
Data Con LA 2022 Keynote
Data Con LA 2022 KeynoteData Con LA 2022 Keynote
Data Con LA 2022 KeynoteData Con LA
 
Data Con LA 2022 - Using Google trends data to build product recommendations
Data Con LA 2022 - Using Google trends data to build product recommendationsData Con LA 2022 - Using Google trends data to build product recommendations
Data Con LA 2022 - Using Google trends data to build product recommendationsData Con LA
 
Data Con LA 2022 - AI Ethics
Data Con LA 2022 - AI EthicsData Con LA 2022 - AI Ethics
Data Con LA 2022 - AI EthicsData Con LA
 
Data Con LA 2022 - Improving disaster response with machine learning
Data Con LA 2022 - Improving disaster response with machine learningData Con LA 2022 - Improving disaster response with machine learning
Data Con LA 2022 - Improving disaster response with machine learningData Con LA
 
Data Con LA 2022 - What's new with MongoDB 6.0 and Atlas
Data Con LA 2022 - What's new with MongoDB 6.0 and AtlasData Con LA 2022 - What's new with MongoDB 6.0 and Atlas
Data Con LA 2022 - What's new with MongoDB 6.0 and AtlasData Con LA
 
Data Con LA 2022 - Real world consumer segmentation
Data Con LA 2022 - Real world consumer segmentationData Con LA 2022 - Real world consumer segmentation
Data Con LA 2022 - Real world consumer segmentationData Con LA
 
Data Con LA 2022 - Modernizing Analytics & AI for today's needs: Intuit Turbo...
Data Con LA 2022 - Modernizing Analytics & AI for today's needs: Intuit Turbo...Data Con LA 2022 - Modernizing Analytics & AI for today's needs: Intuit Turbo...
Data Con LA 2022 - Modernizing Analytics & AI for today's needs: Intuit Turbo...Data Con LA
 
Data Con LA 2022 - Moving Data at Scale to AWS
Data Con LA 2022 - Moving Data at Scale to AWSData Con LA 2022 - Moving Data at Scale to AWS
Data Con LA 2022 - Moving Data at Scale to AWSData Con LA
 
Data Con LA 2022 - Collaborative Data Exploration using Conversational AI
Data Con LA 2022 - Collaborative Data Exploration using Conversational AIData Con LA 2022 - Collaborative Data Exploration using Conversational AI
Data Con LA 2022 - Collaborative Data Exploration using Conversational AIData Con LA
 
Data Con LA 2022 - Why Database Modernization Makes Your Data Decisions More ...
Data Con LA 2022 - Why Database Modernization Makes Your Data Decisions More ...Data Con LA 2022 - Why Database Modernization Makes Your Data Decisions More ...
Data Con LA 2022 - Why Database Modernization Makes Your Data Decisions More ...Data Con LA
 
Data Con LA 2022 - Intro to Data Science
Data Con LA 2022 - Intro to Data ScienceData Con LA 2022 - Intro to Data Science
Data Con LA 2022 - Intro to Data ScienceData Con LA
 
Data Con LA 2022 - How are NFTs and DeFi Changing Entertainment
Data Con LA 2022 - How are NFTs and DeFi Changing EntertainmentData Con LA 2022 - How are NFTs and DeFi Changing Entertainment
Data Con LA 2022 - How are NFTs and DeFi Changing EntertainmentData Con LA
 
Data Con LA 2022 - Why Data Quality vigilance requires an End-to-End, Automat...
Data Con LA 2022 - Why Data Quality vigilance requires an End-to-End, Automat...Data Con LA 2022 - Why Data Quality vigilance requires an End-to-End, Automat...
Data Con LA 2022 - Why Data Quality vigilance requires an End-to-End, Automat...Data Con LA
 
Data Con LA 2022-Perfect Viral Ad prediction of Superbowl 2022 using Tease, T...
Data Con LA 2022-Perfect Viral Ad prediction of Superbowl 2022 using Tease, T...Data Con LA 2022-Perfect Viral Ad prediction of Superbowl 2022 using Tease, T...
Data Con LA 2022-Perfect Viral Ad prediction of Superbowl 2022 using Tease, T...Data Con LA
 
Data Con LA 2022- Embedding medical journeys with machine learning to improve...
Data Con LA 2022- Embedding medical journeys with machine learning to improve...Data Con LA 2022- Embedding medical journeys with machine learning to improve...
Data Con LA 2022- Embedding medical journeys with machine learning to improve...Data Con LA
 
Data Con LA 2022 - Data Streaming with Kafka
Data Con LA 2022 - Data Streaming with KafkaData Con LA 2022 - Data Streaming with Kafka
Data Con LA 2022 - Data Streaming with KafkaData Con LA
 

More from Data Con LA (20)

Data Con LA 2022 Keynotes
Data Con LA 2022 KeynotesData Con LA 2022 Keynotes
Data Con LA 2022 Keynotes
 
Data Con LA 2022 Keynotes
Data Con LA 2022 KeynotesData Con LA 2022 Keynotes
Data Con LA 2022 Keynotes
 
Data Con LA 2022 Keynote
Data Con LA 2022 KeynoteData Con LA 2022 Keynote
Data Con LA 2022 Keynote
 
Data Con LA 2022 - Startup Showcase
Data Con LA 2022 - Startup ShowcaseData Con LA 2022 - Startup Showcase
Data Con LA 2022 - Startup Showcase
 
Data Con LA 2022 Keynote
Data Con LA 2022 KeynoteData Con LA 2022 Keynote
Data Con LA 2022 Keynote
 
Data Con LA 2022 - Using Google trends data to build product recommendations
Data Con LA 2022 - Using Google trends data to build product recommendationsData Con LA 2022 - Using Google trends data to build product recommendations
Data Con LA 2022 - Using Google trends data to build product recommendations
 
Data Con LA 2022 - AI Ethics
Data Con LA 2022 - AI EthicsData Con LA 2022 - AI Ethics
Data Con LA 2022 - AI Ethics
 
Data Con LA 2022 - Improving disaster response with machine learning
Data Con LA 2022 - Improving disaster response with machine learningData Con LA 2022 - Improving disaster response with machine learning
Data Con LA 2022 - Improving disaster response with machine learning
 
Data Con LA 2022 - What's new with MongoDB 6.0 and Atlas
Data Con LA 2022 - What's new with MongoDB 6.0 and AtlasData Con LA 2022 - What's new with MongoDB 6.0 and Atlas
Data Con LA 2022 - What's new with MongoDB 6.0 and Atlas
 
Data Con LA 2022 - Real world consumer segmentation
Data Con LA 2022 - Real world consumer segmentationData Con LA 2022 - Real world consumer segmentation
Data Con LA 2022 - Real world consumer segmentation
 
Data Con LA 2022 - Modernizing Analytics & AI for today's needs: Intuit Turbo...
Data Con LA 2022 - Modernizing Analytics & AI for today's needs: Intuit Turbo...Data Con LA 2022 - Modernizing Analytics & AI for today's needs: Intuit Turbo...
Data Con LA 2022 - Modernizing Analytics & AI for today's needs: Intuit Turbo...
 
Data Con LA 2022 - Moving Data at Scale to AWS
Data Con LA 2022 - Moving Data at Scale to AWSData Con LA 2022 - Moving Data at Scale to AWS
Data Con LA 2022 - Moving Data at Scale to AWS
 
Data Con LA 2022 - Collaborative Data Exploration using Conversational AI
Data Con LA 2022 - Collaborative Data Exploration using Conversational AIData Con LA 2022 - Collaborative Data Exploration using Conversational AI
Data Con LA 2022 - Collaborative Data Exploration using Conversational AI
 
Data Con LA 2022 - Why Database Modernization Makes Your Data Decisions More ...
Data Con LA 2022 - Why Database Modernization Makes Your Data Decisions More ...Data Con LA 2022 - Why Database Modernization Makes Your Data Decisions More ...
Data Con LA 2022 - Why Database Modernization Makes Your Data Decisions More ...
 
Data Con LA 2022 - Intro to Data Science
Data Con LA 2022 - Intro to Data ScienceData Con LA 2022 - Intro to Data Science
Data Con LA 2022 - Intro to Data Science
 
Data Con LA 2022 - How are NFTs and DeFi Changing Entertainment
Data Con LA 2022 - How are NFTs and DeFi Changing EntertainmentData Con LA 2022 - How are NFTs and DeFi Changing Entertainment
Data Con LA 2022 - How are NFTs and DeFi Changing Entertainment
 
Data Con LA 2022 - Why Data Quality vigilance requires an End-to-End, Automat...
Data Con LA 2022 - Why Data Quality vigilance requires an End-to-End, Automat...Data Con LA 2022 - Why Data Quality vigilance requires an End-to-End, Automat...
Data Con LA 2022 - Why Data Quality vigilance requires an End-to-End, Automat...
 
Data Con LA 2022-Perfect Viral Ad prediction of Superbowl 2022 using Tease, T...
Data Con LA 2022-Perfect Viral Ad prediction of Superbowl 2022 using Tease, T...Data Con LA 2022-Perfect Viral Ad prediction of Superbowl 2022 using Tease, T...
Data Con LA 2022-Perfect Viral Ad prediction of Superbowl 2022 using Tease, T...
 
Data Con LA 2022- Embedding medical journeys with machine learning to improve...
Data Con LA 2022- Embedding medical journeys with machine learning to improve...Data Con LA 2022- Embedding medical journeys with machine learning to improve...
Data Con LA 2022- Embedding medical journeys with machine learning to improve...
 
Data Con LA 2022 - Data Streaming with Kafka
Data Con LA 2022 - Data Streaming with KafkaData Con LA 2022 - Data Streaming with Kafka
Data Con LA 2022 - Data Streaming with Kafka
 

Recently uploaded

Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processorsdebabhi2
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...Zilliz
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyKhushali Kathiriya
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century educationjfdjdjcjdnsjd
 
MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsNanddeep Nachan
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...apidays
 
CNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In PakistanCNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In Pakistandanishmna97
 
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Victor Rentea
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...apidays
 
Exploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusExploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusZilliz
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FMESafe Software
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAndrey Devyatkin
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businesspanagenda
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoffsammart93
 
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...Angeliki Cooney
 
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfRising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfOrbitshub
 
Ransomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdfRansomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdfOverkill Security
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingEdi Saputra
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProduct Anonymous
 

Recently uploaded (20)

Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectors
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
CNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In PakistanCNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In Pakistan
 
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 
Exploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusExploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with Milvus
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
 
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfRising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
 
Ransomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdfRansomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdf
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 

Big Data in Pediatric Critical Care by Mohit Mehra

  • 1. BIG DATA IN PEDIATRIC CRITICAL CARE By Mohit Mehra Lead Data Engineer, VPICU 1 August 5th, 2017
  • 3. VPICU Mission • Formed in 1998 • Assist doctors to make acurate decisions using advanced computational techniques and AI based on data • Find similarities between a patient at bedside and historical populations with known outcomes • What kind of decisions: • Diagnosis (sepsis, ARDS, etc.) • Severity of illness • Physiologic trajectory • Re-admissions • Treatment recommendations https://youtu.be/f2OqkRhj5uI THE LAURA P. AND LELAND K. WHITTIER VIRTUAL PICU TTIER VIRTUAL PICU 3
  • 5. But where is pediatric healthcare? Source: http://www.lorienpratt.com/are-machine-learning-and-big-data-all-about-just-advertising-and-marketing/ 5
  • 6. Brief History of IOTs in ICU • 1940s – 1990s: • Several patient monitoring devices/systems • Creation of EMR with data aggregration • LAN/Web based connectivity • HIPAA (Health Insurance Portability and Accountability Act of 1996) • Health Level 7 (HL7) standard • 2000s: • Clinical information systems • Philips CareVue, GE Centricity • RDBMS based data storage • Sharing of high level information via proprietary standards • EHR(Cerner, Epic), Meaningful Use 6
  • 8. Remaining Challenges • Interoperability: • Medical Device industry still uses proprietary data formats and protocols • Some EHR vendors do not adhere to open standards • Liability: • HIPAA privacy and security regulations have lead to a protectionist culture • Hospitals are reluctant to share data to address research needs • Velocity/Volume of Big Data: • High frequency waveforms generate petabytes of data • Storage on traditional platforms is not scalable • Skills: Healthcare IT have a skills gap when it comes to big data technologies 8
  • 10. Hadoop/Big Data Journey • Reason: Single distributed data repository • Timeline: 3 year old journey with a small cluster • POC: • HDP 2.2/Ambari 2.1 • Services: HDFS, Hive, Ranger, Oozie • Growing Pains: • HDP/Ambari upgrades • Deploying custom libraries • Audit Logs space • Hardware failure • User/group multi-tenancy 10
  • 12. On-Prem Infrastructure Batch Data • Monitor Data • EHR Charted Data • Labs Data • Medications Data • Intubation Data • Ventilation Data 12
  • 13. On-Prem Infrastructure 13 Streaming HL7 Data • Monitor Data • ADT Data • Labs Data
  • 14. On-Prem Infrastructure 14 Oozie Jobs • Hourly, Daily, Weekly • Sqoop, REST API, Jars • Pushes to HDFS • Pushes to Hive ORC
  • 15. On-Prem Infrastructure 15 Mirth Engine • Frequency: 15s - 4h • TCP, File IO channel • HL7  JSON • Queuing capability
  • 16. On-Prem Infrastructure 16 Nifi/Kafka • Reads JSON • Pushes to Kafka queue • Pushes to HDFS • Pushes to HBase
  • 18. On-Prem Infrastructure 18 Spark Data Pipeline • Data cleansing • Data munging • Data tagging • Data aggreation • Data de-identification • Outputs ML dataset
  • 20. On-Prem Infrastructure 20 Single patient data Vitals Labs Interventions Drugs time
  • 21. On-Prem Infrastructure 21 Data Science Stack • Architecture: NVIDIA GPUS • ML Algos: Recurrent Neural Networks • ML Frameworks: Keras + Theano
  • 23. How much data is enough? 23 Performance (on same holdout set) increases with amount of data available for training
  • 24. Conclusions • Our goal is to save lives by making timely and accurate predictions • More data and better algorithms can lead to breakthroughs • Data science/Machine Learning has a great potential in our space • Our team continues to make progress in using big data technologies to enable research and development 24
  • 25. VPICU Research Team Clinical Researchers • Randall Wetzel, MD • Roby Khemani, MD • Sareen Shah, MD Computer Scientists - CHLA • Melissa Aczon, PhD - Senior Data Scientist • Brett Bailey - Software Engineer • Alysia Flynn, PhD - Data Ninja • Alec Gunney - Data Scientist • Long Van Ho - Data Scientist • David Ledbetter - Senior Data Scientist • Mohit Mehra – Lead Data Engineer • Mike Reilly - Infrastructure • Paul Vee - Senior Program Manager 25 Human-computer interaction, visualization Jeff Heer Stanford Diana Maclean Stanford Pia Pal Stanford Machine learning, similarity search Ben Marlin UMass Amherst Artificial Intelligence, probabilistic models – Christian Shelton UC Riverside Busra Celikkaya UC Riverside Large-scale statistical analysis Amy Braverman NASA JPL, UCLA Data systems and software architecture Dan Crichton NASA JPL Chris Mattmann NASA JPL, USC
  • 28. DS Challenges • How do we generate features • Irregular, sparsely sampled time-series measurements (HR vs BP vs oxygenation vs medications) • Missing data (should we impute or not, unit conversion) • Noise in the data (fidelity in human charted vs machine generated) • Hard NLP problems (Is MAP = mean arterial pressure or mean airway pressure?, heart rate = hr = pulse) 28

Editor's Notes

  1. Our mission is to assist doctors and clinicians in making acurate decisions based on DATA
  2. - Lots of big data companies
  3. But they are focussed on cusomer analytics etc and not on saving lives Why is pediatric healthcare not on this map
  4. -1940 – 90s: lots of specialized medical devices - 2000s: much larger EMRs storing lots of data
  5. -2008 – 2015 has seen a steady growth in the adoption rate
  6. BUT there are still challenges remaining
  7. How are we addressing some of the challenges Hadoop plays a big part in data engineering
  8. 1. We chose hadoop because of the promise of creating single distributed data repository 2. 3 year old journey with a small cluster resulting in a data lake to house different sources
  9. Current Hadoop Stack: HDP 2.5/Ambari 2.2 2 Name nodes/4 Data nodes 2 Edge nodes (MySQL, other)
  10. Zeppelin, pyspark integration Adding libraries such as matplatlib is pretty easy
  11. Goal is to make a Machine Learning ingestable dataset. This dataset is de-identified “long format” where all variables are stacked at each time interval
  12. Since putting PHI is a challenge, we use AWS cluster primarily on de-identified, anonymized data Using Hortonworks HDC to run spark jobs to ingest and transform de-identified/annymized data
  13. - This is a really important slide as far as we are concerned. - 50% training + 25% validation + 25% test