Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
1
Predicting Patient Outcomes in Real-Time at HCA
Presentation by Allison Baker and Cody Hall
Hospital Corporation of Amer...
2CONFIDENTIAL - Contains proprietary information. Not intended for external distribution.
• Introduction to HCA
• Introduc...
3CONFIDENTIAL - Contains proprietary information. Not intended for external distribution.
“Above all else, we are committe...
4CONFIDENTIAL - Contains proprietary information. Not intended for external distribution.
Where We Are
5CONFIDENTIAL - Contains proprietary information. Not intended for external distribution.
Data Science and Data Products T...
6CONFIDENTIAL - Contains proprietary information. Not intended for external distribution.
CRISP-DM and Data Science
7CONFIDENTIAL - Contains proprietary information. Not intended for external distribution.
• Begin by asking stakeholders a...
8CONFIDENTIAL - Contains proprietary information. Not intended for external distribution.
• Run preliminary visualization
...
9CONFIDENTIAL - Contains proprietary information. Not intended for external distribution.
• Analytic server
– 64 cores
– 4...
10CONFIDENTIAL - Contains proprietary information. Not intended for external distribution.
• Consider
– Re-defining the pr...
11CONFIDENTIAL - Contains proprietary information. Not intended for external distribution.
• We can effectively engineer t...
12CONFIDENTIAL - Contains proprietary information. Not intended for external distribution.
Delivering Value to the Business
13CONFIDENTIAL - Contains proprietary information. Not intended for external distribution.
Near Real-Time Tool
• Consists ...
14CONFIDENTIAL - Contains proprietary information. Not intended for external distribution.
Data Sources are Constantly Cha...
15CONFIDENTIAL - Contains proprietary information. Not intended for external distribution.
Prediction Product
Facility + T...
16CONFIDENTIAL - Contains proprietary information. Not intended for external distribution.
Real-Time Infrastructure
• Cont...
17CONFIDENTIAL - Contains proprietary information. Not intended for external distribution.
18CONFIDENTIAL - Contains proprietary information. Not intended for external distribution.
A Proof of Concept Use Case and...
19CONFIDENTIAL - Contains proprietary information. Not intended for external distribution.
Summary
20CONFIDENTIAL - Contains proprietary information. Not intended for external distribution.
Questions
Upcoming SlideShare
Loading in …5
×

Predicting Patient Outcomes in Real-Time at HCA

1,345 views

Published on

Data Scientist Allison Baker and Development Manager of Data Products Cody Hall work with a talented team of data scientists, software engineers, and web developers, and are building the framework and infrastructure to support a real-time prediction application, with the ability to scale across the entire company. Paramount to these efforts has been the capability of integrating the architecture for software production with the predictive models generated by H2O. This talk will review the processes by which HCA is building a pipeline to predict patient outcomes in real-time, heavily relying on H2O’s POJO scoring API and implemented in Clojure data processing. #h2ony

- Powered by the open source machine learning software H2O.ai. Contributors welcome at: https://github.com/h2oai
- To view videos on H2O open source machine learning software, go to: https://www.youtube.com/user/0xdata

Published in: Data & Analytics
  • Be the first to comment

  • Be the first to like this

Predicting Patient Outcomes in Real-Time at HCA

  1. 1. 1 Predicting Patient Outcomes in Real-Time at HCA Presentation by Allison Baker and Cody Hall Hospital Corporation of America Department of Data and Analytics, Clinical Services Group July 20, 2016
  2. 2. 2CONFIDENTIAL - Contains proprietary information. Not intended for external distribution. • Introduction to HCA • Introduction to our team • Data science pipeline • Near real-time architecture • Real-time architecture • Current POC goals Overview
  3. 3. 3CONFIDENTIAL - Contains proprietary information. Not intended for external distribution. “Above all else, we are committed to the care and improvement of human life. In recognition of this commitment, we strive to deliver high-quality, cost-effective healthcare in the communities we serve.” – HCA Mission Statement • Hospital Corporation of America (HCA) is the leading healthcare provider in the country – 169 hospitals – 116 freestanding surgery centers in 20 states and the U.K. • Approximately 233,000 employees across the company • Over 26 million patient encounters each year • More than 8 million emergency room visits each year • About 2 million inpatients treated annually Hospital Corporation of America
  4. 4. 4CONFIDENTIAL - Contains proprietary information. Not intended for external distribution. Where We Are
  5. 5. 5CONFIDENTIAL - Contains proprietary information. Not intended for external distribution. Data Science and Data Products Teams Dr. Martin Tobias Data Scientist Sandeepkumar Kothiwale Data Scientist Allison Baker Data Scientist Dr. Nan Chen Data Scientist Kunal Marwah Data Scientist Gerardo Castro Data Scientist Chris Cate Data Scientist Igor Ges Data Product Engineer Josh Wolter BI Developer Dr. Jesse Spencer-Smith Director of Data Science Dr. Edmund Jackson Chief Data Scientist VP of Data and Analytics Warren Sadler Data Product Engineer Cody Hall Development Manager of Data Products Nick Selleh Application Engineer
  6. 6. 6CONFIDENTIAL - Contains proprietary information. Not intended for external distribution. CRISP-DM and Data Science
  7. 7. 7CONFIDENTIAL - Contains proprietary information. Not intended for external distribution. • Begin by asking stakeholders and business owners “What business decisions will be made with the analysis results?” • Document all project and product features, timelines and code using GitHub • Source historical data using Teradata SQL • Log all data sourcing and data extract steps using DRAKE • Options – Continuous integration – Jenkins to monitor DRAKE builds Problem Definition and Data Sourcing
  8. 8. 8CONFIDENTIAL - Contains proprietary information. Not intended for external distribution. • Run preliminary visualization • QA data testing for coverage, outliers, abnormalities, format and structural issues, frequency, duplication and accuracy • Pre-process data – Balance outcomes – Filter patients – Remove non-data • Engineer features Data Manipulation
  9. 9. 9CONFIDENTIAL - Contains proprietary information. Not intended for external distribution. • Analytic server – 64 cores – 4 Terabytes of hard disk – 1.5 Terabytes of RAM • Iterate models • Evaluate statistics Modeling
  10. 10. 10CONFIDENTIAL - Contains proprietary information. Not intended for external distribution. • Consider – Re-defining the problem – Additional modeling – Additional data sourcing • Discuss results with clinical owners and business stakeholders – Consider additional features Interpretation and Reporting
  11. 11. 11CONFIDENTIAL - Contains proprietary information. Not intended for external distribution. • We can effectively engineer thousands of clinically and statistically relevant features. • We can successfully build accurate, complex and sophisticated predictive models. • How do we take these models to the patient bedside? What Now?
  12. 12. 12CONFIDENTIAL - Contains proprietary information. Not intended for external distribution. Delivering Value to the Business
  13. 13. 13CONFIDENTIAL - Contains proprietary information. Not intended for external distribution. Near Real-Time Tool • Consists of 3 main components – Data source (different than historical training source) – Scoring engine – User interface • Shows early value using a minimally viable product-based approach • Phases POC to include development time for real-time architecture • Updates in 15 minute batches • Provides near real-time predictions • Solicits feedback from facilities, focusing on accuracy and usefulness
  14. 14. 14CONFIDENTIAL - Contains proprietary information. Not intended for external distribution. Data Sources are Constantly Changing
  15. 15. 15CONFIDENTIAL - Contains proprietary information. Not intended for external distribution. Prediction Product Facility + Team Patient Kafka Topic OpenGate MS SQL PostgreSQL Analytic Store HDFS Cluster Predictive Model • Single POJO .jar • Clojure (FE library) ETL • Independent SQL process HDFS Cluster Data Source • 15 minute batches • SQL defined Data Source • Streaming • HL7QL defined • GitHub & Nexus • Jenkins • Tableau Supporting Infrastructure • PostgreSQL administration & monitoring • Docker with Node JS (UI) User Interface (UI) • Displays measures + events • Notifications of predictions • Prompt for acknowledgement or dismissal • On acknowledgement, disable notifications for 12 hours Measures + Events: Vitals Lab results Orders Demographics Surgery times Nursing documentations Prediction Measures + EventsHL-7 Measures + Events & PredictionHL-7 Measures + Events HL7QL (Spark) Kafka Topic EDN Predictive Model + ETL • Clojure (FE library)/Spark job • PowderKeg Measures + Events Data Persistence Near Real-Time System Real-Time System
  16. 16. 16CONFIDENTIAL - Contains proprietary information. Not intended for external distribution. Real-Time Infrastructure • Continuously consumes HL7 messages from a Kafka topic and parses via Spark and HL7QL • Processes (producers) publish messages to Kafka topics (categories) and subscriptions are made to the topics to process the message feeds (consumers) • Apache Spark is the application interface to allow for cloud computing • HL7 Query Language (HL7QL) parses the messages • Scores (predicts) on new streaming information – Runs a .jar file via a Spark process compiled from Clojure code and H2O POJO • Deploys with Docker – Container-based application architecture • Continuously monitors with Jenkins
  17. 17. 17CONFIDENTIAL - Contains proprietary information. Not intended for external distribution.
  18. 18. 18CONFIDENTIAL - Contains proprietary information. Not intended for external distribution. A Proof of Concept Use Case and Goals Primary: 1. Assess clinical workflow to identify how the model can support the current clinical processes for treating negative patient outcomes 2. Determine the model’s capability to extract meaningful information from existing and available patient data and identify patterns that predict the outcome 3. Determine the usefulness of an early prediction model within a clinical workflow Secondary: 1. Improve the prediction model through incorporation of feedback provided by the clinical team 2. Maximize the utility of the prediction tool to improve a clinical workflow for the facility staff
  19. 19. 19CONFIDENTIAL - Contains proprietary information. Not intended for external distribution. Summary
  20. 20. 20CONFIDENTIAL - Contains proprietary information. Not intended for external distribution. Questions

×