Uploaded byDataWorks Summit

858 views

Real-Time Clinical Analytics

This document discusses real-time clinical analytics at Mercy, a large Catholic health system. It describes how Mercy is using Hadoop to process real-time data streams and merge them with batch data to enable near real-time updates and faster analytics. This allows them to reuse existing SQL skills and data models while gaining the benefits of real-time data. Potential use cases mentioned include free-text search on lab results, inventory archiving, medical documentation improvement, and EMR auditing.

Real-Time Clinical Analytics
Paul Boal & Adam Doyle

Paul Boal
Director – Data Management
Adam Doyle
Lead Developer

Mercy is the 5th largest Catholic health system in the U.S.
serving in 140 communities over a multi-state footprint through several touch
points including outreach ministries and virtual care.
35 Acute Care Hospitals
700 Clinic and Outpatient Facilities
2,100 Mercy Clinic Physicians
4,231 Acute Licensed Beds
40,000 Co-workers
22,486 Births
158,768 Acute Inpatient Discharges
150,595 Surgeries (In/Outpatient)
650,702 Emergency Visits
8,361,683 Outpatient Visits
$4.48 billion Operating Revenue

We do Hadoop

Why?

What’s wrong with you?
Well…processing…processing…

Architecture

Electronic Medical Record
Traditional
Reporting
Database
Real-Time Data Stream
Nightly Batch Copy
bit of magic
more magic
no special magic here…

RDBMS Sync Utility
CLARITY,oracle,etluser,
MDW,oracle,dwuser,abchd
EPSI,sqlserver,finuser,
EPSI
MDW
CLARITY
patient,pat_id,2015-01-
pat_enc,pat_enc_csn,201
order_med,order_med_id,

TRANSLATION
CONVERSION
TRANSMISSION
Storm Magic

Merging Batch and Real-Time
CREATE EXTERNAL TABLE
...
STORED BY ‘Hbase’
...
CREATE TABLE
...
ORCFILE
...
CREATE VIEW
COALESCE(rt, batch)
FROM rt FULL OUTER JOIN
batch ON...

What it Does for Us

Looks like the same reporting database…
smells a lot better!
• Use existing skills (SQL)
• Reuse data model expertise
• Large batch jobs run faster
• Large analytics run faster
• Near real-time updates

Use Cases & Future

Free text search on lab results
Inventory data archival
Medical documentation improvement
EMR audit trail archival and reporting
o Text extraction and analytics
o Device data
o Real-time alerting
o New Integrated Patient Database

Questions?

Recommended

PPTX

Hadoop in Healthcare Systems

byDataWorks Summit/Hadoop Summit

PPTX

Hadoop Enabled Healthcare

byDataWorks Summit

PPTX

BIG Data & Hadoop Applications in Healthcare

PPTX

Big-Data in HealthCare _ Overview

byHamdaoui Younes

PDF

The Path to Wellness through Big Data

PPTX

Big data's impact on healthcare

byRené Kuipers

PPTX

Health care and big data with hadoop – Beacuse prevention is better than cure

PDF

Baptist Health: Solving Healthcare Problems with Big Data

byMapR Technologies

PDF

Big Data Analytics for Healthcare Decision Support- Operational and Clinical

byAdrish Sannyasi

PPTX

Big data and the Healthcare Sector

PDF

Big data in healthcare

PDF

Building safety-critical medical device platforms and Meaningful Use EHR gate...

PPTX

Big data analystics

byAll India Institute of Medical Sciences

PPTX

Healthcare and Big Data - May 2017

bypaul young cpa, cga

PPTX

Demand connected medical devices to improve military EHRs

PPTX

HP & Sogeti Healthcare Big Data Presentation for Discover 2015

PPTX

4 Big Data Challenges In Healthcare

PPTX

Big Data in HealthCare

byScott Hettesheimer

PDF

Hadoop and Data Virtualization - A Case Study by VHA

PPTX

Reasons Why Health Data is Poorly Integrated Today and What We Can Do About It

PDF

Revenue opportunities in the management of healthcare data deluge

PPTX

The biggest opportunities in digital health for Turkey's Medical Sector

PDF

Med Device Vendors Have Big Opportunities in Health IT Software, Services, an...

PPTX

Big Data in Healthcare Made Simple: Where It Stands Today and Where It’s Going

byHealth Catalyst

PDF

CHC Briefing: OSEHRA is a great business opportunity for healthcare IT ISVs a...

PDF

Architecting, designing and building medical devices in an outcomes focused B...

PDF

User Experience - How Sensors and Big Data will change your Healthcare experi...

PDF

OSEHRA and VistA Platform Overview

PPTX

Accelerating Real-Time Analytics Insights Through Hadoop Open Source Ecosystem

byDataWorks Summit

PPTX

Data Science: Driving Smarter Finance and Workforce Decsions for the Enterprise

byDataWorks Summit

More Related Content

PPTX

Hadoop in Healthcare Systems

byDataWorks Summit/Hadoop Summit

PPTX

Hadoop Enabled Healthcare

byDataWorks Summit

PPTX

BIG Data & Hadoop Applications in Healthcare

PPTX

Big-Data in HealthCare _ Overview

byHamdaoui Younes

PDF

The Path to Wellness through Big Data

PPTX

Big data's impact on healthcare

byRené Kuipers

PPTX

Health care and big data with hadoop – Beacuse prevention is better than cure

PDF

Baptist Health: Solving Healthcare Problems with Big Data

byMapR Technologies

Hadoop in Healthcare Systems

byDataWorks Summit/Hadoop Summit

Hadoop Enabled Healthcare

byDataWorks Summit

BIG Data & Hadoop Applications in Healthcare

Big-Data in HealthCare _ Overview

byHamdaoui Younes

The Path to Wellness through Big Data

Big data's impact on healthcare

byRené Kuipers

Health care and big data with hadoop – Beacuse prevention is better than cure

Baptist Health: Solving Healthcare Problems with Big Data

byMapR Technologies

What's hot

PDF

Big Data Analytics for Healthcare Decision Support- Operational and Clinical

byAdrish Sannyasi

PPTX

Big data and the Healthcare Sector

PDF

Big data in healthcare

PDF

Building safety-critical medical device platforms and Meaningful Use EHR gate...

PPTX

Big data analystics

byAll India Institute of Medical Sciences

PPTX

Healthcare and Big Data - May 2017

bypaul young cpa, cga

PPTX

Demand connected medical devices to improve military EHRs

PPTX

HP & Sogeti Healthcare Big Data Presentation for Discover 2015

PPTX

4 Big Data Challenges In Healthcare

PPTX

Big Data in HealthCare

byScott Hettesheimer

PDF

Hadoop and Data Virtualization - A Case Study by VHA

PPTX

Reasons Why Health Data is Poorly Integrated Today and What We Can Do About It

PDF

Revenue opportunities in the management of healthcare data deluge

PPTX

The biggest opportunities in digital health for Turkey's Medical Sector

PDF

Med Device Vendors Have Big Opportunities in Health IT Software, Services, an...

PPTX

Big Data in Healthcare Made Simple: Where It Stands Today and Where It’s Going

byHealth Catalyst

PDF

CHC Briefing: OSEHRA is a great business opportunity for healthcare IT ISVs a...

PDF

Architecting, designing and building medical devices in an outcomes focused B...

PDF

User Experience - How Sensors and Big Data will change your Healthcare experi...

PDF

OSEHRA and VistA Platform Overview

Big Data Analytics for Healthcare Decision Support- Operational and Clinical

byAdrish Sannyasi

Big data and the Healthcare Sector

Big data in healthcare

Building safety-critical medical device platforms and Meaningful Use EHR gate...

Big data analystics

byAll India Institute of Medical Sciences

Healthcare and Big Data - May 2017

bypaul young cpa, cga

Demand connected medical devices to improve military EHRs

HP & Sogeti Healthcare Big Data Presentation for Discover 2015

4 Big Data Challenges In Healthcare

Big Data in HealthCare

byScott Hettesheimer

Hadoop and Data Virtualization - A Case Study by VHA

Reasons Why Health Data is Poorly Integrated Today and What We Can Do About It

Revenue opportunities in the management of healthcare data deluge

The biggest opportunities in digital health for Turkey's Medical Sector

Med Device Vendors Have Big Opportunities in Health IT Software, Services, an...

Big Data in Healthcare Made Simple: Where It Stands Today and Where It’s Going

byHealth Catalyst

CHC Briefing: OSEHRA is a great business opportunity for healthcare IT ISVs a...

Architecting, designing and building medical devices in an outcomes focused B...

User Experience - How Sensors and Big Data will change your Healthcare experi...

OSEHRA and VistA Platform Overview

Viewers also liked

PPTX

Accelerating Real-Time Analytics Insights Through Hadoop Open Source Ecosystem

byDataWorks Summit

PPTX

Data Science: Driving Smarter Finance and Workforce Decsions for the Enterprise

byDataWorks Summit

PPTX

Internet of Things Crash Course Workshop at Hadoop Summit

byDataWorks Summit

PPTX

Spark crash course workshop at Hadoop Summit

byDataWorks Summit

PPTX

Big Data Platform Processes Daily Healthcare Data for Clinic Use at Mayo Clinic

byDataWorks Summit

PPTX

Hadoop crash course workshop at Hadoop Summit

byDataWorks Summit

PPTX

Evolution of Big Data at Intel - Crawl, Walk and Run Approach

byDataWorks Summit

PPTX

Millions of Regions in HBase: Size Matters

byDataWorks Summit

PPTX

Analyse des médias étrangers CNN vs CCTV

PDF

Leveraging Oracle's Clinical Development Analytics to Boost Productivity and ...

PPT

IBM : Gouvernance de l\'Information - Principes & Mise en oeuvre

byNicolas Desachy

PDF

Intelligent Video Surveillance with Cloud Computing

byIEEE International Conference on Intelligent Information Hiding and Multimedia Signal Processing

PDF

Plug & Play: Benefits of Out-of-the-Box Clinical Development Analytics (CDA) ...

PPT

New trends in video analytics and surveillance systems for the mining industry

bySchneider Electric

PPTX

Clinical Analytics

PDF

An Introduction to Video Analytics

PPTX

What's new in Ambari

byDataWorks Summit

PPTX

Video Analytics on Hadoop webinar victor fang-201309

PPTX

Preparing for the Future: How one ACO is Using Analytics to Drive Clinical & ...

byHealth Catalyst

PDF

Data Analysis with Hadoop and Hive, ChicagoDB 2/21/2011

byJonathan Seidman

Accelerating Real-Time Analytics Insights Through Hadoop Open Source Ecosystem

byDataWorks Summit

Data Science: Driving Smarter Finance and Workforce Decsions for the Enterprise

byDataWorks Summit

Internet of Things Crash Course Workshop at Hadoop Summit

byDataWorks Summit

Spark crash course workshop at Hadoop Summit

byDataWorks Summit

Big Data Platform Processes Daily Healthcare Data for Clinic Use at Mayo Clinic

byDataWorks Summit

Hadoop crash course workshop at Hadoop Summit

byDataWorks Summit

Evolution of Big Data at Intel - Crawl, Walk and Run Approach

byDataWorks Summit

Millions of Regions in HBase: Size Matters

byDataWorks Summit

Analyse des médias étrangers CNN vs CCTV

Leveraging Oracle's Clinical Development Analytics to Boost Productivity and ...

IBM : Gouvernance de l\'Information - Principes & Mise en oeuvre

byNicolas Desachy

Intelligent Video Surveillance with Cloud Computing

byIEEE International Conference on Intelligent Information Hiding and Multimedia Signal Processing

Plug & Play: Benefits of Out-of-the-Box Clinical Development Analytics (CDA) ...

New trends in video analytics and surveillance systems for the mining industry

bySchneider Electric

Clinical Analytics

An Introduction to Video Analytics

What's new in Ambari

byDataWorks Summit

Video Analytics on Hadoop webinar victor fang-201309

Preparing for the Future: How one ACO is Using Analytics to Drive Clinical & ...

byHealth Catalyst

Data Analysis with Hadoop and Hive, ChicagoDB 2/21/2011

byJonathan Seidman

Similar to Real-Time Clinical Analytics

PPTX

Batch and Real-time EHR updates into Hadoop - StampedeCon 2015

PPT

Oracle enterprise manager cloud control 12c

bysolarisyougood

PPTX

Current Status of Healthcare Analytics

PPTX

Benefits of Implementing Clinic and Hospital Management Software – DocPulse H...

PPT

Oracle Enterprise Manager Cloud Control 12c - Top 10 Features for DBAs

byLeighton Nelson

PDF

Citrix Customer Story: How Saint Francis Hospital & Medical Center Deploys A ...

PDF

InterSystems UK Symposium 2012 Corporate Overview

PPTX

Oracle Cerner powerpoint presentation xd

byWayneGretzky1

PDF

AWS Health Tech Day • SourceFuse

PDF

UK HealthCare IT Transformation HIMSS 2013

byTim Tarnowski

PPTX

Russia symposium 2012 corp overview draft 2

PPTX

Information Technology Department Of Fortis, Mohali

PDF

Tackle healthcare interoperability challenges and improve transitions of care v3

byPerficient, Inc.

PDF

Corepoint_White Paper_0815-3

PDF

Open app challenge phase 1 submission team recommind

byKathleenAller

PPT

Population Health Colloquium 2015: Mini Summit IV: Who is Your Champion of Cl...

byPerficient, Inc.

PDF

How a healthcare management system (hms) is improving hospitals and clinics

PPTX

Cloud Disrupting Healthcare

PPSX

Attune Hospital Information System

byAttune Technologies

PDF

HIMSS Oregon Spring Conference - HIE

Batch and Real-time EHR updates into Hadoop - StampedeCon 2015

Oracle enterprise manager cloud control 12c

bysolarisyougood

Current Status of Healthcare Analytics

Benefits of Implementing Clinic and Hospital Management Software – DocPulse H...

Oracle Enterprise Manager Cloud Control 12c - Top 10 Features for DBAs

byLeighton Nelson

Citrix Customer Story: How Saint Francis Hospital & Medical Center Deploys A ...

InterSystems UK Symposium 2012 Corporate Overview

Oracle Cerner powerpoint presentation xd

byWayneGretzky1

AWS Health Tech Day • SourceFuse

UK HealthCare IT Transformation HIMSS 2013

byTim Tarnowski

Russia symposium 2012 corp overview draft 2

Information Technology Department Of Fortis, Mohali

Tackle healthcare interoperability challenges and improve transitions of care v3

byPerficient, Inc.

Corepoint_White Paper_0815-3

Open app challenge phase 1 submission team recommind

byKathleenAller

Population Health Colloquium 2015: Mini Summit IV: Who is Your Champion of Cl...

byPerficient, Inc.

How a healthcare management system (hms) is improving hospitals and clinics

Cloud Disrupting Healthcare

Attune Hospital Information System

byAttune Technologies

HIMSS Oregon Spring Conference - HIE

More from DataWorks Summit

PPTX

Data Science Crash Course

byDataWorks Summit

PPTX

Floating on a RAFT: HBase Durability with Apache Ratis

byDataWorks Summit

PPTX

Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi

byDataWorks Summit

PDF

HBase Tales From the Trenches - Short stories about most common HBase operati...

byDataWorks Summit

PPTX

Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...

byDataWorks Summit

PPTX

Managing the Dewey Decimal System

byDataWorks Summit

PPTX

Practical NoSQL: Accumulo's dirlist Example

byDataWorks Summit

PPTX

HBase Global Indexing to support large-scale data ingestion at Uber

byDataWorks Summit

PPTX

Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix

byDataWorks Summit

PPTX

Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi

byDataWorks Summit

PPTX

Supporting Apache HBase : Troubleshooting and Supportability Improvements

byDataWorks Summit

PPTX

Security Framework for Multitenant Architecture

byDataWorks Summit

PDF

Presto: Optimizing Performance of SQL-on-Anything Engine

byDataWorks Summit

PPTX

Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...

byDataWorks Summit

PPTX

Extending Twitter's Data Platform to Google Cloud

byDataWorks Summit

PPTX

Event-Driven Messaging and Actions using Apache Flink and Apache NiFi

byDataWorks Summit

PPTX

Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger

byDataWorks Summit

PPTX

Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...

byDataWorks Summit

PDF

Computer Vision: Coming to a Store Near You

byDataWorks Summit

PPTX

Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark

byDataWorks Summit

Data Science Crash Course

byDataWorks Summit

Floating on a RAFT: HBase Durability with Apache Ratis

byDataWorks Summit

Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi

byDataWorks Summit

HBase Tales From the Trenches - Short stories about most common HBase operati...

byDataWorks Summit

Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...

byDataWorks Summit

Managing the Dewey Decimal System

byDataWorks Summit

Practical NoSQL: Accumulo's dirlist Example

byDataWorks Summit

HBase Global Indexing to support large-scale data ingestion at Uber

byDataWorks Summit

Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix

byDataWorks Summit

Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi

byDataWorks Summit

Supporting Apache HBase : Troubleshooting and Supportability Improvements

byDataWorks Summit

Security Framework for Multitenant Architecture

byDataWorks Summit

Presto: Optimizing Performance of SQL-on-Anything Engine

byDataWorks Summit

Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...

byDataWorks Summit

Extending Twitter's Data Platform to Google Cloud

byDataWorks Summit

Event-Driven Messaging and Actions using Apache Flink and Apache NiFi

byDataWorks Summit

Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger

byDataWorks Summit

Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...

byDataWorks Summit

Computer Vision: Coming to a Store Near You

byDataWorks Summit

Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark

byDataWorks Summit

Recently uploaded

PDF

Self-Correction Failure Diagnostic: Detecting Drift in Complex Systems

bySystems Research Group

PDF

GenerationAI Paris 2025 | How Agentic AI is Reinventing Organizations

PDF

Mesh WiFi Router: The Smart Solution for Fast, Seamless, Whole-Home Internet

byAeroMesh Systems

PDF

TravelTech Paris 2025 | The EU Digital Identity Wallet and it’s impact on Tra...

PDF

Writing GPU-Ready AI Models in Pure Java with Babylon

byAna-Maria Mihalceanu

PDF

Traditional-Security-Models-No-Longer-Work.pptx (1).pdf

byOhhproJunction

PDF

Making Search Less Taxing: Leveraging Semantics and Keywords in Hybrid Search

byEnterprise Knowledge

PDF

Configure and Manage Systemd Timers- RHCSA (RH134).pdf

byLinuxCert Guru

PDF

Mount File Systems using UUID and Label - RHCSA (RH134).pdf

byLinuxCert Guru

PDF

UiPath Automation Developer Associate Training Series 2026 - Session 2

PPTX

Retrieval Augmented Generation- The Synergistic Power of Prompt Engineering

bySemantic SEO BD

PPTX

Tech Trends 2026: AI Agents, Quantum Computing, Robotics & Cybersecurity

byridwansassman

PPTX

apidays Paris 2025 | Integration is Feminist: Building Peace in Distributed S...

PDF

Rustici Software: eLearning standards in the age of AI

byRustici Software

PPTX

Do You Control the AI, or Does the AI Control You?

bymedhateltoukhy

PDF

Post Quantum Cryptography for Dummies.pdf

byAde Ismail Isnan

PDF

final.pdf

byomarbishtawi04

PDF

Advanced SELinux Management - RHCSA (RH134).pdf

byLinuxCert Guru

PDF

Analyze and Preserve Logs - RHCSA (RH134).pdf

byLinuxCert Guru

PDF

UiPath Automation Developer Associate Training Series 2026 - Session 3

Self-Correction Failure Diagnostic: Detecting Drift in Complex Systems

bySystems Research Group

GenerationAI Paris 2025 | How Agentic AI is Reinventing Organizations

Mesh WiFi Router: The Smart Solution for Fast, Seamless, Whole-Home Internet

byAeroMesh Systems

TravelTech Paris 2025 | The EU Digital Identity Wallet and it’s impact on Tra...

Writing GPU-Ready AI Models in Pure Java with Babylon

byAna-Maria Mihalceanu

Traditional-Security-Models-No-Longer-Work.pptx (1).pdf

byOhhproJunction

Making Search Less Taxing: Leveraging Semantics and Keywords in Hybrid Search

byEnterprise Knowledge

Configure and Manage Systemd Timers- RHCSA (RH134).pdf

byLinuxCert Guru

Mount File Systems using UUID and Label - RHCSA (RH134).pdf

byLinuxCert Guru

UiPath Automation Developer Associate Training Series 2026 - Session 2

Retrieval Augmented Generation- The Synergistic Power of Prompt Engineering

bySemantic SEO BD

Tech Trends 2026: AI Agents, Quantum Computing, Robotics & Cybersecurity

byridwansassman

apidays Paris 2025 | Integration is Feminist: Building Peace in Distributed S...

Rustici Software: eLearning standards in the age of AI

byRustici Software

Do You Control the AI, or Does the AI Control You?

bymedhateltoukhy

Post Quantum Cryptography for Dummies.pdf

byAde Ismail Isnan

final.pdf

byomarbishtawi04

Advanced SELinux Management - RHCSA (RH134).pdf

byLinuxCert Guru

Analyze and Preserve Logs - RHCSA (RH134).pdf

byLinuxCert Guru

UiPath Automation Developer Associate Training Series 2026 - Session 3

Real-Time Clinical Analytics

1.
Real-Time Clinical Analytics PaulBoal & Adam Doyle
2.
Paul Boal Director –Data Management Adam Doyle Lead Developer
4.
Mercy is the5th largest Catholic health system in the U.S. serving in 140 communities over a multi-state footprint through several touch points including outreach ministries and virtual care. 35 Acute Care Hospitals 700 Clinic and Outpatient Facilities 2,100 Mercy Clinic Physicians 4,231 Acute Licensed Beds 40,000 Co-workers 22,486 Births 158,768 Acute Inpatient Discharges 150,595 Surgeries (In/Outpatient) 650,702 Emergency Visits 8,361,683 Outpatient Visits $4.48 billion Operating Revenue
5.
We do Hadoop
6.
Why?
7.
What’s wrong withyou? Well…processing…processing…
8.
Architecture
9.
Electronic Medical Record Traditional Reporting Database Real-TimeData Stream Nightly Batch Copy bit of magic more magic no special magic here…
10.
RDBMS Sync Utility CLARITY,oracle,etluser, MDW,oracle,dwuser,abchd EPSI,sqlserver,finuser, EPSI MDW CLARITY patient,pat_id,2015-01- pat_enc,pat_enc_csn,201 order_med,order_med_id,
11.
TRANSLATION CONVERSION TRANSMISSION Storm Magic
12.
Merging Batch andReal-Time CREATE EXTERNAL TABLE ... STORED BY ‘Hbase’ ... CREATE TABLE ... ORCFILE ... CREATE VIEW COALESCE(rt, batch) FROM rt FULL OUTER JOIN batch ON...
13.
What it Doesfor Us
14.
Looks like thesame reporting database… smells a lot better! • Use existing skills (SQL) • Reuse data model expertise • Large batch jobs run faster • Large analytics run faster • Near real-time updates
15.
Use Cases &Future
20.
Free text searchon lab results Inventory data archival Medical documentation improvement EMR audit trail archival and reporting o Text extraction and analytics o Device data o Real-time alerting o New Integrated Patient Database
21.
Questions?

Editor's Notes

#3 Here we are, Paul & Adam. Paul’s been with Mercy for more than 8 years, and working in data warehousing, business intelligence, analytics, etc for more than 15 years. He’s currently serving as a director in the data engineering and analytics group, focusing his energy on Mercy’s big data implementation and analytics strategy. Adam is Mercy’s technical lead for big data projects. He’s been with Mercy for more than 2 years, and has been in software development and consulting for more than 15 years. Paul and Adam both live in St. Louis, MO.
#5 Here are some facts about who Mercy is. You might know the name Mercy from other places around the country, but if you aren’t near one of these dots, then it’s a different organization. We have a common heritage, but don’t have any business relationships related to the name. The point is that we’re a fairly large healthcare system. In fact, we’re in the very top tier of customers for our EHR vendor, Epic. We’re large enough that we have to have three separate installations of their software to support our size, and one of those is the largest single installation that Epic had ever done at the time (7 years ago). Virtual Care is one of our largest initiatives right now, building on a successful history with the nation’s largest centralized electronic ICU monitoring service. Remote monitoring and virtual access to specialists is a significant part of Mercy’s growth and commercialization strategy… which will obviously lead to even more data for us to play with.
#6 We’re here to talk about Hadoop. We do that. We’re a year into our first real Hadoop project. We spent a couple of years before that doing proofs of concept and looking for the first solid business case.
#7 Why do we do that? Major problems we’re trying to address: EHR gives us data too late. Many of the use cases involved TODAY’S data, not yesterday’s. We didn’t feel like we could do that on our existing data and reporting systems without huge investment. Hadoop gave us a way to move into the real-time space at a lower cost.
#8 It’s because our current analytical systems are simply too slow to both get data and to analyze data. We have a pretty traditional assortment of operational data store and data mart kinds of data structures. Our largest single database is a structure provided by our EHR vendor. And we have well over 100 report developers across the ministry who have been trained on that data model and write SQL or Crystal Reports against that database. But that data is always going to be at least a day behind reality. And over time, that database hasn’t been able to keep up with the increasing demands. It’s over 26 TB (with the largest single dataset, audit trail, already pulled out onto different platform). Users regularly wait 15+ minutes for their standard reports to run. Those factors and the cost of continuing to grow and improve performance make it an unsustainable long-term solution. We felt like Hadoop gave us a place to build and scale solutions much more affordably, and more flexible tools for bringing in low-latency data. Sepsis example
#10 The batch side of our architecture is
#11 To make it easier to add new tables into our RDBMS synchronization process, we build a configuration-driven utility that works off of a few assumptions. 1) We can sqoop data out of source tables (or receive data files from the source) either in total or using a last update timestamp. 2) Every table has a primary key 3) If we have to deletes, they come to us in a separate file 4) Otherwise we can do a pseudo-upsert (using delete / insert; or merge/replace) Challenges with this process: No upsert in Hive Takes a lot of extra space to do that kind of merge / replace Not all tools support the right data file types to make this really efficient (ORC)
#12 The real-time process is a bit more complicated. After receiving the data from the EMR (as small batch files), the process has three phases. First is the translation of the data format. What we receive is somewhat complex variable-length / variable-format record type. So, we have to have several rules for interpreting incoming records. Second is the conversion of data from the EMR semantics into the reporting database semantics. There are thousands of surrogate key translations and foreign key mappings that happen. Luckily, the metadata on how these should occur is maintained in database tables by the EMR vendor. Finally, the data is mapped into the appropriate target tables and stored in correspondig Hbase tables.
#13 It’s important to note that our real-time updates only come as cell-level information. The source system doesn’t transmit full records, only those fields that have been updated or populated. So, to use our real-time data store, we have to merge the full records from the nightly batch with the individual cell updates from the real-time process. So, we use Hive/Hbase integration to bring the realtime data into Hive. Then we use Hive views to merge the two datasets together. Not shown here, we also have a complicated way to distringuish that a field has been set to NULL versus a field that simply hasn’t seen a real-time update.