11© 2017 MapR Technologies
Big Data in Healthcare
Carol McDonald
@caroljmcdonald
22© 2017 MapR Technologies
The Motivation for Big Data: Poor ROI
•  USA spends a lot more per
capita
•  US Health System ranks last
among eleven countries
(OECD)
–  healthy lives, access, quality,
efficiency
33© 2017 MapR Technologies
Who Knew Healthcare could be so complicated?
44© 2017 MapR Technologies
Value Based Care & Value Based Reimbursement
Incentives for Technology:
•  Improve coordination and
outcome
•  shifting from fee-for-service
•  to value based data driven incentives
55© 2017 MapR Technologies© 2016 MapR Technologies© 2016 MapR Technologies
The Data
66© 2017 MapR Technologies
Where is the Big Data Opportunity?
McKinsey Global Institute
77© 2017 MapR Technologies
Where is the Big Data Opportunity?
According to McKinsey Global Institute the big
data opportunity:
•  Claims
–  utilization of care
•  Pharmaceutical
–  clinical trials
•  Clinical Data
–  Electronic Medical Records
•  Patient Behavior and
Population Health
lab
EMR / EHR
Doctor’s notes
Claims
images
HL7
Social Media
88© 2017 MapR Technologies
Building a Healthcare Data Lake on MapR
Data
Lake
Claims
Clinical
Pharmacy
EMR
Logs and
Notes
3rd Party
Additional
Data
CB Header data, Social, ...
Historical procedures, co-morbidities (prof & inst.)
Lab results, vital signs, ...
Dr. Notes, Customer call logs, emails
Licensing, death master, …
Electronic Medical Records, images & text
Prescriptions, adherence
99© 2017 MapR Technologies© 2016 MapR Technologies© 2016 MapR Technologies
Big Data Use Cases
1010© 2017 MapR Technologies
Patient Data Management
Analyzed
Unstructured Data
Patient 360 View
Lab
EMR / EHR
Analysts
Doctor’s notes
Claims
Images
HL7
Social Media
Providers
MapR Converged Data
Platform
1111© 2017 MapR Technologies
Reducing Fraud Waste and Abuse with Big Data Analytics
•  Healthcare Fraud >$60 billion yr
•  UnitedHealthcare:
–  2200% ROI using MapR for
Fraud
•  Medicare/Medicaid prevented
>$210.7 million fraud 1 year
Machine Learning
Model
EDI Claim
Fraud
Score
1212© 2017 MapR Technologies
Predictive Analytics to Improve Outcomes
• Early Diagnosis of sepsis, CHF
• Predicting risk of readmission
• Matching treatments
Early Detection of Congestive Heart Failure
Sun, Jimeng, Large-scale Patient Similarity Learning for health analytics, Georgia Tech
1313© 2017 MapR Technologies
Predictive Analytics/ Machine Learning
•  Aetna Labs predict future risk of metabolic syndrome
–  https://www.healthcare-informatics.com/article/how-aetna-using-big-data-give-patients-
personalized-care
•  Optum Labs data from 150 million patient records gives insight about
what works best
–  http://www.modernhealthcare.com/article/20150926/MAGAZINE/309269979
1414© 2017 MapR Technologies
Real Time Monitoring and Alerts
Medical Devices
Stream
Stream
Stream Dashboards
Global Analytics &
Alerting
1515© 2017 MapR Technologies
Why combine IOT with Machine Learning?
•  Cheaper sensors and machine learning are making it possible for
doctors to rapidly apply smart medicine to their patients’ cases
–  https://www.wsj.com/articles/the-smart-medicine-solution-to-the-health-care-
crisis-1499443449
1616© 2017 MapR Technologies
Why combine IOT with Machine Learning?
•  A Stanford team has shown that a machine-learning model can
identify arrhythmias from an EKG better than an expert
–  https://www.technologyreview.com/s/608234/the-machines-are-getting-ready-to-play-
doctor/
1717© 2017 MapR Technologies
Applying Machine Learning to Live Patient Data
–  https://www.healthitoutcomes.com/doc/applying-machine-learning-to-live-
data-0001
1818© 2017 MapR Technologies
Real Time Monitoring Potential
•  CDC: chronic diseases—such as heart disease—are the major
causes of sickness and health care costs in the nation
•  McKinsey: Better management of congestive heart failure could
reduce treatment costs by a billion dollars annually
1919© 2017 MapR Technologies
Why combine IOT with Machine Learning?
•  Connected care ensuring quicker Sepsis treatment:
–  Blood pressure, pulse rates and oxygen levels from monitoring devices
combined with machine learning to provide alerts
–  http://www.computerweekly.com/news/450422258/Putting-sepsis-algorithms-into-
electronic-patient-records
2020© 2017 MapR Technologies© 2016 MapR Technologies© 2016 MapR Technologies
Solution Architecture
2121© 2017 MapR Technologies
Serve DataStore DataCollect Data
What Do We Need to Do ?
Process DataData Sources
images
? ? ? ?
2222© 2017 MapR Technologies
Collect the Data with NFS mounted on MapR-XD
•  Data Ingest:
–  File Based:
NFS with
MapR-FS
•  Move hot data
to $$ storage
•  Move cold
data to
cheaper MapR-
XD
Collect Data
MapR-FS
Data Sources
images
NFS
$$$ Storage
NFS
RDBMS
Data
Warehouse
NFS
Unlimited
Inexpensive
Storage
2323© 2017 MapR Technologies
Collect the Events with MapR Streams
Consumers
Consumers
Consumers
Producers
Producers
Producers
MapR-FS
Kafka API Kafka API
2424© 2017 MapR Technologies
Collect Data
Batch processing
MapR-FS
Process Data
•  Spark Parallel processing high
throughput fast
•  Hive, Pig, MapReduce slower but can
be simpler for batch file processing
2525© 2017 MapR Technologies
Apache Spark Distributed Datasets
Distributed Dataset
Node
Executor
P4
Node
Executor
P1 P3
Node
Executor
P2
partitioned
Partition 1
8213034705, 95,
2.927373,
jake7870, 0……
Partition 2
8213034705,
115, 2.943484,
Davidbresler2,
1….
Partition 3
8213034705,
100, 2.951285,
gladimacowgirl,
58…
Partition 4
8213034705,
117, 2.998947,
daysrus, 95….
•  Data read into Memory Cache
•  Partitioned across a cluster
•  Operated on in parallel
•  Cached in memory for iterations
2626© 2017 MapR Technologies
Streaming Data
Stream processing
Process Data
•  scalable, high-throughput, stream
processing of live data
raw
enriched
alerts
2727© 2017 MapR Technologies
Streaming Analytics
2828© 2017 MapR Technologies
Store the Data with MapR-DB
Key
Range
xxxx
xxxx
Key
Range
xxxx
xxxx
Key
Range
xxxx
xxxx
Key colB col
C
val val val
xxx val val
Key colB col
C
val val val
xxx val val
Key colB col
C
val val val
xxx val val
Fast Reads and Writes by Key! Data is automatically partitioned
by Key Range!
2929© 2017 MapR Technologies
Store Lots of Data with NoSQL MapR-DB
bottleneck
Storage ModelRDBMS MapR-DB
Normalized schema à Joins for
queries can cause bottleneck De-Normalized schema à Data that
is read together is stored together
Key colB colC
xxx val val
xxx val val
Key colB colC
xxx val val
xxx val val
Key colB colC
xxx val val
xxx val val
3030© 2017 MapR Technologies
What is Drill?
•  SQL engine on “everything”
•  Files: JSON, CSV, Parquet
•  Structured formats – Ex: parquet
•  Ecosystem components – Hbase, MapRDB, Hive
•  Schema optional
•  interactive response times
3131© 2017 MapR Technologies
Apache Drill Architecture
•  massively parallel processing execution engine
•  distributed query processing
3232© 2017 MapR Technologies
Serve DataStore DataCollect Data
What Do We Need to Do ?
MapR-FS
Process DataData Sources
MapR-FS
Stream
Topic
3333© 2017 MapR Technologies© 2016 MapR Technologies© 2016 MapR Technologies
Customer Data Lakes
3434© 2017 MapR Technologies
MapR Healthcare Customers
Delivers clinical intelligence
to healthcare providers
Sepsis control based on
real time patient data
Genomic data platform
Research grant analysis
80+ use cases; FWA, …
Genomics analysisRadiology analytics Customized solutions for
value-based care
MRI
manufacturer
Novartis
3535© 2017 MapR Technologies
MapR Healthcare Architecture
3636© 2017 MapR Technologies
Data Lake Architectures
Agile, self-
service data
exploration
ETL into operational
reporting formats (e.g.,
Parquet)
Multi-tenancy: job/
data placement
control, volumes
Access controls:
file, table, column,
column family, doc,
sub-doc levels
Sources
Labs
Claims pharmacy
EHR
Auditing:
compliance, analyze
user accesses
Snapshots:
track data lineage
and history
Table Replication:
global multi-master,
business continuity
MapR Converged Data Platform
Enterprise Storage Database Event Streaming
MapR-FS MapR-DB MapR Streams
MapR-DB: time
series, structured
data, JSON
MapR-XD:
unstructured data
NFS/ raw files
MapR Event Streams:
real-time event data
3737© 2017 MapR Technologies
Valence Health
Population Health SaaS for 85,000 doctors 135 hospitals
•  3,000 inbound data feeds
–  Labs, EHR, claims…
Business Problem:
•  ETL for 20 million lab records took 22 hours to process.
Solution with MapR:
•  With NFS 20 million lab records now take 20 minutes with less
hardware
•  https://www.cioreview.com/news/valence-health-cuts-down-processing-time-and-
drives-customer-satisfaction-with-mapr-nid-11084-cid-15.html
3838© 2017 MapR Technologies
UnitedHealthcare Optum
MapR Data Lake single platform to analyze claims, prescriptions..
•  NFS to ingest 1 million claims, 10 terabytes per day
•  2200% ROI machine learning for Payment Integrity
•  Machine learning for improving outcomes: Diabetes, reduce readmissions…
3939© 2017 MapR Technologies
Baptist Health South Florida
Problem:
•  Oracle too expensive for big data
•  Need a common data platform for patient history
Solution:
1.  MapR data lake
2.  Offload cold data from Oracle $$ NFS to MapR
3.  Integration with EMR
4.  Admission/Readmission prediction
5.  Early sepsis detection/notification
6.  real time monitoring
4040© 2017 MapR Technologies
Use Case: Streaming System of Record for Healthcare
•  Objective:
–  Build a flexible, secure
healthcare information
exchange
Challenges:
•  Many different data models
•  Security and privacy issues
•  HIPAA compliance
4141© 2017 MapR Technologies
Solution: Streaming System of Record for Healthcare
•  Solution:
–  Streaming system of record
•  secure
•  immutable
•  rewindable
Auditable
•  Materialized views continuously computed
•  Selective cross data center replication
Stream
Topic
Records
Applications
6 5 4 3 2 1
Search
Graph DB
JSON
HBase
Micro
Service
Micro
Service
Micro
Service
Micro
Service
Micro
Service
Micro
Service
A
P
I
Streaming System of Record
Materialized
Views
4242© 2017 MapR Technologies
Streaming System of Record for Healthcare
Case Study: Liaison Technologies
Raw
Data
workflow
Key/Value
MapR-DB
materialized
view
workflow
Search
Engine
materialized
view
CEP
k v v v v v
k v v v
k v v
k v v v v
k v v v
k v v v v v
Document Log
(MapR-FS)
log
API
App
pre-
processor
workflow
Graph DB
materialized
view
workflow
Time
Series DB
materialized
view
micro
service
micro
service
micro
service
micro
service
micro
service
micro
service
micro
service
micro
service
App AppApp
...
MapR-ES as Immutable Log
MapR Event Streams (MapR-ES)
•  Immutable log for all data
ingested or consumed.
•  Events become system of
record, processed by
consumers based on their
permissions.
MapR-ES powers compliance-
ready lineage:
•  Immutability. MapR-ES throws
no data away.
•  Auditing. Who wrote/read
events?
•  Rewind. What was status of
data two days ago?
•  Replay. Rebuild derivative data
stores.
Auditors want to see:
•  Data lineage. Where data came
from, how it got there.
•  Audit logging. Who wrote to,
updated, or read the data.
4343© 2017 MapR Technologies
Q&A
@mapr
https://www.mapr.com/blog/author/carol-mcdonald
Engage with us!
mapr-technologies

How Big Data is Reducing Costs and Improving Outcomes in Health Care

  • 1.
    11© 2017 MapRTechnologies Big Data in Healthcare Carol McDonald @caroljmcdonald
  • 2.
    22© 2017 MapRTechnologies The Motivation for Big Data: Poor ROI •  USA spends a lot more per capita •  US Health System ranks last among eleven countries (OECD) –  healthy lives, access, quality, efficiency
  • 3.
    33© 2017 MapRTechnologies Who Knew Healthcare could be so complicated?
  • 4.
    44© 2017 MapRTechnologies Value Based Care & Value Based Reimbursement Incentives for Technology: •  Improve coordination and outcome •  shifting from fee-for-service •  to value based data driven incentives
  • 5.
    55© 2017 MapRTechnologies© 2016 MapR Technologies© 2016 MapR Technologies The Data
  • 6.
    66© 2017 MapRTechnologies Where is the Big Data Opportunity? McKinsey Global Institute
  • 7.
    77© 2017 MapRTechnologies Where is the Big Data Opportunity? According to McKinsey Global Institute the big data opportunity: •  Claims –  utilization of care •  Pharmaceutical –  clinical trials •  Clinical Data –  Electronic Medical Records •  Patient Behavior and Population Health lab EMR / EHR Doctor’s notes Claims images HL7 Social Media
  • 8.
    88© 2017 MapRTechnologies Building a Healthcare Data Lake on MapR Data Lake Claims Clinical Pharmacy EMR Logs and Notes 3rd Party Additional Data CB Header data, Social, ... Historical procedures, co-morbidities (prof & inst.) Lab results, vital signs, ... Dr. Notes, Customer call logs, emails Licensing, death master, … Electronic Medical Records, images & text Prescriptions, adherence
  • 9.
    99© 2017 MapRTechnologies© 2016 MapR Technologies© 2016 MapR Technologies Big Data Use Cases
  • 10.
    1010© 2017 MapRTechnologies Patient Data Management Analyzed Unstructured Data Patient 360 View Lab EMR / EHR Analysts Doctor’s notes Claims Images HL7 Social Media Providers MapR Converged Data Platform
  • 11.
    1111© 2017 MapRTechnologies Reducing Fraud Waste and Abuse with Big Data Analytics •  Healthcare Fraud >$60 billion yr •  UnitedHealthcare: –  2200% ROI using MapR for Fraud •  Medicare/Medicaid prevented >$210.7 million fraud 1 year Machine Learning Model EDI Claim Fraud Score
  • 12.
    1212© 2017 MapRTechnologies Predictive Analytics to Improve Outcomes • Early Diagnosis of sepsis, CHF • Predicting risk of readmission • Matching treatments Early Detection of Congestive Heart Failure Sun, Jimeng, Large-scale Patient Similarity Learning for health analytics, Georgia Tech
  • 13.
    1313© 2017 MapRTechnologies Predictive Analytics/ Machine Learning •  Aetna Labs predict future risk of metabolic syndrome –  https://www.healthcare-informatics.com/article/how-aetna-using-big-data-give-patients- personalized-care •  Optum Labs data from 150 million patient records gives insight about what works best –  http://www.modernhealthcare.com/article/20150926/MAGAZINE/309269979
  • 14.
    1414© 2017 MapRTechnologies Real Time Monitoring and Alerts Medical Devices Stream Stream Stream Dashboards Global Analytics & Alerting
  • 15.
    1515© 2017 MapRTechnologies Why combine IOT with Machine Learning? •  Cheaper sensors and machine learning are making it possible for doctors to rapidly apply smart medicine to their patients’ cases –  https://www.wsj.com/articles/the-smart-medicine-solution-to-the-health-care- crisis-1499443449
  • 16.
    1616© 2017 MapRTechnologies Why combine IOT with Machine Learning? •  A Stanford team has shown that a machine-learning model can identify arrhythmias from an EKG better than an expert –  https://www.technologyreview.com/s/608234/the-machines-are-getting-ready-to-play- doctor/
  • 17.
    1717© 2017 MapRTechnologies Applying Machine Learning to Live Patient Data –  https://www.healthitoutcomes.com/doc/applying-machine-learning-to-live- data-0001
  • 18.
    1818© 2017 MapRTechnologies Real Time Monitoring Potential •  CDC: chronic diseases—such as heart disease—are the major causes of sickness and health care costs in the nation •  McKinsey: Better management of congestive heart failure could reduce treatment costs by a billion dollars annually
  • 19.
    1919© 2017 MapRTechnologies Why combine IOT with Machine Learning? •  Connected care ensuring quicker Sepsis treatment: –  Blood pressure, pulse rates and oxygen levels from monitoring devices combined with machine learning to provide alerts –  http://www.computerweekly.com/news/450422258/Putting-sepsis-algorithms-into- electronic-patient-records
  • 20.
    2020© 2017 MapRTechnologies© 2016 MapR Technologies© 2016 MapR Technologies Solution Architecture
  • 21.
    2121© 2017 MapRTechnologies Serve DataStore DataCollect Data What Do We Need to Do ? Process DataData Sources images ? ? ? ?
  • 22.
    2222© 2017 MapRTechnologies Collect the Data with NFS mounted on MapR-XD •  Data Ingest: –  File Based: NFS with MapR-FS •  Move hot data to $$ storage •  Move cold data to cheaper MapR- XD Collect Data MapR-FS Data Sources images NFS $$$ Storage NFS RDBMS Data Warehouse NFS Unlimited Inexpensive Storage
  • 23.
    2323© 2017 MapRTechnologies Collect the Events with MapR Streams Consumers Consumers Consumers Producers Producers Producers MapR-FS Kafka API Kafka API
  • 24.
    2424© 2017 MapRTechnologies Collect Data Batch processing MapR-FS Process Data •  Spark Parallel processing high throughput fast •  Hive, Pig, MapReduce slower but can be simpler for batch file processing
  • 25.
    2525© 2017 MapRTechnologies Apache Spark Distributed Datasets Distributed Dataset Node Executor P4 Node Executor P1 P3 Node Executor P2 partitioned Partition 1 8213034705, 95, 2.927373, jake7870, 0…… Partition 2 8213034705, 115, 2.943484, Davidbresler2, 1…. Partition 3 8213034705, 100, 2.951285, gladimacowgirl, 58… Partition 4 8213034705, 117, 2.998947, daysrus, 95…. •  Data read into Memory Cache •  Partitioned across a cluster •  Operated on in parallel •  Cached in memory for iterations
  • 26.
    2626© 2017 MapRTechnologies Streaming Data Stream processing Process Data •  scalable, high-throughput, stream processing of live data raw enriched alerts
  • 27.
    2727© 2017 MapRTechnologies Streaming Analytics
  • 28.
    2828© 2017 MapRTechnologies Store the Data with MapR-DB Key Range xxxx xxxx Key Range xxxx xxxx Key Range xxxx xxxx Key colB col C val val val xxx val val Key colB col C val val val xxx val val Key colB col C val val val xxx val val Fast Reads and Writes by Key! Data is automatically partitioned by Key Range!
  • 29.
    2929© 2017 MapRTechnologies Store Lots of Data with NoSQL MapR-DB bottleneck Storage ModelRDBMS MapR-DB Normalized schema à Joins for queries can cause bottleneck De-Normalized schema à Data that is read together is stored together Key colB colC xxx val val xxx val val Key colB colC xxx val val xxx val val Key colB colC xxx val val xxx val val
  • 30.
    3030© 2017 MapRTechnologies What is Drill? •  SQL engine on “everything” •  Files: JSON, CSV, Parquet •  Structured formats – Ex: parquet •  Ecosystem components – Hbase, MapRDB, Hive •  Schema optional •  interactive response times
  • 31.
    3131© 2017 MapRTechnologies Apache Drill Architecture •  massively parallel processing execution engine •  distributed query processing
  • 32.
    3232© 2017 MapRTechnologies Serve DataStore DataCollect Data What Do We Need to Do ? MapR-FS Process DataData Sources MapR-FS Stream Topic
  • 33.
    3333© 2017 MapRTechnologies© 2016 MapR Technologies© 2016 MapR Technologies Customer Data Lakes
  • 34.
    3434© 2017 MapRTechnologies MapR Healthcare Customers Delivers clinical intelligence to healthcare providers Sepsis control based on real time patient data Genomic data platform Research grant analysis 80+ use cases; FWA, … Genomics analysisRadiology analytics Customized solutions for value-based care MRI manufacturer Novartis
  • 35.
    3535© 2017 MapRTechnologies MapR Healthcare Architecture
  • 36.
    3636© 2017 MapRTechnologies Data Lake Architectures Agile, self- service data exploration ETL into operational reporting formats (e.g., Parquet) Multi-tenancy: job/ data placement control, volumes Access controls: file, table, column, column family, doc, sub-doc levels Sources Labs Claims pharmacy EHR Auditing: compliance, analyze user accesses Snapshots: track data lineage and history Table Replication: global multi-master, business continuity MapR Converged Data Platform Enterprise Storage Database Event Streaming MapR-FS MapR-DB MapR Streams MapR-DB: time series, structured data, JSON MapR-XD: unstructured data NFS/ raw files MapR Event Streams: real-time event data
  • 37.
    3737© 2017 MapRTechnologies Valence Health Population Health SaaS for 85,000 doctors 135 hospitals •  3,000 inbound data feeds –  Labs, EHR, claims… Business Problem: •  ETL for 20 million lab records took 22 hours to process. Solution with MapR: •  With NFS 20 million lab records now take 20 minutes with less hardware •  https://www.cioreview.com/news/valence-health-cuts-down-processing-time-and- drives-customer-satisfaction-with-mapr-nid-11084-cid-15.html
  • 38.
    3838© 2017 MapRTechnologies UnitedHealthcare Optum MapR Data Lake single platform to analyze claims, prescriptions.. •  NFS to ingest 1 million claims, 10 terabytes per day •  2200% ROI machine learning for Payment Integrity •  Machine learning for improving outcomes: Diabetes, reduce readmissions…
  • 39.
    3939© 2017 MapRTechnologies Baptist Health South Florida Problem: •  Oracle too expensive for big data •  Need a common data platform for patient history Solution: 1.  MapR data lake 2.  Offload cold data from Oracle $$ NFS to MapR 3.  Integration with EMR 4.  Admission/Readmission prediction 5.  Early sepsis detection/notification 6.  real time monitoring
  • 40.
    4040© 2017 MapRTechnologies Use Case: Streaming System of Record for Healthcare •  Objective: –  Build a flexible, secure healthcare information exchange Challenges: •  Many different data models •  Security and privacy issues •  HIPAA compliance
  • 41.
    4141© 2017 MapRTechnologies Solution: Streaming System of Record for Healthcare •  Solution: –  Streaming system of record •  secure •  immutable •  rewindable Auditable •  Materialized views continuously computed •  Selective cross data center replication Stream Topic Records Applications 6 5 4 3 2 1 Search Graph DB JSON HBase Micro Service Micro Service Micro Service Micro Service Micro Service Micro Service A P I Streaming System of Record Materialized Views
  • 42.
    4242© 2017 MapRTechnologies Streaming System of Record for Healthcare Case Study: Liaison Technologies Raw Data workflow Key/Value MapR-DB materialized view workflow Search Engine materialized view CEP k v v v v v k v v v k v v k v v v v k v v v k v v v v v Document Log (MapR-FS) log API App pre- processor workflow Graph DB materialized view workflow Time Series DB materialized view micro service micro service micro service micro service micro service micro service micro service micro service App AppApp ... MapR-ES as Immutable Log MapR Event Streams (MapR-ES) •  Immutable log for all data ingested or consumed. •  Events become system of record, processed by consumers based on their permissions. MapR-ES powers compliance- ready lineage: •  Immutability. MapR-ES throws no data away. •  Auditing. Who wrote/read events? •  Rewind. What was status of data two days ago? •  Replay. Rebuild derivative data stores. Auditors want to see: •  Data lineage. Where data came from, how it got there. •  Audit logging. Who wrote to, updated, or read the data.
  • 43.
    4343© 2017 MapRTechnologies Q&A @mapr https://www.mapr.com/blog/author/carol-mcdonald Engage with us! mapr-technologies