Apache Kafka in the Healthcare Industry

The Rise of Data in Motion in the Healthcare Industry
Use Cases, Architectures and Examples powered by Apache Kafka
Kai Waehner
Field CTO
kai.waehner@confluent.io
linkedin.com/in/kaiwaehner
@KaiWaehner
confluent.io
kai-waehner.de

Data in Motion with Apache Kafka in the Healthcare Industry – @KaiWaehner - www.kai-waehner.de
Healthcare includes many topics…
https://isilanguagesolutions.com/2019/02/25/what-are-the-differences-between-health-care-medical-life-science-and-pharmaceutical-translations/

Healthcare Value Chain
4
https://www.researchgate.net/publication/265654743_The_business_of_healthcare_innovation_in_the_Wharton_School_curriculum

The world is changing.

“Pandemic drives digital
adoption forward 5 years
in a span of 8 weeks.”
Digital adoption through COVID and beyond, McKinsey
Covid Increases the Pressure
6

Digital health
ecosystems: A payer
perspective
- McKinsey Article August
2019
Digital
Health
Ecosystem
Disruption

This transformation is
happening everywhere

Doctors become Software

Medical Research becomes Software

Patient Data becomes Software

Security becomes Software

Healthcare Companies and Organizations

What enables this
transformation?

Real-time Data beats Slow Data.
19

Real-time Data beats Slow Data.
Emergency
Real-time sensor
diagnostics
Intelligent routing
ETA updates
Patient Care
Diagnosis
Treatment
Connected Health
Insurance
Member Enrollment
Claim processing
Omnichannel
patient experience
Cybersecurity
Threat detection
Incident response
Data privacy
protection

This is a fundamental paradigm shift...
21
Infrastructure
as code
Data in Motion
as continuous
streams of events
Future of the
datacenter
Future of data
Cloud
Event
Streaming

What is Data in Motion?

‘Event’ is what happens in your business
Transportation
GPS in the ambulance sends ETA to the hospital at 5:11am.
Kafka
Insurance Claim
Alice filed a healthcare insurance claim Friday at 7:34pm.
Kafka
Patient Interaction
The doctor updates Sabine’s case status at 9:10am.
Kafka

Data in Motion in the Healthcare Industry
Your Business as Streams of Events, powered by Kafka
Insurance Claim
Processing
Contact
Relatives
Patient
Diagnosis
Surgery
Ambulance
Emergency
Situation

An Event Streaming Platform is the
Underpinning of an Event-driven Architecture
25
MES
ERP
Sensors
Mobile
Customer 360
Real-time
Alerting System
Data
warehouse
Producers
Consumers
Streams of real time events
Stream processing
apps
Connectors
Connectors
Stream processing
apps
Supplier
Alert
Forecast
Inventory Customer
Order

With Data in Motion…
Hadoop ... Device
Logs ... App ...
Microservice
Mainframes
Data
Warehouse Splunk ...
Data Stores Logs 3rd Party Apps Custom Apps / Microservices
Supply Chain
Management
Medical Fraud
Detection
Patient &
Beneficiary 360
Disease Spread
Modeling
HL Data
Transformation ...
Contextual Event-Driven Applications
Universal Event Pipeline

Public Health Data Automation in Confluent
28
Connectors:
CDC
MQ
REST Proxy
EDI / Batch Input
Processing
Legacy Data
Storage and
Processing
Claims Clinical
Schema
Registry
ksqlDB / Streams
HL7-FHIR
MicroServices
Analytics
Sink Connector
Sinks

Example: Benefits application process
Software-using
1 3 5
4 6
2
BENEFICIARY FORM
INTAKE
CASE
MANAGER
APPLICATION
REVIEW
BENEFITS
APPLICATION
APPROVE
DENY
Software-defined
1
BENEFICIARY BENEFITS
APP UI
3
APPROVE
DENY
$
BENEFITS
SERVICE
RISK/FRAUD
SERVICE
!
EXTERNAL
AGENCY
SERVICE
2
Weeks
Seconds

Use Cases for Data in Motion in the Healthcare Industry
31
Know Your Patient (= “Customer 360”)
● Digital Transformation
● eCommerce Optimization
● Product Catalog Optimization
● Product-Inventory Profiling and
Filtering by Customer or Persona
● Real-time Pricing Models
● Next Best Offer/Cross-Sell/
Recommendations
● Omni-Channel Experience
● Customer Profile Updates
● …
Operations (Healthcare 4.0 including
Drug R&D, Patient Care, etc.)
● Supply Chain Optimization
● Shipment Notifications/Delays
● Inventory Processing and
Oversight
● Predictive Inventory Management
● Connected Health
● Improved Care
● Proactive Patient Care
● Patient Notifications
● Pharma Modernization
● M&A Rapid Integration
● …
IT Perspective
● Cybersecurity/
SIEM Optimization
● Mainframe Offload
● Hybrid Cloud Integration/ Bridge
to Cloud
● Middleware/
Messaging Modernization
● Streaming ETL & Analytics
● …

Real World Deployments

1. Legacy Modernization and Hybrid Cloud
2. Streaming ETL
3. Real-time Analytics
4. Machine Learning and Data Science
5. Open API and Omnichannel
Data in Motion across the Healthcare Value Chain

Optum – Self-Service Kafka
American pharmacy benefit manager and health care provider
(subsidiary of UnitedHealth Group)
Kafka as a Service within UnitedHealth Group
Centrally managed and utilized by over 200 internal application
teams
Repeatable, scalable, cost-efficient way to standardize data
From mainframe via CDC into modern data processing and
analytics tools

Centene
Integration and Data Processing at Scale in Real-Time
Healthcare Insurer acts as intermediary for both government-sponsored and privately insured health care programs
Largest Medicaid and Medicare Managed Care Provider in the US
https://www.confluent.io/online-talks/building-an-enterprise-eventing-framework-on-demand/

Bayer AG – Hybrid Real-Time Data Flow
Adopted a cloud first strategy and started a multi-year transition to the cloud.
Kafka-based cross-datacenter DataHub was created to facilitate migration and to drive shift to real-time stream processing.
Strong enterprise adoption and supports a myriad of use cases
41
https://www.confluent.io/kafka-summit-sf18/bringing-streaming-data-to-the-masses

Bayer AG – Data Integration and Processing in R&D
Analysis of clinical trials, patents, reports, news, literature, etc.
250M documents, 7TB raw text from 30 data sources.
Variety of document streams with different formats and schemas flowing through several text processing and enrichment steps.
Scalable, reliable Kafka pipelines with Kafka Streams (Java) and Faust (Python) replaced custom, error-prone, non-scalable scripts.
43
https://www.kafka-summit.org/sessions/bayer-document-stream-pipelines

Babylon Health – Secure and Agile Integration
Connectivity + Agile Microservice Architecture.
GDPR and PII compliant security.
44
https://www.confluent.io/kafka-summit-lon19/one-key-to-rule-them-all

Cerner – Sepsis Alerting
Supplier of health information technology services, devices, and hardware
~30% of all US Healthcare Data in a Cerner Solution
Central event streaming platform for sepsis alerting in real-time to save lives

Celmatix - Reproductive Health Care
47
https://www.confluent.io/customers/celmatix/
Preclinical-stage biotech company that provides
digital tools and genetic insights focused on fertility.
Personalized information to disrupt how women
approach their lifelong reproductive health journey.
Real-time aggregation of heterogeneous data data
collected from Electronic Medical Records (EMRs)
and genetic data collected from partners through
their Personalized Reproductive Medicine (PReM)
Initiative.
Proactive reproductive health decisions by leveraging
real-time genomics data and applying technologies
such as big data analytics, machine learning, A/I and
whole-genome DNA sequencing
Data governance for security and compliance.

Centers for Disease Control and Prevention (CDC):
Covid-19 Electronic Lab Reporting
https://www.confluent.io/resources/kafka-summit-2020/flattening-the-curve-with-kafka/
CELR
(COVID Electronic Lab Reporting)
Case notifications, lab reporting,
healthcare interoperability in real-time
Track the threat of COVID-19 virus to
provide comprehensive data for local,
state, and federal response
Better understand locations with an
increase in incidence
Rapidly aggregate, validate, transform,
and distribute laboratory testing data
submitted by public health departments
and other partners

Recursion – Discovering Drugs in Real-Time
Accelerate drug discovery.
Find drug treatments by processing biological images.
Massively parallel system.
Combines experimental biology, artificial intelligence,
automation and real-time event streaming.
50
https://www.confluent.io/customers/recursion
https://www.confluent.io/kafka-summit-san-francisco-2019/discovering-drugs-with-kafka-streams

Humana – Real-Time Integration and Analytics
Interoperability platform to transition from Insurance Company with Elements of Health,
to truly a Health Company with Elements of Insurance.
Consumer-centric, health plan agnostic, provider agnostic. Cloud resilient and elastic. Event-driven and real-time.
Inter organization data sharing (aka “data exchange / data sharing”)
Use cases include real-time updates of health information (Connecting HCP’s -> Pharmacies), reducing pre-authorizations from 20-
30 minutes to 1 minute, real-time home healthcare assistant communication
51
https://www.confluent.io/resources/kafka-summit-2020/levi-bailey-keynote-humana-improving-health-with-event-driven-architectures/

Care.com – Trusted Caregivers
53
Online marketplace for a range of care services including senior care and housekeeping
Bravo Platform as simple, unified IT architecture to be able to streamline go-to-market initiatives
From a monolithic architecture into a truly decoupled, scalable microservices platform
Migration from Confluent Platform to Confluent Cloud to focus on business problems
Data Governance with Schema Registry across different run times (Java, .NET, Go, etc.)
“Care APIs” (inspired by Google APIs) to define all of their data and service contracts with Protobuf
Enhance security for PII data with fine-grained RBAC and data lineage
https://www.confluent.io/customers/care-com/

Invitae – Data Science and 24/7 Production
Biotechnology company that provides DNA-based testing for the detection of genetic abnormalities beyond
what can be identified through traditional methodologies
Gene panels and single-gene testing for a broad range of clinical areas including
hereditary cancer, cardiology, neurology, pediatric genetics, metabolic disorders, immunology, hematology.
Bring comprehensive genetic information into mainstream medical practice
to improve the quality of healthcare for billions of people.
Omnichannel: Genetic results are often just the beginning. Invitae's interactive, educational portal and caring
gentic counselors can help you understand your results and what to do next.
Truly decoupled infrastructure to enable others to join in and consume the data.
Paradigm shift: Building an application entirely of streams.
54
https://www.confluent.io/kafka-summit-san-francisco-2019/from-zero-to-streaming-healthcare-in-production

What is Data Streaming with the
Apache Kafka Ecosystem?

Kafka: The Trinity of Event Streaming
01
Publish & Subscribe
to Streams of Events
02
Store
your Event Streams
03
Process & Analyze
your Events Streams

Kafka Makes Your Business Real-time
CREATE STREAM payments (user VARCHAR, amount INT)
WITH (kafka_topic = 'all_payments', value_format = 'avro');
CREDIT
SERVICE
ksqlDB
CREATE TABLE credit_scores AS
SELECT user, updateScore(p.amount) AS credit_score
FROM payments AS p
GROUP BY user
EMIT CHANGES;
RISK
SERVICE
ksqlDB

Databases
Messaging
ETL / Data Integration
Data Warehouse
Why can’t I do this with my
existing data platforms?

Enterprise Data Platform Requirements Are Shifting
1 3 4
2
Scalable for
Transactional Data
Transient Raw data
Built for
Historical Data
Built for Real-
Time Events
Scalable for
ALL data
Persistent +
Durable
Enriched
data
● Value: Trigger real-
time workflows (i.e.
real-time order
management)
● Value: Scale across
the enterprise (i.e.
customer 360)
● Value: Build
mission-critical
apps with zero data
loss (i.e. instant
payments)
● Value: Add context &
situational awareness
(i.e. ride sharing ETA)
62

Only Event Streaming Has All 4 Requirements
63

Only Event Streaming Has All 4 Requirements
Messaging
Databases
Event Streaming
Data Warehouse
BUILT FOR REAL-
TIME EVENTS
SCALABLE
FOR ALL DATA
PERSISTENT &
DURABLE
CAPABLE OF
ENRICHMENT
64
Good for transactional applications
Good for ultra low-latency, fire-and-forget use cases
Good for batch data integration
Good for historical analytics and reporting
Platform for Event-Driven Transformation
(Scalable Messaging + Real-Time Data Integration + Stream Processing)
ETL/Data Integration

Project Example:
Drug Discovery

Use Case: Drug Discovery
“On average, it takes at least ten
years for a new medicine to
complete the journey from initial
discovery to the marketplace”
PhRMA
http://phrma-docs.phrma.org/sites/default/files/pdf/rd_brochure_022307.pdf

Recursion – Discovering Drugs in Real-Time
Accelerate drug discovery.
Find drug treatments by processing biological images.
Massively parallel system.
Combines experimental biology, artificial intelligence,
automation and real-time event streaming.
70
https://www.confluent.io/customers/recursion
https://www.confluent.io/kafka-summit-san-francisco-2019/discovering-drugs-with-kafka-streams

Image and Video Processing
… (on high level) is “just” pixels (arrays of 0s and 1s) and matrix multiplication

Drug Discovery
in manual and slow, bursty batch mode, not scalable

Drug Discovery
in automated, scalable, reliable real time Mode

Digital Image Processing for Drug Discovery
Find drug treatments by processing biological images:
• ML models can be trained to decide between healthy cells and disease
cells with problematic genes
• Grow healthy cells and disease cells in labs
• Apply different drugs à Make disease cells look healthy again

Digital Image
Processing
(OpenCV
SaaS Service
REST API)
Kafka, ksqlDB and TensorFlow for
Drug Discovery in Real Time at Scale
Kafka Client
(.NET C++)
Batch
Reporting
Platform
BI
Dashboard
Confluent
Server
Tiered Storage
Kafka
Connect
Laboratory
(Windows Machines)
Confluent Platform
Other Components
Model Training
and Scoring
(Python Client +
TensorFlow)
All Data
Processed
Images
Images
Human
Intelligence
Streaming
ETL
(ksqlDB)
Stateful
Workflow
Orchestration
(Kafka Streams)
Database
(MySQL) Kafka Connect
(Oracle CDC)
Historical Drugs Data

Ingestion of Images
Replication
Cluster Linking
Kafka
Connect
Laboratory

Data Preprocessing
Preprocessing
Filter, transform, anonymize, extract features,
reduce noise, enhance brightness / contrast
Streams
Data Ready
For Model Training

SELECT image_id, experiment_id, image_details
FROM image_channel i
LEFT JOIN experiment_database e ON i.experiment_id =
e.experiment_id
WHERE e.image_type = ‘black_and_white';
Data Processing with ksqlDB

Direct streaming ingestion
for model training and / or scoring
with TensorFlow I/O + Kafka Plugin
(no additional data storage
like S3 or HDFS required!)
Time
Model B
Model A
Producer
Distributed Commit Log
Streaming Ingestion and Model Training
with TensorFlow IO
https://github.com/tensorflow/io

Confluent Tiered Storage for Kafka
85

Use Cases for Reprocessing Historical Events
Give me all events from time A to time B
Real-time Producer
Time
• New consumer application
• Error-handling
• Compliance / regulatory processing
• Query and analyze existing events
• Schema changes in analytics platform
• Model training
Real-time Consumer
Consumer of Historical Data

Local Predictions
Model Training
in Cloud
Model Deployment
at the Edge
Analytic Model
Separation of
Model Training and Model Inference

Streams
Input Event
Prediction
Request
Response
Model Serving
TensorFlow Serving
gRPC / HTTP
Application
Stream Processing with External Model and RPC
Model

“CREATE STREAM ImageAnalysis AS
SELECT image_id, analyzeImage(image_details)
FROM image_channel;“
User Defined Function (UDF)
Embedded Model Deployment with
Apache Kafka, ksqlDB and TensorFlow

Model Training and Scoring
with the same ML Pipeline (or even in the same Application)
• Data Science team responsible for the whole model lifecycle
• Beloved Python tool stack (Pandas, scikit learn, TensorFlow, Jupyter, …)
• 24/7 production scale with Confluent Python Client (e.g. deployed in Docker containers on Kubernetes)

Digital Image
Processing
(External SaaS
Service + REST)
Kafka, ksqlDB and TensorFlow for
Drug Discovery in Real Time at Scale
Kafka Client
(.NET C++)
Batch
Reporting
Platform
BI
Dashboard
Confluent
Server
Tiered Storage
Kafka
Connect
Laboratory
(Windows Machines)
Confluent Platform
Other Components
Model Training
and Scoring
(Python Client +
TensorFlow)
All Data
Processed
Images
Images
Human
Intelligence
Streaming
ETL
(ksqlDB)
Stateful
Workflow
Orchestration
(Kafka Streams)
Database
(MySQL) Kafka Connect
(Oracle CDC)
Historical Drugs Data

Data in Motion Is The Future Of Data
92
Infrastructure
as code
Data in motion
as continuous
streams of events
Future of the
datacenter
Future of data
Cloud
Event
Streaming

Why Confluent?

The Rise of Data in Motion
2010
Apache Kafka
created at LinkedIn by
Confluent founders
2014
2020
80%
Fortune 100
Companies
trust and use
Apache Kafka
95

I N V E S T M E N T & T I M E
V
A
L
U
E
3
4
5
1
2
Event Streaming Maturity Model
Initial Awareness /
Pilot (1 Kafka
Cluster)
Start to Build
Pipeline / Deliver 1
New Outcome
(1 Kafka Cluster)
Mission-Critical
Deployment
(Stretched, Hybrid,
Multi-Region)
Build Contextual
Event-Driven Apps
(Stretched, Hybrid,
Multi-Region)
Central Nervous
System
(Global Kafka)
Product, Support, Training, Partners, Technical Account Management...
96

Car Engine Car Self-driving Car
Confluent completes Apache Kafka. Cloud-native. Everywhere.

Kai Waehner
Field CTO
kai.waehner@confluent.io
@KaiWaehner
confluent.io
kai-waehner.de
linkedin.com/in/kaiwaehner
Questions? Feedback?
Let’s connect!

Apache Kafka in the Healthcare Industry

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Apache Kafka in the Healthcare Industry

Similar to Apache Kafka in the Healthcare Industry (20)

More from Kai Wähner

More from Kai Wähner (13)

Recently uploaded

Recently uploaded (20)

Apache Kafka in the Healthcare Industry