Healthcare directories underpin most healthcare systems around the world and is often a core component that enables initiatives like ‘Care Coordination’.
Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...
Building a Federated Data Directory Platform for Public Health
1.
2. Building a Federated Data Directory
Platform for Public Health
Mark Paul
Engineering Manager
Anshul Bajpai
Data Engineering Lead
3. Agenda
1. Problems with Centralised Data Directories
2. Solution: Federated Data Directory Platform
3. Design Patterns
4. Intelligent System of Record Ranking
5. Architecture Patterns
4. ▪ Australian digital health
infrastructure
▪ National directory of health
services and the practitioners
who provide them
▪ National, government-owned,
not-for-profit organization
▪ Trusted health information and
advice for all Australians
#1 Australian health
information website
4.8m community
connections each month
6. Healthcare Directories - Critical Healthcare Infrastructure
▪ Enables Care Coordination
▪ Single Point-of-Failure
▪ Bad Data Quality = Clinical Risk to
Patients
7. Healthcare Directories - Problems
Data Updated via Content
Management Systems and Call
Centres
This Model Is Reactive and Inefficient!
Data volatility (High Frequency of change to data)
Basic Centrally Managed Databases
Applications
9. Federating Data is a Powerful Concept
Federated Database:
▪ Maps multiple autonomous database systems into
a single federated datastore
Federated Data Platform:
▪ Controlled aggregation to create “gold-standard
data” by using multiple Autonomous Origin Data
Sources
▪ Data Aggregation via Event Sourcing pipelines
10. Building the Federated Data “Puzzle”
Federal, State, Public/Private Hospitals, EMR, and other Commercial Vendors
participate as Systems of Records
12. Source Classification
▪ System of Record (SoR):
▪ Identify your Authoritative SoRs
SoRs have Role/s:
▪ Source of Truth
▪ Authoritative owner of a subset of data
▪ Source of Validation:
▪ Improve Data Quality
▪ Source of Notification:
▪ Increase “data currency”
▪ Gold Entities
▪ Your final entity models (e.g. Healthcare Service,
Organisation, Practitioner)
▪ Raw Entities
▪ Raw (Source) entities that are in pre-mapping stage that
would be eventual mapped to your gold entities
▪ Source Channels
▪ Pipeline channels that transition Raw Entities into new
version of Gold Entities
Entity / Channel Setup
13. Attribute Sourcing
Id: "561f10e4-0109-b99f-a2df-c059f9dc4a9b"
name: "Cottesloe Medical Centre"
bookingProviders [
{ Id: hotdoc,
providerIdentifier: cottesloe-medical-centre },
{ Id: healthengine,
providerIdentifier: ctl-m-re }
]
practitionerRelations [
{ pracId: c618860e-a69a
type: providerNumber,
value: 2djfkdn3k34 },
{ pracId: hsjfk3e-53vd
type: providerNumber,
value: dsfh4kslfls }
]
Calendar: {
openRules: […],
closedRules: […]
}
Contacts: {
Email: sss@gmail.com
Website: www.tt.com
Phone: 3242343
}
Medicare (SoV)
Healthscope (SoN)
Vendor Software (SoT)
Vendor Software (SoT)
Internal
Healthcare
Service
Practitioner
Relation
Details about
Practitioners who work at
a service
Internal
Internal
Vendor Software (SoT)
Data Federation
15. Pre-Processing Layer
Automated Pre Processing via Notebooks (origin API or
offline data extracts via SFTP, S3 pickup folders)
▪ Generate “Source Data” Event Object
{
DataPayload: <type> Raw Entity Model
Provenance: <type> Provenance
}
▪ DataPayload holds the “Raw Entity” (Source Specific
Model)
▪ Provenance used for source / origin identification
16. Raw Processing Layer (Bronze)
Picks up Source Data Event from “Pre Processing Output”
▪ Performs routine, high level parsing / cleansing
▪ Generate “Core Data” Event Object
▪ Which carries to each downstream layer and captures
transition/operational changes at each layer
▪ Generate Event Trace ID - the end-to-end traceability
identification
▪ Data Lineage Object “begins” to capture operational
outcomes to events
17. Stage Processing Layer (Silver)
Picks up Core Data Event from “Raw Output”
▪ Mapping Operation - Convert from Raw (Source) Entity
model to Gold Entity Model
▪ Referencing Operation - Enrichment using Reference
Data lookups
▪ Merging Operation - New Gold Version created
▪ Validation Operation: Final validation against Gold
Model validation Rules
18. Stage Processing Layer
▪ Merging Operation
▪ Matching by “primary key”
▪ Merging (based on last version) / Delta Determination
▪ Version Incrementation
▪ Metadata Attribution generated and appended
▪ Logs Every Change to Every Attribute on Every Event
▪ Individual Data Lineage Objects store all operational
outcomes on the event (attribute exceptions, violations,
status changes etc.)
19. Gold Processing Layer
Picks up Core Data Event from “Stage Output”
▪ Entity Relationship Validation - Ensures entity
relationships are “intact” - Prevents Orphans
▪ Re-Processing & Replay : Replay latest versions (for
new reference data, business / validation logic to apply)
▪ Data Science Layer - Data Quality Benchmarks
23. “Dedicated System of Record (SoR)”
Has full update authority over your
data attributes
1. Data Quality Regressions flow
into your System
2. Low Frequency of Change (Low
Data Currency)
Problem
“Candidate Systems of Record (CSoR)”
Alternate “SoRs” who compete to update
the same data
Healthcare Service
{ Opening Hours,
Contact Details}
SoR
A
CSoR
B
CSoR
C
Solution
24. Source Opening Hours Contact Details
SoR A Priority 1 Priority 1
CSoR B - Priority 2
CSoR C - Priority 3
SoR
C
In the same
MicroBatch - SoR A
wins over CSoR B
and CSoR C
SoR
B SoR
A
▪ Ranking assigned based on “business priority”
Healthcare Service Entity Attributes
Manual Ranking
25. Source
Total Updates Contact Details-
Lineage Warnings
Contact Details-
Lineage Errors
SoR A 10 4 2
CSoR B 8 1 0
CSoR C 2 1 1
▪ Data Lineage outcomes aggregated over last 30 days
▪ “Priority boosted” based on “Recent Performance” of Sources
Healthcare Service Entity Lineage Events
Automatic Ranking
Contact
Details
Priority 2
Priority 1
Priority 3
Updated Priority
Contact
Details
Priority 1
Priority 2
Priority 3
Original Priority
29. Logical Data Zones using Databricks DELTA
▪ Data Control Plane: LANDING, RAW, STAGE, GOLD (i.e. Bronze, Silver, Gold)
▪ Used DELTA Cache for performance optimisation (stream and batch workloads)
▪ Runs on AWS Accounts under our Security Policy and Regulatory Compliance
▪ Operational Control Plane: Cluster Administration, Management functions like,
Access Control, Jobs and Schedules
31. Continuous Streaming Applications
▪ Enables “True Event Sourcing” via Streaming
Input, Kinesis, S3, and DELTA
▪ Running Micro-batches lead to smaller and
more manageable data volumes
▪ Recoverability through Checkpoints and
Reliability through Streaming Sinks to DELTA
tables
32. Data Issue Problem Statement:
Downstream Health Integrator is complaining that un-
anticipated special unicode characters in the service
description is breaking their integration.
33. Restore & Recover Data Versions Seamlessly
▪ During Data quality issues, we can rewind to
previous versions
▪ Using Metadata attribution, Provenance and Data
Lineage features - we can trace the root cause to a
specific origin source up-to millisecond precision
▪ Complete audit trail and ability to provide SoR Data
Quality Reporting
34. Questions & Feedback
Mark Paul - @ThisIsMarkPaul
Your feedback is important to us.
Don’t forget to rate and
review the sessions.