SlideShare a Scribd company logo
1 of 34
Download to read offline
Building a Federated Data Directory
Platform for Public Health
Mark Paul
Engineering Manager
Anshul Bajpai
Data Engineering Lead
Agenda
1. Problems with Centralised Data Directories
2. Solution: Federated Data Directory Platform
3. Design Patterns
4. Intelligent System of Record Ranking
5. Architecture Patterns
▪ Australian digital health
infrastructure
▪ National directory of health
services and the practitioners
who provide them
▪ National, government-owned,
not-for-profit organization
▪ Trusted health information and
advice for all Australians
#1 Australian health
information website
4.8m community
connections each month
Problems with Centralised
Data Directories
Healthcare Directories - Critical Healthcare Infrastructure
▪ Enables Care Coordination
▪ Single Point-of-Failure
▪ Bad Data Quality = Clinical Risk to
Patients
Healthcare Directories - Problems
Data Updated via Content
Management Systems and Call
Centres
This Model Is Reactive and Inefficient!
Data volatility (High Frequency of change to data)
Basic Centrally Managed Databases
Applications
Solution: Federated Data
Directory Platform
Federating Data is a Powerful Concept
Federated Database:
▪ Maps multiple autonomous database systems into
a single federated datastore
Federated Data Platform:
▪ Controlled aggregation to create “gold-standard
data” by using multiple Autonomous Origin Data
Sources
▪ Data Aggregation via Event Sourcing pipelines
Building the Federated Data “Puzzle”
Federal, State, Public/Private Hospitals, EMR, and other Commercial Vendors
participate as Systems of Records
Design Patterns for
Federated Data Platforms
Source Classification
▪ System of Record (SoR):
▪ Identify your Authoritative SoRs
SoRs have Role/s:
▪ Source of Truth
▪ Authoritative owner of a subset of data
▪ Source of Validation:
▪ Improve Data Quality
▪ Source of Notification:
▪ Increase “data currency”
▪ Gold Entities
▪ Your final entity models (e.g. Healthcare Service,
Organisation, Practitioner)
▪ Raw Entities
▪ Raw (Source) entities that are in pre-mapping stage that
would be eventual mapped to your gold entities
▪ Source Channels
▪ Pipeline channels that transition Raw Entities into new
version of Gold Entities
Entity / Channel Setup
Attribute Sourcing
Id: "561f10e4-0109-b99f-a2df-c059f9dc4a9b"
name: "Cottesloe Medical Centre"
bookingProviders [
{ Id: hotdoc,
providerIdentifier: cottesloe-medical-centre },
{ Id: healthengine,
providerIdentifier: ctl-m-re }
]
practitionerRelations [
{ pracId: c618860e-a69a
type: providerNumber,
value: 2djfkdn3k34 },
{ pracId: hsjfk3e-53vd
type: providerNumber,
value: dsfh4kslfls }
]
Calendar: {
openRules: […],
closedRules: […]
}
Contacts: {
Email: sss@gmail.com
Website: www.tt.com
Phone: 3242343
}
Medicare (SoV)
Healthscope (SoN)
Vendor Software (SoT)
Vendor Software (SoT)
Internal
Healthcare
Service
Practitioner
Relation
Details about
Practitioners who work at
a service
Internal
Internal
Vendor Software (SoT)
Data Federation
Pre-Processing Raw (Bronze) Stage (Silver) Gold Publishing
Pre-Processing Layer
Automated Pre Processing via Notebooks (origin API or
offline data extracts via SFTP, S3 pickup folders)
▪ Generate “Source Data” Event Object
{
DataPayload: <type> Raw Entity Model
Provenance: <type> Provenance
}
▪ DataPayload holds the “Raw Entity” (Source Specific
Model)
▪ Provenance used for source / origin identification
Raw Processing Layer (Bronze)
Picks up Source Data Event from “Pre Processing Output”
▪ Performs routine, high level parsing / cleansing
▪ Generate “Core Data” Event Object
▪ Which carries to each downstream layer and captures
transition/operational changes at each layer
▪ Generate Event Trace ID - the end-to-end traceability
identification
▪ Data Lineage Object “begins” to capture operational
outcomes to events
Stage Processing Layer (Silver)
Picks up Core Data Event from “Raw Output”
▪ Mapping Operation - Convert from Raw (Source) Entity
model to Gold Entity Model
▪ Referencing Operation - Enrichment using Reference
Data lookups
▪ Merging Operation - New Gold Version created
▪ Validation Operation: Final validation against Gold
Model validation Rules
Stage Processing Layer
▪ Merging Operation
▪ Matching by “primary key”
▪ Merging (based on last version) / Delta Determination
▪ Version Incrementation
▪ Metadata Attribution generated and appended
▪ Logs Every Change to Every Attribute on Every Event
▪ Individual Data Lineage Objects store all operational
outcomes on the event (attribute exceptions, violations,
status changes etc.)
Gold Processing Layer
Picks up Core Data Event from “Stage Output”
▪ Entity Relationship Validation - Ensures entity
relationships are “intact” - Prevents Orphans
▪ Re-Processing & Replay : Replay latest versions (for
new reference data, business / validation logic to apply)
▪ Data Science Layer - Data Quality Benchmarks
Data Provenance Object event_trace_id: 79d77056-c773-4496-
ac0d-5223c49e06f0
file_name:
ext_provider_location_service_bf14
feb8-538a-4f40-85eb-
93b77d2c1704_2019-09-
17T22:50:47Z.json
source_file_name:
ext_provider_location_service_bf14
feb8-538a-4f40-85eb-
93b77d2c1704_2019-09-
17T22:50:47Z.json
flow_name:
nifi_flow_ext_provider_location_se
rvice_withstate_v1
owner_agency: HDAInternal
arrival_timestamp: 2019-09-
17T22:50:46Z
primary_key: [“pLocSvcId”]
primary_key_temporal: TRUE
data_in_load_strategy: DELTA
unique_data_code:
ext_provider_location_service
version: v0.0.1
source_identifier: TAL-2324
Trace an Event back to it’s Exact Origin
▪ Identify Upstream Source Identity & Raw Source File
▪ Inject Source (External) Identifier (e.g. Jira Ticket #)
Source Intention
▪ Target Entity (what this event intends to update)
Data Lineage Object event_trace_id: 79d77056-c773-
4496-ac0d-5223c49e06f0
application_id: STAGE-01
application_name: STAGE
application_description: Versioned
Entity Data
application_version: 1.0.0
application_state:
STAGE_REFERENCING
dms_event_id: 4000ae0b-6b08-4dce-
a432-fff8e608e7ec
source_dms_event_id: 4de1802c-
70e6-4552-b2b0-4349bfc3a073
operation: [{
operation_name:
ENTITY_REFERENCING,
operation_rule_name:
plsParsing,
operation_result: SUCCESS,
failure_severity: “”,
attributes: [“”],
created_time: 2019-09-
30T22:39:47Z
}],
created_time: 2019-09-
30T22:39:47Z
Encapsulate Operation Outcomes that occur to Entity
Events
▪ Capture deviation of Data Quality
▪ Exceptions / Warnings
▪ Exceptions - Fix Data
▪ Warning - Improve Data (Quality)
▪ Visibility of End to End Data Flow (via Operational
Outcomes Summary)
Intelligent System of Record
Ranking
“Dedicated System of Record (SoR)”
Has full update authority over your
data attributes
1. Data Quality Regressions flow
into your System
2. Low Frequency of Change (Low
Data Currency)
Problem
“Candidate Systems of Record (CSoR)”
Alternate “SoRs” who compete to update
the same data
Healthcare Service
{ Opening Hours,
Contact Details}
SoR
A
CSoR
B
CSoR
C
Solution
Source Opening Hours Contact Details
SoR A Priority 1 Priority 1
CSoR B - Priority 2
CSoR C - Priority 3
SoR
C
In the same
MicroBatch - SoR A
wins over CSoR B
and CSoR C
SoR
B SoR
A
▪ Ranking assigned based on “business priority”
Healthcare Service Entity Attributes
Manual Ranking
Source
Total Updates Contact Details-
Lineage Warnings
Contact Details-
Lineage Errors
SoR A 10 4 2
CSoR B 8 1 0
CSoR C 2 1 1
▪ Data Lineage outcomes aggregated over last 30 days
▪ “Priority boosted” based on “Recent Performance” of Sources
Healthcare Service Entity Lineage Events
Automatic Ranking
Contact
Details
Priority 2
Priority 1
Priority 3
Updated Priority
Contact
Details
Priority 1
Priority 2
Priority 3
Original Priority
Source
Total
Updates
Lineage
Warnings
Lineage
Errors
Public
Complaints
Count
Completeness
Score
Consistency
Score
Accuracy
Score
Conformity
Score
Integrity
Score
… Nth
SoR A 10 4 2 2 60 20 99 56 21 …
CSoR B 8 1 0 1 45 34 80 54 22
CSoR C 2 1 1 6 78 45 34 56 45
…Nth
Source
… … … … … … … … … …
▪ Sources and Features are growing
▪ “Seasonal” Data Regression
▪ Source Data Quality Model: “Confidence Score” based on “Past Performance” applied in “Real Time”
Healthcare Service Entity Features
Intelligent Ranking - Future State
Architecture Patterns for
Federated Data Platforms
Architecture Overview
Logical Data Zones using Databricks DELTA
▪ Data Control Plane: LANDING, RAW, STAGE, GOLD (i.e. Bronze, Silver, Gold)
▪ Used DELTA Cache for performance optimisation (stream and batch workloads)
▪ Runs on AWS Accounts under our Security Policy and Regulatory Compliance
▪ Operational Control Plane: Cluster Administration, Management functions like,
Access Control, Jobs and Schedules
Data Plane & Processing Pipeline
Continuous Streaming Applications
▪ Enables “True Event Sourcing” via Streaming
Input, Kinesis, S3, and DELTA
▪ Running Micro-batches lead to smaller and
more manageable data volumes
▪ Recoverability through Checkpoints and
Reliability through Streaming Sinks to DELTA
tables
Data Issue Problem Statement:
Downstream Health Integrator is complaining that un-
anticipated special unicode characters in the service
description is breaking their integration.
Restore & Recover Data Versions Seamlessly
▪ During Data quality issues, we can rewind to
previous versions
▪ Using Metadata attribution, Provenance and Data
Lineage features - we can trace the root cause to a
specific origin source up-to millisecond precision
▪ Complete audit trail and ability to provide SoR Data
Quality Reporting
Questions & Feedback
Mark Paul - @ThisIsMarkPaul
Your feedback is important to us.
Don’t forget to rate and
review the sessions.

More Related Content

What's hot

Implementing Effective Data Governance
Implementing Effective Data GovernanceImplementing Effective Data Governance
Implementing Effective Data GovernanceChristopher Bradley
 
Health Informatics- Module 1-Chapter 2.pptx
Health Informatics- Module 1-Chapter 2.pptxHealth Informatics- Module 1-Chapter 2.pptx
Health Informatics- Module 1-Chapter 2.pptxArti Parab Academics
 
Systems Analyst and Design - Data Dictionary
Systems Analyst and Design -  Data DictionarySystems Analyst and Design -  Data Dictionary
Systems Analyst and Design - Data DictionaryKimberly Coquilla
 
Reference master data management
Reference master data managementReference master data management
Reference master data managementDr. Hamdan Al-Sabri
 
Best Practices of Hospital IT from Ramathibodi Hospital
Best Practices of Hospital IT from Ramathibodi HospitalBest Practices of Hospital IT from Ramathibodi Hospital
Best Practices of Hospital IT from Ramathibodi HospitalNawanan Theera-Ampornpunt
 
3D Data Strategy Framework
3D Data Strategy Framework3D Data Strategy Framework
3D Data Strategy FrameworkDaniel Ren
 
Building an integrated data strategy
Building an integrated data strategyBuilding an integrated data strategy
Building an integrated data strategyLucas Modesto
 
Microsoft Data Warehouse Business Intelligence Lifecycle - The Kimball Approach
Microsoft Data Warehouse Business Intelligence Lifecycle - The Kimball ApproachMicrosoft Data Warehouse Business Intelligence Lifecycle - The Kimball Approach
Microsoft Data Warehouse Business Intelligence Lifecycle - The Kimball ApproachMark Ginnebaugh
 
Tableau Drive, A new methodology for scaling your analytic culture
Tableau Drive, A new methodology for scaling your analytic cultureTableau Drive, A new methodology for scaling your analytic culture
Tableau Drive, A new methodology for scaling your analytic cultureTableau Software
 
Becoming a Data-Driven Organization - Aligning Business & Data Strategy
Becoming a Data-Driven Organization - Aligning Business & Data StrategyBecoming a Data-Driven Organization - Aligning Business & Data Strategy
Becoming a Data-Driven Organization - Aligning Business & Data StrategyDATAVERSITY
 
IT Infrastructure Management Powerpoint Presentation Slides
IT Infrastructure Management Powerpoint Presentation SlidesIT Infrastructure Management Powerpoint Presentation Slides
IT Infrastructure Management Powerpoint Presentation SlidesSlideTeam
 
Informatica MDM Presentation
Informatica MDM PresentationInformatica MDM Presentation
Informatica MDM PresentationMaxHung
 
DAS Slides: Master Data Management – Aligning Data, Process, and Governance
DAS Slides: Master Data Management – Aligning Data, Process, and GovernanceDAS Slides: Master Data Management – Aligning Data, Process, and Governance
DAS Slides: Master Data Management – Aligning Data, Process, and GovernanceDATAVERSITY
 
Information Architecture System Design (IA)
Information Architecture System Design (IA)Information Architecture System Design (IA)
Information Architecture System Design (IA)Billy Choi
 
Master Data Management - Aligning Data, Process, and Governance
Master Data Management - Aligning Data, Process, and GovernanceMaster Data Management - Aligning Data, Process, and Governance
Master Data Management - Aligning Data, Process, and GovernanceDATAVERSITY
 
The Importance of Master Data Management
The Importance of Master Data ManagementThe Importance of Master Data Management
The Importance of Master Data ManagementDATAVERSITY
 
Data warehousing - Dr. Radhika Kotecha
Data warehousing - Dr. Radhika KotechaData warehousing - Dr. Radhika Kotecha
Data warehousing - Dr. Radhika KotechaRadhika Kotecha
 
Asegurando la calidad del dato en mi entorno de business intelligence
Asegurando la calidad del dato en mi entorno de business intelligenceAsegurando la calidad del dato en mi entorno de business intelligence
Asegurando la calidad del dato en mi entorno de business intelligenceMary Arcia
 

What's hot (20)

Implementing Effective Data Governance
Implementing Effective Data GovernanceImplementing Effective Data Governance
Implementing Effective Data Governance
 
Health Informatics- Module 1-Chapter 2.pptx
Health Informatics- Module 1-Chapter 2.pptxHealth Informatics- Module 1-Chapter 2.pptx
Health Informatics- Module 1-Chapter 2.pptx
 
Systems Analyst and Design - Data Dictionary
Systems Analyst and Design -  Data DictionarySystems Analyst and Design -  Data Dictionary
Systems Analyst and Design - Data Dictionary
 
Reference master data management
Reference master data managementReference master data management
Reference master data management
 
Best Practices of Hospital IT from Ramathibodi Hospital
Best Practices of Hospital IT from Ramathibodi HospitalBest Practices of Hospital IT from Ramathibodi Hospital
Best Practices of Hospital IT from Ramathibodi Hospital
 
3D Data Strategy Framework
3D Data Strategy Framework3D Data Strategy Framework
3D Data Strategy Framework
 
Building an integrated data strategy
Building an integrated data strategyBuilding an integrated data strategy
Building an integrated data strategy
 
Microsoft Data Warehouse Business Intelligence Lifecycle - The Kimball Approach
Microsoft Data Warehouse Business Intelligence Lifecycle - The Kimball ApproachMicrosoft Data Warehouse Business Intelligence Lifecycle - The Kimball Approach
Microsoft Data Warehouse Business Intelligence Lifecycle - The Kimball Approach
 
Tableau Drive, A new methodology for scaling your analytic culture
Tableau Drive, A new methodology for scaling your analytic cultureTableau Drive, A new methodology for scaling your analytic culture
Tableau Drive, A new methodology for scaling your analytic culture
 
Big data unit i
Big data unit iBig data unit i
Big data unit i
 
Becoming a Data-Driven Organization - Aligning Business & Data Strategy
Becoming a Data-Driven Organization - Aligning Business & Data StrategyBecoming a Data-Driven Organization - Aligning Business & Data Strategy
Becoming a Data-Driven Organization - Aligning Business & Data Strategy
 
IT Infrastructure Management Powerpoint Presentation Slides
IT Infrastructure Management Powerpoint Presentation SlidesIT Infrastructure Management Powerpoint Presentation Slides
IT Infrastructure Management Powerpoint Presentation Slides
 
Data Cleaning Techniques
Data Cleaning TechniquesData Cleaning Techniques
Data Cleaning Techniques
 
Informatica MDM Presentation
Informatica MDM PresentationInformatica MDM Presentation
Informatica MDM Presentation
 
DAS Slides: Master Data Management – Aligning Data, Process, and Governance
DAS Slides: Master Data Management – Aligning Data, Process, and GovernanceDAS Slides: Master Data Management – Aligning Data, Process, and Governance
DAS Slides: Master Data Management – Aligning Data, Process, and Governance
 
Information Architecture System Design (IA)
Information Architecture System Design (IA)Information Architecture System Design (IA)
Information Architecture System Design (IA)
 
Master Data Management - Aligning Data, Process, and Governance
Master Data Management - Aligning Data, Process, and GovernanceMaster Data Management - Aligning Data, Process, and Governance
Master Data Management - Aligning Data, Process, and Governance
 
The Importance of Master Data Management
The Importance of Master Data ManagementThe Importance of Master Data Management
The Importance of Master Data Management
 
Data warehousing - Dr. Radhika Kotecha
Data warehousing - Dr. Radhika KotechaData warehousing - Dr. Radhika Kotecha
Data warehousing - Dr. Radhika Kotecha
 
Asegurando la calidad del dato en mi entorno de business intelligence
Asegurando la calidad del dato en mi entorno de business intelligenceAsegurando la calidad del dato en mi entorno de business intelligence
Asegurando la calidad del dato en mi entorno de business intelligence
 

Similar to Building a Federated Data Directory Platform for Public Health

Flattening the Curve with Kafka (Rishi Tarar, Northrop Grumman Corp.) Kafka S...
Flattening the Curve with Kafka (Rishi Tarar, Northrop Grumman Corp.) Kafka S...Flattening the Curve with Kafka (Rishi Tarar, Northrop Grumman Corp.) Kafka S...
Flattening the Curve with Kafka (Rishi Tarar, Northrop Grumman Corp.) Kafka S...confluent
 
The Stream is the Database - Revolutionizing Healthcare Data Architecture
The Stream is the Database - Revolutionizing Healthcare Data ArchitectureThe Stream is the Database - Revolutionizing Healthcare Data Architecture
The Stream is the Database - Revolutionizing Healthcare Data ArchitectureDataWorks Summit/Hadoop Summit
 
5 Shades of Analytics - Presentation Version - Distributable Version
5 Shades of Analytics - Presentation Version - Distributable Version5 Shades of Analytics - Presentation Version - Distributable Version
5 Shades of Analytics - Presentation Version - Distributable VersionMichael Josephs
 
BCS DMSG Healthcare Data Management : Transformation through Migration 26-1...
BCS DMSG Healthcare Data Management : Transformation through Migration   26-1...BCS DMSG Healthcare Data Management : Transformation through Migration   26-1...
BCS DMSG Healthcare Data Management : Transformation through Migration 26-1...BCS Data Management Specialist Group
 
Customer Intelligence_ Harnessing Elephants at Transamerica Presentation (1)
Customer Intelligence_ Harnessing Elephants at Transamerica    Presentation (1)Customer Intelligence_ Harnessing Elephants at Transamerica    Presentation (1)
Customer Intelligence_ Harnessing Elephants at Transamerica Presentation (1)Vishal Bamba
 
Real time data integration best practices and architecture
Real time data integration best practices and architectureReal time data integration best practices and architecture
Real time data integration best practices and architectureBui Kiet
 
Painting the Future of Big Data with Apache Spark and MongoDB
Painting the Future of Big Data with Apache Spark and MongoDBPainting the Future of Big Data with Apache Spark and MongoDB
Painting the Future of Big Data with Apache Spark and MongoDBMongoDB
 
Integrating SIS’s with Salesforce: An Accidental Integrator’s Guide
Integrating SIS’s with Salesforce: An Accidental Integrator’s GuideIntegrating SIS’s with Salesforce: An Accidental Integrator’s Guide
Integrating SIS’s with Salesforce: An Accidental Integrator’s GuideSalesforce.org
 
Data Management 2: Conquering Data Proliferation
Data Management 2: Conquering Data ProliferationData Management 2: Conquering Data Proliferation
Data Management 2: Conquering Data ProliferationMongoDB
 
Recording and Reasoning Over Data Provenance in Web and Grid Services
Recording and Reasoning Over Data Provenance in Web and Grid ServicesRecording and Reasoning Over Data Provenance in Web and Grid Services
Recording and Reasoning Over Data Provenance in Web and Grid ServicesMartin Szomszor
 
Partners in Technology 13 Sept 2013 HSIA CIO Ray Brown
Partners in Technology 13 Sept 2013 HSIA CIO Ray BrownPartners in Technology 13 Sept 2013 HSIA CIO Ray Brown
Partners in Technology 13 Sept 2013 HSIA CIO Ray BrownDigital Queensland
 
Privacy and Auditing in Clouds
Privacy and Auditing in CloudsPrivacy and Auditing in Clouds
Privacy and Auditing in CloudsTyrone Grandison
 
Building Custom Big Data Integrations
Building Custom Big Data IntegrationsBuilding Custom Big Data Integrations
Building Custom Big Data IntegrationsPat Patterson
 
Neoaug 2013 critical success factors for data quality management-chain-sys-co...
Neoaug 2013 critical success factors for data quality management-chain-sys-co...Neoaug 2013 critical success factors for data quality management-chain-sys-co...
Neoaug 2013 critical success factors for data quality management-chain-sys-co...Chain Sys Corporation
 
online Blood Bank management system
online Blood Bank management system online Blood Bank management system
online Blood Bank management system amarsajid
 
How Cognizant's ZDLC solution is helping Data Lineage for compliance to Basel...
How Cognizant's ZDLC solution is helping Data Lineage for compliance to Basel...How Cognizant's ZDLC solution is helping Data Lineage for compliance to Basel...
How Cognizant's ZDLC solution is helping Data Lineage for compliance to Basel...Dr. Bippin Makoond
 
Advanced Analytics and Machine Learning with Data Virtualization
Advanced Analytics and Machine Learning with Data VirtualizationAdvanced Analytics and Machine Learning with Data Virtualization
Advanced Analytics and Machine Learning with Data VirtualizationDenodo
 

Similar to Building a Federated Data Directory Platform for Public Health (20)

Flattening the Curve with Kafka (Rishi Tarar, Northrop Grumman Corp.) Kafka S...
Flattening the Curve with Kafka (Rishi Tarar, Northrop Grumman Corp.) Kafka S...Flattening the Curve with Kafka (Rishi Tarar, Northrop Grumman Corp.) Kafka S...
Flattening the Curve with Kafka (Rishi Tarar, Northrop Grumman Corp.) Kafka S...
 
End User Informatics
End User InformaticsEnd User Informatics
End User Informatics
 
The Stream is the Database - Revolutionizing Healthcare Data Architecture
The Stream is the Database - Revolutionizing Healthcare Data ArchitectureThe Stream is the Database - Revolutionizing Healthcare Data Architecture
The Stream is the Database - Revolutionizing Healthcare Data Architecture
 
5 Shades of Analytics - Presentation Version - Distributable Version
5 Shades of Analytics - Presentation Version - Distributable Version5 Shades of Analytics - Presentation Version - Distributable Version
5 Shades of Analytics - Presentation Version - Distributable Version
 
BCS DMSG Healthcare Data Management : Transformation through Migration 26-1...
BCS DMSG Healthcare Data Management : Transformation through Migration   26-1...BCS DMSG Healthcare Data Management : Transformation through Migration   26-1...
BCS DMSG Healthcare Data Management : Transformation through Migration 26-1...
 
Customer Intelligence_ Harnessing Elephants at Transamerica Presentation (1)
Customer Intelligence_ Harnessing Elephants at Transamerica    Presentation (1)Customer Intelligence_ Harnessing Elephants at Transamerica    Presentation (1)
Customer Intelligence_ Harnessing Elephants at Transamerica Presentation (1)
 
Real time data integration best practices and architecture
Real time data integration best practices and architectureReal time data integration best practices and architecture
Real time data integration best practices and architecture
 
Painting the Future of Big Data with Apache Spark and MongoDB
Painting the Future of Big Data with Apache Spark and MongoDBPainting the Future of Big Data with Apache Spark and MongoDB
Painting the Future of Big Data with Apache Spark and MongoDB
 
Integrating SIS’s with Salesforce: An Accidental Integrator’s Guide
Integrating SIS’s with Salesforce: An Accidental Integrator’s GuideIntegrating SIS’s with Salesforce: An Accidental Integrator’s Guide
Integrating SIS’s with Salesforce: An Accidental Integrator’s Guide
 
Data Management 2: Conquering Data Proliferation
Data Management 2: Conquering Data ProliferationData Management 2: Conquering Data Proliferation
Data Management 2: Conquering Data Proliferation
 
Recording and Reasoning Over Data Provenance in Web and Grid Services
Recording and Reasoning Over Data Provenance in Web and Grid ServicesRecording and Reasoning Over Data Provenance in Web and Grid Services
Recording and Reasoning Over Data Provenance in Web and Grid Services
 
Nic solution strategy
Nic solution strategyNic solution strategy
Nic solution strategy
 
Partners in Technology 13 Sept 2013 HSIA CIO Ray Brown
Partners in Technology 13 Sept 2013 HSIA CIO Ray BrownPartners in Technology 13 Sept 2013 HSIA CIO Ray Brown
Partners in Technology 13 Sept 2013 HSIA CIO Ray Brown
 
Privacy and Auditing in Clouds
Privacy and Auditing in CloudsPrivacy and Auditing in Clouds
Privacy and Auditing in Clouds
 
Building Custom Big Data Integrations
Building Custom Big Data IntegrationsBuilding Custom Big Data Integrations
Building Custom Big Data Integrations
 
Neoaug 2013 critical success factors for data quality management-chain-sys-co...
Neoaug 2013 critical success factors for data quality management-chain-sys-co...Neoaug 2013 critical success factors for data quality management-chain-sys-co...
Neoaug 2013 critical success factors for data quality management-chain-sys-co...
 
WebAction-Sami Abkay
WebAction-Sami AbkayWebAction-Sami Abkay
WebAction-Sami Abkay
 
online Blood Bank management system
online Blood Bank management system online Blood Bank management system
online Blood Bank management system
 
How Cognizant's ZDLC solution is helping Data Lineage for compliance to Basel...
How Cognizant's ZDLC solution is helping Data Lineage for compliance to Basel...How Cognizant's ZDLC solution is helping Data Lineage for compliance to Basel...
How Cognizant's ZDLC solution is helping Data Lineage for compliance to Basel...
 
Advanced Analytics and Machine Learning with Data Virtualization
Advanced Analytics and Machine Learning with Data VirtualizationAdvanced Analytics and Machine Learning with Data Virtualization
Advanced Analytics and Machine Learning with Data Virtualization
 

More from Databricks

DW Migration Webinar-March 2022.pptx
DW Migration Webinar-March 2022.pptxDW Migration Webinar-March 2022.pptx
DW Migration Webinar-March 2022.pptxDatabricks
 
Data Lakehouse Symposium | Day 1 | Part 1
Data Lakehouse Symposium | Day 1 | Part 1Data Lakehouse Symposium | Day 1 | Part 1
Data Lakehouse Symposium | Day 1 | Part 1Databricks
 
Data Lakehouse Symposium | Day 1 | Part 2
Data Lakehouse Symposium | Day 1 | Part 2Data Lakehouse Symposium | Day 1 | Part 2
Data Lakehouse Symposium | Day 1 | Part 2Databricks
 
Data Lakehouse Symposium | Day 2
Data Lakehouse Symposium | Day 2Data Lakehouse Symposium | Day 2
Data Lakehouse Symposium | Day 2Databricks
 
Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4Databricks
 
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
5 Critical Steps to Clean Your Data Swamp When Migrating Off of HadoopDatabricks
 
Democratizing Data Quality Through a Centralized Platform
Democratizing Data Quality Through a Centralized PlatformDemocratizing Data Quality Through a Centralized Platform
Democratizing Data Quality Through a Centralized PlatformDatabricks
 
Learn to Use Databricks for Data Science
Learn to Use Databricks for Data ScienceLearn to Use Databricks for Data Science
Learn to Use Databricks for Data ScienceDatabricks
 
Why APM Is Not the Same As ML Monitoring
Why APM Is Not the Same As ML MonitoringWhy APM Is Not the Same As ML Monitoring
Why APM Is Not the Same As ML MonitoringDatabricks
 
The Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
The Function, the Context, and the Data—Enabling ML Ops at Stitch FixThe Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
The Function, the Context, and the Data—Enabling ML Ops at Stitch FixDatabricks
 
Stage Level Scheduling Improving Big Data and AI Integration
Stage Level Scheduling Improving Big Data and AI IntegrationStage Level Scheduling Improving Big Data and AI Integration
Stage Level Scheduling Improving Big Data and AI IntegrationDatabricks
 
Simplify Data Conversion from Spark to TensorFlow and PyTorch
Simplify Data Conversion from Spark to TensorFlow and PyTorchSimplify Data Conversion from Spark to TensorFlow and PyTorch
Simplify Data Conversion from Spark to TensorFlow and PyTorchDatabricks
 
Scaling your Data Pipelines with Apache Spark on Kubernetes
Scaling your Data Pipelines with Apache Spark on KubernetesScaling your Data Pipelines with Apache Spark on Kubernetes
Scaling your Data Pipelines with Apache Spark on KubernetesDatabricks
 
Scaling and Unifying SciKit Learn and Apache Spark Pipelines
Scaling and Unifying SciKit Learn and Apache Spark PipelinesScaling and Unifying SciKit Learn and Apache Spark Pipelines
Scaling and Unifying SciKit Learn and Apache Spark PipelinesDatabricks
 
Sawtooth Windows for Feature Aggregations
Sawtooth Windows for Feature AggregationsSawtooth Windows for Feature Aggregations
Sawtooth Windows for Feature AggregationsDatabricks
 
Redis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
Redis + Apache Spark = Swiss Army Knife Meets Kitchen SinkRedis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
Redis + Apache Spark = Swiss Army Knife Meets Kitchen SinkDatabricks
 
Re-imagine Data Monitoring with whylogs and Spark
Re-imagine Data Monitoring with whylogs and SparkRe-imagine Data Monitoring with whylogs and Spark
Re-imagine Data Monitoring with whylogs and SparkDatabricks
 
Raven: End-to-end Optimization of ML Prediction Queries
Raven: End-to-end Optimization of ML Prediction QueriesRaven: End-to-end Optimization of ML Prediction Queries
Raven: End-to-end Optimization of ML Prediction QueriesDatabricks
 
Processing Large Datasets for ADAS Applications using Apache Spark
Processing Large Datasets for ADAS Applications using Apache SparkProcessing Large Datasets for ADAS Applications using Apache Spark
Processing Large Datasets for ADAS Applications using Apache SparkDatabricks
 
Massive Data Processing in Adobe Using Delta Lake
Massive Data Processing in Adobe Using Delta LakeMassive Data Processing in Adobe Using Delta Lake
Massive Data Processing in Adobe Using Delta LakeDatabricks
 

More from Databricks (20)

DW Migration Webinar-March 2022.pptx
DW Migration Webinar-March 2022.pptxDW Migration Webinar-March 2022.pptx
DW Migration Webinar-March 2022.pptx
 
Data Lakehouse Symposium | Day 1 | Part 1
Data Lakehouse Symposium | Day 1 | Part 1Data Lakehouse Symposium | Day 1 | Part 1
Data Lakehouse Symposium | Day 1 | Part 1
 
Data Lakehouse Symposium | Day 1 | Part 2
Data Lakehouse Symposium | Day 1 | Part 2Data Lakehouse Symposium | Day 1 | Part 2
Data Lakehouse Symposium | Day 1 | Part 2
 
Data Lakehouse Symposium | Day 2
Data Lakehouse Symposium | Day 2Data Lakehouse Symposium | Day 2
Data Lakehouse Symposium | Day 2
 
Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4
 
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
 
Democratizing Data Quality Through a Centralized Platform
Democratizing Data Quality Through a Centralized PlatformDemocratizing Data Quality Through a Centralized Platform
Democratizing Data Quality Through a Centralized Platform
 
Learn to Use Databricks for Data Science
Learn to Use Databricks for Data ScienceLearn to Use Databricks for Data Science
Learn to Use Databricks for Data Science
 
Why APM Is Not the Same As ML Monitoring
Why APM Is Not the Same As ML MonitoringWhy APM Is Not the Same As ML Monitoring
Why APM Is Not the Same As ML Monitoring
 
The Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
The Function, the Context, and the Data—Enabling ML Ops at Stitch FixThe Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
The Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
 
Stage Level Scheduling Improving Big Data and AI Integration
Stage Level Scheduling Improving Big Data and AI IntegrationStage Level Scheduling Improving Big Data and AI Integration
Stage Level Scheduling Improving Big Data and AI Integration
 
Simplify Data Conversion from Spark to TensorFlow and PyTorch
Simplify Data Conversion from Spark to TensorFlow and PyTorchSimplify Data Conversion from Spark to TensorFlow and PyTorch
Simplify Data Conversion from Spark to TensorFlow and PyTorch
 
Scaling your Data Pipelines with Apache Spark on Kubernetes
Scaling your Data Pipelines with Apache Spark on KubernetesScaling your Data Pipelines with Apache Spark on Kubernetes
Scaling your Data Pipelines with Apache Spark on Kubernetes
 
Scaling and Unifying SciKit Learn and Apache Spark Pipelines
Scaling and Unifying SciKit Learn and Apache Spark PipelinesScaling and Unifying SciKit Learn and Apache Spark Pipelines
Scaling and Unifying SciKit Learn and Apache Spark Pipelines
 
Sawtooth Windows for Feature Aggregations
Sawtooth Windows for Feature AggregationsSawtooth Windows for Feature Aggregations
Sawtooth Windows for Feature Aggregations
 
Redis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
Redis + Apache Spark = Swiss Army Knife Meets Kitchen SinkRedis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
Redis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
 
Re-imagine Data Monitoring with whylogs and Spark
Re-imagine Data Monitoring with whylogs and SparkRe-imagine Data Monitoring with whylogs and Spark
Re-imagine Data Monitoring with whylogs and Spark
 
Raven: End-to-end Optimization of ML Prediction Queries
Raven: End-to-end Optimization of ML Prediction QueriesRaven: End-to-end Optimization of ML Prediction Queries
Raven: End-to-end Optimization of ML Prediction Queries
 
Processing Large Datasets for ADAS Applications using Apache Spark
Processing Large Datasets for ADAS Applications using Apache SparkProcessing Large Datasets for ADAS Applications using Apache Spark
Processing Large Datasets for ADAS Applications using Apache Spark
 
Massive Data Processing in Adobe Using Delta Lake
Massive Data Processing in Adobe Using Delta LakeMassive Data Processing in Adobe Using Delta Lake
Massive Data Processing in Adobe Using Delta Lake
 

Recently uploaded

如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样wsppdmt
 
Discover Why Less is More in B2B Research
Discover Why Less is More in B2B ResearchDiscover Why Less is More in B2B Research
Discover Why Less is More in B2B Researchmichael115558
 
Top profile Call Girls In Indore [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Indore [ 7014168258 ] Call Me For Genuine Models We...Top profile Call Girls In Indore [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Indore [ 7014168258 ] Call Me For Genuine Models We...gajnagarg
 
Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...gajnagarg
 
+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...
+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...
+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...Health
 
Kings of Saudi Arabia, information about them
Kings of Saudi Arabia, information about themKings of Saudi Arabia, information about them
Kings of Saudi Arabia, information about themeitharjee
 
Fun all Day Call Girls in Jaipur 9332606886 High Profile Call Girls You Ca...
Fun all Day Call Girls in Jaipur   9332606886  High Profile Call Girls You Ca...Fun all Day Call Girls in Jaipur   9332606886  High Profile Call Girls You Ca...
Fun all Day Call Girls in Jaipur 9332606886 High Profile Call Girls You Ca...kumargunjan9515
 
High Profile Call Girls Service in Jalore { 9332606886 } VVIP NISHA Call Girl...
High Profile Call Girls Service in Jalore { 9332606886 } VVIP NISHA Call Girl...High Profile Call Girls Service in Jalore { 9332606886 } VVIP NISHA Call Girl...
High Profile Call Girls Service in Jalore { 9332606886 } VVIP NISHA Call Girl...kumargunjan9515
 
Charbagh + Female Escorts Service in Lucknow | Starting ₹,5K To @25k with A/C...
Charbagh + Female Escorts Service in Lucknow | Starting ₹,5K To @25k with A/C...Charbagh + Female Escorts Service in Lucknow | Starting ₹,5K To @25k with A/C...
Charbagh + Female Escorts Service in Lucknow | Starting ₹,5K To @25k with A/C...HyderabadDolls
 
Sonagachi * best call girls in Kolkata | ₹,9500 Pay Cash 8005736733 Free Home...
Sonagachi * best call girls in Kolkata | ₹,9500 Pay Cash 8005736733 Free Home...Sonagachi * best call girls in Kolkata | ₹,9500 Pay Cash 8005736733 Free Home...
Sonagachi * best call girls in Kolkata | ₹,9500 Pay Cash 8005736733 Free Home...HyderabadDolls
 
Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...nirzagarg
 
Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...
Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...
Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...gajnagarg
 
Gulbai Tekra * Cheap Call Girls In Ahmedabad Phone No 8005736733 Elite Escort...
Gulbai Tekra * Cheap Call Girls In Ahmedabad Phone No 8005736733 Elite Escort...Gulbai Tekra * Cheap Call Girls In Ahmedabad Phone No 8005736733 Elite Escort...
Gulbai Tekra * Cheap Call Girls In Ahmedabad Phone No 8005736733 Elite Escort...gragchanchal546
 
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteedamy56318795
 
Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...nirzagarg
 
Lecture_2_Deep_Learning_Overview-newone1
Lecture_2_Deep_Learning_Overview-newone1Lecture_2_Deep_Learning_Overview-newone1
Lecture_2_Deep_Learning_Overview-newone1ranjankumarbehera14
 
7. Epi of Chronic respiratory diseases.ppt
7. Epi of Chronic respiratory diseases.ppt7. Epi of Chronic respiratory diseases.ppt
7. Epi of Chronic respiratory diseases.pptibrahimabdi22
 
Top Call Girls in Balaghat 9332606886Call Girls Advance Cash On Delivery Ser...
Top Call Girls in Balaghat  9332606886Call Girls Advance Cash On Delivery Ser...Top Call Girls in Balaghat  9332606886Call Girls Advance Cash On Delivery Ser...
Top Call Girls in Balaghat 9332606886Call Girls Advance Cash On Delivery Ser...kumargunjan9515
 
Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...
Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...
Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...nirzagarg
 

Recently uploaded (20)

如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样
 
Discover Why Less is More in B2B Research
Discover Why Less is More in B2B ResearchDiscover Why Less is More in B2B Research
Discover Why Less is More in B2B Research
 
Top profile Call Girls In Indore [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Indore [ 7014168258 ] Call Me For Genuine Models We...Top profile Call Girls In Indore [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Indore [ 7014168258 ] Call Me For Genuine Models We...
 
Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...
 
+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...
+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...
+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...
 
Kings of Saudi Arabia, information about them
Kings of Saudi Arabia, information about themKings of Saudi Arabia, information about them
Kings of Saudi Arabia, information about them
 
Fun all Day Call Girls in Jaipur 9332606886 High Profile Call Girls You Ca...
Fun all Day Call Girls in Jaipur   9332606886  High Profile Call Girls You Ca...Fun all Day Call Girls in Jaipur   9332606886  High Profile Call Girls You Ca...
Fun all Day Call Girls in Jaipur 9332606886 High Profile Call Girls You Ca...
 
Abortion pills in Jeddah | +966572737505 | Get Cytotec
Abortion pills in Jeddah | +966572737505 | Get CytotecAbortion pills in Jeddah | +966572737505 | Get Cytotec
Abortion pills in Jeddah | +966572737505 | Get Cytotec
 
High Profile Call Girls Service in Jalore { 9332606886 } VVIP NISHA Call Girl...
High Profile Call Girls Service in Jalore { 9332606886 } VVIP NISHA Call Girl...High Profile Call Girls Service in Jalore { 9332606886 } VVIP NISHA Call Girl...
High Profile Call Girls Service in Jalore { 9332606886 } VVIP NISHA Call Girl...
 
Charbagh + Female Escorts Service in Lucknow | Starting ₹,5K To @25k with A/C...
Charbagh + Female Escorts Service in Lucknow | Starting ₹,5K To @25k with A/C...Charbagh + Female Escorts Service in Lucknow | Starting ₹,5K To @25k with A/C...
Charbagh + Female Escorts Service in Lucknow | Starting ₹,5K To @25k with A/C...
 
Sonagachi * best call girls in Kolkata | ₹,9500 Pay Cash 8005736733 Free Home...
Sonagachi * best call girls in Kolkata | ₹,9500 Pay Cash 8005736733 Free Home...Sonagachi * best call girls in Kolkata | ₹,9500 Pay Cash 8005736733 Free Home...
Sonagachi * best call girls in Kolkata | ₹,9500 Pay Cash 8005736733 Free Home...
 
Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...
 
Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...
Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...
Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...
 
Gulbai Tekra * Cheap Call Girls In Ahmedabad Phone No 8005736733 Elite Escort...
Gulbai Tekra * Cheap Call Girls In Ahmedabad Phone No 8005736733 Elite Escort...Gulbai Tekra * Cheap Call Girls In Ahmedabad Phone No 8005736733 Elite Escort...
Gulbai Tekra * Cheap Call Girls In Ahmedabad Phone No 8005736733 Elite Escort...
 
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
 
Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...
 
Lecture_2_Deep_Learning_Overview-newone1
Lecture_2_Deep_Learning_Overview-newone1Lecture_2_Deep_Learning_Overview-newone1
Lecture_2_Deep_Learning_Overview-newone1
 
7. Epi of Chronic respiratory diseases.ppt
7. Epi of Chronic respiratory diseases.ppt7. Epi of Chronic respiratory diseases.ppt
7. Epi of Chronic respiratory diseases.ppt
 
Top Call Girls in Balaghat 9332606886Call Girls Advance Cash On Delivery Ser...
Top Call Girls in Balaghat  9332606886Call Girls Advance Cash On Delivery Ser...Top Call Girls in Balaghat  9332606886Call Girls Advance Cash On Delivery Ser...
Top Call Girls in Balaghat 9332606886Call Girls Advance Cash On Delivery Ser...
 
Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...
Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...
Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...
 

Building a Federated Data Directory Platform for Public Health

  • 1.
  • 2. Building a Federated Data Directory Platform for Public Health Mark Paul Engineering Manager Anshul Bajpai Data Engineering Lead
  • 3. Agenda 1. Problems with Centralised Data Directories 2. Solution: Federated Data Directory Platform 3. Design Patterns 4. Intelligent System of Record Ranking 5. Architecture Patterns
  • 4. ▪ Australian digital health infrastructure ▪ National directory of health services and the practitioners who provide them ▪ National, government-owned, not-for-profit organization ▪ Trusted health information and advice for all Australians #1 Australian health information website 4.8m community connections each month
  • 6. Healthcare Directories - Critical Healthcare Infrastructure ▪ Enables Care Coordination ▪ Single Point-of-Failure ▪ Bad Data Quality = Clinical Risk to Patients
  • 7. Healthcare Directories - Problems Data Updated via Content Management Systems and Call Centres This Model Is Reactive and Inefficient! Data volatility (High Frequency of change to data) Basic Centrally Managed Databases Applications
  • 9. Federating Data is a Powerful Concept Federated Database: ▪ Maps multiple autonomous database systems into a single federated datastore Federated Data Platform: ▪ Controlled aggregation to create “gold-standard data” by using multiple Autonomous Origin Data Sources ▪ Data Aggregation via Event Sourcing pipelines
  • 10. Building the Federated Data “Puzzle” Federal, State, Public/Private Hospitals, EMR, and other Commercial Vendors participate as Systems of Records
  • 12. Source Classification ▪ System of Record (SoR): ▪ Identify your Authoritative SoRs SoRs have Role/s: ▪ Source of Truth ▪ Authoritative owner of a subset of data ▪ Source of Validation: ▪ Improve Data Quality ▪ Source of Notification: ▪ Increase “data currency” ▪ Gold Entities ▪ Your final entity models (e.g. Healthcare Service, Organisation, Practitioner) ▪ Raw Entities ▪ Raw (Source) entities that are in pre-mapping stage that would be eventual mapped to your gold entities ▪ Source Channels ▪ Pipeline channels that transition Raw Entities into new version of Gold Entities Entity / Channel Setup
  • 13. Attribute Sourcing Id: "561f10e4-0109-b99f-a2df-c059f9dc4a9b" name: "Cottesloe Medical Centre" bookingProviders [ { Id: hotdoc, providerIdentifier: cottesloe-medical-centre }, { Id: healthengine, providerIdentifier: ctl-m-re } ] practitionerRelations [ { pracId: c618860e-a69a type: providerNumber, value: 2djfkdn3k34 }, { pracId: hsjfk3e-53vd type: providerNumber, value: dsfh4kslfls } ] Calendar: { openRules: […], closedRules: […] } Contacts: { Email: sss@gmail.com Website: www.tt.com Phone: 3242343 } Medicare (SoV) Healthscope (SoN) Vendor Software (SoT) Vendor Software (SoT) Internal Healthcare Service Practitioner Relation Details about Practitioners who work at a service Internal Internal Vendor Software (SoT) Data Federation
  • 14. Pre-Processing Raw (Bronze) Stage (Silver) Gold Publishing
  • 15. Pre-Processing Layer Automated Pre Processing via Notebooks (origin API or offline data extracts via SFTP, S3 pickup folders) ▪ Generate “Source Data” Event Object { DataPayload: <type> Raw Entity Model Provenance: <type> Provenance } ▪ DataPayload holds the “Raw Entity” (Source Specific Model) ▪ Provenance used for source / origin identification
  • 16. Raw Processing Layer (Bronze) Picks up Source Data Event from “Pre Processing Output” ▪ Performs routine, high level parsing / cleansing ▪ Generate “Core Data” Event Object ▪ Which carries to each downstream layer and captures transition/operational changes at each layer ▪ Generate Event Trace ID - the end-to-end traceability identification ▪ Data Lineage Object “begins” to capture operational outcomes to events
  • 17. Stage Processing Layer (Silver) Picks up Core Data Event from “Raw Output” ▪ Mapping Operation - Convert from Raw (Source) Entity model to Gold Entity Model ▪ Referencing Operation - Enrichment using Reference Data lookups ▪ Merging Operation - New Gold Version created ▪ Validation Operation: Final validation against Gold Model validation Rules
  • 18. Stage Processing Layer ▪ Merging Operation ▪ Matching by “primary key” ▪ Merging (based on last version) / Delta Determination ▪ Version Incrementation ▪ Metadata Attribution generated and appended ▪ Logs Every Change to Every Attribute on Every Event ▪ Individual Data Lineage Objects store all operational outcomes on the event (attribute exceptions, violations, status changes etc.)
  • 19. Gold Processing Layer Picks up Core Data Event from “Stage Output” ▪ Entity Relationship Validation - Ensures entity relationships are “intact” - Prevents Orphans ▪ Re-Processing & Replay : Replay latest versions (for new reference data, business / validation logic to apply) ▪ Data Science Layer - Data Quality Benchmarks
  • 20. Data Provenance Object event_trace_id: 79d77056-c773-4496- ac0d-5223c49e06f0 file_name: ext_provider_location_service_bf14 feb8-538a-4f40-85eb- 93b77d2c1704_2019-09- 17T22:50:47Z.json source_file_name: ext_provider_location_service_bf14 feb8-538a-4f40-85eb- 93b77d2c1704_2019-09- 17T22:50:47Z.json flow_name: nifi_flow_ext_provider_location_se rvice_withstate_v1 owner_agency: HDAInternal arrival_timestamp: 2019-09- 17T22:50:46Z primary_key: [“pLocSvcId”] primary_key_temporal: TRUE data_in_load_strategy: DELTA unique_data_code: ext_provider_location_service version: v0.0.1 source_identifier: TAL-2324 Trace an Event back to it’s Exact Origin ▪ Identify Upstream Source Identity & Raw Source File ▪ Inject Source (External) Identifier (e.g. Jira Ticket #) Source Intention ▪ Target Entity (what this event intends to update)
  • 21. Data Lineage Object event_trace_id: 79d77056-c773- 4496-ac0d-5223c49e06f0 application_id: STAGE-01 application_name: STAGE application_description: Versioned Entity Data application_version: 1.0.0 application_state: STAGE_REFERENCING dms_event_id: 4000ae0b-6b08-4dce- a432-fff8e608e7ec source_dms_event_id: 4de1802c- 70e6-4552-b2b0-4349bfc3a073 operation: [{ operation_name: ENTITY_REFERENCING, operation_rule_name: plsParsing, operation_result: SUCCESS, failure_severity: “”, attributes: [“”], created_time: 2019-09- 30T22:39:47Z }], created_time: 2019-09- 30T22:39:47Z Encapsulate Operation Outcomes that occur to Entity Events ▪ Capture deviation of Data Quality ▪ Exceptions / Warnings ▪ Exceptions - Fix Data ▪ Warning - Improve Data (Quality) ▪ Visibility of End to End Data Flow (via Operational Outcomes Summary)
  • 22. Intelligent System of Record Ranking
  • 23. “Dedicated System of Record (SoR)” Has full update authority over your data attributes 1. Data Quality Regressions flow into your System 2. Low Frequency of Change (Low Data Currency) Problem “Candidate Systems of Record (CSoR)” Alternate “SoRs” who compete to update the same data Healthcare Service { Opening Hours, Contact Details} SoR A CSoR B CSoR C Solution
  • 24. Source Opening Hours Contact Details SoR A Priority 1 Priority 1 CSoR B - Priority 2 CSoR C - Priority 3 SoR C In the same MicroBatch - SoR A wins over CSoR B and CSoR C SoR B SoR A ▪ Ranking assigned based on “business priority” Healthcare Service Entity Attributes Manual Ranking
  • 25. Source Total Updates Contact Details- Lineage Warnings Contact Details- Lineage Errors SoR A 10 4 2 CSoR B 8 1 0 CSoR C 2 1 1 ▪ Data Lineage outcomes aggregated over last 30 days ▪ “Priority boosted” based on “Recent Performance” of Sources Healthcare Service Entity Lineage Events Automatic Ranking Contact Details Priority 2 Priority 1 Priority 3 Updated Priority Contact Details Priority 1 Priority 2 Priority 3 Original Priority
  • 26. Source Total Updates Lineage Warnings Lineage Errors Public Complaints Count Completeness Score Consistency Score Accuracy Score Conformity Score Integrity Score … Nth SoR A 10 4 2 2 60 20 99 56 21 … CSoR B 8 1 0 1 45 34 80 54 22 CSoR C 2 1 1 6 78 45 34 56 45 …Nth Source … … … … … … … … … … ▪ Sources and Features are growing ▪ “Seasonal” Data Regression ▪ Source Data Quality Model: “Confidence Score” based on “Past Performance” applied in “Real Time” Healthcare Service Entity Features Intelligent Ranking - Future State
  • 29. Logical Data Zones using Databricks DELTA ▪ Data Control Plane: LANDING, RAW, STAGE, GOLD (i.e. Bronze, Silver, Gold) ▪ Used DELTA Cache for performance optimisation (stream and batch workloads) ▪ Runs on AWS Accounts under our Security Policy and Regulatory Compliance ▪ Operational Control Plane: Cluster Administration, Management functions like, Access Control, Jobs and Schedules
  • 30. Data Plane & Processing Pipeline
  • 31. Continuous Streaming Applications ▪ Enables “True Event Sourcing” via Streaming Input, Kinesis, S3, and DELTA ▪ Running Micro-batches lead to smaller and more manageable data volumes ▪ Recoverability through Checkpoints and Reliability through Streaming Sinks to DELTA tables
  • 32. Data Issue Problem Statement: Downstream Health Integrator is complaining that un- anticipated special unicode characters in the service description is breaking their integration.
  • 33. Restore & Recover Data Versions Seamlessly ▪ During Data quality issues, we can rewind to previous versions ▪ Using Metadata attribution, Provenance and Data Lineage features - we can trace the root cause to a specific origin source up-to millisecond precision ▪ Complete audit trail and ability to provide SoR Data Quality Reporting
  • 34. Questions & Feedback Mark Paul - @ThisIsMarkPaul Your feedback is important to us. Don’t forget to rate and review the sessions.