© 2017 IDERA, Inc. All rights reserved.
Proprietary and confidential.
A MEDICAL DATA
WAREHOUSE MODEL
Michael R Blaha, DSc.
blaha@computer.org
www.superdataguy.com
2© 2017 IDERA, Inc. All rights reserved. Proprietary and confidential.
WAREHOUSE DATA FLOW
3© 2017 IDERA, Inc. All rights reserved. Proprietary and confidential.
SOURCE DATA DETAILS
 Epic Clarity is 99+% of source data
• All the data of a major hospital
• Small amount of data from other apps and
external sources
• No scrubbing of source data
• 10,000+ Clarity tables
• Some tables have 200+ columns
• Much redundant data
• Clarity also has both transaction data and rollup data
• No referential integrity
• Odd PKs of many-to-many tables
• Table1 PK + line_number
4© 2017 IDERA, Inc. All rights reserved. Proprietary and confidential.
OTHER DATA FLOW DETAILS
 Data warehouse platform is Netezza
 The staging tables maintain a history of operational data
• One staging table for each major source
• Staging schema = operational schema + surrogate key
+ effective date + expiration date
 Informatica for staging data + a new commercial tool
 Informatica for ETL + agile SQL queries
5© 2017 IDERA, Inc. All rights reserved. Proprietary and confidential.
BUS ARCHITECTURE
 Definition: A data warehouse with dimensions that are
consistently defined across facts
 200+ dimensions; 100+ facts; Kimball approach
 Fully commented data model with naming standards
 We created our own DW schema
• This project preceded Epic’s DW schema
 DW has strong correspondence to Clarity
• Little abstraction
6© 2017 IDERA, Inc. All rights reserved. Proprietary and confidential.
PARTIAL BUS ARCHITECTURE
A
c
c
t
A
p
p
t
B
e
n
P
C
a
r
r
C
o
s
C
C
o
v
r
D
a
t
e
D
e
p
t
D
I
a
g
D
R
G
E
n
c
M
e
d
P
a
t
P
a
y
r
P
h
a
r
P
r
o
b
P
r
o
c
P
r
o
v
R
e
v
C
U
n
a
v
V
e
n
d
Account_Coverage X X X X X
Account_Diagnosis X X X X X
Account_Procedure X X X X X X
Account_Summary X X X X X X X X X X X X
Billing_Tx_Detail X X X X X X X X X X
Encounter X X X X X X
7© 2017 IDERA, Inc. All rights reserved. Proprietary and confidential.
PARTIAL BUS ARCHITECTURE (CONT.)
A
c
c
t
A
p
p
t
B
e
n
P
C
a
r
r
C
o
s
C
C
o
v
r
D
a
t
e
D
e
p
t
D
I
a
g
D
R
G
E
n
c
M
e
d
P
a
t
P
a
y
r
P
h
a
r
P
r
o
b
P
r
o
c
P
r
o
v
R
e
v
C
U
n
a
v
V
e
n
d
Order_Medication X X X X X X
Order_Procedure X X X X X X X
Patient_Appt X X X X X X
Provider_Avail X X X X X X
Readmission X X X X X X X X X X
Referral X X X X X X X X X
8© 2017 IDERA, Inc. All rights reserved. Proprietary and confidential.
SAMPLE DIMENSIONS
 Account
 Appointment
 Benefit_Plan
 Carrier
 Cost_Center
 Coverage
 Date
 Department
 Diagnosis
 DRG
 Encounter
9© 2017 IDERA, Inc. All rights reserved. Proprietary and confidential.
SAMPLE DIMENSIONS (CONT.)
 Medication
 Patient
 Payor
 Pharmacy
 Problem_List
 Procedure
 Provider
 Revenue_Code
 Unavailable_Reason
 Vendor
10© 2017 IDERA, Inc. All rights reserved. Proprietary and confidential.
SAMPLE FACTS
 Account_Coverage – patient account and coverage data
 Account_Diagnosis – data about diagnoses
 Account_Procedure – procedures for an account
 Account_Summary – an accumulating snapshot fact
 Billing_Transaction_Detail – hospital billing transactions
 Encounter – the current state of a patient encounter
 Order_Medication – data for medications
 Order_Procedure – data for procedures and lab orders
 Patient_Appointment – data for appointments
 Provider_Availability – time slots for a provider’s schedule
 Readmission – data for hospital readmission per ACA
 Referral – a patient handoff from one provider to another
11© 2017 IDERA, Inc. All rights reserved. Proprietary and confidential.
SAMPLE SUBJECT AREA
12© 2017 IDERA, Inc. All rights reserved. Proprietary and confidential.
SAMPLE SUBJECT AREA
13© 2017 IDERA, Inc. All rights reserved. Proprietary and confidential.
SAMPLE SUBJECT AREA
14© 2017 IDERA, Inc. All rights reserved. Proprietary and confidential.
STAFFING BREAKDOWN
 4 managers
 2 data modelers + advanced tools/techniques
 6 ETL
 5 business analysts
 1 DBA
 1 metadata
 2 secretary
 2 legacy software
15© 2017 IDERA, Inc. All rights reserved. Proprietary and confidential.
RETROSPECTIVE
 Too much of “build it and they will come”
 There were errors in staging data!
 There was code review for ETL scripts
• I’m not confident that the ETL scripts were correct
• We should have used SQL to validate ETL scripts
 Agile analytics
• We had complex SQL for calculating readmissions
• The SQL was validated as correct
• The legacy software was wrong!
 Issue of Clarity rollup vs DW rollup
16© 2017 IDERA, Inc. All rights reserved. Proprietary and confidential.
THANKS!
Any questions?
You can find me at:
@michaelrblaha
blaha@computer.org
www.superdataguy.com

A Medical Data Warehouse Model

  • 1.
    © 2017 IDERA,Inc. All rights reserved. Proprietary and confidential. A MEDICAL DATA WAREHOUSE MODEL Michael R Blaha, DSc. blaha@computer.org www.superdataguy.com
  • 2.
    2© 2017 IDERA,Inc. All rights reserved. Proprietary and confidential. WAREHOUSE DATA FLOW
  • 3.
    3© 2017 IDERA,Inc. All rights reserved. Proprietary and confidential. SOURCE DATA DETAILS  Epic Clarity is 99+% of source data • All the data of a major hospital • Small amount of data from other apps and external sources • No scrubbing of source data • 10,000+ Clarity tables • Some tables have 200+ columns • Much redundant data • Clarity also has both transaction data and rollup data • No referential integrity • Odd PKs of many-to-many tables • Table1 PK + line_number
  • 4.
    4© 2017 IDERA,Inc. All rights reserved. Proprietary and confidential. OTHER DATA FLOW DETAILS  Data warehouse platform is Netezza  The staging tables maintain a history of operational data • One staging table for each major source • Staging schema = operational schema + surrogate key + effective date + expiration date  Informatica for staging data + a new commercial tool  Informatica for ETL + agile SQL queries
  • 5.
    5© 2017 IDERA,Inc. All rights reserved. Proprietary and confidential. BUS ARCHITECTURE  Definition: A data warehouse with dimensions that are consistently defined across facts  200+ dimensions; 100+ facts; Kimball approach  Fully commented data model with naming standards  We created our own DW schema • This project preceded Epic’s DW schema  DW has strong correspondence to Clarity • Little abstraction
  • 6.
    6© 2017 IDERA,Inc. All rights reserved. Proprietary and confidential. PARTIAL BUS ARCHITECTURE A c c t A p p t B e n P C a r r C o s C C o v r D a t e D e p t D I a g D R G E n c M e d P a t P a y r P h a r P r o b P r o c P r o v R e v C U n a v V e n d Account_Coverage X X X X X Account_Diagnosis X X X X X Account_Procedure X X X X X X Account_Summary X X X X X X X X X X X X Billing_Tx_Detail X X X X X X X X X X Encounter X X X X X X
  • 7.
    7© 2017 IDERA,Inc. All rights reserved. Proprietary and confidential. PARTIAL BUS ARCHITECTURE (CONT.) A c c t A p p t B e n P C a r r C o s C C o v r D a t e D e p t D I a g D R G E n c M e d P a t P a y r P h a r P r o b P r o c P r o v R e v C U n a v V e n d Order_Medication X X X X X X Order_Procedure X X X X X X X Patient_Appt X X X X X X Provider_Avail X X X X X X Readmission X X X X X X X X X X Referral X X X X X X X X X
  • 8.
    8© 2017 IDERA,Inc. All rights reserved. Proprietary and confidential. SAMPLE DIMENSIONS  Account  Appointment  Benefit_Plan  Carrier  Cost_Center  Coverage  Date  Department  Diagnosis  DRG  Encounter
  • 9.
    9© 2017 IDERA,Inc. All rights reserved. Proprietary and confidential. SAMPLE DIMENSIONS (CONT.)  Medication  Patient  Payor  Pharmacy  Problem_List  Procedure  Provider  Revenue_Code  Unavailable_Reason  Vendor
  • 10.
    10© 2017 IDERA,Inc. All rights reserved. Proprietary and confidential. SAMPLE FACTS  Account_Coverage – patient account and coverage data  Account_Diagnosis – data about diagnoses  Account_Procedure – procedures for an account  Account_Summary – an accumulating snapshot fact  Billing_Transaction_Detail – hospital billing transactions  Encounter – the current state of a patient encounter  Order_Medication – data for medications  Order_Procedure – data for procedures and lab orders  Patient_Appointment – data for appointments  Provider_Availability – time slots for a provider’s schedule  Readmission – data for hospital readmission per ACA  Referral – a patient handoff from one provider to another
  • 11.
    11© 2017 IDERA,Inc. All rights reserved. Proprietary and confidential. SAMPLE SUBJECT AREA
  • 12.
    12© 2017 IDERA,Inc. All rights reserved. Proprietary and confidential. SAMPLE SUBJECT AREA
  • 13.
    13© 2017 IDERA,Inc. All rights reserved. Proprietary and confidential. SAMPLE SUBJECT AREA
  • 14.
    14© 2017 IDERA,Inc. All rights reserved. Proprietary and confidential. STAFFING BREAKDOWN  4 managers  2 data modelers + advanced tools/techniques  6 ETL  5 business analysts  1 DBA  1 metadata  2 secretary  2 legacy software
  • 15.
    15© 2017 IDERA,Inc. All rights reserved. Proprietary and confidential. RETROSPECTIVE  Too much of “build it and they will come”  There were errors in staging data!  There was code review for ETL scripts • I’m not confident that the ETL scripts were correct • We should have used SQL to validate ETL scripts  Agile analytics • We had complex SQL for calculating readmissions • The SQL was validated as correct • The legacy software was wrong!  Issue of Clarity rollup vs DW rollup
  • 16.
    16© 2017 IDERA,Inc. All rights reserved. Proprietary and confidential. THANKS! Any questions? You can find me at: @michaelrblaha blaha@computer.org www.superdataguy.com