Data Warehousing – Dimensions | Star and
                    Snowflake Schemas




Eric Matthews - DataWithUs
Defining Some Key Terms
 Dimension
    • Data Element
    • Categorizes each item in a data set
    • Provides Structured Labeling/Tagging
    • Dimensions can consist of hierarchies. For example: Date |
      Month, Quarter, Year
    • Dimension tables contain appropriate foreign keys to join
      to fact tables.
 Dimension – Primary Role
    • Data Filtering
    • Data Grouping
    • Data Labeling

 Fact
    • Measures, Counted, or aggregate event. For example:
      Sales, Admissions, Blood Pressure, Inventory can all be
      construed as “facts”
    • Fact Tables contain appropriate joining keys
Defining Some Key Terms (continued)
 Conformed Dimension
    • Common set of data structures/attributes
    • Can cut across many facts, but…
    • The row headers in an answer must be able to exactly
      match, or…
    • Can be an exact subset



 These definitions will come into brighter light as we look at some
 examples.
Star Schema



   • Most atomic form of dimension modeling

   • Consists of dimension table(s) modeled around a fact table

   • Optimized for querying large data sets
Star Schema
                  Logical                Dimension Table
                                          Patient
Dimension Table                           Demographics
 Date/Time

                            Fact Table


                               Keys
                                           Dimension Table
                              Facts          Referring
Dimension Table                              Physician
  Insurance
  Carrier
Star Schema – Talking Points for Next Diagram
Note: Have original table schema as point of reference.


  • Discuss aggregation from source table to fact table rolling
    up totals (How this needed to be done).
  • Discuss the notion of rolling up fact tables to create other
    fact tables (use account type, financial class, and service
    code columns in the fact table for basis of discussion)
  • Discuss some of the pitfalls of dimension tables by using
    the physician dimension as an example (example:
    Physicians can change jobs)
  • Discuss the Date Dimension from the perspective of the
    data in the table… which transitions us to a key point…

  …which is similar to how one needs to resolve foreign keys in
  reporting the dimension table is a table form of the same
  concept.

  Additionally, If one has well defined master data then populating
  the dimension tables can be done using a columnar subset of the
  source master data table.
Fact Table: Acct Fin Rollup
Dimension Table
Date                                                      Dimension Table
                             ACCT_NUM                     Patient
 WEEK                        ACCT_PTPTR
 YEAR                                                       ACCT_PTPTR
                             ACCT_GUARANTOR_ID              PATIENT_NAME
 QUARTER                     ACCT_REFERRING_MD
 MONTH                                                      CITY
                             ACCT_START_DATE                STATE
                             ACCT_END_DATE                  ZIP
                             PLAN_SEQ1
                             ACCT_TYPE
   Dimension Table           FC
   Insurance Plan/Carrier    HOSPITAL_SERVICE_CODE
    PLAN_SEQ1
    PLAN_NAME                TOT_TOTAL_CHARGES
                                                          Dimension Table
    CARRIER                  TOT_TOTAL_PAYMENTS
                                                          Referring Physician
    CITY                     TOT_TOTAL_ADJUSTMENTS
                             TOT_BALANCE                   ACCT_REFERRING_MD
    STATE
                                                           PHYSICIAN_NAME
    ZIP
                                                           AFFILIATION
                                                           AFFILIATION_CITY
                                                           AFFILIATION_STATE
                                                           AFFILIATION_ZIP
Snowflake Schema
    • Think Star Schema where the dimension tables are
      normalized

    • Can be used to segregate rows in dimension tables that
      have a high percentage of null data (for faster lookup, you
      cannot index null )
Snowflake Schema



       Fact Table

    product_key


                    Dimension Table
    Units            product_key
    Cost Per Unit    supplier_key

                      Product Info    Dimension Table
                                       supplier_key

                                        Supplier Info
Conformed Dimension
  A conformed dimension is a set of data attributes that have been
  physically implemented in multiple tables using the same structure. A
  conformed dimension can be applied to different fact tables. For
  example:

 Dimension Table
    Patient
    Demographics
    (Gender, Age)
                                                  Fact Table
                                                     Hypertension
                                                     Studies
Note: The classic example for
a conformed dimension is                          Fact Table
date. I wanted to offer a
different example.                                   Lab Results


                                                  Fact Table
                                                    Diabetes
                                                    Assessment
Transition to Next Point of Discussion

  Star and Snowflake schemas are optimized for
  querying large data sets.

  They should support:
      • OLAP cubes
      • Business Intelligence and Analytic Applications
      • Ad hoc queries
The End

Warehousing dimension star-snowflake_schemas

  • 1.
    Data Warehousing –Dimensions | Star and Snowflake Schemas Eric Matthews - DataWithUs
  • 2.
    Defining Some KeyTerms Dimension • Data Element • Categorizes each item in a data set • Provides Structured Labeling/Tagging • Dimensions can consist of hierarchies. For example: Date | Month, Quarter, Year • Dimension tables contain appropriate foreign keys to join to fact tables. Dimension – Primary Role • Data Filtering • Data Grouping • Data Labeling Fact • Measures, Counted, or aggregate event. For example: Sales, Admissions, Blood Pressure, Inventory can all be construed as “facts” • Fact Tables contain appropriate joining keys
  • 3.
    Defining Some KeyTerms (continued) Conformed Dimension • Common set of data structures/attributes • Can cut across many facts, but… • The row headers in an answer must be able to exactly match, or… • Can be an exact subset These definitions will come into brighter light as we look at some examples.
  • 4.
    Star Schema • Most atomic form of dimension modeling • Consists of dimension table(s) modeled around a fact table • Optimized for querying large data sets
  • 5.
    Star Schema Logical Dimension Table Patient Dimension Table Demographics Date/Time Fact Table Keys Dimension Table Facts Referring Dimension Table Physician Insurance Carrier
  • 6.
    Star Schema –Talking Points for Next Diagram Note: Have original table schema as point of reference. • Discuss aggregation from source table to fact table rolling up totals (How this needed to be done). • Discuss the notion of rolling up fact tables to create other fact tables (use account type, financial class, and service code columns in the fact table for basis of discussion) • Discuss some of the pitfalls of dimension tables by using the physician dimension as an example (example: Physicians can change jobs) • Discuss the Date Dimension from the perspective of the data in the table… which transitions us to a key point… …which is similar to how one needs to resolve foreign keys in reporting the dimension table is a table form of the same concept. Additionally, If one has well defined master data then populating the dimension tables can be done using a columnar subset of the source master data table.
  • 7.
    Fact Table: AcctFin Rollup Dimension Table Date Dimension Table ACCT_NUM Patient WEEK ACCT_PTPTR YEAR ACCT_PTPTR ACCT_GUARANTOR_ID PATIENT_NAME QUARTER ACCT_REFERRING_MD MONTH CITY ACCT_START_DATE STATE ACCT_END_DATE ZIP PLAN_SEQ1 ACCT_TYPE Dimension Table FC Insurance Plan/Carrier HOSPITAL_SERVICE_CODE PLAN_SEQ1 PLAN_NAME TOT_TOTAL_CHARGES Dimension Table CARRIER TOT_TOTAL_PAYMENTS Referring Physician CITY TOT_TOTAL_ADJUSTMENTS TOT_BALANCE ACCT_REFERRING_MD STATE PHYSICIAN_NAME ZIP AFFILIATION AFFILIATION_CITY AFFILIATION_STATE AFFILIATION_ZIP
  • 8.
    Snowflake Schema • Think Star Schema where the dimension tables are normalized • Can be used to segregate rows in dimension tables that have a high percentage of null data (for faster lookup, you cannot index null )
  • 9.
    Snowflake Schema Fact Table product_key Dimension Table Units product_key Cost Per Unit supplier_key Product Info Dimension Table supplier_key Supplier Info
  • 10.
    Conformed Dimension A conformed dimension is a set of data attributes that have been physically implemented in multiple tables using the same structure. A conformed dimension can be applied to different fact tables. For example: Dimension Table Patient Demographics (Gender, Age) Fact Table Hypertension Studies Note: The classic example for a conformed dimension is Fact Table date. I wanted to offer a different example. Lab Results Fact Table Diabetes Assessment
  • 11.
    Transition to NextPoint of Discussion Star and Snowflake schemas are optimized for querying large data sets. They should support: • OLAP cubes • Business Intelligence and Analytic Applications • Ad hoc queries
  • 12.