Transcript of "Warehousing dimension star-snowflake_schemas"
Data Warehousing – Dimensions | Star and Snowflake SchemasEric Matthews - DataWithUs
Defining Some Key Terms Dimension • Data Element • Categorizes each item in a data set • Provides Structured Labeling/Tagging • Dimensions can consist of hierarchies. For example: Date | Month, Quarter, Year • Dimension tables contain appropriate foreign keys to join to fact tables. Dimension – Primary Role • Data Filtering • Data Grouping • Data Labeling Fact • Measures, Counted, or aggregate event. For example: Sales, Admissions, Blood Pressure, Inventory can all be construed as “facts” • Fact Tables contain appropriate joining keys
Defining Some Key Terms (continued) Conformed Dimension • Common set of data structures/attributes • Can cut across many facts, but… • The row headers in an answer must be able to exactly match, or… • Can be an exact subset These definitions will come into brighter light as we look at some examples.
Star Schema • Most atomic form of dimension modeling • Consists of dimension table(s) modeled around a fact table • Optimized for querying large data sets
Star Schema – Talking Points for Next DiagramNote: Have original table schema as point of reference. • Discuss aggregation from source table to fact table rolling up totals (How this needed to be done). • Discuss the notion of rolling up fact tables to create other fact tables (use account type, financial class, and service code columns in the fact table for basis of discussion) • Discuss some of the pitfalls of dimension tables by using the physician dimension as an example (example: Physicians can change jobs) • Discuss the Date Dimension from the perspective of the data in the table… which transitions us to a key point… …which is similar to how one needs to resolve foreign keys in reporting the dimension table is a table form of the same concept. Additionally, If one has well defined master data then populating the dimension tables can be done using a columnar subset of the source master data table.
Fact Table: Acct Fin RollupDimension TableDate Dimension Table ACCT_NUM Patient WEEK ACCT_PTPTR YEAR ACCT_PTPTR ACCT_GUARANTOR_ID PATIENT_NAME QUARTER ACCT_REFERRING_MD MONTH CITY ACCT_START_DATE STATE ACCT_END_DATE ZIP PLAN_SEQ1 ACCT_TYPE Dimension Table FC Insurance Plan/Carrier HOSPITAL_SERVICE_CODE PLAN_SEQ1 PLAN_NAME TOT_TOTAL_CHARGES Dimension Table CARRIER TOT_TOTAL_PAYMENTS Referring Physician CITY TOT_TOTAL_ADJUSTMENTS TOT_BALANCE ACCT_REFERRING_MD STATE PHYSICIAN_NAME ZIP AFFILIATION AFFILIATION_CITY AFFILIATION_STATE AFFILIATION_ZIP
Snowflake Schema • Think Star Schema where the dimension tables are normalized • Can be used to segregate rows in dimension tables that have a high percentage of null data (for faster lookup, you cannot index null )
Snowflake Schema Fact Table product_key Dimension Table Units product_key Cost Per Unit supplier_key Product Info Dimension Table supplier_key Supplier Info
Conformed Dimension A conformed dimension is a set of data attributes that have been physically implemented in multiple tables using the same structure. A conformed dimension can be applied to different fact tables. For example: Dimension Table Patient Demographics (Gender, Age) Fact Table Hypertension StudiesNote: The classic example fora conformed dimension is Fact Tabledate. I wanted to offer adifferent example. Lab Results Fact Table Diabetes Assessment
Transition to Next Point of Discussion Star and Snowflake schemas are optimized for querying large data sets. They should support: • OLAP cubes • Business Intelligence and Analytic Applications • Ad hoc queries