SlideShare a Scribd company logo
@tbunio
tbunio@protegra.com
agilevoyageur.wordpress.com
www.protegra.com
Agenda
• Data Modeling
• The Project
• Hot DB topics
• Relational vs Dimensional
• Dimensional concepts
– Facts
– Dimensions
• Complex Concept Introduction
What is Data Modeling?
Definition
• “A database model is a specification
describing how a database is
structured and used” – Wikipedia
Definition
• “A database model is a specification
describing how a database is
structured and used” – Wikipedia
• “A data model describes how the
data entities are related to each other
in the real world” - Terry
Data Model Characteristics
• Organize/Structure like Data Elements
• Define relationships between Data
Entities
• Highly Cohesive
• Loosely Coupled
The Project
• Major Health Service provider is
switching their claims system to SAP
• As part of this, they are totally
redeveloping their entire Data
Warehouse solution
The Project
• 3+ years duration
• 100+ Integration Projects
• 200+ Systems Personnel
Integration Project Streams
• Client Administration – Policy Systems
• Data Warehouse
• Legacy – Conversion from Legacy
• Queries – Queries for internal and
external
• Web – External Web Applications
Data Warehouse Team
• Terry – Data Architect/Modeler and
PM
• Hanaa – Lead Analyst and Data
Analyst
• Kevin – Lead Data Migration
Developer
• Lisa – Lead Report Analyst
• Les – Lead Report Developer
Current State
• Sybase Data Warehouse
– Combination of Normalized and
Dimensional design
• Data Migration
– Series of SQL Scripts that move data from
Legacy (Cobol) and Java Applications
• Impromptu
– 1000+ Reports
Target State
• SQL Server 2012
• SQL Server Integration Services for
Data Migration
• SQL Server Reporting Services for
Report Development
• Sharepoint for Report Portal
Target Solution
• Initial load moves 2.5 Terabytes of data
• Initial load runs once
• Incremental load runs every hour
Target Solution
• Operational Data Store
– Normalized
– 400+ tables
• Data Warehouse
– Dimensional
– 60+ tables
• Why both?
– ODS does not have history (Just
Transactions)
Our #1 Challenge
• We needed to be Agile like the other
projects!
– We are now on revision 3500
• We spent weeks planning of how to be
flexible
• Instead of spending time planning, we
spent time planning how we could
quickly change and adapt
• This also meant we created a new
automated test framework
Beef?
• Where are the hot topics like
– Big Data
– NoSQL
– MySQL
– Data Warehouse Appliances
– Cloud
– Open Source Databases
Big Data
• “Commercial Databases” have come
a long way to handle large data
volume
• Big Data is still important but probably
is not required for the vast majority of
databases
– But it applicable for the Facebooks and
Amazons out there
Big Data
• For example, many of the Big Data
solutions featured ColumnStore
Indexing
• Now almost all commercial databases
offer ColumnStore Indexes
NoSQL
• NoSQL was heralded a few years ago
as the death of structured databases
• Mainly promoted from the developer
community
• Seems to have found a niche for
supporting more mainly unstructured
and dynamic data
• Traditional databases still the most
efficient for structured data
MySQL
• MySQL was also promoted as a great
lightweight, high performance option’
• We actually investigated it as an
option for the project
• Great example of never trusting what
you hear
MySQL
• All of the great MySQL benchmarks use
the simplest database engine with no
ACID compliance
– MySQL has the option to use different
engines with different features
• Once you use the ACID compliant
engine, the performance is
equivalent(or worse) to SQL Server and
PostgreSQL
Data Warehouse Appliances
• “marketing term for an integrated set
of servers, storage, operating
system(s), DBMS and software
specifically pre-installed and pre-
optimized for data warehousing”
Data Warehouse Appliances
• Recently in the Data Warehouse
Industry, there has been the rise of a
the Data Warehouse appliances
• These appliances are a one-stop
solution that builds in Big Data
capabilities
Data Warehouse Appliances
• Cool Names like:
– Teradata
– GreenPlum
– Netezza
– InfoSphere
– EMC
• Like Big Data these solution are valuable
if you need to play in the Big Data/Big
Analysis arena
• Most solutions don’t require them
Cloud
• Great to store pictures and music – the
concept still makes businesses nervous
– Also regulatory requirements sometime
prevent it
• Business is starting to become more
comfortable
– Still a ways to go
• Very few business go to the Cloud unless
they have to
– Amazon/Microsoft is changing this with their
services
Open Source Databases
• We investigated Open Sources
databases for our solution. We looked
at:
– MySQL
– PostgreSQL
– others
Open Sources Databases
• We were surprised to learn that once
you factor in all the things you get from
SQL Server, it actually is cheaper over
10 years than Open Source!!
• So we select SQL Server
Foundational DB Practices
Two design methods
• Relational
– “Database normalization is the process of organizing
the fields and tables of a relational database to
minimize redundancy and dependency. Normalization
usually involves dividing large tables into smaller (and less
redundant) tables and defining relationships between
them. The objective is to isolate data so that additions,
deletions, and modifications of a field can be made in just
one table and then propagated through the rest of the
database via the defined relationships.”.”
Two design methods
• Dimensional
– “Dimensional modeling always uses the concepts of facts
(measures), and dimensions (context). Facts are typically
(but not always) numeric values that can be aggregated,
and dimensions are groups of hierarchies and descriptors
that define the facts
Relational
Relational
• Relational Analysis
– Database design is usually in Third Normal
Form
– Database is optimized for transaction
processing. (OLTP)
– Normalized tables are optimized for
modification rather than retrieval
Normal forms
• 1st - Under first normal form, all occurrences of a
record type must contain the same number of
fields.
• 2nd - Second normal form is violated when a non-
key field is a fact about a subset of a key. It is only
relevant when the key is composite
• 3rd - Third normal form is violated when a non-key
field is a fact about another non-key field
Source: William Kent - 1982
Dimensional
Dimensional
• Dimensional Analysis
– Star Schema/Snowflake
– Database is optimized for analytical
processing. (OLAP)
– Facts and Dimensions optimized for
retrieval
• Facts – Business events – Transactions
• Dimensions – context for Transactions
– People
– Accounts
– Products
– Date
Relational
• 3 Dimensions
• Spatial Model
– No historical components except for
transactional tables
• Relational – Models the one truth of
the data
– One account ‘11’
– One person ‘Terry Bunio’
– One transaction of ‘$100.00’ on April 10th
Dimensional
• 4 Dimensions
• Temporal Model
– All tables have a time component
• Dimensional – Models the one truth of
the data at a point in time
– Multiple versions of Accounts over time
– Multiple versions of people over time
– One transaction
• Transactions are already temporal
Fact Tables
• Contains the measurements or facts
about a business process
• Are thin and deep
• Usually is:
– Business transaction
– Business Event
• The grain of a Fact table is the level of
the data recorded
– Order, Invoice, Invoice Item
Special Fact Tables
• Degenerate Dimensions
– Degenerate Dimensions are Dimensions
that can typically provide additional
context about a Fact
• For example, flags that describe a transaction
• Degenerate Dimensions can either be
a separate Dimension table or be
collapsed onto the Fact table
– My preference is the latter
Dimension Tables
• Unlike fact tables, dimension tables
contain descriptive attributes that are
typically textual fields
• These attributes are designed to serve
two critical purposes:
– query constraining and/or filtering
– query result set labeling.
Source: Wikipedia
Dimension Tables
• Shallow and Wide
• Usually corresponds to entities that the
business interacts with
– People
– Locations
– Products
– Accounts
Time Dimension
• All Dimensional Models need a time
component
• This is either a:
– Separate Time Dimension
(recommended)
– Time attributes on each Fact Table
Mini-Dimensions
Mini-Dimensions
• Splitting a Dimension up due to the
activity of change for a set of
attributes
• Helps to reduce the growth of the
Dimension table
Slowly Changing Dimensions
• Type 1 – Overwrite the row with the
new values and update the effective
date
– Pre-existing Facts now refer to the
updated Dimension
– May cause inconsistent reports
Slowly Changing Dimensions
• Type 2 – Insert a new Dimension row with
the new data and new effective date
– Update the expiry date on the prior row
• Don’t update old Facts that refer to the old
row
– Only new Facts will refer to this new Dimension
row
• Type 2 Slowly Changing Dimension
maintains the historical context of the data
Slowly Changing Dimensions
• No longer to I have one row to
represent:
– Account 10123
– Terry Bunio
– Sales Representative 11092
• This changes the mindset and query
syntax to retrieve data
Slowly Changing Dimensions
• Type 3 – The Dimension stores multiple
versions for the attribute in question
• This usually involves a current and
previous value for the attribute
• When a change occurs, no rows are
added but both the current and
previous attributes are updated
• Like Type 1, Type 3 does not retain full
historical context
Complexity
• Most textbooks stop here only show
the simplest Dimensional Models
• Unfortunately, I’ve never run into a
Dimensional Model like that
Complex Concept Introduction
• Snowflake vs Star Schema
• Multi-Valued Dimensions and Bridges
• Recursive Hierarchies
Snowflake vs Star Schema
Snowflake vs Star Schema
Dimensional
Snowflake vs Star Schema
• These extra tables are termed
outriggers
• They are used to address real world
complexities with the data
– Excessive row length
– Repeating groups of data within the
Dimension
• I will use outriggers in a limited way for
repeating data
Multi-Valued Dimensions
• Multi-Valued Dimensions are when a
Fact needs to connect more than
once to a Dimension
– Primary Sales Representative
– Secondary Sales Representative
Multi-Valued Dimensions
• Two possible solutions
– Create copies of the Dimensions for each
role
– Create a Bridge table to resolve the many
to many relationship
Multi-Valued Dimensions
Bridge Tables
Bridge Tables
• Bridge Tables can be used to resolve any
many to many relationships
• This is frequently required with more
complex data areas
• These bridge tables need to be
considered a Dimension and they need
to use the same Slowly Changing
Dimension Design as the base Dimension
– My Recommendation
Hierarchies and Recursive
Hierarchies
Why?
• Why Dimensional Model?
• Allows for a concise representation of
data for reporting. This is especially
important for Self-Service Reporting
– We reduced from 400+ tables in our
Operational Data Store to 60+ tables in
our Data Warehouse
– Aligns with real world business concepts
Why?
• The most important reason –
– Requires detailed understanding of the
data
– Validates the solution
– Uncovers inconsistencies and errors in the
Normalized Model
• Easy for inconsistencies and errors to hide in
400+ tables
• No place to hide when those tables are
reduced down
Why?
• Ultimately there must be a business
requirement for a temporal data
model and not just a spatial one.
• Although you could go through the
exercise to validate your
understanding and not implement the
Dimensional Data Model
How?
• Start with your simplest Dimension and Fact
tables and define the Natural Keys for them
– i.e. People, Product, Transaction, Time
• De-Normalize Reference tables to Dimensions
(And possibly Facts based on how large the
Fact tables will be)
– I place both codes and descriptions on the
Dimension and Fact tables
• Look to De-normalize other tables with the
same Cardinality into one Dimension
– Validate the Natural Keys still define one row
How?
• Don’t force entities on the same
Dimension
– Tempting but you will find it doesn’t
represent the data and will cause issues
for loading or retrieval
– Bridge table or mini-snowflakes are not
bad
• I don’t like a deep snowflake, but shallow
snowflakes can be appropriate
• Don’t fall into the Star-Schema/Snowflake Holy
War – Let your data define the solution
How?
• Iterate, Iterate, Iterate
– Your initial solution will be wrong
– Create it and start to define the load
process and reports
– You will learn more by using the data than
months of analysis to try and get the
model right
Two things to Ponder
Two things to Ponder
• In the Information Age ahead,
databases will be used more for
analysis than operational
– More Dimensional Models and analytical
processes
Two things to Ponder
• Critical skills going forward will be:
– Data Modeling/Data Architecture
– Data Migration
• There is a whole subject area here for a
subsequent presentation. More of an art than
science
– Data Verbalization
• Again a real art form to take a huge amount
of data and present it in a readable form
Whew! Questions?

More Related Content

What's hot

MariaDB AX: Solución analítica con ColumnStore
MariaDB AX: Solución analítica con ColumnStoreMariaDB AX: Solución analítica con ColumnStore
MariaDB AX: Solución analítica con ColumnStore
MariaDB plc
 
Data ware housing - Introduction to data ware housing process.
Data ware housing - Introduction to data ware housing process.Data ware housing - Introduction to data ware housing process.
Data ware housing - Introduction to data ware housing process.
Vibrant Technologies & Computers
 
Datawarehousing
DatawarehousingDatawarehousing
Datawarehousing
work
 
Introduction to Data Warehousing
Introduction to Data WarehousingIntroduction to Data Warehousing
Introduction to Data Warehousing
Eyad Manna
 
Architecting for the cloud storage misc topics
Architecting for the cloud storage misc topicsArchitecting for the cloud storage misc topics
Architecting for the cloud storage misc topics
Len Bass
 
Data warehousing
Data warehousingData warehousing
Data warehousing
Vigneshwaar Ponnuswamy
 
Dwdm 2(data warehouse)
Dwdm 2(data warehouse)Dwdm 2(data warehouse)
Dwdm 2(data warehouse)
Er Bansal
 
Data warehouse introduction
Data warehouse introductionData warehouse introduction
Data warehouse introduction
Murli Jha
 
Star schema
Star schemaStar schema
data warehousing
data warehousingdata warehousing
data warehousing
Jagnesh Chawla
 
BI Introduction
BI IntroductionBI Introduction
BI Introduction
Taras Panchenko
 
Introduction To Data Warehousing
Introduction To Data WarehousingIntroduction To Data Warehousing
Introduction To Data Warehousing
Alex Meadows
 
Seminar datawarehousing
Seminar datawarehousingSeminar datawarehousing
Seminar datawarehousing
Kavisha Uniyal
 
Data-ware Housing
Data-ware HousingData-ware Housing
Data-ware Housing
Prof.Nilesh Magar
 
Data warehouse architecture
Data warehouse architectureData warehouse architecture
Data warehouse architecture
pcherukumalla
 
Bigdataissueschallengestoolsngoodpractices 141130054740-conversion-gate01
Bigdataissueschallengestoolsngoodpractices 141130054740-conversion-gate01Bigdataissueschallengestoolsngoodpractices 141130054740-conversion-gate01
Bigdataissueschallengestoolsngoodpractices 141130054740-conversion-gate01
Soujanya V
 
Data Warehouses & Deployment By Ankita dubey
Data Warehouses & Deployment By Ankita dubeyData Warehouses & Deployment By Ankita dubey
Data Warehouses & Deployment By Ankita dubey
Ankita Dubey
 
Data ware housing- Introduction to data ware housing
Data ware housing- Introduction to data ware housingData ware housing- Introduction to data ware housing
Data ware housing- Introduction to data ware housing
Vibrant Technologies & Computers
 
Introduction Data warehouse
Introduction Data warehouseIntroduction Data warehouse
Introduction Data warehouse
Amin Choroomi
 
Data Mining and Data Warehousing
Data Mining and Data WarehousingData Mining and Data Warehousing
Data Mining and Data Warehousing
Aswathy S Nair
 

What's hot (20)

MariaDB AX: Solución analítica con ColumnStore
MariaDB AX: Solución analítica con ColumnStoreMariaDB AX: Solución analítica con ColumnStore
MariaDB AX: Solución analítica con ColumnStore
 
Data ware housing - Introduction to data ware housing process.
Data ware housing - Introduction to data ware housing process.Data ware housing - Introduction to data ware housing process.
Data ware housing - Introduction to data ware housing process.
 
Datawarehousing
DatawarehousingDatawarehousing
Datawarehousing
 
Introduction to Data Warehousing
Introduction to Data WarehousingIntroduction to Data Warehousing
Introduction to Data Warehousing
 
Architecting for the cloud storage misc topics
Architecting for the cloud storage misc topicsArchitecting for the cloud storage misc topics
Architecting for the cloud storage misc topics
 
Data warehousing
Data warehousingData warehousing
Data warehousing
 
Dwdm 2(data warehouse)
Dwdm 2(data warehouse)Dwdm 2(data warehouse)
Dwdm 2(data warehouse)
 
Data warehouse introduction
Data warehouse introductionData warehouse introduction
Data warehouse introduction
 
Star schema
Star schemaStar schema
Star schema
 
data warehousing
data warehousingdata warehousing
data warehousing
 
BI Introduction
BI IntroductionBI Introduction
BI Introduction
 
Introduction To Data Warehousing
Introduction To Data WarehousingIntroduction To Data Warehousing
Introduction To Data Warehousing
 
Seminar datawarehousing
Seminar datawarehousingSeminar datawarehousing
Seminar datawarehousing
 
Data-ware Housing
Data-ware HousingData-ware Housing
Data-ware Housing
 
Data warehouse architecture
Data warehouse architectureData warehouse architecture
Data warehouse architecture
 
Bigdataissueschallengestoolsngoodpractices 141130054740-conversion-gate01
Bigdataissueschallengestoolsngoodpractices 141130054740-conversion-gate01Bigdataissueschallengestoolsngoodpractices 141130054740-conversion-gate01
Bigdataissueschallengestoolsngoodpractices 141130054740-conversion-gate01
 
Data Warehouses & Deployment By Ankita dubey
Data Warehouses & Deployment By Ankita dubeyData Warehouses & Deployment By Ankita dubey
Data Warehouses & Deployment By Ankita dubey
 
Data ware housing- Introduction to data ware housing
Data ware housing- Introduction to data ware housingData ware housing- Introduction to data ware housing
Data ware housing- Introduction to data ware housing
 
Introduction Data warehouse
Introduction Data warehouseIntroduction Data warehouse
Introduction Data warehouse
 
Data Mining and Data Warehousing
Data Mining and Data WarehousingData Mining and Data Warehousing
Data Mining and Data Warehousing
 

Similar to Asper database presentation - Data Modeling Topics

Dimensional modeling primer
Dimensional modeling primerDimensional modeling primer
Dimensional modeling primer
Terry Bunio
 
Dimensional modeling primer - SQL Saturday Madison - April 11th, 2015
Dimensional modeling primer - SQL Saturday Madison - April 11th, 2015Dimensional modeling primer - SQL Saturday Madison - April 11th, 2015
Dimensional modeling primer - SQL Saturday Madison - April 11th, 2015
Terry Bunio
 
DATA WAREHOUSING
DATA WAREHOUSINGDATA WAREHOUSING
DATA WAREHOUSING
Rishikese MR
 
Dataware house Introduction By Quontra Solutions
Dataware house Introduction By Quontra SolutionsDataware house Introduction By Quontra Solutions
Dataware house Introduction By Quontra Solutions
Quontra Solutions
 
Dataware house introduction by InformaticaTrainingClasses
Dataware house introduction by InformaticaTrainingClassesDataware house introduction by InformaticaTrainingClasses
Dataware house introduction by InformaticaTrainingClasses
InformaticaTrainingClasses
 
DW (1).ppt
DW (1).pptDW (1).ppt
DW (1).ppt
RahulSingh986955
 
Big Data Warehousing Meetup: Dimensional Modeling Still Matters!!!
Big Data Warehousing Meetup: Dimensional Modeling Still Matters!!!Big Data Warehousing Meetup: Dimensional Modeling Still Matters!!!
Big Data Warehousing Meetup: Dimensional Modeling Still Matters!!!
Caserta
 
Data Warehouse
Data WarehouseData Warehouse
Data Warehouse
AttaUrRahman78
 
kalyani.ppt
kalyani.pptkalyani.ppt
kalyani.ppt
ReyersonMax
 
Data Warehouse
Data WarehouseData Warehouse
Data Warehouse
AttaUrRahman78
 
kalyani.ppt
kalyani.pptkalyani.ppt
kalyani.ppt
GenrlUse1
 
An introduction to data warehousing
An introduction to data warehousingAn introduction to data warehousing
An introduction to data warehousing
Shahed Khalili
 
Various Applications of Data Warehouse.ppt
Various Applications of Data Warehouse.pptVarious Applications of Data Warehouse.ppt
Various Applications of Data Warehouse.ppt
RafiulHasan19
 
dataWarehouse.pptx
dataWarehouse.pptxdataWarehouse.pptx
dataWarehouse.pptx
hqlm1
 
Introduction to Data Warehousing
Introduction to Data WarehousingIntroduction to Data Warehousing
Introduction to Data Warehousing
Gurpreet Singh Sachdeva
 
CS101- Introduction to Computing- Lecture 37
CS101- Introduction to Computing- Lecture 37CS101- Introduction to Computing- Lecture 37
CS101- Introduction to Computing- Lecture 37
Bilal Ahmed
 
4- DB Ch6 18-3-2020.pptx
4- DB Ch6 18-3-2020.pptx4- DB Ch6 18-3-2020.pptx
4- DB Ch6 18-3-2020.pptx
Shoaibmirza18
 
Chapter 5 data resource management
Chapter 5  data resource managementChapter 5  data resource management
Chapter 5 data resource management
Advance Saraswati Prakashan Pvt Ltd
 
Datawarehouse
DatawarehouseDatawarehouse
RowanDay4.pptx
RowanDay4.pptxRowanDay4.pptx
RowanDay4.pptx
MattMarino13
 

Similar to Asper database presentation - Data Modeling Topics (20)

Dimensional modeling primer
Dimensional modeling primerDimensional modeling primer
Dimensional modeling primer
 
Dimensional modeling primer - SQL Saturday Madison - April 11th, 2015
Dimensional modeling primer - SQL Saturday Madison - April 11th, 2015Dimensional modeling primer - SQL Saturday Madison - April 11th, 2015
Dimensional modeling primer - SQL Saturday Madison - April 11th, 2015
 
DATA WAREHOUSING
DATA WAREHOUSINGDATA WAREHOUSING
DATA WAREHOUSING
 
Dataware house Introduction By Quontra Solutions
Dataware house Introduction By Quontra SolutionsDataware house Introduction By Quontra Solutions
Dataware house Introduction By Quontra Solutions
 
Dataware house introduction by InformaticaTrainingClasses
Dataware house introduction by InformaticaTrainingClassesDataware house introduction by InformaticaTrainingClasses
Dataware house introduction by InformaticaTrainingClasses
 
DW (1).ppt
DW (1).pptDW (1).ppt
DW (1).ppt
 
Big Data Warehousing Meetup: Dimensional Modeling Still Matters!!!
Big Data Warehousing Meetup: Dimensional Modeling Still Matters!!!Big Data Warehousing Meetup: Dimensional Modeling Still Matters!!!
Big Data Warehousing Meetup: Dimensional Modeling Still Matters!!!
 
Data Warehouse
Data WarehouseData Warehouse
Data Warehouse
 
kalyani.ppt
kalyani.pptkalyani.ppt
kalyani.ppt
 
Data Warehouse
Data WarehouseData Warehouse
Data Warehouse
 
kalyani.ppt
kalyani.pptkalyani.ppt
kalyani.ppt
 
An introduction to data warehousing
An introduction to data warehousingAn introduction to data warehousing
An introduction to data warehousing
 
Various Applications of Data Warehouse.ppt
Various Applications of Data Warehouse.pptVarious Applications of Data Warehouse.ppt
Various Applications of Data Warehouse.ppt
 
dataWarehouse.pptx
dataWarehouse.pptxdataWarehouse.pptx
dataWarehouse.pptx
 
Introduction to Data Warehousing
Introduction to Data WarehousingIntroduction to Data Warehousing
Introduction to Data Warehousing
 
CS101- Introduction to Computing- Lecture 37
CS101- Introduction to Computing- Lecture 37CS101- Introduction to Computing- Lecture 37
CS101- Introduction to Computing- Lecture 37
 
4- DB Ch6 18-3-2020.pptx
4- DB Ch6 18-3-2020.pptx4- DB Ch6 18-3-2020.pptx
4- DB Ch6 18-3-2020.pptx
 
Chapter 5 data resource management
Chapter 5  data resource managementChapter 5  data resource management
Chapter 5 data resource management
 
Datawarehouse
DatawarehouseDatawarehouse
Datawarehouse
 
RowanDay4.pptx
RowanDay4.pptxRowanDay4.pptx
RowanDay4.pptx
 

More from Terry Bunio

Uof m empathys role
Uof m empathys roleUof m empathys role
Uof m empathys role
Terry Bunio
 
Ictam big data
Ictam big dataIctam big data
Ictam big data
Terry Bunio
 
Data modeling tips from the trenches
Data modeling tips from the trenchesData modeling tips from the trenches
Data modeling tips from the trenches
Terry Bunio
 
#YesEstimates
#YesEstimates#YesEstimates
#YesEstimates
Terry Bunio
 
Pr dc 2015 sql server is cheaper than open source
Pr dc 2015 sql server is cheaper than open sourcePr dc 2015 sql server is cheaper than open source
Pr dc 2015 sql server is cheaper than open source
Terry Bunio
 
Breaking data
Breaking dataBreaking data
Breaking data
Terry Bunio
 
Ssrs and sharepoint there and back again - SQL SAT Fargo
Ssrs and sharepoint   there and back again - SQL SAT FargoSsrs and sharepoint   there and back again - SQL SAT Fargo
Ssrs and sharepoint there and back again - SQL SAT Fargo
Terry Bunio
 
A data driven etl test framework sqlsat madison
A data driven etl test framework sqlsat madisonA data driven etl test framework sqlsat madison
A data driven etl test framework sqlsat madison
Terry Bunio
 
SSRS and Sharepoint there and back again
SSRS and Sharepoint   there and back againSSRS and Sharepoint   there and back again
SSRS and Sharepoint there and back again
Terry Bunio
 
Role of an agile pm
Role of an agile pmRole of an agile pm
Role of an agile pm
Terry Bunio
 
Estimating 101
Estimating 101Estimating 101
Estimating 101
Terry Bunio
 
Introduction to lean and agile
Introduction to lean and agileIntroduction to lean and agile
Introduction to lean and agile
Terry Bunio
 
Pmi june 5th 2007
Pmi june 5th 2007Pmi june 5th 2007
Pmi june 5th 2007
Terry Bunio
 
Pmi sac november 20
Pmi sac november 20Pmi sac november 20
Pmi sac november 20
Terry Bunio
 
Iiba.november.09
Iiba.november.09Iiba.november.09
Iiba.november.09
Terry Bunio
 
Sdec11 when user stories are not enough
Sdec11 when user stories are not enoughSdec11 when user stories are not enough
Sdec11 when user stories are not enough
Terry Bunio
 
Sdec10 lean AMS
Sdec10 lean AMSSdec10 lean AMS
Sdec10 lean AMS
Terry Bunio
 
Sdec09 kick off to deployment in 92days
Sdec09 kick off to deployment in 92daysSdec09 kick off to deployment in 92days
Sdec09 kick off to deployment in 92days
Terry Bunio
 
Sdec10 lean package implementation
Sdec10 lean package implementationSdec10 lean package implementation
Sdec10 lean package implementation
Terry Bunio
 
Role of an agile Project Manager
Role of an agile Project ManagerRole of an agile Project Manager
Role of an agile Project Manager
Terry Bunio
 

More from Terry Bunio (20)

Uof m empathys role
Uof m empathys roleUof m empathys role
Uof m empathys role
 
Ictam big data
Ictam big dataIctam big data
Ictam big data
 
Data modeling tips from the trenches
Data modeling tips from the trenchesData modeling tips from the trenches
Data modeling tips from the trenches
 
#YesEstimates
#YesEstimates#YesEstimates
#YesEstimates
 
Pr dc 2015 sql server is cheaper than open source
Pr dc 2015 sql server is cheaper than open sourcePr dc 2015 sql server is cheaper than open source
Pr dc 2015 sql server is cheaper than open source
 
Breaking data
Breaking dataBreaking data
Breaking data
 
Ssrs and sharepoint there and back again - SQL SAT Fargo
Ssrs and sharepoint   there and back again - SQL SAT FargoSsrs and sharepoint   there and back again - SQL SAT Fargo
Ssrs and sharepoint there and back again - SQL SAT Fargo
 
A data driven etl test framework sqlsat madison
A data driven etl test framework sqlsat madisonA data driven etl test framework sqlsat madison
A data driven etl test framework sqlsat madison
 
SSRS and Sharepoint there and back again
SSRS and Sharepoint   there and back againSSRS and Sharepoint   there and back again
SSRS and Sharepoint there and back again
 
Role of an agile pm
Role of an agile pmRole of an agile pm
Role of an agile pm
 
Estimating 101
Estimating 101Estimating 101
Estimating 101
 
Introduction to lean and agile
Introduction to lean and agileIntroduction to lean and agile
Introduction to lean and agile
 
Pmi june 5th 2007
Pmi june 5th 2007Pmi june 5th 2007
Pmi june 5th 2007
 
Pmi sac november 20
Pmi sac november 20Pmi sac november 20
Pmi sac november 20
 
Iiba.november.09
Iiba.november.09Iiba.november.09
Iiba.november.09
 
Sdec11 when user stories are not enough
Sdec11 when user stories are not enoughSdec11 when user stories are not enough
Sdec11 when user stories are not enough
 
Sdec10 lean AMS
Sdec10 lean AMSSdec10 lean AMS
Sdec10 lean AMS
 
Sdec09 kick off to deployment in 92days
Sdec09 kick off to deployment in 92daysSdec09 kick off to deployment in 92days
Sdec09 kick off to deployment in 92days
 
Sdec10 lean package implementation
Sdec10 lean package implementationSdec10 lean package implementation
Sdec10 lean package implementation
 
Role of an agile Project Manager
Role of an agile Project ManagerRole of an agile Project Manager
Role of an agile Project Manager
 

Recently uploaded

Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You...
Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You...Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You...
Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You...
Aggregage
 
Influence of Marketing Strategy and Market Competition on Business Plan
Influence of Marketing Strategy and Market Competition on Business PlanInfluence of Marketing Strategy and Market Competition on Business Plan
Influence of Marketing Strategy and Market Competition on Business Plan
jerlynmaetalle
 
Palo Alto Cortex XDR presentation .......
Palo Alto Cortex XDR presentation .......Palo Alto Cortex XDR presentation .......
Palo Alto Cortex XDR presentation .......
Sachin Paul
 
The Building Blocks of QuestDB, a Time Series Database
The Building Blocks of QuestDB, a Time Series DatabaseThe Building Blocks of QuestDB, a Time Series Database
The Building Blocks of QuestDB, a Time Series Database
javier ramirez
 
STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...
STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...
STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...
sameer shah
 
在线办理(英国UCA毕业证书)创意艺术大学毕业证在读证明一模一样
在线办理(英国UCA毕业证书)创意艺术大学毕业证在读证明一模一样在线办理(英国UCA毕业证书)创意艺术大学毕业证在读证明一模一样
在线办理(英国UCA毕业证书)创意艺术大学毕业证在读证明一模一样
v7oacc3l
 
一比一原版(Chester毕业证书)切斯特大学毕业证如何办理
一比一原版(Chester毕业证书)切斯特大学毕业证如何办理一比一原版(Chester毕业证书)切斯特大学毕业证如何办理
一比一原版(Chester毕业证书)切斯特大学毕业证如何办理
74nqk8xf
 
Challenges of Nation Building-1.pptx with more important
Challenges of Nation Building-1.pptx with more importantChallenges of Nation Building-1.pptx with more important
Challenges of Nation Building-1.pptx with more important
Sm321
 
ViewShift: Hassle-free Dynamic Policy Enforcement for Every Data Lake
ViewShift: Hassle-free Dynamic Policy Enforcement for Every Data LakeViewShift: Hassle-free Dynamic Policy Enforcement for Every Data Lake
ViewShift: Hassle-free Dynamic Policy Enforcement for Every Data Lake
Walaa Eldin Moustafa
 
Population Growth in Bataan: The effects of population growth around rural pl...
Population Growth in Bataan: The effects of population growth around rural pl...Population Growth in Bataan: The effects of population growth around rural pl...
Population Growth in Bataan: The effects of population growth around rural pl...
Bill641377
 
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
Timothy Spann
 
My burning issue is homelessness K.C.M.O.
My burning issue is homelessness K.C.M.O.My burning issue is homelessness K.C.M.O.
My burning issue is homelessness K.C.M.O.
rwarrenll
 
一比一原版(UCSB文凭证书)圣芭芭拉分校毕业证如何办理
一比一原版(UCSB文凭证书)圣芭芭拉分校毕业证如何办理一比一原版(UCSB文凭证书)圣芭芭拉分校毕业证如何办理
一比一原版(UCSB文凭证书)圣芭芭拉分校毕业证如何办理
nuttdpt
 
4th Modern Marketing Reckoner by MMA Global India & Group M: 60+ experts on W...
4th Modern Marketing Reckoner by MMA Global India & Group M: 60+ experts on W...4th Modern Marketing Reckoner by MMA Global India & Group M: 60+ experts on W...
4th Modern Marketing Reckoner by MMA Global India & Group M: 60+ experts on W...
Social Samosa
 
University of New South Wales degree offer diploma Transcript
University of New South Wales degree offer diploma TranscriptUniversity of New South Wales degree offer diploma Transcript
University of New South Wales degree offer diploma Transcript
soxrziqu
 
办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样
办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样
办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样
apvysm8
 
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
Timothy Spann
 
Intelligence supported media monitoring in veterinary medicine
Intelligence supported media monitoring in veterinary medicineIntelligence supported media monitoring in veterinary medicine
Intelligence supported media monitoring in veterinary medicine
AndrzejJarynowski
 
一比一原版(GWU,GW文凭证书)乔治·华盛顿大学毕业证如何办理
一比一原版(GWU,GW文凭证书)乔治·华盛顿大学毕业证如何办理一比一原版(GWU,GW文凭证书)乔治·华盛顿大学毕业证如何办理
一比一原版(GWU,GW文凭证书)乔治·华盛顿大学毕业证如何办理
bopyb
 
End-to-end pipeline agility - Berlin Buzzwords 2024
End-to-end pipeline agility - Berlin Buzzwords 2024End-to-end pipeline agility - Berlin Buzzwords 2024
End-to-end pipeline agility - Berlin Buzzwords 2024
Lars Albertsson
 

Recently uploaded (20)

Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You...
Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You...Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You...
Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You...
 
Influence of Marketing Strategy and Market Competition on Business Plan
Influence of Marketing Strategy and Market Competition on Business PlanInfluence of Marketing Strategy and Market Competition on Business Plan
Influence of Marketing Strategy and Market Competition on Business Plan
 
Palo Alto Cortex XDR presentation .......
Palo Alto Cortex XDR presentation .......Palo Alto Cortex XDR presentation .......
Palo Alto Cortex XDR presentation .......
 
The Building Blocks of QuestDB, a Time Series Database
The Building Blocks of QuestDB, a Time Series DatabaseThe Building Blocks of QuestDB, a Time Series Database
The Building Blocks of QuestDB, a Time Series Database
 
STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...
STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...
STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...
 
在线办理(英国UCA毕业证书)创意艺术大学毕业证在读证明一模一样
在线办理(英国UCA毕业证书)创意艺术大学毕业证在读证明一模一样在线办理(英国UCA毕业证书)创意艺术大学毕业证在读证明一模一样
在线办理(英国UCA毕业证书)创意艺术大学毕业证在读证明一模一样
 
一比一原版(Chester毕业证书)切斯特大学毕业证如何办理
一比一原版(Chester毕业证书)切斯特大学毕业证如何办理一比一原版(Chester毕业证书)切斯特大学毕业证如何办理
一比一原版(Chester毕业证书)切斯特大学毕业证如何办理
 
Challenges of Nation Building-1.pptx with more important
Challenges of Nation Building-1.pptx with more importantChallenges of Nation Building-1.pptx with more important
Challenges of Nation Building-1.pptx with more important
 
ViewShift: Hassle-free Dynamic Policy Enforcement for Every Data Lake
ViewShift: Hassle-free Dynamic Policy Enforcement for Every Data LakeViewShift: Hassle-free Dynamic Policy Enforcement for Every Data Lake
ViewShift: Hassle-free Dynamic Policy Enforcement for Every Data Lake
 
Population Growth in Bataan: The effects of population growth around rural pl...
Population Growth in Bataan: The effects of population growth around rural pl...Population Growth in Bataan: The effects of population growth around rural pl...
Population Growth in Bataan: The effects of population growth around rural pl...
 
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
 
My burning issue is homelessness K.C.M.O.
My burning issue is homelessness K.C.M.O.My burning issue is homelessness K.C.M.O.
My burning issue is homelessness K.C.M.O.
 
一比一原版(UCSB文凭证书)圣芭芭拉分校毕业证如何办理
一比一原版(UCSB文凭证书)圣芭芭拉分校毕业证如何办理一比一原版(UCSB文凭证书)圣芭芭拉分校毕业证如何办理
一比一原版(UCSB文凭证书)圣芭芭拉分校毕业证如何办理
 
4th Modern Marketing Reckoner by MMA Global India & Group M: 60+ experts on W...
4th Modern Marketing Reckoner by MMA Global India & Group M: 60+ experts on W...4th Modern Marketing Reckoner by MMA Global India & Group M: 60+ experts on W...
4th Modern Marketing Reckoner by MMA Global India & Group M: 60+ experts on W...
 
University of New South Wales degree offer diploma Transcript
University of New South Wales degree offer diploma TranscriptUniversity of New South Wales degree offer diploma Transcript
University of New South Wales degree offer diploma Transcript
 
办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样
办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样
办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样
 
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
 
Intelligence supported media monitoring in veterinary medicine
Intelligence supported media monitoring in veterinary medicineIntelligence supported media monitoring in veterinary medicine
Intelligence supported media monitoring in veterinary medicine
 
一比一原版(GWU,GW文凭证书)乔治·华盛顿大学毕业证如何办理
一比一原版(GWU,GW文凭证书)乔治·华盛顿大学毕业证如何办理一比一原版(GWU,GW文凭证书)乔治·华盛顿大学毕业证如何办理
一比一原版(GWU,GW文凭证书)乔治·华盛顿大学毕业证如何办理
 
End-to-end pipeline agility - Berlin Buzzwords 2024
End-to-end pipeline agility - Berlin Buzzwords 2024End-to-end pipeline agility - Berlin Buzzwords 2024
End-to-end pipeline agility - Berlin Buzzwords 2024
 

Asper database presentation - Data Modeling Topics

  • 2. Agenda • Data Modeling • The Project • Hot DB topics • Relational vs Dimensional • Dimensional concepts – Facts – Dimensions • Complex Concept Introduction
  • 3. What is Data Modeling?
  • 4. Definition • “A database model is a specification describing how a database is structured and used” – Wikipedia
  • 5. Definition • “A database model is a specification describing how a database is structured and used” – Wikipedia • “A data model describes how the data entities are related to each other in the real world” - Terry
  • 6. Data Model Characteristics • Organize/Structure like Data Elements • Define relationships between Data Entities • Highly Cohesive • Loosely Coupled
  • 7. The Project • Major Health Service provider is switching their claims system to SAP • As part of this, they are totally redeveloping their entire Data Warehouse solution
  • 8. The Project • 3+ years duration • 100+ Integration Projects • 200+ Systems Personnel
  • 9. Integration Project Streams • Client Administration – Policy Systems • Data Warehouse • Legacy – Conversion from Legacy • Queries – Queries for internal and external • Web – External Web Applications
  • 10. Data Warehouse Team • Terry – Data Architect/Modeler and PM • Hanaa – Lead Analyst and Data Analyst • Kevin – Lead Data Migration Developer • Lisa – Lead Report Analyst • Les – Lead Report Developer
  • 11. Current State • Sybase Data Warehouse – Combination of Normalized and Dimensional design • Data Migration – Series of SQL Scripts that move data from Legacy (Cobol) and Java Applications • Impromptu – 1000+ Reports
  • 12. Target State • SQL Server 2012 • SQL Server Integration Services for Data Migration • SQL Server Reporting Services for Report Development • Sharepoint for Report Portal
  • 13. Target Solution • Initial load moves 2.5 Terabytes of data • Initial load runs once • Incremental load runs every hour
  • 14. Target Solution • Operational Data Store – Normalized – 400+ tables • Data Warehouse – Dimensional – 60+ tables • Why both? – ODS does not have history (Just Transactions)
  • 15. Our #1 Challenge • We needed to be Agile like the other projects! – We are now on revision 3500 • We spent weeks planning of how to be flexible • Instead of spending time planning, we spent time planning how we could quickly change and adapt • This also meant we created a new automated test framework
  • 16.
  • 17. Beef? • Where are the hot topics like – Big Data – NoSQL – MySQL – Data Warehouse Appliances – Cloud – Open Source Databases
  • 18. Big Data • “Commercial Databases” have come a long way to handle large data volume • Big Data is still important but probably is not required for the vast majority of databases – But it applicable for the Facebooks and Amazons out there
  • 19. Big Data • For example, many of the Big Data solutions featured ColumnStore Indexing • Now almost all commercial databases offer ColumnStore Indexes
  • 20. NoSQL • NoSQL was heralded a few years ago as the death of structured databases • Mainly promoted from the developer community • Seems to have found a niche for supporting more mainly unstructured and dynamic data • Traditional databases still the most efficient for structured data
  • 21. MySQL • MySQL was also promoted as a great lightweight, high performance option’ • We actually investigated it as an option for the project • Great example of never trusting what you hear
  • 22. MySQL • All of the great MySQL benchmarks use the simplest database engine with no ACID compliance – MySQL has the option to use different engines with different features • Once you use the ACID compliant engine, the performance is equivalent(or worse) to SQL Server and PostgreSQL
  • 23. Data Warehouse Appliances • “marketing term for an integrated set of servers, storage, operating system(s), DBMS and software specifically pre-installed and pre- optimized for data warehousing”
  • 24. Data Warehouse Appliances • Recently in the Data Warehouse Industry, there has been the rise of a the Data Warehouse appliances • These appliances are a one-stop solution that builds in Big Data capabilities
  • 25. Data Warehouse Appliances • Cool Names like: – Teradata – GreenPlum – Netezza – InfoSphere – EMC • Like Big Data these solution are valuable if you need to play in the Big Data/Big Analysis arena • Most solutions don’t require them
  • 26. Cloud • Great to store pictures and music – the concept still makes businesses nervous – Also regulatory requirements sometime prevent it • Business is starting to become more comfortable – Still a ways to go • Very few business go to the Cloud unless they have to – Amazon/Microsoft is changing this with their services
  • 27. Open Source Databases • We investigated Open Sources databases for our solution. We looked at: – MySQL – PostgreSQL – others
  • 28. Open Sources Databases • We were surprised to learn that once you factor in all the things you get from SQL Server, it actually is cheaper over 10 years than Open Source!! • So we select SQL Server
  • 30. Two design methods • Relational – “Database normalization is the process of organizing the fields and tables of a relational database to minimize redundancy and dependency. Normalization usually involves dividing large tables into smaller (and less redundant) tables and defining relationships between them. The objective is to isolate data so that additions, deletions, and modifications of a field can be made in just one table and then propagated through the rest of the database via the defined relationships.”.”
  • 31. Two design methods • Dimensional – “Dimensional modeling always uses the concepts of facts (measures), and dimensions (context). Facts are typically (but not always) numeric values that can be aggregated, and dimensions are groups of hierarchies and descriptors that define the facts
  • 33. Relational • Relational Analysis – Database design is usually in Third Normal Form – Database is optimized for transaction processing. (OLTP) – Normalized tables are optimized for modification rather than retrieval
  • 34. Normal forms • 1st - Under first normal form, all occurrences of a record type must contain the same number of fields. • 2nd - Second normal form is violated when a non- key field is a fact about a subset of a key. It is only relevant when the key is composite • 3rd - Third normal form is violated when a non-key field is a fact about another non-key field Source: William Kent - 1982
  • 36. Dimensional • Dimensional Analysis – Star Schema/Snowflake – Database is optimized for analytical processing. (OLAP) – Facts and Dimensions optimized for retrieval • Facts – Business events – Transactions • Dimensions – context for Transactions – People – Accounts – Products – Date
  • 37. Relational • 3 Dimensions • Spatial Model – No historical components except for transactional tables • Relational – Models the one truth of the data – One account ‘11’ – One person ‘Terry Bunio’ – One transaction of ‘$100.00’ on April 10th
  • 38. Dimensional • 4 Dimensions • Temporal Model – All tables have a time component • Dimensional – Models the one truth of the data at a point in time – Multiple versions of Accounts over time – Multiple versions of people over time – One transaction • Transactions are already temporal
  • 39. Fact Tables • Contains the measurements or facts about a business process • Are thin and deep • Usually is: – Business transaction – Business Event • The grain of a Fact table is the level of the data recorded – Order, Invoice, Invoice Item
  • 40. Special Fact Tables • Degenerate Dimensions – Degenerate Dimensions are Dimensions that can typically provide additional context about a Fact • For example, flags that describe a transaction • Degenerate Dimensions can either be a separate Dimension table or be collapsed onto the Fact table – My preference is the latter
  • 41. Dimension Tables • Unlike fact tables, dimension tables contain descriptive attributes that are typically textual fields • These attributes are designed to serve two critical purposes: – query constraining and/or filtering – query result set labeling. Source: Wikipedia
  • 42. Dimension Tables • Shallow and Wide • Usually corresponds to entities that the business interacts with – People – Locations – Products – Accounts
  • 43. Time Dimension • All Dimensional Models need a time component • This is either a: – Separate Time Dimension (recommended) – Time attributes on each Fact Table
  • 45. Mini-Dimensions • Splitting a Dimension up due to the activity of change for a set of attributes • Helps to reduce the growth of the Dimension table
  • 46. Slowly Changing Dimensions • Type 1 – Overwrite the row with the new values and update the effective date – Pre-existing Facts now refer to the updated Dimension – May cause inconsistent reports
  • 47. Slowly Changing Dimensions • Type 2 – Insert a new Dimension row with the new data and new effective date – Update the expiry date on the prior row • Don’t update old Facts that refer to the old row – Only new Facts will refer to this new Dimension row • Type 2 Slowly Changing Dimension maintains the historical context of the data
  • 48. Slowly Changing Dimensions • No longer to I have one row to represent: – Account 10123 – Terry Bunio – Sales Representative 11092 • This changes the mindset and query syntax to retrieve data
  • 49. Slowly Changing Dimensions • Type 3 – The Dimension stores multiple versions for the attribute in question • This usually involves a current and previous value for the attribute • When a change occurs, no rows are added but both the current and previous attributes are updated • Like Type 1, Type 3 does not retain full historical context
  • 50. Complexity • Most textbooks stop here only show the simplest Dimensional Models • Unfortunately, I’ve never run into a Dimensional Model like that
  • 51. Complex Concept Introduction • Snowflake vs Star Schema • Multi-Valued Dimensions and Bridges • Recursive Hierarchies
  • 55. Snowflake vs Star Schema • These extra tables are termed outriggers • They are used to address real world complexities with the data – Excessive row length – Repeating groups of data within the Dimension • I will use outriggers in a limited way for repeating data
  • 56. Multi-Valued Dimensions • Multi-Valued Dimensions are when a Fact needs to connect more than once to a Dimension – Primary Sales Representative – Secondary Sales Representative
  • 57. Multi-Valued Dimensions • Two possible solutions – Create copies of the Dimensions for each role – Create a Bridge table to resolve the many to many relationship
  • 60. Bridge Tables • Bridge Tables can be used to resolve any many to many relationships • This is frequently required with more complex data areas • These bridge tables need to be considered a Dimension and they need to use the same Slowly Changing Dimension Design as the base Dimension – My Recommendation
  • 62. Why? • Why Dimensional Model? • Allows for a concise representation of data for reporting. This is especially important for Self-Service Reporting – We reduced from 400+ tables in our Operational Data Store to 60+ tables in our Data Warehouse – Aligns with real world business concepts
  • 63. Why? • The most important reason – – Requires detailed understanding of the data – Validates the solution – Uncovers inconsistencies and errors in the Normalized Model • Easy for inconsistencies and errors to hide in 400+ tables • No place to hide when those tables are reduced down
  • 64. Why? • Ultimately there must be a business requirement for a temporal data model and not just a spatial one. • Although you could go through the exercise to validate your understanding and not implement the Dimensional Data Model
  • 65. How? • Start with your simplest Dimension and Fact tables and define the Natural Keys for them – i.e. People, Product, Transaction, Time • De-Normalize Reference tables to Dimensions (And possibly Facts based on how large the Fact tables will be) – I place both codes and descriptions on the Dimension and Fact tables • Look to De-normalize other tables with the same Cardinality into one Dimension – Validate the Natural Keys still define one row
  • 66. How? • Don’t force entities on the same Dimension – Tempting but you will find it doesn’t represent the data and will cause issues for loading or retrieval – Bridge table or mini-snowflakes are not bad • I don’t like a deep snowflake, but shallow snowflakes can be appropriate • Don’t fall into the Star-Schema/Snowflake Holy War – Let your data define the solution
  • 67. How? • Iterate, Iterate, Iterate – Your initial solution will be wrong – Create it and start to define the load process and reports – You will learn more by using the data than months of analysis to try and get the model right
  • 68. Two things to Ponder
  • 69. Two things to Ponder • In the Information Age ahead, databases will be used more for analysis than operational – More Dimensional Models and analytical processes
  • 70. Two things to Ponder • Critical skills going forward will be: – Data Modeling/Data Architecture – Data Migration • There is a whole subject area here for a subsequent presentation. More of an art than science – Data Verbalization • Again a real art form to take a huge amount of data and present it in a readable form