SlideShare a Scribd company logo
1 of 47
Download to read offline
Data Warehouse Design
& Dimensional Modeling
Aaron Lowe
Principal Consultant
@Vendoran
@SQLFriends
Who am I
» Aaron Lowe
  » Husband
  » Father of 5
  » Principal Consultant at Magenic
  » Working with SQL Server since 1998, version 6.5
  » MCITP 2005 and 2008
  » Co-organizer of SQLSaturday Chicago
  » Masters in Information Systems Management
  » www.aaronlowe.net / @Vendoran
  » sqlfriends.org / @SQLFriends
Data, Data everywhere, but not a drop of
Information




             http://www.flickr.com/photos/walkingsf/5993167874/
The Data Person




             http://www.flickr.com/photos/tantek/1360323838/
How can we get more out of our data?




         http://www.flickr.com/photos/danahlongley/4472897115/
Leverage data to provide business insight




           http://www.flickr.com/photos/juhansonin/4646203016/
Create a new Data Model in a Data Warehouse
Why a new Data Model?
What do we need?
Information – not just data
» Collecting data
  » Log Files
  » Clicks
  » How long?
  » How much?
» Prediction?
  » How Target Figured out Teen was pregnant -
    http://www.forbes.com/sites/kashmirhill/2012/02/16/how-target-figured-
    out-a-teen-girl-was-pregnant-before-her-father-did/
  » The Numerati - http://www.amazon.com/The-Numerati-Stephen-
    Baker/dp/B003TO6G20/ - published 2009!
Relate data from multiple systems
» The purpose of a data warehouse is to house standardized, structured, consistent,
  integrated, correct, cleansed and timely data, extracted from various operational
  systems in an organization

» True picture of the business process

» Source Systems
  » Financial – AR/AP
  » Sales
  » CRM
  » HR
  » Application
Fast
» It’s my information and I want it Now!
» Empower Users
» Exploratory
» Reads
» Large datasets
Why won’t existing models work?
What are they designed for?
» Operational
   » Preservation of data integrity
   » Speed of recording of business transactions
   » Often Many tables


» To free the collection of relations from undesirable insertion, update and deletion dependencies;
» To reduce the need for restructuring the collection of relations, as new types of data are introduced,
  and thus increase the life span of application programs;
» To make the relational model more informative to users;
» To make the collection of relations neutral to the query statistics, where these statistics are liable to
  change as time goes by.
» —E.F. Codd, "Further Normalization of the Data Base Relational Model"
Consistent
» Partial data across
  » Have the sale in the sales system
  » Represented in the inventory system
  » Don’t have the $ in the financial system yet

» Deleted on sources
  » Removed transactions
  » Archive
  » Legally destroy records can remove work product

» Incomplete data on source
  » Changes over time
Silo’d
» How do we get the entire picture?


» Example:
  » Cost of Sales?
     » Sales system – Sale Price
     » Marketing System – $$ spent on Marketing
     » Inventory System – $$ spent on inventory
     » HR System – $$ spent on Employee
     » IT Systems – $$ spent on Infrastructure
What will work?




         http://www.flickr.com/photos/d-y-f/2870942257/
Designed for Users
» De-normalized
  » Fast Reads
  » Fast Reports
  » Limited JOINs

» Information
  » Scheduled
  » On Demand
  » Exploratory

» Information
  » Cross Functional
  » The more the better!
Inter-related data
» Specifications for my Current Data Warehouse




            http://www.flickr.com/photos/ross_goodman/3276964270/
Independent from Operational
» Operational systems change
  » Data will outlive Application
  » Crashes
  » Upgrades
  » Breaking changes


» Single Source of truth
Logical Data Model




           http://www.flickr.com/photos/doctorlizardo/6812846803/
Terminology




       http://www.flickr.com/photos/doctorlizardo/6809564765/
Metadata Management
» Business metadata
  » What’s out there?
  » Identify/Define
      » Overloaded terms
      » What is a customer?
» Process metadata
  » DW process operations
  » Asses system status
  » Investigate problems
» Technical metadata
  » Tables
  » Fields
  » Datatypes
Dimensions and Facts
Dimensions                                     Facts

Thing/Objects                                  Measurements/Events

Nouns                                          Verbs

Wide but short                                 Skinny but long

Rows can exist independently                   Rows cannot exist independently

Descriptive                                    Mostly Numeric and Additive

                               “By” words – FACT by Dimension

                       Quantity Ordered by Product by Customer by Date
Grain
• Level of detail
• What is needed to meet business
  requirements?
• What is possible to collect?
• How do you describe it?

• One row per X where X is the business
  event
   • One row per customer call
   • One row per time sheet entry
   • One row per employee status
     change
   • One row per order line item
                http://www.flickr.com/photos/frederikvanroest/3842334310/
Methodology




       http://www.flickr.com/photos/doctorlizardo/6812847973/
Requirements – business focused
» “Must embrace the goal of enhancing business value as the primary purpose.” –
  Kimball


» “If your job is BI and you speak mostly to technical people all day, you are doing
  it wrong. Focus on first word - BUSINESS.” – Whitney Weaver (former
  Magenicon)

» Never ask “What do you want in the data warehouse?” Only one right answer -
  “Everything.”

» Ask questions that help you learn what the end user does
Kimball v. Inmon
Ralph Kimball                              Bill Inmon
Kimballites                                Inmonites
Bottom Up                                  Top Down
Dimensional                                Normalized
Star Schema                                3rd Normal Form
Easier for the User                        More Difficult for the Users
Few JOINs                                  Many JOINs
Dimension/Facts                            Entities
Complicated ETL                            Not as complicated ETL
Difficult to modify structure              Easier to adapt
                                Not mutually Exclusive
Star vs. Snowflake
Star                      Snowflake
ER resembles Star         ER resembles Snowflake

Easier for the User       More Difficult for the Users

Few JOINs                 Many JOINs
Faster Aggregations       Slower Aggregations

                          Children with multiple
                          parent tables

                          Normalized Dimensions

           Snowflake is a variation on a Star,
                  not an alternative      http://www.flickr.com/photos/wandrus/6283157711/
History (ology?)




           http://www.flickr.com/photos/doctorlizardo/6809564335/
Dimension Types
» 0 – Inserts only, no updates or delete
» 1 – Insert and updated to reflect current state
» 2 – Slowly Changing Dimension (SCD)- multiple records to indicate different points in time
                        Source Key        Value              StartDate      EndDate
                        14                Blue               2012-01-01     2012-03-01
                        14                Green              2012-03-02
» 3 – multiple columns to indicate different point in time
                       Source Key       Value            OldValue         EffectiveDate
                       14               Green            Blue             2012-03-02

» 4 – current value table and a history table
» UNKNOWN values
Date and Time
» Date
  » Fundamental dimensions across all organizations and industries
  » Allows for trending across dates or periods
  » 1 row for every date in the years = 365 or 366 row/year
  » Use your words
      »   WeekDay
      »   EndofMonth
      »   Quarter
      »   FiscalYear?
» Time
  » Not often needed, but becoming more popular
  » Allows for time based analysis for things like Status
  » 1 row for every time slice in a day – minutes? Seconds?
Surrogate Keys
» New set of keys in the DW


» Protects against
  » Source systems changes
  » Single key for multiple source systems
  » New rows that only exist in DW (UNKNOWN)
  » Tracking over time (SCD)
Physical Data Model




           http://www.flickr.com/photos/flying_cloud/2667218708/
Approach




           http://www.flickr.com/photos/7506006@N07/7021456259/
Null – yay or nay
» Same discussion as OLTP with a twist
  » Purpose of DW is for reporting
  » Building on top of with :
     » SSIS
     » SSAS
  » Purpose of the Dimension UNKNOWN values


» Best practice is to avoid if you can, otherwise document
  » Some have separate values for UNKNOWN and NOT POPULATED
  » Default value instead
Aggregates
» Minimize number of aggregates while maximize effectiveness

» Store or

» Can aggregate Facts

» Roll-up Dimension hierarchies?

» Can still be relational to other tables when necessary
Hierarchies
» Example:
»      Date - Roll up by Month, Quarter or Year
                           Key          Day           Month         Quarter   Year
                           364          30            12            4         2011
                           365          31            12            4         2011
                           366          1             1             1         2012
                           367          2             1             1         2012

» Variable depth – Self-referencing

» Variable depth with historical – changing surrogate keys – ouch
  » Track business process separately
Size Matters




      http://starwars.wikia.com/wiki/Rancor?image=Rancor-jpg
Data amount and size
» Data Types?
  » BLOB data?
  » Identity columns (do you need bigint?)

» Data Profiling
  » Collect source system sizes for data bringing over
  » Add sizes of new row

» Don’t forget index size!!!
Partitioning
» Usually lends naturally to partitioning large Fact tables by Date


» Larger Dimension tables can be partitioned as well


» Sometimes Old (SQL 2000) Partitioning is still better than SQL 2005+
  partitioning

» Take ETL process into consideration
Archiving
» Question: When is big too big?


» Answer: When performance impact outweighs need for data availability


» Many options:
  » Backup to tape offline
  » keep “Archived” DW available
» Records Retention – this could be your work product
Performance




          http://www.flickr.com/photos/elfidomx/6026943114/
Hardware
» Remember when the user said “It’s my data and I want it now”?
» Buy
  » Reference Architecture (Fast Track)
  » Appliances
      » HP
        »    Enterprise Data Warehouse
        »    Business Decision
        »    Business Data Warehouse
        »    Enterprise Database Consolidation
      » Dell
        » PDW
» Build
  » Reference Architecture (Fast Track)
  » SQLIO
  » Benchmark
Throughput
»   Amounts of data
    » Not all of it will be in memory
    » Between ETL and reports, SP Cache might not be efficient
    » Need to tune those disks

»   Reference Architecture(Fast Track)
    » Accepts that Procedure cache will stink due to data sizes
    » Instead small amount of RAM
    » Requires bandwidth of 400 GB/s per LUN

»   Materialize data that makes reporting faster!!
    » More Denormalization
    » More Aggregations

»   ReadOnly while not processing ETLs? (switch)
Parallelism
» Multiple Data Files
  » SQL writes proportional fill

» Multiple Filegroups
  » Partitioning scheme
  » Facts/Dimensions
  » Tables that are often joined
  » Big tables
  » NCIX vs. data

» Multiple LUNs
  » I am not a SAN admin nor play one on TV
» Normal SQL performance
Questions and Discussion time!

More Related Content

What's hot

Dimensional modeling primer
Dimensional modeling primerDimensional modeling primer
Dimensional modeling primerTerry Bunio
 
04 Dimensional Analysis - v6
04 Dimensional Analysis - v604 Dimensional Analysis - v6
04 Dimensional Analysis - v6Prithwis Mukerjee
 
The Data Warehouse Lifecycle
The Data Warehouse LifecycleThe Data Warehouse Lifecycle
The Data Warehouse Lifecyclebartlowe
 
Data Warehouse Project Report
Data Warehouse Project Report Data Warehouse Project Report
Data Warehouse Project Report Tom Donoghue
 
Multidimentional data model
Multidimentional data modelMultidimentional data model
Multidimentional data modeljagdish_93
 
Business Intelligence Data Warehouse System
Business Intelligence Data Warehouse SystemBusiness Intelligence Data Warehouse System
Business Intelligence Data Warehouse SystemKiran kumar
 
Data Warehouse Modeling
Data Warehouse ModelingData Warehouse Modeling
Data Warehouse Modelingvivekjv
 
introduction to datawarehouse
introduction to datawarehouseintroduction to datawarehouse
introduction to datawarehousekiran14360
 
Dimensional Modeling Basic Concept with Example
Dimensional Modeling Basic Concept with ExampleDimensional Modeling Basic Concept with Example
Dimensional Modeling Basic Concept with ExampleSajjad Zaheer
 
Dimensional modelling-mod-3
Dimensional modelling-mod-3Dimensional modelling-mod-3
Dimensional modelling-mod-3Malik Alig
 
Become BI Architect with 1KEY Agile BI Suite - OLAP
Become BI Architect with 1KEY Agile BI Suite - OLAPBecome BI Architect with 1KEY Agile BI Suite - OLAP
Become BI Architect with 1KEY Agile BI Suite - OLAPDhiren Gala
 
Data warehousing and business intelligence project report
Data warehousing and business intelligence project reportData warehousing and business intelligence project report
Data warehousing and business intelligence project reportsonalighai
 
Dwdm 2(data warehouse)
Dwdm 2(data warehouse)Dwdm 2(data warehouse)
Dwdm 2(data warehouse)Er Bansal
 
Dimensional data model
Dimensional data modelDimensional data model
Dimensional data modelVnktp1
 
Designing the business process dimensional model
Designing the business process dimensional modelDesigning the business process dimensional model
Designing the business process dimensional modelGersiton Pila Challco
 
Multidimensional Database Design & Architecture
Multidimensional Database Design & ArchitectureMultidimensional Database Design & Architecture
Multidimensional Database Design & Architecturehasanshan
 
Introduction to Dimesional Modelling
Introduction to Dimesional ModellingIntroduction to Dimesional Modelling
Introduction to Dimesional ModellingAshish Chandwani
 

What's hot (20)

Dimensional modeling primer
Dimensional modeling primerDimensional modeling primer
Dimensional modeling primer
 
04 Dimensional Analysis - v6
04 Dimensional Analysis - v604 Dimensional Analysis - v6
04 Dimensional Analysis - v6
 
Introduction to Data Warehousing
Introduction to Data WarehousingIntroduction to Data Warehousing
Introduction to Data Warehousing
 
The Data Warehouse Lifecycle
The Data Warehouse LifecycleThe Data Warehouse Lifecycle
The Data Warehouse Lifecycle
 
Data Warehouse Project Report
Data Warehouse Project Report Data Warehouse Project Report
Data Warehouse Project Report
 
Multidimentional data model
Multidimentional data modelMultidimentional data model
Multidimentional data model
 
Business Intelligence Data Warehouse System
Business Intelligence Data Warehouse SystemBusiness Intelligence Data Warehouse System
Business Intelligence Data Warehouse System
 
Data Warehouse Modeling
Data Warehouse ModelingData Warehouse Modeling
Data Warehouse Modeling
 
introduction to datawarehouse
introduction to datawarehouseintroduction to datawarehouse
introduction to datawarehouse
 
Dimensional Modeling Basic Concept with Example
Dimensional Modeling Basic Concept with ExampleDimensional Modeling Basic Concept with Example
Dimensional Modeling Basic Concept with Example
 
Dimensional modelling-mod-3
Dimensional modelling-mod-3Dimensional modelling-mod-3
Dimensional modelling-mod-3
 
Star schema
Star schemaStar schema
Star schema
 
Become BI Architect with 1KEY Agile BI Suite - OLAP
Become BI Architect with 1KEY Agile BI Suite - OLAPBecome BI Architect with 1KEY Agile BI Suite - OLAP
Become BI Architect with 1KEY Agile BI Suite - OLAP
 
Data warehousing and business intelligence project report
Data warehousing and business intelligence project reportData warehousing and business intelligence project report
Data warehousing and business intelligence project report
 
Dimensional Modeling
Dimensional ModelingDimensional Modeling
Dimensional Modeling
 
Dwdm 2(data warehouse)
Dwdm 2(data warehouse)Dwdm 2(data warehouse)
Dwdm 2(data warehouse)
 
Dimensional data model
Dimensional data modelDimensional data model
Dimensional data model
 
Designing the business process dimensional model
Designing the business process dimensional modelDesigning the business process dimensional model
Designing the business process dimensional model
 
Multidimensional Database Design & Architecture
Multidimensional Database Design & ArchitectureMultidimensional Database Design & Architecture
Multidimensional Database Design & Architecture
 
Introduction to Dimesional Modelling
Introduction to Dimesional ModellingIntroduction to Dimesional Modelling
Introduction to Dimesional Modelling
 

Viewers also liked

Data Warehouse Design and Best Practices
Data Warehouse Design and Best PracticesData Warehouse Design and Best Practices
Data Warehouse Design and Best PracticesIvo Andreev
 
Data Modeling with Excel Power Pivots
Data Modeling with Excel Power PivotsData Modeling with Excel Power Pivots
Data Modeling with Excel Power PivotsClair Huntley
 
Big Data in Ecommerce
Big Data in EcommerceBig Data in Ecommerce
Big Data in EcommerceTeguh Nugraha
 
Warehouse Storage Policy Simulation
Warehouse Storage Policy SimulationWarehouse Storage Policy Simulation
Warehouse Storage Policy SimulationSv Jayakhanthan
 
Dimensional Modeling
Dimensional ModelingDimensional Modeling
Dimensional ModelingSunita Sahu
 
Big Data Analytics and its Application in E-Commerce
Big Data Analytics and its Application in E-CommerceBig Data Analytics and its Application in E-Commerce
Big Data Analytics and its Application in E-CommerceUyoyo Edosio
 
Dimensional modeling in oracle sql developer
Dimensional modeling in oracle sql developerDimensional modeling in oracle sql developer
Dimensional modeling in oracle sql developerJeff Smith
 
Data warehouse design
Data warehouse designData warehouse design
Data warehouse designines beltaief
 
Compiler Design(NANTHU NOTES)
Compiler Design(NANTHU NOTES)Compiler Design(NANTHU NOTES)
Compiler Design(NANTHU NOTES)guest251d9a
 
Big Data in e-Commerce
Big Data in e-CommerceBig Data in e-Commerce
Big Data in e-CommerceDivante
 
RFID on Warehouse Management System
RFID on Warehouse Management SystemRFID on Warehouse Management System
RFID on Warehouse Management SystemCheri Amour Calicdan
 
Data Warehouse Programme Notes
Data Warehouse Programme NotesData Warehouse Programme Notes
Data Warehouse Programme NotesAlan McSweeney
 
Data Vault: Data Warehouse Design Goes Agile
Data Vault: Data Warehouse Design Goes AgileData Vault: Data Warehouse Design Goes Agile
Data Vault: Data Warehouse Design Goes AgileDaniel Upton
 

Viewers also liked (18)

Data Warehouse Design and Best Practices
Data Warehouse Design and Best PracticesData Warehouse Design and Best Practices
Data Warehouse Design and Best Practices
 
Data Modeling with Excel Power Pivots
Data Modeling with Excel Power PivotsData Modeling with Excel Power Pivots
Data Modeling with Excel Power Pivots
 
Big Data in Ecommerce
Big Data in EcommerceBig Data in Ecommerce
Big Data in Ecommerce
 
Big Data and E-Commerce
Big Data and E-CommerceBig Data and E-Commerce
Big Data and E-Commerce
 
Warehouse Storage Policy Simulation
Warehouse Storage Policy SimulationWarehouse Storage Policy Simulation
Warehouse Storage Policy Simulation
 
Dimensional Modeling
Dimensional ModelingDimensional Modeling
Dimensional Modeling
 
warehouse
warehousewarehouse
warehouse
 
Design of warehouse
Design of warehouseDesign of warehouse
Design of warehouse
 
Big Data Analytics and its Application in E-Commerce
Big Data Analytics and its Application in E-CommerceBig Data Analytics and its Application in E-Commerce
Big Data Analytics and its Application in E-Commerce
 
Dimensional modeling in oracle sql developer
Dimensional modeling in oracle sql developerDimensional modeling in oracle sql developer
Dimensional modeling in oracle sql developer
 
Datacube
DatacubeDatacube
Datacube
 
Data warehouse design
Data warehouse designData warehouse design
Data warehouse design
 
Compiler Design(NANTHU NOTES)
Compiler Design(NANTHU NOTES)Compiler Design(NANTHU NOTES)
Compiler Design(NANTHU NOTES)
 
Big Data in e-Commerce
Big Data in e-CommerceBig Data in e-Commerce
Big Data in e-Commerce
 
RFID on Warehouse Management System
RFID on Warehouse Management SystemRFID on Warehouse Management System
RFID on Warehouse Management System
 
Data Warehouse Programme Notes
Data Warehouse Programme NotesData Warehouse Programme Notes
Data Warehouse Programme Notes
 
Warehouse automation
Warehouse automationWarehouse automation
Warehouse automation
 
Data Vault: Data Warehouse Design Goes Agile
Data Vault: Data Warehouse Design Goes AgileData Vault: Data Warehouse Design Goes Agile
Data Vault: Data Warehouse Design Goes Agile
 

Similar to Data Warehouse Design & Dimensional Modeling

Data modeling trends for Analytics
Data modeling trends for AnalyticsData modeling trends for Analytics
Data modeling trends for AnalyticsIke Ellis
 
Industrial Data Science
Industrial Data ScienceIndustrial Data Science
Industrial Data ScienceNiko Vuokko
 
Using OBIEE and Data Vault to Virtualize Your BI Environment: An Agile Approach
Using OBIEE and Data Vault to Virtualize Your BI Environment: An Agile ApproachUsing OBIEE and Data Vault to Virtualize Your BI Environment: An Agile Approach
Using OBIEE and Data Vault to Virtualize Your BI Environment: An Agile ApproachKent Graziano
 
The Death of the Star Schema
The Death of the Star SchemaThe Death of the Star Schema
The Death of the Star SchemaDATAVERSITY
 
The Art of Requesting Data from IT
The Art of Requesting Data from ITThe Art of Requesting Data from IT
The Art of Requesting Data from ITBrad Adams
 
Building an Effective Data Warehouse Architecture
Building an Effective Data Warehouse ArchitectureBuilding an Effective Data Warehouse Architecture
Building an Effective Data Warehouse ArchitectureJames Serra
 
Build a modern data platform.pptx
Build a modern data platform.pptxBuild a modern data platform.pptx
Build a modern data platform.pptxIke Ellis
 
The final frontier v3
The final frontier v3The final frontier v3
The final frontier v3Terry Bunio
 
Query Tuning Azure SQL Databases
Query Tuning Azure SQL DatabasesQuery Tuning Azure SQL Databases
Query Tuning Azure SQL DatabasesGrant Fritchey
 
Clare Somerville Trish O’Kane Data in Databases
Clare Somerville Trish O’Kane Data in DatabasesClare Somerville Trish O’Kane Data in Databases
Clare Somerville Trish O’Kane Data in DatabasesFuture Perfect 2012
 
Analyzing Semi-Structured Data At Volume In The Cloud
Analyzing Semi-Structured Data At Volume In The CloudAnalyzing Semi-Structured Data At Volume In The Cloud
Analyzing Semi-Structured Data At Volume In The CloudRobert Dempsey
 
Data modeling trends for analytics
Data modeling trends for analyticsData modeling trends for analytics
Data modeling trends for analyticsIke Ellis
 
Data Warehousing Trends, Best Practices, and Future Outlook
Data Warehousing Trends, Best Practices, and Future OutlookData Warehousing Trends, Best Practices, and Future Outlook
Data Warehousing Trends, Best Practices, and Future OutlookJames Serra
 
The final frontier
The final frontierThe final frontier
The final frontierTerry Bunio
 
Data Lakehouse, Data Mesh, and Data Fabric (r2)
Data Lakehouse, Data Mesh, and Data Fabric (r2)Data Lakehouse, Data Mesh, and Data Fabric (r2)
Data Lakehouse, Data Mesh, and Data Fabric (r2)James Serra
 
Managing Large Amounts of Data with Salesforce
Managing Large Amounts of Data with SalesforceManaging Large Amounts of Data with Salesforce
Managing Large Amounts of Data with SalesforceSense Corp
 
Building Data Warehouse in SQL Server
Building Data Warehouse in SQL ServerBuilding Data Warehouse in SQL Server
Building Data Warehouse in SQL ServerAntonios Chatzipavlis
 
Enable Better Decision Making with Power BI Visualizations & Modern Data Estate
Enable Better Decision Making with Power BI Visualizations & Modern Data EstateEnable Better Decision Making with Power BI Visualizations & Modern Data Estate
Enable Better Decision Making with Power BI Visualizations & Modern Data EstateCCG
 
How to deliver a Single View in Financial Services
 How to deliver a Single View in Financial Services How to deliver a Single View in Financial Services
How to deliver a Single View in Financial ServicesMongoDB
 
Big Data Expo 2015 - Barnsten Why Data Modelling is Essential
Big Data Expo 2015 - Barnsten Why Data Modelling is EssentialBig Data Expo 2015 - Barnsten Why Data Modelling is Essential
Big Data Expo 2015 - Barnsten Why Data Modelling is EssentialBigDataExpo
 

Similar to Data Warehouse Design & Dimensional Modeling (20)

Data modeling trends for Analytics
Data modeling trends for AnalyticsData modeling trends for Analytics
Data modeling trends for Analytics
 
Industrial Data Science
Industrial Data ScienceIndustrial Data Science
Industrial Data Science
 
Using OBIEE and Data Vault to Virtualize Your BI Environment: An Agile Approach
Using OBIEE and Data Vault to Virtualize Your BI Environment: An Agile ApproachUsing OBIEE and Data Vault to Virtualize Your BI Environment: An Agile Approach
Using OBIEE and Data Vault to Virtualize Your BI Environment: An Agile Approach
 
The Death of the Star Schema
The Death of the Star SchemaThe Death of the Star Schema
The Death of the Star Schema
 
The Art of Requesting Data from IT
The Art of Requesting Data from ITThe Art of Requesting Data from IT
The Art of Requesting Data from IT
 
Building an Effective Data Warehouse Architecture
Building an Effective Data Warehouse ArchitectureBuilding an Effective Data Warehouse Architecture
Building an Effective Data Warehouse Architecture
 
Build a modern data platform.pptx
Build a modern data platform.pptxBuild a modern data platform.pptx
Build a modern data platform.pptx
 
The final frontier v3
The final frontier v3The final frontier v3
The final frontier v3
 
Query Tuning Azure SQL Databases
Query Tuning Azure SQL DatabasesQuery Tuning Azure SQL Databases
Query Tuning Azure SQL Databases
 
Clare Somerville Trish O’Kane Data in Databases
Clare Somerville Trish O’Kane Data in DatabasesClare Somerville Trish O’Kane Data in Databases
Clare Somerville Trish O’Kane Data in Databases
 
Analyzing Semi-Structured Data At Volume In The Cloud
Analyzing Semi-Structured Data At Volume In The CloudAnalyzing Semi-Structured Data At Volume In The Cloud
Analyzing Semi-Structured Data At Volume In The Cloud
 
Data modeling trends for analytics
Data modeling trends for analyticsData modeling trends for analytics
Data modeling trends for analytics
 
Data Warehousing Trends, Best Practices, and Future Outlook
Data Warehousing Trends, Best Practices, and Future OutlookData Warehousing Trends, Best Practices, and Future Outlook
Data Warehousing Trends, Best Practices, and Future Outlook
 
The final frontier
The final frontierThe final frontier
The final frontier
 
Data Lakehouse, Data Mesh, and Data Fabric (r2)
Data Lakehouse, Data Mesh, and Data Fabric (r2)Data Lakehouse, Data Mesh, and Data Fabric (r2)
Data Lakehouse, Data Mesh, and Data Fabric (r2)
 
Managing Large Amounts of Data with Salesforce
Managing Large Amounts of Data with SalesforceManaging Large Amounts of Data with Salesforce
Managing Large Amounts of Data with Salesforce
 
Building Data Warehouse in SQL Server
Building Data Warehouse in SQL ServerBuilding Data Warehouse in SQL Server
Building Data Warehouse in SQL Server
 
Enable Better Decision Making with Power BI Visualizations & Modern Data Estate
Enable Better Decision Making with Power BI Visualizations & Modern Data EstateEnable Better Decision Making with Power BI Visualizations & Modern Data Estate
Enable Better Decision Making with Power BI Visualizations & Modern Data Estate
 
How to deliver a Single View in Financial Services
 How to deliver a Single View in Financial Services How to deliver a Single View in Financial Services
How to deliver a Single View in Financial Services
 
Big Data Expo 2015 - Barnsten Why Data Modelling is Essential
Big Data Expo 2015 - Barnsten Why Data Modelling is EssentialBig Data Expo 2015 - Barnsten Why Data Modelling is Essential
Big Data Expo 2015 - Barnsten Why Data Modelling is Essential
 

More from Code Mastery

Using SSRS Reports with SSAS Cubes
Using SSRS Reports with SSAS CubesUsing SSRS Reports with SSAS Cubes
Using SSRS Reports with SSAS CubesCode Mastery
 
Query Tuning for Database Pros & Developers
Query Tuning for Database Pros & DevelopersQuery Tuning for Database Pros & Developers
Query Tuning for Database Pros & DevelopersCode Mastery
 
Exploring, Visualizing and Presenting Data with Power View
Exploring, Visualizing and Presenting Data with Power ViewExploring, Visualizing and Presenting Data with Power View
Exploring, Visualizing and Presenting Data with Power ViewCode Mastery
 
Building a SSAS Tabular Model Database
Building a SSAS Tabular Model DatabaseBuilding a SSAS Tabular Model Database
Building a SSAS Tabular Model DatabaseCode Mastery
 
Designer and Developer Collaboration with Visual Studio 2012 and Expression B...
Designer and Developer Collaboration with Visual Studio 2012 and Expression B...Designer and Developer Collaboration with Visual Studio 2012 and Expression B...
Designer and Developer Collaboration with Visual Studio 2012 and Expression B...Code Mastery
 
Build automation best practices
Build automation best practicesBuild automation best practices
Build automation best practicesCode Mastery
 
Keynote Rockford Lhotka on the Microsoft Development Platftorm
Keynote   Rockford Lhotka on the Microsoft Development PlatftormKeynote   Rockford Lhotka on the Microsoft Development Platftorm
Keynote Rockford Lhotka on the Microsoft Development PlatftormCode Mastery
 
Session 5 Systems Integration Architectures: BizTalk VS Windows Workflow Foun...
Session 5 Systems Integration Architectures: BizTalk VS Windows Workflow Foun...Session 5 Systems Integration Architectures: BizTalk VS Windows Workflow Foun...
Session 5 Systems Integration Architectures: BizTalk VS Windows Workflow Foun...Code Mastery
 
Session 4 Future of BizTalk and the Cloud
Session 4  Future of BizTalk and the CloudSession 4  Future of BizTalk and the Cloud
Session 4 Future of BizTalk and the CloudCode Mastery
 
Session 3c The SF SaaS Framework
Session 3c  The SF SaaS FrameworkSession 3c  The SF SaaS Framework
Session 3c The SF SaaS FrameworkCode Mastery
 
Session 3b The SF SaaS Framework
Session 3b   The SF SaaS FrameworkSession 3b   The SF SaaS Framework
Session 3b The SF SaaS FrameworkCode Mastery
 
Session 3a The SF SaaS Framework
Session 3a  The SF SaaS FrameworkSession 3a  The SF SaaS Framework
Session 3a The SF SaaS FrameworkCode Mastery
 
Session 2 Integrating SharePoint 2010 and Windows Azure
Session 2   Integrating SharePoint 2010 and Windows AzureSession 2   Integrating SharePoint 2010 and Windows Azure
Session 2 Integrating SharePoint 2010 and Windows AzureCode Mastery
 
Session 1 IaaS, PaaS, SaaS Overview
Session 1   IaaS, PaaS, SaaS OverviewSession 1   IaaS, PaaS, SaaS Overview
Session 1 IaaS, PaaS, SaaS OverviewCode Mastery
 
Loading a data warehouse using ssis 2012
Loading a data warehouse using ssis 2012Loading a data warehouse using ssis 2012
Loading a data warehouse using ssis 2012Code Mastery
 
Exploring, visualizing and presenting data with power view
Exploring, visualizing and presenting data with power viewExploring, visualizing and presenting data with power view
Exploring, visualizing and presenting data with power viewCode Mastery
 
Creating a Tabular Model Using SQL Server 2012 Analysis Services
Creating a Tabular Model Using SQL Server 2012 Analysis ServicesCreating a Tabular Model Using SQL Server 2012 Analysis Services
Creating a Tabular Model Using SQL Server 2012 Analysis ServicesCode Mastery
 
Preparing for Windows 8 and Metro
Preparing for Windows 8 and MetroPreparing for Windows 8 and Metro
Preparing for Windows 8 and MetroCode Mastery
 
Extending Your Reach using the Cloud and Mobile Devices
Extending Your Reach using the Cloud and Mobile DevicesExtending Your Reach using the Cloud and Mobile Devices
Extending Your Reach using the Cloud and Mobile DevicesCode Mastery
 
Creating Tomorrow’s Web Applications Using Today’s Technologies
Creating Tomorrow’s Web Applications Using Today’s Technologies Creating Tomorrow’s Web Applications Using Today’s Technologies
Creating Tomorrow’s Web Applications Using Today’s Technologies Code Mastery
 

More from Code Mastery (20)

Using SSRS Reports with SSAS Cubes
Using SSRS Reports with SSAS CubesUsing SSRS Reports with SSAS Cubes
Using SSRS Reports with SSAS Cubes
 
Query Tuning for Database Pros & Developers
Query Tuning for Database Pros & DevelopersQuery Tuning for Database Pros & Developers
Query Tuning for Database Pros & Developers
 
Exploring, Visualizing and Presenting Data with Power View
Exploring, Visualizing and Presenting Data with Power ViewExploring, Visualizing and Presenting Data with Power View
Exploring, Visualizing and Presenting Data with Power View
 
Building a SSAS Tabular Model Database
Building a SSAS Tabular Model DatabaseBuilding a SSAS Tabular Model Database
Building a SSAS Tabular Model Database
 
Designer and Developer Collaboration with Visual Studio 2012 and Expression B...
Designer and Developer Collaboration with Visual Studio 2012 and Expression B...Designer and Developer Collaboration with Visual Studio 2012 and Expression B...
Designer and Developer Collaboration with Visual Studio 2012 and Expression B...
 
Build automation best practices
Build automation best practicesBuild automation best practices
Build automation best practices
 
Keynote Rockford Lhotka on the Microsoft Development Platftorm
Keynote   Rockford Lhotka on the Microsoft Development PlatftormKeynote   Rockford Lhotka on the Microsoft Development Platftorm
Keynote Rockford Lhotka on the Microsoft Development Platftorm
 
Session 5 Systems Integration Architectures: BizTalk VS Windows Workflow Foun...
Session 5 Systems Integration Architectures: BizTalk VS Windows Workflow Foun...Session 5 Systems Integration Architectures: BizTalk VS Windows Workflow Foun...
Session 5 Systems Integration Architectures: BizTalk VS Windows Workflow Foun...
 
Session 4 Future of BizTalk and the Cloud
Session 4  Future of BizTalk and the CloudSession 4  Future of BizTalk and the Cloud
Session 4 Future of BizTalk and the Cloud
 
Session 3c The SF SaaS Framework
Session 3c  The SF SaaS FrameworkSession 3c  The SF SaaS Framework
Session 3c The SF SaaS Framework
 
Session 3b The SF SaaS Framework
Session 3b   The SF SaaS FrameworkSession 3b   The SF SaaS Framework
Session 3b The SF SaaS Framework
 
Session 3a The SF SaaS Framework
Session 3a  The SF SaaS FrameworkSession 3a  The SF SaaS Framework
Session 3a The SF SaaS Framework
 
Session 2 Integrating SharePoint 2010 and Windows Azure
Session 2   Integrating SharePoint 2010 and Windows AzureSession 2   Integrating SharePoint 2010 and Windows Azure
Session 2 Integrating SharePoint 2010 and Windows Azure
 
Session 1 IaaS, PaaS, SaaS Overview
Session 1   IaaS, PaaS, SaaS OverviewSession 1   IaaS, PaaS, SaaS Overview
Session 1 IaaS, PaaS, SaaS Overview
 
Loading a data warehouse using ssis 2012
Loading a data warehouse using ssis 2012Loading a data warehouse using ssis 2012
Loading a data warehouse using ssis 2012
 
Exploring, visualizing and presenting data with power view
Exploring, visualizing and presenting data with power viewExploring, visualizing and presenting data with power view
Exploring, visualizing and presenting data with power view
 
Creating a Tabular Model Using SQL Server 2012 Analysis Services
Creating a Tabular Model Using SQL Server 2012 Analysis ServicesCreating a Tabular Model Using SQL Server 2012 Analysis Services
Creating a Tabular Model Using SQL Server 2012 Analysis Services
 
Preparing for Windows 8 and Metro
Preparing for Windows 8 and MetroPreparing for Windows 8 and Metro
Preparing for Windows 8 and Metro
 
Extending Your Reach using the Cloud and Mobile Devices
Extending Your Reach using the Cloud and Mobile DevicesExtending Your Reach using the Cloud and Mobile Devices
Extending Your Reach using the Cloud and Mobile Devices
 
Creating Tomorrow’s Web Applications Using Today’s Technologies
Creating Tomorrow’s Web Applications Using Today’s Technologies Creating Tomorrow’s Web Applications Using Today’s Technologies
Creating Tomorrow’s Web Applications Using Today’s Technologies
 

Recently uploaded

Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)wesley chun
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProduct Anonymous
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slidevu2urc
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherRemote DBA Services
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfsudhanshuwaghmare1
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?Antenna Manufacturer Coco
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Scriptwesley chun
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...DianaGray10
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...Martijn de Jong
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century educationjfdjdjcjdnsjd
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...Neo4j
 

Recently uploaded (20)

Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 

Data Warehouse Design & Dimensional Modeling

  • 1. Data Warehouse Design & Dimensional Modeling Aaron Lowe Principal Consultant @Vendoran @SQLFriends
  • 2. Who am I » Aaron Lowe » Husband » Father of 5 » Principal Consultant at Magenic » Working with SQL Server since 1998, version 6.5 » MCITP 2005 and 2008 » Co-organizer of SQLSaturday Chicago » Masters in Information Systems Management » www.aaronlowe.net / @Vendoran » sqlfriends.org / @SQLFriends
  • 3. Data, Data everywhere, but not a drop of Information http://www.flickr.com/photos/walkingsf/5993167874/
  • 4. The Data Person http://www.flickr.com/photos/tantek/1360323838/
  • 5. How can we get more out of our data? http://www.flickr.com/photos/danahlongley/4472897115/
  • 6. Leverage data to provide business insight http://www.flickr.com/photos/juhansonin/4646203016/
  • 7. Create a new Data Model in a Data Warehouse
  • 8. Why a new Data Model?
  • 9. What do we need?
  • 10. Information – not just data » Collecting data » Log Files » Clicks » How long? » How much? » Prediction? » How Target Figured out Teen was pregnant - http://www.forbes.com/sites/kashmirhill/2012/02/16/how-target-figured- out-a-teen-girl-was-pregnant-before-her-father-did/ » The Numerati - http://www.amazon.com/The-Numerati-Stephen- Baker/dp/B003TO6G20/ - published 2009!
  • 11. Relate data from multiple systems » The purpose of a data warehouse is to house standardized, structured, consistent, integrated, correct, cleansed and timely data, extracted from various operational systems in an organization » True picture of the business process » Source Systems » Financial – AR/AP » Sales » CRM » HR » Application
  • 12. Fast » It’s my information and I want it Now! » Empower Users » Exploratory » Reads » Large datasets
  • 13. Why won’t existing models work?
  • 14. What are they designed for? » Operational » Preservation of data integrity » Speed of recording of business transactions » Often Many tables » To free the collection of relations from undesirable insertion, update and deletion dependencies; » To reduce the need for restructuring the collection of relations, as new types of data are introduced, and thus increase the life span of application programs; » To make the relational model more informative to users; » To make the collection of relations neutral to the query statistics, where these statistics are liable to change as time goes by. » —E.F. Codd, "Further Normalization of the Data Base Relational Model"
  • 15. Consistent » Partial data across » Have the sale in the sales system » Represented in the inventory system » Don’t have the $ in the financial system yet » Deleted on sources » Removed transactions » Archive » Legally destroy records can remove work product » Incomplete data on source » Changes over time
  • 16. Silo’d » How do we get the entire picture? » Example: » Cost of Sales? » Sales system – Sale Price » Marketing System – $$ spent on Marketing » Inventory System – $$ spent on inventory » HR System – $$ spent on Employee » IT Systems – $$ spent on Infrastructure
  • 17. What will work? http://www.flickr.com/photos/d-y-f/2870942257/
  • 18. Designed for Users » De-normalized » Fast Reads » Fast Reports » Limited JOINs » Information » Scheduled » On Demand » Exploratory » Information » Cross Functional » The more the better!
  • 19. Inter-related data » Specifications for my Current Data Warehouse http://www.flickr.com/photos/ross_goodman/3276964270/
  • 20. Independent from Operational » Operational systems change » Data will outlive Application » Crashes » Upgrades » Breaking changes » Single Source of truth
  • 21. Logical Data Model http://www.flickr.com/photos/doctorlizardo/6812846803/
  • 22. Terminology http://www.flickr.com/photos/doctorlizardo/6809564765/
  • 23. Metadata Management » Business metadata » What’s out there? » Identify/Define » Overloaded terms » What is a customer? » Process metadata » DW process operations » Asses system status » Investigate problems » Technical metadata » Tables » Fields » Datatypes
  • 24. Dimensions and Facts Dimensions Facts Thing/Objects Measurements/Events Nouns Verbs Wide but short Skinny but long Rows can exist independently Rows cannot exist independently Descriptive Mostly Numeric and Additive “By” words – FACT by Dimension Quantity Ordered by Product by Customer by Date
  • 25. Grain • Level of detail • What is needed to meet business requirements? • What is possible to collect? • How do you describe it? • One row per X where X is the business event • One row per customer call • One row per time sheet entry • One row per employee status change • One row per order line item http://www.flickr.com/photos/frederikvanroest/3842334310/
  • 26. Methodology http://www.flickr.com/photos/doctorlizardo/6812847973/
  • 27. Requirements – business focused » “Must embrace the goal of enhancing business value as the primary purpose.” – Kimball » “If your job is BI and you speak mostly to technical people all day, you are doing it wrong. Focus on first word - BUSINESS.” – Whitney Weaver (former Magenicon) » Never ask “What do you want in the data warehouse?” Only one right answer - “Everything.” » Ask questions that help you learn what the end user does
  • 28. Kimball v. Inmon Ralph Kimball Bill Inmon Kimballites Inmonites Bottom Up Top Down Dimensional Normalized Star Schema 3rd Normal Form Easier for the User More Difficult for the Users Few JOINs Many JOINs Dimension/Facts Entities Complicated ETL Not as complicated ETL Difficult to modify structure Easier to adapt Not mutually Exclusive
  • 29. Star vs. Snowflake Star Snowflake ER resembles Star ER resembles Snowflake Easier for the User More Difficult for the Users Few JOINs Many JOINs Faster Aggregations Slower Aggregations Children with multiple parent tables Normalized Dimensions Snowflake is a variation on a Star, not an alternative http://www.flickr.com/photos/wandrus/6283157711/
  • 30. History (ology?) http://www.flickr.com/photos/doctorlizardo/6809564335/
  • 31. Dimension Types » 0 – Inserts only, no updates or delete » 1 – Insert and updated to reflect current state » 2 – Slowly Changing Dimension (SCD)- multiple records to indicate different points in time Source Key Value StartDate EndDate 14 Blue 2012-01-01 2012-03-01 14 Green 2012-03-02 » 3 – multiple columns to indicate different point in time Source Key Value OldValue EffectiveDate 14 Green Blue 2012-03-02 » 4 – current value table and a history table » UNKNOWN values
  • 32. Date and Time » Date » Fundamental dimensions across all organizations and industries » Allows for trending across dates or periods » 1 row for every date in the years = 365 or 366 row/year » Use your words » WeekDay » EndofMonth » Quarter » FiscalYear? » Time » Not often needed, but becoming more popular » Allows for time based analysis for things like Status » 1 row for every time slice in a day – minutes? Seconds?
  • 33. Surrogate Keys » New set of keys in the DW » Protects against » Source systems changes » Single key for multiple source systems » New rows that only exist in DW (UNKNOWN) » Tracking over time (SCD)
  • 34. Physical Data Model http://www.flickr.com/photos/flying_cloud/2667218708/
  • 35. Approach http://www.flickr.com/photos/7506006@N07/7021456259/
  • 36. Null – yay or nay » Same discussion as OLTP with a twist » Purpose of DW is for reporting » Building on top of with : » SSIS » SSAS » Purpose of the Dimension UNKNOWN values » Best practice is to avoid if you can, otherwise document » Some have separate values for UNKNOWN and NOT POPULATED » Default value instead
  • 37. Aggregates » Minimize number of aggregates while maximize effectiveness » Store or » Can aggregate Facts » Roll-up Dimension hierarchies? » Can still be relational to other tables when necessary
  • 38. Hierarchies » Example: » Date - Roll up by Month, Quarter or Year Key Day Month Quarter Year 364 30 12 4 2011 365 31 12 4 2011 366 1 1 1 2012 367 2 1 1 2012 » Variable depth – Self-referencing » Variable depth with historical – changing surrogate keys – ouch » Track business process separately
  • 39. Size Matters http://starwars.wikia.com/wiki/Rancor?image=Rancor-jpg
  • 40. Data amount and size » Data Types? » BLOB data? » Identity columns (do you need bigint?) » Data Profiling » Collect source system sizes for data bringing over » Add sizes of new row » Don’t forget index size!!!
  • 41. Partitioning » Usually lends naturally to partitioning large Fact tables by Date » Larger Dimension tables can be partitioned as well » Sometimes Old (SQL 2000) Partitioning is still better than SQL 2005+ partitioning » Take ETL process into consideration
  • 42. Archiving » Question: When is big too big? » Answer: When performance impact outweighs need for data availability » Many options: » Backup to tape offline » keep “Archived” DW available » Records Retention – this could be your work product
  • 43. Performance http://www.flickr.com/photos/elfidomx/6026943114/
  • 44. Hardware » Remember when the user said “It’s my data and I want it now”? » Buy » Reference Architecture (Fast Track) » Appliances » HP » Enterprise Data Warehouse » Business Decision » Business Data Warehouse » Enterprise Database Consolidation » Dell » PDW » Build » Reference Architecture (Fast Track) » SQLIO » Benchmark
  • 45. Throughput » Amounts of data » Not all of it will be in memory » Between ETL and reports, SP Cache might not be efficient » Need to tune those disks » Reference Architecture(Fast Track) » Accepts that Procedure cache will stink due to data sizes » Instead small amount of RAM » Requires bandwidth of 400 GB/s per LUN » Materialize data that makes reporting faster!! » More Denormalization » More Aggregations » ReadOnly while not processing ETLs? (switch)
  • 46. Parallelism » Multiple Data Files » SQL writes proportional fill » Multiple Filegroups » Partitioning scheme » Facts/Dimensions » Tables that are often joined » Big tables » NCIX vs. data » Multiple LUNs » I am not a SAN admin nor play one on TV » Normal SQL performance