SlideShare a Scribd company logo
1 of 66
Download to read offline
Starring Sakila
A Data Warehousing Primer
Roland Bouman (Strukton Rail)
http://rpbouman.blogspot.com/
Starring Sakila
Topics
Data Warehousing Terminology
● Terminology
– Business Intelligence
– Data Warehouse
– Dimensional Model
– Star Schema
– OLAP
– Cube
What is Business Intelligence?
● Business Intelligence (BI)
– Skills, technologies, applications and
practices to acquire a better understanding
of the commercial context of your business.
● Data Warehouse
● Dimensional Model
● Star Schema
● OLAP
● Cube
What is a Data Warehouse?
● Business Intelligence
● Data Warehouse
– A database designed to support Business
Intelligence
● Dimensional Model
● Star Schema
● OLAP
● Cube
What is the Dimensional Model?
● Business Intelligence
● Data Warehouse
● Dimensional Model
– A logical data model that divides data in two
kinds: Facts and Dimensions
● Star Schema
● OLAP
● Cube
What is a Star Schema?
● Business Intelligence
● Data Warehouse
● Dimensional Model
● Star Schema
– Physical implementation of the Dimensional
Model on a RDBMS which maps a dimension
to a single table
● OLAP
● Cube
What is OLAP?
● Business Intelligence
● Data Warehouse
● Dimensional Model
● Star Schema
● OLAP
– On-Line Analytical Processing: querying
muli-dimensional data, cornerstone of most
BI applications
● Cube
What is a Cube
● Business Intelligence
● Data Warehouse
● Dimensional Model
● Star Schema
● OLAP
● Cube
– Multi-dimensional data structure suitable for
OLAP queries
Understanding your Business
Business
Intelligence
Business Intelligence
● Front end Applications:
– Reports
– Charts and Graphs
– OLAP Pivot tables
– Data Mining
– Dashboards
● Back end, Infrastructure
– ETL
● Extract
● Transformation
● Load
– Data Warehouse
– Data Mart
– Metadata
– ROLAP Cube
High Level
BI Architecture
Source Systems,
External Data
Staging Area
Data Warehouse
Meta Data
Business Intelligence
Applications
Extract Transform Load Present
Back-end Front-end
Business Intelligence Database
Data
Warehouse
Data Warehouse
● Ultimately, it's just a Relational Database
– Tables, Columns, Keys...
● ...But designed for BI applications
– Ease of use
– Performance
● Data from various source systems
– Integration, Standardization, Data cleaning
– Add and maintain history
OLTP vs OLAP: Application
Characterization
● OLTP
– Operational
– 'Always' on
– All kinds of users
– Many users
– Directly supports
business process
– Keep a Record of
Current status
● OLAP
– Tactical, Strategic
– Periodically Available
– Managers, Directors
– Few(er) users
– Decision support,
long-term planning
– Maintain history
OLTP vs OLAP: data processing
● OLTP
– Subject Oriented
– Add, Modify, Remove
single rows
– Human data entry
– Queries for small sets
of rows with all their
details
– Standard queries
● OLAP
– Aspect Oriented
– Bulk load, rarely
modify, never remove
– Automated ETL jobs
– Scan large sets to
return aggregates
over arbitrary groups
– Ad-hoc queries
OLTP vs OLAP:
database schema organization
● OLTP
– Entity-Relationship
model
– Entities, Attributes,
Relationships
– Foreign key
constraints
– Indexes to increase
performance
– Normalized to 3NF or
BCNF
● OLAP
– Dimensional
model
– Facts, Dimensions,
Hierarchies
– Ref. integrity ensured
in loading process
– Scans on Fact table
obliterates indexes
– Denormalized
Dimensions (<= 1NF)
Organizing data
to suit Business Intelligence
Dimensional
Model
The Dimensional Model
● Two kinds of data
– Facts
– Dimensions
The Dimensional Model:
Facts
● Facts
– Measures/Metrics of a Business Process
– Typical Metrics
● Cost, Units Sold, Profit
The Dimensional Model:
Dimensions
● Dimensions
– Describe aspects of Business Process
– Dimensions typically not inter-dependent
– Who? What? Where? When? Why?
– Typical Dimensions:
● Customer (who?), Product (what?),
Date/Time (when?)
The Dimensional Model:
Navigating Facts with Dimensions
● Dimension Attributes organized in Hierarchies
– Date dimension examples:
● Year, Quarter, Month, Day
● Year, Week, Day
● Metrics typically numeric and additive
● Navigate fact data
– Choose particular values for dimension
– Aggregate fact data at chosen level of
hierarchy
Dimensional Example: Crosstab
Date Dimension 2008 Q4
Location Dimension All Months October November December
All locations $ 3850 $ 1000 $ 1350 $ 1500
America All America $ 2050 $500 $ 750 $ 800
North $ 1275 $ 300 $ 500 $ 475
South $ 775 $ 200 $ 250 $ 325
Europe All Europe $ 1800 $ 500 $ 600 $ 700
East $ 800 $ 250 $ 250 $ 300
West $ 1000 $ 250 $ 350 $ 400
Dimensional Model
Implementation
Star Schema
Stars Schema
Characteristics
● Central Fact Table
– Columns for storing Metrics
– 'Foreign Key' columns point to Dimension
– Typically normalized and not pre-aggregated
● Dimension maps to a Dimension table
– Surrogate key
– Descriptive attributes organized in hierarchies
– No Foreign Keys to other tables
– Typically heavily denormalized
Rentals
Star Schema example: Sakila
Rentals
Store
Date
Time
Film
CustomerStaff
Stars Schema Characteristics
● Star schema is 'just' an implementation
– Optimized for simplicity
– Optimized for performance (?)
– Heavily denormalized dimensions
● Snowflake: Star Schema Alternative
– Still a dimensional model
– Still a central fact table
– Normalized dimensions
– Easier maintenance of dimensions
Snow Flake example: Sakila
Rentals
Store
Date
Minute
Film
Customer
Staff
Month
Hour
Quarter City
Country
City
Country
Language
Rating
Year
Week Rentals
Starring Sakila
Desinging
Star Schemas
Dimensional Design
● Select Business Process
– Sales, Purchase, Storage, Transport, ...
● Define Facts and Key Metrics
– Facts: Key Event in Business Process
– Metrics (Fact Attributes): Count or Amount
● Choose Dimensions and Hierarchies
– What? When? Where?
– Who? Why?
Dimensional Model example
● MySQL Sample Database
– http://dev.mysql.com/doc/sakila/en/sakila.html
● DVD rental business
– Overly simplified database schema
● Typical OLTP database
Dimensional Model example
● Rental Business Process
– Customer visits store, picks DVD
– DVD taken out of store inventory by staff member
– Customer returns home and enjoys DVD
– Customer returns to store with DVD
– DVD returned to staff member
– Staff member collects payment made by customer
3NF Source schema: Sakila
Rentals
Rental Customer
Film
Store Address
Category Actor
StaffInventory
City
CountryLanguage
Example Business Process:
Rentals
● Select Business Process
– Rentals
● Identify Facts
– Count (number of rentals)
– Rental Duration
● Choose Dimensions
– What: Films
– When: Rental, Return
– Who: Customer, Staff
– Where: Store
Target Star Schema
Fact: Rentals
Store
Date
Time
Film
When?
Where?
What?
CustomerStaff
Who?
Rental Star Schema
A star is born: Rentals 3NF
Rental
CustomerStaffInventory
A star is born: Rentals 3NF
Rental
CustomerStaffInventory
Store
Film
Category
Film Category
A star is born: Denormalize
Rental
CustomerStaffInventory
Store
Film
Category
Film Category
A star is born: Denormalize
Rental
CustomerStaff
Store
Film
StoreCategory
A star is born
Rental
CustomerStaff
Store
Film
Store
Address
Category
A star is born: Denormalize
Rental
CustomerStaff
Store
Film
Store
Address
Category
A star is born: Denormalize
Rental
CustomerStaff
Store
Film
Store
AddressAddress
Language
Category
A star is born: Denormalize
Rental
CustomerStaff
Store
Film
Store
AddressAddress
Language
CityCity
Category
A star is born: Rental Snowflake
Rental
CustomerStaff
Store
Film
Store
AddressAddress
Language
CityCity
CountryCountry
Category
A star is born: Rental Star
Schema
Rental
StoreLanguage
Film
Country
City
What: Film Who: CustomerWhere: Store Who: Staff
Address
Store
Staff
Country
City
Address
Customer
Category
Dimensional Design
● Something is missing....
– Who ? (Customer, Staff)
– What ? (Film)
– Where ? (Store)
– .... ?
A star is born:
Rental Date and Time
Rental
What: Film Who: CustomerWhere: Store Who: Staff
When: Date When: Time
Role Playing: Date/Time
for both Rentals and Returns
Rental
What: Film Who: CustomerWhere: Store Who: Staff
When:
Rental Date
When:
Rental Time
When:
Return Date
When:
Return Time
Denormalization through Joins
Denormalization through
Flattening (Repeating Group)
ETL with
Pentaho Data Integration
Loading a
Data Warehouse
Dimensional Design
● Pentaho Data Integration
– sourceforge.net/projects/pentaho/
● ETL and much more
● Transformations:
– Extract, Load and Transform
● Jobs:
– Organize multiple transformations to a complete
ETL process
● > 30 RDBMS-es, > 130 Transformation Steps
Job: Rental ETL Process
● First load dimensions, finally load fact
● Mail notification in case of success / failure
Job: Rental ETL Process
● First load dimensions, finally load fact
● Mail notification in case of success / failure
Job: Rental ETL Process
● Get store, lookup address (subtransformation)
and manager
● Load store dimension table
Job: Rental ETL Process
● Get store, lookup address (subtransformation)
and manager
● Load store dimension table
Job: Rental ETL Process
● Get address, lookup city and country
● Concatenate address if necessary
Job: Rental ETL Process
● This was just a simple example
● More complex example: importing XML
<?xml version="1.0" encoding="UTF-8"?>
<result>
<actors>
<actor id="00000015">Anderson, Jeff</actor>
<actor id="00000015">Anderson, Jeff</actor>
..
</actors>
<videos>
<video>
<title>The Fugitive</title>
<genre>action</genre>
....
</video>
...
</videos>
</result>
Job: Rental ETL Process
● This was just a simple example
● More complex example: importing XML
OLAP Pivot Table with
Pentaho Analysis Services
OLAP
Dimensional Design
● Pentaho Analysis Services
– Part of Pentaho BI Server
– sourceforge.net/projects/pentaho/
– Based on Mondrian ROLAP server
– sourceforge.net/projects/mondrian/
Dimensional Design
● Pentaho Schema Workbench
– Map data warehouse tables to a logical Cube
Dimensional Design
● Pentaho Analysis View:
Upcoming Book:
Pentaho Solutions
● Pentaho Solutions
– Wiley
– ISBN 978-0-470-48432-6
– September 2009
– 630+ page paperback
– Amazon pre-order $31.50
– Regular: $50.00

More Related Content

What's hot

Star ,Snow and Fact-Constullation Schemas??
Star ,Snow and  Fact-Constullation Schemas??Star ,Snow and  Fact-Constullation Schemas??
Star ,Snow and Fact-Constullation Schemas??
Abdul Aslam
 
The Data Warehouse Lifecycle
The Data Warehouse LifecycleThe Data Warehouse Lifecycle
The Data Warehouse Lifecycle
bartlowe
 
Date warehousing concepts
Date warehousing conceptsDate warehousing concepts
Date warehousing concepts
pcherukumalla
 
12. Indexing and Hashing in DBMS
12. Indexing and Hashing in DBMS12. Indexing and Hashing in DBMS
12. Indexing and Hashing in DBMS
koolkampus
 
Introduction to Data Warehouse
Introduction to Data WarehouseIntroduction to Data Warehouse
Introduction to Data Warehouse
Shanthi Mukkavilli
 

What's hot (20)

data warehouse , data mart, etl
data warehouse , data mart, etldata warehouse , data mart, etl
data warehouse , data mart, etl
 
Star ,Snow and Fact-Constullation Schemas??
Star ,Snow and  Fact-Constullation Schemas??Star ,Snow and  Fact-Constullation Schemas??
Star ,Snow and Fact-Constullation Schemas??
 
An Introduction To Oracle Database
An Introduction To Oracle DatabaseAn Introduction To Oracle Database
An Introduction To Oracle Database
 
Tutorial etl
Tutorial etlTutorial etl
Tutorial etl
 
Introduction to Data Warehousing
Introduction to Data WarehousingIntroduction to Data Warehousing
Introduction to Data Warehousing
 
Data Warehouse 101
Data Warehouse 101Data Warehouse 101
Data Warehouse 101
 
Database backup and recovery basics
Database backup and recovery basicsDatabase backup and recovery basics
Database backup and recovery basics
 
The Data Warehouse Lifecycle
The Data Warehouse LifecycleThe Data Warehouse Lifecycle
The Data Warehouse Lifecycle
 
Oltp vs olap
Oltp vs olapOltp vs olap
Oltp vs olap
 
5.2 mining time series data
5.2 mining time series data5.2 mining time series data
5.2 mining time series data
 
Type constructor
Type constructorType constructor
Type constructor
 
Introduction to Graph Databases
Introduction to Graph DatabasesIntroduction to Graph Databases
Introduction to Graph Databases
 
Data Warehouse Fundamentals
Data Warehouse FundamentalsData Warehouse Fundamentals
Data Warehouse Fundamentals
 
Data warehouse presentaion
Data warehouse presentaionData warehouse presentaion
Data warehouse presentaion
 
Data dictionary
Data dictionaryData dictionary
Data dictionary
 
SQL Queries Information
SQL Queries InformationSQL Queries Information
SQL Queries Information
 
Chapter15
Chapter15Chapter15
Chapter15
 
Date warehousing concepts
Date warehousing conceptsDate warehousing concepts
Date warehousing concepts
 
12. Indexing and Hashing in DBMS
12. Indexing and Hashing in DBMS12. Indexing and Hashing in DBMS
12. Indexing and Hashing in DBMS
 
Introduction to Data Warehouse
Introduction to Data WarehouseIntroduction to Data Warehouse
Introduction to Data Warehouse
 

Similar to Starring sakila my sql university 2009

The final frontier v3
The final frontier v3The final frontier v3
The final frontier v3
Terry Bunio
 
Bw training 1 intro dw
Bw training   1 intro dwBw training   1 intro dw
Bw training 1 intro dw
Joseph Tham
 
1 introductory slides (1)
1 introductory slides (1)1 introductory slides (1)
1 introductory slides (1)
tafosepsdfasg
 

Similar to Starring sakila my sql university 2009 (20)

Dirty data? Clean it up! - Datapalooza Denver 2016
Dirty data? Clean it up! - Datapalooza Denver 2016Dirty data? Clean it up! - Datapalooza Denver 2016
Dirty data? Clean it up! - Datapalooza Denver 2016
 
Dirty Data? Clean it up! - Rocky Mountain DataCon 2016
Dirty Data? Clean it up! - Rocky Mountain DataCon 2016Dirty Data? Clean it up! - Rocky Mountain DataCon 2016
Dirty Data? Clean it up! - Rocky Mountain DataCon 2016
 
The final frontier v3
The final frontier v3The final frontier v3
The final frontier v3
 
Data Warehouse approaches with Dynamics AX
Data Warehouse  approaches with Dynamics AXData Warehouse  approaches with Dynamics AX
Data Warehouse approaches with Dynamics AX
 
FinTech Data Challenges @ Nerdwallet
FinTech Data Challenges @ Nerdwallet FinTech Data Challenges @ Nerdwallet
FinTech Data Challenges @ Nerdwallet
 
Query generation across multiple data stores [SBTB 2016]
Query generation across multiple data stores [SBTB 2016]Query generation across multiple data stores [SBTB 2016]
Query generation across multiple data stores [SBTB 2016]
 
BI Introduction
BI IntroductionBI Introduction
BI Introduction
 
Complete unit ii notes
Complete unit ii notesComplete unit ii notes
Complete unit ii notes
 
Bw training 1 intro dw
Bw training   1 intro dwBw training   1 intro dw
Bw training 1 intro dw
 
Become BI Architect with 1KEY Agile BI Suite - OLAP
Become BI Architect with 1KEY Agile BI Suite - OLAPBecome BI Architect with 1KEY Agile BI Suite - OLAP
Become BI Architect with 1KEY Agile BI Suite - OLAP
 
Data warehouse
Data warehouseData warehouse
Data warehouse
 
Data warehouse system and its concepts
Data warehouse system and its conceptsData warehouse system and its concepts
Data warehouse system and its concepts
 
Roland bouman modern_data_warehouse_architectures_data_vault_and_anchor_model...
Roland bouman modern_data_warehouse_architectures_data_vault_and_anchor_model...Roland bouman modern_data_warehouse_architectures_data_vault_and_anchor_model...
Roland bouman modern_data_warehouse_architectures_data_vault_and_anchor_model...
 
Business analysis
Business analysisBusiness analysis
Business analysis
 
DATA WAREHOUSE IMPLEMENTATION BY SAIKIRAN PANJALA
DATA WAREHOUSE IMPLEMENTATION BY SAIKIRAN PANJALADATA WAREHOUSE IMPLEMENTATION BY SAIKIRAN PANJALA
DATA WAREHOUSE IMPLEMENTATION BY SAIKIRAN PANJALA
 
1 introductory slides (1)
1 introductory slides (1)1 introductory slides (1)
1 introductory slides (1)
 
Data Platform Architecture Principles and Evaluation Criteria
Data Platform Architecture Principles and Evaluation CriteriaData Platform Architecture Principles and Evaluation Criteria
Data Platform Architecture Principles and Evaluation Criteria
 
MongoDB World 2019: Near Real-Time Analytical Data Hub with MongoDB
MongoDB World 2019: Near Real-Time Analytical Data Hub with MongoDBMongoDB World 2019: Near Real-Time Analytical Data Hub with MongoDB
MongoDB World 2019: Near Real-Time Analytical Data Hub with MongoDB
 
A Journey from Oracle to PostgreSQL
A Journey from Oracle to PostgreSQLA Journey from Oracle to PostgreSQL
A Journey from Oracle to PostgreSQL
 
Make your data fly - Building data platform in AWS
Make your data fly - Building data platform in AWSMake your data fly - Building data platform in AWS
Make your data fly - Building data platform in AWS
 

Recently uploaded

XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
ssuser89054b
 
Integrated Test Rig For HTFE-25 - Neometrix
Integrated Test Rig For HTFE-25 - NeometrixIntegrated Test Rig For HTFE-25 - Neometrix
Integrated Test Rig For HTFE-25 - Neometrix
Neometrix_Engineering_Pvt_Ltd
 
Query optimization and processing for advanced database systems
Query optimization and processing for advanced database systemsQuery optimization and processing for advanced database systems
Query optimization and processing for advanced database systems
meharikiros2
 
Standard vs Custom Battery Packs - Decoding the Power Play
Standard vs Custom Battery Packs - Decoding the Power PlayStandard vs Custom Battery Packs - Decoding the Power Play
Standard vs Custom Battery Packs - Decoding the Power Play
Epec Engineered Technologies
 
"Lesotho Leaps Forward: A Chronicle of Transformative Developments"
"Lesotho Leaps Forward: A Chronicle of Transformative Developments""Lesotho Leaps Forward: A Chronicle of Transformative Developments"
"Lesotho Leaps Forward: A Chronicle of Transformative Developments"
mphochane1998
 

Recently uploaded (20)

Worksharing and 3D Modeling with Revit.pptx
Worksharing and 3D Modeling with Revit.pptxWorksharing and 3D Modeling with Revit.pptx
Worksharing and 3D Modeling with Revit.pptx
 
Computer Graphics Introduction To Curves
Computer Graphics Introduction To CurvesComputer Graphics Introduction To Curves
Computer Graphics Introduction To Curves
 
Convergence of Robotics and Gen AI offers excellent opportunities for Entrepr...
Convergence of Robotics and Gen AI offers excellent opportunities for Entrepr...Convergence of Robotics and Gen AI offers excellent opportunities for Entrepr...
Convergence of Robotics and Gen AI offers excellent opportunities for Entrepr...
 
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
 
Integrated Test Rig For HTFE-25 - Neometrix
Integrated Test Rig For HTFE-25 - NeometrixIntegrated Test Rig For HTFE-25 - Neometrix
Integrated Test Rig For HTFE-25 - Neometrix
 
Query optimization and processing for advanced database systems
Query optimization and processing for advanced database systemsQuery optimization and processing for advanced database systems
Query optimization and processing for advanced database systems
 
Introduction to Artificial Intelligence ( AI)
Introduction to Artificial Intelligence ( AI)Introduction to Artificial Intelligence ( AI)
Introduction to Artificial Intelligence ( AI)
 
Path loss model, OKUMURA Model, Hata Model
Path loss model, OKUMURA Model, Hata ModelPath loss model, OKUMURA Model, Hata Model
Path loss model, OKUMURA Model, Hata Model
 
HAND TOOLS USED AT ELECTRONICS WORK PRESENTED BY KOUSTAV SARKAR
HAND TOOLS USED AT ELECTRONICS WORK PRESENTED BY KOUSTAV SARKARHAND TOOLS USED AT ELECTRONICS WORK PRESENTED BY KOUSTAV SARKAR
HAND TOOLS USED AT ELECTRONICS WORK PRESENTED BY KOUSTAV SARKAR
 
COST-EFFETIVE and Energy Efficient BUILDINGS ptx
COST-EFFETIVE  and Energy Efficient BUILDINGS ptxCOST-EFFETIVE  and Energy Efficient BUILDINGS ptx
COST-EFFETIVE and Energy Efficient BUILDINGS ptx
 
Signal Processing and Linear System Analysis
Signal Processing and Linear System AnalysisSignal Processing and Linear System Analysis
Signal Processing and Linear System Analysis
 
Theory of Time 2024 (Universal Theory for Everything)
Theory of Time 2024 (Universal Theory for Everything)Theory of Time 2024 (Universal Theory for Everything)
Theory of Time 2024 (Universal Theory for Everything)
 
8086 Microprocessor Architecture: 16-bit microprocessor
8086 Microprocessor Architecture: 16-bit microprocessor8086 Microprocessor Architecture: 16-bit microprocessor
8086 Microprocessor Architecture: 16-bit microprocessor
 
Post office management system project ..pdf
Post office management system project ..pdfPost office management system project ..pdf
Post office management system project ..pdf
 
Basic Electronics for diploma students as per technical education Kerala Syll...
Basic Electronics for diploma students as per technical education Kerala Syll...Basic Electronics for diploma students as per technical education Kerala Syll...
Basic Electronics for diploma students as per technical education Kerala Syll...
 
Ground Improvement Technique: Earth Reinforcement
Ground Improvement Technique: Earth ReinforcementGround Improvement Technique: Earth Reinforcement
Ground Improvement Technique: Earth Reinforcement
 
S1S2 B.Arch MGU - HOA1&2 Module 3 -Temple Architecture of Kerala.pptx
S1S2 B.Arch MGU - HOA1&2 Module 3 -Temple Architecture of Kerala.pptxS1S2 B.Arch MGU - HOA1&2 Module 3 -Temple Architecture of Kerala.pptx
S1S2 B.Arch MGU - HOA1&2 Module 3 -Temple Architecture of Kerala.pptx
 
Introduction to Geographic Information Systems
Introduction to Geographic Information SystemsIntroduction to Geographic Information Systems
Introduction to Geographic Information Systems
 
Standard vs Custom Battery Packs - Decoding the Power Play
Standard vs Custom Battery Packs - Decoding the Power PlayStandard vs Custom Battery Packs - Decoding the Power Play
Standard vs Custom Battery Packs - Decoding the Power Play
 
"Lesotho Leaps Forward: A Chronicle of Transformative Developments"
"Lesotho Leaps Forward: A Chronicle of Transformative Developments""Lesotho Leaps Forward: A Chronicle of Transformative Developments"
"Lesotho Leaps Forward: A Chronicle of Transformative Developments"
 

Starring sakila my sql university 2009

  • 1. Starring Sakila A Data Warehousing Primer Roland Bouman (Strukton Rail) http://rpbouman.blogspot.com/
  • 3. Data Warehousing Terminology ● Terminology – Business Intelligence – Data Warehouse – Dimensional Model – Star Schema – OLAP – Cube
  • 4. What is Business Intelligence? ● Business Intelligence (BI) – Skills, technologies, applications and practices to acquire a better understanding of the commercial context of your business. ● Data Warehouse ● Dimensional Model ● Star Schema ● OLAP ● Cube
  • 5. What is a Data Warehouse? ● Business Intelligence ● Data Warehouse – A database designed to support Business Intelligence ● Dimensional Model ● Star Schema ● OLAP ● Cube
  • 6. What is the Dimensional Model? ● Business Intelligence ● Data Warehouse ● Dimensional Model – A logical data model that divides data in two kinds: Facts and Dimensions ● Star Schema ● OLAP ● Cube
  • 7. What is a Star Schema? ● Business Intelligence ● Data Warehouse ● Dimensional Model ● Star Schema – Physical implementation of the Dimensional Model on a RDBMS which maps a dimension to a single table ● OLAP ● Cube
  • 8. What is OLAP? ● Business Intelligence ● Data Warehouse ● Dimensional Model ● Star Schema ● OLAP – On-Line Analytical Processing: querying muli-dimensional data, cornerstone of most BI applications ● Cube
  • 9. What is a Cube ● Business Intelligence ● Data Warehouse ● Dimensional Model ● Star Schema ● OLAP ● Cube – Multi-dimensional data structure suitable for OLAP queries
  • 11. Business Intelligence ● Front end Applications: – Reports – Charts and Graphs – OLAP Pivot tables – Data Mining – Dashboards ● Back end, Infrastructure – ETL ● Extract ● Transformation ● Load – Data Warehouse – Data Mart – Metadata – ROLAP Cube
  • 12. High Level BI Architecture Source Systems, External Data Staging Area Data Warehouse Meta Data Business Intelligence Applications Extract Transform Load Present Back-end Front-end
  • 14. Data Warehouse ● Ultimately, it's just a Relational Database – Tables, Columns, Keys... ● ...But designed for BI applications – Ease of use – Performance ● Data from various source systems – Integration, Standardization, Data cleaning – Add and maintain history
  • 15. OLTP vs OLAP: Application Characterization ● OLTP – Operational – 'Always' on – All kinds of users – Many users – Directly supports business process – Keep a Record of Current status ● OLAP – Tactical, Strategic – Periodically Available – Managers, Directors – Few(er) users – Decision support, long-term planning – Maintain history
  • 16. OLTP vs OLAP: data processing ● OLTP – Subject Oriented – Add, Modify, Remove single rows – Human data entry – Queries for small sets of rows with all their details – Standard queries ● OLAP – Aspect Oriented – Bulk load, rarely modify, never remove – Automated ETL jobs – Scan large sets to return aggregates over arbitrary groups – Ad-hoc queries
  • 17. OLTP vs OLAP: database schema organization ● OLTP – Entity-Relationship model – Entities, Attributes, Relationships – Foreign key constraints – Indexes to increase performance – Normalized to 3NF or BCNF ● OLAP – Dimensional model – Facts, Dimensions, Hierarchies – Ref. integrity ensured in loading process – Scans on Fact table obliterates indexes – Denormalized Dimensions (<= 1NF)
  • 18. Organizing data to suit Business Intelligence Dimensional Model
  • 19. The Dimensional Model ● Two kinds of data – Facts – Dimensions
  • 20. The Dimensional Model: Facts ● Facts – Measures/Metrics of a Business Process – Typical Metrics ● Cost, Units Sold, Profit
  • 21. The Dimensional Model: Dimensions ● Dimensions – Describe aspects of Business Process – Dimensions typically not inter-dependent – Who? What? Where? When? Why? – Typical Dimensions: ● Customer (who?), Product (what?), Date/Time (when?)
  • 22. The Dimensional Model: Navigating Facts with Dimensions ● Dimension Attributes organized in Hierarchies – Date dimension examples: ● Year, Quarter, Month, Day ● Year, Week, Day ● Metrics typically numeric and additive ● Navigate fact data – Choose particular values for dimension – Aggregate fact data at chosen level of hierarchy
  • 23. Dimensional Example: Crosstab Date Dimension 2008 Q4 Location Dimension All Months October November December All locations $ 3850 $ 1000 $ 1350 $ 1500 America All America $ 2050 $500 $ 750 $ 800 North $ 1275 $ 300 $ 500 $ 475 South $ 775 $ 200 $ 250 $ 325 Europe All Europe $ 1800 $ 500 $ 600 $ 700 East $ 800 $ 250 $ 250 $ 300 West $ 1000 $ 250 $ 350 $ 400
  • 25. Stars Schema Characteristics ● Central Fact Table – Columns for storing Metrics – 'Foreign Key' columns point to Dimension – Typically normalized and not pre-aggregated ● Dimension maps to a Dimension table – Surrogate key – Descriptive attributes organized in hierarchies – No Foreign Keys to other tables – Typically heavily denormalized
  • 26. Rentals Star Schema example: Sakila Rentals Store Date Time Film CustomerStaff
  • 27. Stars Schema Characteristics ● Star schema is 'just' an implementation – Optimized for simplicity – Optimized for performance (?) – Heavily denormalized dimensions ● Snowflake: Star Schema Alternative – Still a dimensional model – Still a central fact table – Normalized dimensions – Easier maintenance of dimensions
  • 28. Snow Flake example: Sakila Rentals Store Date Minute Film Customer Staff Month Hour Quarter City Country City Country Language Rating Year Week Rentals
  • 30. Dimensional Design ● Select Business Process – Sales, Purchase, Storage, Transport, ... ● Define Facts and Key Metrics – Facts: Key Event in Business Process – Metrics (Fact Attributes): Count or Amount ● Choose Dimensions and Hierarchies – What? When? Where? – Who? Why?
  • 31. Dimensional Model example ● MySQL Sample Database – http://dev.mysql.com/doc/sakila/en/sakila.html ● DVD rental business – Overly simplified database schema ● Typical OLTP database
  • 32. Dimensional Model example ● Rental Business Process – Customer visits store, picks DVD – DVD taken out of store inventory by staff member – Customer returns home and enjoys DVD – Customer returns to store with DVD – DVD returned to staff member – Staff member collects payment made by customer
  • 33.
  • 34. 3NF Source schema: Sakila Rentals Rental Customer Film Store Address Category Actor StaffInventory City CountryLanguage
  • 35. Example Business Process: Rentals ● Select Business Process – Rentals ● Identify Facts – Count (number of rentals) – Rental Duration ● Choose Dimensions – What: Films – When: Rental, Return – Who: Customer, Staff – Where: Store
  • 36. Target Star Schema Fact: Rentals Store Date Time Film When? Where? What? CustomerStaff Who?
  • 38. A star is born: Rentals 3NF Rental CustomerStaffInventory
  • 39. A star is born: Rentals 3NF Rental CustomerStaffInventory Store Film Category Film Category
  • 40. A star is born: Denormalize Rental CustomerStaffInventory Store Film Category Film Category
  • 41. A star is born: Denormalize Rental CustomerStaff Store Film StoreCategory
  • 42. A star is born Rental CustomerStaff Store Film Store Address Category
  • 43. A star is born: Denormalize Rental CustomerStaff Store Film Store Address Category
  • 44. A star is born: Denormalize Rental CustomerStaff Store Film Store AddressAddress Language Category
  • 45. A star is born: Denormalize Rental CustomerStaff Store Film Store AddressAddress Language CityCity Category
  • 46. A star is born: Rental Snowflake Rental CustomerStaff Store Film Store AddressAddress Language CityCity CountryCountry Category
  • 47. A star is born: Rental Star Schema Rental StoreLanguage Film Country City What: Film Who: CustomerWhere: Store Who: Staff Address Store Staff Country City Address Customer Category
  • 48. Dimensional Design ● Something is missing.... – Who ? (Customer, Staff) – What ? (Film) – Where ? (Store) – .... ?
  • 49. A star is born: Rental Date and Time Rental What: Film Who: CustomerWhere: Store Who: Staff When: Date When: Time
  • 50. Role Playing: Date/Time for both Rentals and Returns Rental What: Film Who: CustomerWhere: Store Who: Staff When: Rental Date When: Rental Time When: Return Date When: Return Time
  • 53. ETL with Pentaho Data Integration Loading a Data Warehouse
  • 54. Dimensional Design ● Pentaho Data Integration – sourceforge.net/projects/pentaho/ ● ETL and much more ● Transformations: – Extract, Load and Transform ● Jobs: – Organize multiple transformations to a complete ETL process ● > 30 RDBMS-es, > 130 Transformation Steps
  • 55. Job: Rental ETL Process ● First load dimensions, finally load fact ● Mail notification in case of success / failure
  • 56. Job: Rental ETL Process ● First load dimensions, finally load fact ● Mail notification in case of success / failure
  • 57. Job: Rental ETL Process ● Get store, lookup address (subtransformation) and manager ● Load store dimension table
  • 58. Job: Rental ETL Process ● Get store, lookup address (subtransformation) and manager ● Load store dimension table
  • 59. Job: Rental ETL Process ● Get address, lookup city and country ● Concatenate address if necessary
  • 60. Job: Rental ETL Process ● This was just a simple example ● More complex example: importing XML <?xml version="1.0" encoding="UTF-8"?> <result> <actors> <actor id="00000015">Anderson, Jeff</actor> <actor id="00000015">Anderson, Jeff</actor> .. </actors> <videos> <video> <title>The Fugitive</title> <genre>action</genre> .... </video> ... </videos> </result>
  • 61. Job: Rental ETL Process ● This was just a simple example ● More complex example: importing XML
  • 62. OLAP Pivot Table with Pentaho Analysis Services OLAP
  • 63. Dimensional Design ● Pentaho Analysis Services – Part of Pentaho BI Server – sourceforge.net/projects/pentaho/ – Based on Mondrian ROLAP server – sourceforge.net/projects/mondrian/
  • 64. Dimensional Design ● Pentaho Schema Workbench – Map data warehouse tables to a logical Cube
  • 66. Upcoming Book: Pentaho Solutions ● Pentaho Solutions – Wiley – ISBN 978-0-470-48432-6 – September 2009 – 630+ page paperback – Amazon pre-order $31.50 – Regular: $50.00