Introduction to
DataWarehouse
Amin Choroomi
choroomi@vdashonline.com
What is a Data
Warehouse?
 A Simple Relational Database
 Different Architecture
 Less Normalized
 Analytical Design
 Facts and Dimensions
 Non-Operational
Why Data
Warehouse?
Operational Systems
Transactional
Ssytems
Legacy
Applications
Internal /
External Feeds
Management
Reports
Dashboards
Analytics
Why Data
Warehouse?
Management
Reports
Dashboards
Analytics
Data Warehouse
Operational Systems
Transactional
Ssytems
Legacy
Applications
Internal /
External Feeds
Benefits of a
Data
Warehouse
 Centralized Data Source
 Enhanced Business Intelligence
 Increased Query and System Performance
 Business Intelligence from Multiple Sources
 Timely Access to Data
 Enhanced Data Quality and Consistency
 Historical Intelligence
 High Return on Investment
OLTP vs.OLAP
OnlineTransaction Processing
 Optimized forTransactions
 Concurrent Operations
 Consistent andAccurate
 Real-Time Data
 Short life span
 Too Many SmallTables
 Normalized
Online Analytical Processing
 Optimized For Analysis
 LargeAmounts of Historical
Data
 Fed From OLTP Databases
 Less Normalized (2NF)
 Facts and Dimensions
 Not Real-Time
 ExtractTransform and Load
(ETL)
Relational vs.
Multidimensional
DWs
Relational DWs
 Similar to OLTP
 Simpler Structure
 Query using SQL
 Less ProcessingCost
 Easier Maintenance
 Best for Real-TimeAd-hoc
Reporting
Multidimensional DWs
 Different Structure (Cubes)
 Different Query Language
(MDX)
 Much Faster for Extra-Large
DataSets
 Pre-Calculated Measures,
KPIs
 Optimized to write and
answer complicated requests
What’s inside a Data
Warehouse?
Inside of Data
Warehouse
 Dimensions
 Measures
 Facts
Dimensions
 Describing Information
 Slicing and Dicing
 Comparing
 Make the data meaningful
 Example:
 Customer
 Time
 Product
 Project
 Types
 Categories
 Colors, Sizes, etc…
Sample
Dimension
Table
Metrics /
Measures
 Measurable Columns
 Things we’re actually looking for
 They are usually aggregated (Sum, Avg, Min, Max…)
 Examples:
 Sales Amount
 Order Quantity
 Customer Count
 Tax Paid
 Etc.
Facts
 Describing Measures by Dimensions
 Tables Containing Multiple Dimension Keys and MeasureValues
 Usually PrimaryKey is all the Dimensions Keys or the Event Key
 Dimension Keys are also ForeignKey to DimensionTables
 Facts usually express real events that happened at a specific time
 Example:
 We Sold 2 Toyotas to John Smith in New York Yesterday for
$20,000.00 each and gave him $2,000.00 overall discount. So it was
totally $38,000.00
Facts
 Describing Measures by Dimensions
 Tables Containing Multiple Dimension Keys and MeasureValues
 Usually PrimaryKey is all the Dimensions Keys or the Event Key
 Dimension Keys are also ForeignKey to DimensionTables
 Facts usually express real events that happened at a specific time
 Example:
 We Sold 2 Toyotas to John Smith in New York Yesterday for
$20,000.00 each and gave him $2,000.00 overall discount. So it was
totally $38,000.00
Sample Fact
Table
FKs to DimensionTables
Describing the
Fact (Used for
Drill-Down)
Measures
Fact /
Dimension
Relationship
Star Schema
 Facts connect DIRECTLY to
each Dimension with a single
relation.
 Simple Structure
 EasierTo Query
 Not the best approach for
complicated Dimensions
 No Built-in Drill-Down
Snow Flake Schema /
Dimensions
 Dimensions are
HIERARCHICALLY connected
to each other.
 Facts connect to one of the
Dimensions and uses the
other ones through the
connected dimension.
 More Complicated
 Built-in Drill-Down
StarSchema
Fact
Dimension 1
Dimension 5 Dimension 4
Dimension 3
Dimension 2
StarSchema
Sales
Customer
Location Promotion
Product
Time
Snow Flake
Schema Sales
Customer
Location
Product
Category
Product
Product
Subcategory
Designing
Data
Warehouse
 Fact Oriented
 You have and know the business facts
 Design Facts, fill it with measures and then Dimensions
 Might need a couple of Iterations.
 You will end up with Facts with real PrimaryKeys
 Ex. Internet Sales (we talked about it before)
 Measure Group Oriented
 You only know/want your Measures
 Write down all business measures you need
 Connect them to Dimensions
 Group them by their meaning and common Dimensions
 You will end up with Facts with Dimension Combined Primarykeys
 Ex. Employee Offdays (TimeKey, EmployeeKey, ReasonKey,
OffDayCount)
RealWorld
FactTable
RealWorld
Data
Warehouse
(Some of it )
About Me
Amin Choroomi
CTO & Co-Founder at vdash
Software Developer, Teacher and Consultant
DataVisualization, Analytics, Dashboards
Data Warehousing, Integration, Business Intelligence
http://www.vdash.ir
choroomi@live.com
choroomi@vdashonline.com
https://linkedin.com/in/choroomi
@aminchoroomi
ThankYou

Introduction Data warehouse

  • 1.
  • 2.
    What is aData Warehouse?  A Simple Relational Database  Different Architecture  Less Normalized  Analytical Design  Facts and Dimensions  Non-Operational
  • 3.
  • 4.
    Why Data Warehouse? Management Reports Dashboards Analytics Data Warehouse OperationalSystems Transactional Ssytems Legacy Applications Internal / External Feeds
  • 5.
    Benefits of a Data Warehouse Centralized Data Source  Enhanced Business Intelligence  Increased Query and System Performance  Business Intelligence from Multiple Sources  Timely Access to Data  Enhanced Data Quality and Consistency  Historical Intelligence  High Return on Investment
  • 6.
    OLTP vs.OLAP OnlineTransaction Processing Optimized forTransactions  Concurrent Operations  Consistent andAccurate  Real-Time Data  Short life span  Too Many SmallTables  Normalized Online Analytical Processing  Optimized For Analysis  LargeAmounts of Historical Data  Fed From OLTP Databases  Less Normalized (2NF)  Facts and Dimensions  Not Real-Time  ExtractTransform and Load (ETL)
  • 7.
    Relational vs. Multidimensional DWs Relational DWs Similar to OLTP  Simpler Structure  Query using SQL  Less ProcessingCost  Easier Maintenance  Best for Real-TimeAd-hoc Reporting Multidimensional DWs  Different Structure (Cubes)  Different Query Language (MDX)  Much Faster for Extra-Large DataSets  Pre-Calculated Measures, KPIs  Optimized to write and answer complicated requests
  • 8.
    What’s inside aData Warehouse?
  • 9.
    Inside of Data Warehouse Dimensions  Measures  Facts
  • 10.
    Dimensions  Describing Information Slicing and Dicing  Comparing  Make the data meaningful  Example:  Customer  Time  Product  Project  Types  Categories  Colors, Sizes, etc…
  • 11.
  • 12.
    Metrics / Measures  MeasurableColumns  Things we’re actually looking for  They are usually aggregated (Sum, Avg, Min, Max…)  Examples:  Sales Amount  Order Quantity  Customer Count  Tax Paid  Etc.
  • 13.
    Facts  Describing Measuresby Dimensions  Tables Containing Multiple Dimension Keys and MeasureValues  Usually PrimaryKey is all the Dimensions Keys or the Event Key  Dimension Keys are also ForeignKey to DimensionTables  Facts usually express real events that happened at a specific time  Example:  We Sold 2 Toyotas to John Smith in New York Yesterday for $20,000.00 each and gave him $2,000.00 overall discount. So it was totally $38,000.00
  • 14.
    Facts  Describing Measuresby Dimensions  Tables Containing Multiple Dimension Keys and MeasureValues  Usually PrimaryKey is all the Dimensions Keys or the Event Key  Dimension Keys are also ForeignKey to DimensionTables  Facts usually express real events that happened at a specific time  Example:  We Sold 2 Toyotas to John Smith in New York Yesterday for $20,000.00 each and gave him $2,000.00 overall discount. So it was totally $38,000.00
  • 15.
    Sample Fact Table FKs toDimensionTables Describing the Fact (Used for Drill-Down) Measures
  • 16.
    Fact / Dimension Relationship Star Schema Facts connect DIRECTLY to each Dimension with a single relation.  Simple Structure  EasierTo Query  Not the best approach for complicated Dimensions  No Built-in Drill-Down Snow Flake Schema / Dimensions  Dimensions are HIERARCHICALLY connected to each other.  Facts connect to one of the Dimensions and uses the other ones through the connected dimension.  More Complicated  Built-in Drill-Down
  • 17.
    StarSchema Fact Dimension 1 Dimension 5Dimension 4 Dimension 3 Dimension 2
  • 18.
  • 19.
  • 20.
    Designing Data Warehouse  Fact Oriented You have and know the business facts  Design Facts, fill it with measures and then Dimensions  Might need a couple of Iterations.  You will end up with Facts with real PrimaryKeys  Ex. Internet Sales (we talked about it before)  Measure Group Oriented  You only know/want your Measures  Write down all business measures you need  Connect them to Dimensions  Group them by their meaning and common Dimensions  You will end up with Facts with Dimension Combined Primarykeys  Ex. Employee Offdays (TimeKey, EmployeeKey, ReasonKey, OffDayCount)
  • 21.
  • 22.
  • 23.
    About Me Amin Choroomi CTO& Co-Founder at vdash Software Developer, Teacher and Consultant DataVisualization, Analytics, Dashboards Data Warehousing, Integration, Business Intelligence http://www.vdash.ir choroomi@live.com choroomi@vdashonline.com https://linkedin.com/in/choroomi @aminchoroomi
  • 24.