Module 1
Data Warehousing
Fundamentals
By
Ms. Rashmi Bhat
Strategic Information
DATA WAREHOUSING AND MINING BY RASHMI BHAT 2
An interesting post on Facebook
Strategic Information
DATA WAREHOUSING AND MINING BY RASHMI BHAT 3
• The information collected and transformed to support or shape competitive
strategy.
What is Strategic Information?
• By the executives and managers who are responsible for keeping the enterprise
competitive need information to make proper decisions
• They need information to formulate the business strategies, establish goals, set
objectives, and monitor results.
Who needs the Strategic Information?
Strategic Information
DATA WAREHOUSING AND MINING BY RASHMI BHAT 4
Get the data in
• Making the wheels of business turn
• Take an order
• Process a claim
• Make a shipment
• Generate an invoice
• Receive cash
• Reserve an airline seat
Operational Systems
Get the information out
• Watching the wheels of business turn
• Show me the top-selling products
• Show me the problem regions
• Tell me why (drill down)
• Let me see other data (drill across)
• Show the highest margins
• Alert me when a district sells below
target
Decision Support Systems
Strategic Information
DATA WAREHOUSING AND MINING BY RASHMI BHAT 5
How are Operational and informational systems different?
Parameters Operational Systems Informational Systems
Data Contents Current values Archived, derived, summarized
Data Structure Optimized for transactions Optimized for complex queries
Access Frequency High Medium to low
Access Type Read, update, delete Read
Usage Predictable, repetitive Ad hoc, random, heuristic
Response Time Sub-seconds Several seconds to minutes
Users Large number Relatively small number
Data Warehouse
DATA WAREHOUSING AND MINING BY RASHMI BHAT 6
• Database designed for analytical tasks
• Data from multiple applications
• Easy to use and beneficial to long interactive sessions by users
• Read-intensive data usage
• Direct interaction with the system by the users without IT assistance
• Content updated periodically and stable
• Content to include current and historical data
• Ability for users to run queries and get results online
• Ability for users to initiate reports
Desired Features of New Type of System
Data Warehouse
DATA WAREHOUSING AND MINING BY RASHMI BHAT 7
• Running of simple queries and reports against current and historical data.
• Ability to perform “what if” analysis in many different ways.
• Ability to query, step back, analyze, and then continue the process to any desired length.
• Ability to spot historical trends and apply them in future interactive processes
Processing Requirements in New System
This new environment is known as the Data Warehouse environment
Data Warehouse
DATA WAREHOUSING AND MINING BY RASHMI BHAT 8
• Provides an integrated and total view of the enterprise.
• Makes the enterprise’s current and historical information easily available for
strategic decision making.
• Makes decision-support transactions possible without hindering operational
systems.
• Renders the organization’s information consistent.
• Presents a flexible and interactive source of strategic information.
Data Warehouse is an informational environment that:
Data Warehouse
DATA WAREHOUSING AND MINING BY RASHMI BHAT 9
“A Data Warehouse is a subject oriented, integrated, nonvolatile, and
time variant collection of data in support of management’s decisions.”
-Bill Inmon
What is Data Warehouse?
DATA WAREHOUSING AND MINING BY RASHMI BHAT 10
Features of Data Warehouse
Subject Oriented Data
Integrated Data
Time Variant Data
Non-Volatile Data
Data Granularity
Data Warehouse Architecture
DATA WAREHOUSING AND MINING BY RASHMI BHAT 11
ETL
DATA WAREHOUSING AND MINING BY RASHMI BHAT 12
Data Warehouse
Architecture
Data Warehouse Architecture
DATA WAREHOUSING AND MINING BY RASHMI BHAT 13
• Bottom Tier: Warehouse Database Server
• A relational database system.
• Data is fed from operational databases or other external sources
• Data are extracted using application program called as Gateways.
• Contains metadata repository.
• OLAP Servers
• Implemented using relational OLAP or multi-dimensional OLAP
• Front-end Client Layer
• Contains query and reporting tools, analysis tools and/or data mining tools
Three Tier Architecture
Data Warehouse and Data Marts
DATA WAREHOUSING AND MINING BY RASHMI BHAT 14
• A data mart is a subset of a data warehouse focused on a particular line of business,
department, or subject area.
• It is designed for use by a specific department, unit or set of users in an organization. E.g.,
Marketing, Sales, HR or finance.
• It is often controlled by a single department in an organization.
• Data Mart usually draws data from only a few sources compared to a Data warehouse.
• Data marts are small in size and are more flexible compared to a Datawarehouse.
What is Data Mart?
Data Warehouse and Data Marts
DATA WAREHOUSING AND MINING BY RASHMI BHAT 15
What is Data Mart?
Data
Warehouse
Data Mart
Data Mart
Data Mart
Data Warehouse and Data Marts
DATA WAREHOUSING AND MINING BY RASHMI BHAT 16
Data Warehouse Data Mart
Scope
• Application independent
• Centralized or enterprise wide
• Planned
• Specific application
• Decentralized by group
• Organic but may be planned
Data
• Historical, detailed, summary
• Some denormalization
• Some history, detailed summary
• Highly denormalized
Sources • Many internal and external sources • Few internal and external sources
Other
• Flexible
• Data oriented
• Long life
• Single complex structure
• Restrictive
• Project oriented
• Short life
• Multiple simple structure
Metadata
DATA WAREHOUSING AND MINING BY RASHMI BHAT 17
• The metadata component is the data about the data in the data warehouse.
• The Yellow Pages is a directory with data about the institutions in your town.
• The metadata component serves as a directory of the contents of your data warehouse
• Metadata is a key architectural component of the data warehouse
• Types of Metadata
1. Operational metadata
2. Extraction and transformation metadata
3. End-user metadata
Metadata in Data Warehouse
Metadata
DATA WAREHOUSING AND MINING BY RASHMI BHAT 18
• Operational metadata
• contain all of this information about the operational data sources.
• Extraction and transformation metadata
• contain data about the extraction of data from the source systems, namely, the extraction
frequencies, extraction methods, and business rules for the data extraction
• contains information about all the data transformations that take place in the data staging area.
• End-user metadata
• the navigational map of the data warehouse.
• enables the end-users to find information from the data warehouse
Types of Metadata
Metadata
DATA WAREHOUSING AND MINING BY RASHMI BHAT 19
• It acts as the glue that connects all parts of the data warehouse.
• It provides information about the contents and structures to the developers.
• It opens the door to the end-users and makes the contents recognizable in their own terms.
Why is Metadata important in data warehouse?
Dimensional Modelling
DATA WAREHOUSING AND MINING BY RASHMI BHAT 20
Dimensional Modelling
DATA WAREHOUSING AND MINING BY RASHMI BHAT 21
• A data structure technique optimized for data storage in a Data warehouse.
• Optimizes the database for faster retrieval of data.
• A dimensional model in data warehouse is designed to read, summarize, analyze numeric
information.
• Developed by Ralph Kimball and consists of “fact” and “dimension” tables.
What is Dimensional Modelling?
Dimensional Modelling
DATA WAREHOUSING AND MINING BY RASHMI BHAT 22
• Dimensions and Facts are the basic building block of data warehouse.
• Everything in data warehouse revolves around dimensions and facts in data warehouse.
• When we collect data, we should be able to identify which entities are dimensions and
which are facts from available data.
Dimensions and Facts
Data
Warehous
e
Dimensions Facts
Dimensional Modelling
DATA WAREHOUSING AND MINING BY RASHMI BHAT 23
• Consider a Retail business process.
• Business process contains two types of data entities
1. Descriptive entity
2. Business numbers
• Business numbers provides quantitative information about business processes called as facts
(or metric/measurements).
• e.g. Quantity, sales amount, total earning etc.
• Descriptive entities describe this quantitative numbers. This is the data which is level based,
descriptive and that explains business numbers are called as dimensions.
• e.g. people, product, place, time.
Dimensions and Facts
Dimensional Modelling
DATA WAREHOUSING AND MINING BY RASHMI BHAT 24
Dimensions and Facts
Retail Shop
Transaction # Date Store Product Quantity Unit Price
Sales
Amount
854751 11/04/20 Pune Chair 15 550 8250
875486 21/07/16 Thane Desktop 5 8500 42500
Date Store Product
Text Entities/
Attributes
Numeric Entities/
Attributes
Numeric
Attributes
Dimensional Modelling
DATA WAREHOUSING AND MINING BY RASHMI BHAT 25
Dimensions and Facts
Retail Shop
Transaction # Date Store Product Quantity Unit Price
Sales
Amount
854751 11/04/20 Pune Chair 15 550 8250
875486 21/07/16 Thane Desktop 5 8500 42500
Date Store Product
Chair 15
Dimensional Modelling
DATA WAREHOUSING AND MINING BY RASHMI BHAT 26
• Dimensional Modelling is a set of guidelines to design a database table structure for easier
and faster data retrieval.
• Not restricted to relational databases. Once logical design is ready, can be used in relational
an multidimensional model
• Easier to understand
• Made from dimension tables and fact tables.
• Separate model for each business process
• Build to enhance query performance
What is Dimensional Modelling
Dimensional Modelling
DATA WAREHOUSING AND MINING BY RASHMI BHAT 27
ER-Modelling Dimensional Modelling
Data is stored in RDBMS Data is stored in RDBMS or multidimensional
databases
Data is stored in Tables Data is stored in data cubes
Data is normalized and used in OLTP systems Data is de-normalized and used in OLAP
systems
Several tables and chain od relationships
among them
Few tables and fact tables are connected to
dimensional tables a
Volatile data Non-volatile data
Designed for reducing data redundancy Designed for easy queries and navigation
Dimensional Modelling
DATA WAREHOUSING AND MINING BY RASHMI BHAT 28
• Dimensional Modeling is a logical design technique to structure the business dimensions and
the metrics that are analyzed along these dimensions.
• The multidimensional information package diagram is the foundation for the dimensional
model.
• The dimensional model consists of the specific data structures needed to represent the
business dimensions.
• These data structures also contain the metrics or facts.
Information Package Diagram (IPD)
Dimensional Modelling
DATA WAREHOUSING AND MINING BY RASHMI BHAT 29
Time Locations Products Age Groups
Year Country Class Group 1
Month State Subclass Subgroup
Measured Facts: Target Sales, Actual Sales, Forecast Sale
Hierarchies
Fig. Information Package Diagram
Information Subject: Sales Analysis
Dimensional Modelling
DATA WAREHOUSING AND MINING BY RASHMI BHAT 30
• The fact table gets its name from the subject for analysis
• Each fact item or measurement goes into the fact table as an attribute for automaker sales
• The product business dimension is used when we want to analyze the facts by different
attributes
• By product
• by individual models.
• by product lines.
• by product categories.
Fact Table and Dimension Table
DATA WAREHOUSING AND MINING BY RASHMI BHAT 31
Fig: Formation of the automaker sales fact table.
DATA WAREHOUSING AND MINING BY RASHMI BHAT 32
Fig: Formation of the automaker dimension tables
DATA WAREHOUSING AND MINING BY RASHMI BHAT 33
Dimension
Schema
Star Schema Snowflake Schema
Fact Constellation
Schema
(Galaxy Schema)
STAR Schema
DATA WAREHOUSING AND MINING BY RASHMI BHAT 34
• The arrangement of fact table and dimensional table looks like a star formation
• The fact table at the core of the star and the dimension tables along the spikes of the star.
• This dimensional model is called as STAR schema.
STAR Schema
STAR Schema
DATA WAREHOUSING AND MINING BY RASHMI BHAT 35
Product
Dealer
Time
Payment
Method
Customer
Demographics
Auto
Sales
Fig.: Star schema for Automaker Sales
STAR Schema
DATA WAREHOUSING AND MINING BY RASHMI BHAT 36
Fig.: Star schema for Order Analysis
Order
Order Rs
Cost
Margin
Quantity Sold
Customer
Cust_Name
Cust_Code
Billing Address
Shipping Address
Phone Number
Salesperson
SalesPerson Name
Territory Name
Region Name
Product
Prod_Name
Brand Name
Batch no
Order
Date
Date
Month
Quarter
Year
STAR Schema
DATA WAREHOUSING AND MINING BY RASHMI BHAT 37
• It is a structure that can be easily understood by the users.
• The structure mirrors how the users normally view their critical measures along their
business dimensions.
• When a query is made against the data warehouse, the results of the query are produced
by combining or joining one of more dimension tables with the fact table.
• The joins are between the fact table and individual dimension tables.
STAR Schema
STAR Schema
DATA WAREHOUSING AND MINING BY RASHMI BHAT 38
• The dimension tables represent the business dimensions
along which the metrics are analyzed.
• The primary key of the dimension table uniquely identifies
each row in the table.
• The attributes in a dimension table are of textual format.
• Attributes represent the textual descriptions of the
components within the business dimensions.
• Users will compose their queries using these descriptors
Inside a Dimension Table
Dimension
Dimension_Pkey
.
.
.
Set of attributes
.
.
.
Fig. Inside the dimension
table
STAR Schema
DATA WAREHOUSING AND MINING BY RASHMI BHAT 39
• Fact table is a place where we keep the measurements.
• Some fact tables may just contain summary data. These are called
aggregate fact tables.
• Fact table contains set of foreign keys of each dimension table.
• The row in the fact table must be identified by the primary keys of
these dimension tables.
• the primary key of the fact table must be the concatenation of the
primary keys of all the dimension tables.
• Fact tables contain facts(metrics) which is stored as data grain(level
of detail for the measurements or metrics).
Inside a Fact Table
Fact
Set of foreign keys
.
.
.
Set of facts
.
.
.
Fig. Inside the dimension
table
DATA WAREHOUSING AND MINING BY RASHMI BHAT 40
Attendance
Date_key
Classroom_key
Subj_key
Stud_key
Teacher_key
Present
Student
Stud_key
Roll_No
Student_Name
Department
Class
Division
Teacher
Teacher_key
Teacher Name
Department
Emp_id
Date
Date_key
Day
Month
Year
Subject
Subj_key
Sub_code
Sub_name
Semester
Is_elective
Classroom
Classroom_key
Wing
Floor
Fig.: Star schema for Attendance
DATA WAREHOUSING AND MINING BY RASHMI BHAT 41
A manufacturing company has a huge sales network. To control
the sales, it is divided in the regions,. Each region has multiple
zones. Each zone has different cities. Each sales person is
allocated different cities. The objective is to track sales figure at
different granularity levels of region. Also to count no. of
products sold.
Design a Star schema to take into consideration of above
granularity levels for region, sales person and time.
DATA WAREHOUSING AND MINING BY RASHMI BHAT 42
Prod_key
Prod_name
Prod_subcategory
Prod_category
Dim_Product
Location_key
City
Zone
Region
Dim_Location
Time_key
Day
Week
Month
Quarter
Year
Dim_Time
SP_key
SP_name
SP_type
SP_Dept
Dim_Salesperson
Prod_key
Time_key
Location_key
SP_key
No_of_prod_sold
Sales_amount
Fact_sales
STAR Schema
DATA WAREHOUSING AND MINING BY RASHMI BHAT 43
• We are building a fact table to track the attendance of students.
• For analyzing student attendance, the dimensions are student, course, date, room, and
professor
• The attendance may be affected by any of these dimensions.
• Every fact table row will contain the number one as attendance.
• The fact tables which do not need to contain facts are factless fact tables.
Factless Fact Table
DATA WAREHOUSING AND MINING BY RASHMI BHAT 44
Attendance
Date_key
Classroom_key
Subj_key
Stud_key
Teacher_key
Student
Stud_key
Roll_No
Student_Name
Department
Class
Division
Teacher
Teacher_key
Teacher Name
Department
Emp_id
Date
Date_key
Day
Month
Year
Subject
Subj_key
Sub_code
Sub_name
Semester
Is_elective
Classroom
Classroom_key
Wing
Floor
Fig.: Star schema for Attendance
Snowflake Schema
DATA WAREHOUSING AND MINING BY RASHMI BHAT 45
• “Snowflaking” is a method of normalizing the dimension
tables in a STAR schema.
• When you completely normalize all the dimension
tables, the resultant structure resembles a snowflake
with the fact table in the middle.
Snowflake Schema
Snowflake Schema
DATA WAREHOUSING AND MINING BY RASHMI BHAT 46
• “Snowflaking” or normalization of the dimension tables can be achieved in a few different
ways-
• Partially normalize only a few dimension tables, leaving the others intact.
• Partially or fully normalize only a few dimension tables, leaving the rest intact.
• Partially normalize every dimension table.
• Fully normalize every dimension table.
Snowflake Schema
DATA WAREHOUSING AND MINING BY RASHMI BHAT 47
SALES FACT
Prod_key
Time_key
Cust_key
SalesRep_key
Quantity_sold
Sales_amount
Sales_margin
CUSTOMER
Cust_key
Cust_name
Cust_code
Maritual_status
Address
State
Pin
Country
SALESREP
SalesRep_key
SalesPerson_name
Territory_name
Region_name
TIME
Time_key
Date
Month
Quarter
Year
Product
Prod_key
Prod_name
Prod_code
Package_type
Brand_name
Product Category
Fig: Star Schema for Sales
DATA WAREHOUSING AND MINING BY RASHMI BHAT 48
SALES FACT
Prod_key
Time_key
Cust_key
SalesRep_key
Quantity_sold
Sales_amount
Sales_margin
CUSTOMER
Cust_key
Cust_name
Cust_code
Maritual_status
Address
State
Pin
Country
SALESREP
SalesRep_key
SalesPerson_name
Territory_name
Region_name
TIME
Time_key
Date
Month
Quarter
Year
Product
Prod_key
Prod_name
Prod_code
Package_type
Brand_key
Brand
Brand_key
Brand_name
Category_key
Category
Category_key
Product category
Fig: Snowflake Schema: Partially normalized Product
Dimension
DATA WAREHOUSING AND MINING BY RASHMI BHAT 49
SALES FACT
Prod_key
Time_key
Cust_key
SalesRep_key
Quantity_sold
Sales_amount
Sales_margin
CUSTOMER
Cust_key
Cust_name
Cust_code
Maritual_status
Address
State
Pin
Country_key
SALESREP
SalesRep_key
SalesPerson_name
Territory_key
TIME
Time_key
Date
Month
Quarter
Year
Product
Prod_key
Prod_name
Prod_code
Package_key
Brand_key
Brand
Brand_key
Brand_name
Category_key
Category
Category_key
Product category
Fig: Snowflake Schema for Sales
COUNTRY
Country_key
Country_Name
PACKAGE
Package_key
Package_type
TERRITORY
Territory_key
Territory_name
Region_key
REGION
Region_key
Region_Name
Snowflake Schema
DATA WAREHOUSING AND MINING BY RASHMI BHAT 50
• By normalizing, we remove redundancies that might appear to save significant storage space
when the dimensions are large.
• Advantages of Snowflaking
• Normalized structures are easier to update and maintain
• Small savings in storage space
• Limitations of Snowflaking
• Schema less intuitive and end-users are put off by the complexity
• Ability to browse through the contents difficult
• Degraded query performance because of additional joins
Snowflake Schema
Fact Constellation Schema
DATA WAREHOUSING AND MINING BY RASHMI BHAT 51
• It is a group of different fact tables that have few similar dimension tables.
• It can be represented as a group of multiple Star schema and thus also called as Galaxy Schema.
• A combined group of star schema is called as fact constellation schema
• Used to design a data warehouse
• It is a little complicated than star and snowflake schema.
Fact Constellation Schema
Fact Constellation Schema
DATA WAREHOUSING AND MINING BY RASHMI BHAT 52
Basic schema for Fact Constellation Schema
Fact Table
Dimension
Table
Fact Table
Dimension
Table
Dimension
Table
Dimension
Table
Dimension
Table
Dimension
Table
Fact Constellation Schema
DATA WAREHOUSING AND MINING BY RASHMI BHAT 53
Student
Roll_no
Stud_name
Marks
TPD
Tpd_id
Tpo_name
Department
Company
Company_id
Company_name
Package offered
Placement
Roll_no
Company_id
Tpd_id
no._of_students
Attented_students
Selected _students
Training Class
Roll_no
Training_code
Tpd_id
Attended_students
Passed_students
Workshop
Training_code
Training_name
Subject
Workshop_fees
Fig: Fact Constellation Schema
Updates to Dimension Tables
DATA WAREHOUSING AND MINING BY RASHMI BHAT 56
• Over time the number of rows in the fact table continues to grow.
• Very rarely are the rows in a fact table updated with changes.
• Adjustments to the prior numbers are processed as additional adjustment rows and
added to the fact table
• Dimension tables are more stable and less volatile.
• Slowly Changing Dimensions
• Customers address
• Customers marital status
• Saleman’s region etc.
Updates to Fact and Dimension Tables
Updates to Dimension Tables
DATA WAREHOUSING AND MINING BY RASHMI BHAT 57
• For changes to the dimension tables, we can derive the following principles:
• Most dimensions are generally constant over time.
• Many dimensions, though not constant over time, change slowly.
• The product key of the source record does not change.
• The description and other attributes change slowly over time.
• In the source OLTP systems, the new values overwrite the old ones.
• Overwriting of dimension table attributes is not always the appropriate option in a data
warehouse.
• The ways changes are made to the dimension tables depend on the types of changes and what
information must be preserved in the data warehouse.
Slowly Changing Dimensions
Updates to Dimension Tables
DATA WAREHOUSING AND MINING BY RASHMI BHAT 58
Updates to
table
Type 1 Changes
Correction of Errors
Type 2 Changes
Preservation of
History
Type 3 Changes
Tentative Soft
Revisions
Updates to Dimension Tables
DATA WAREHOUSING AND MINING BY RASHMI BHAT 59
• These changes usually relate to the corrections of errors in the source systems.
• General principles for type 1 changes:
• Usually, the changes relate to correction of errors in source systems.
• Sometimes the change in the source system has no significance.
• The old value in the source system needs to be discarded.
• The change in the source system need not be preserved in the data warehouse.
Type 1 Change: Correction of Errors
Updates to Dimension Tables
DATA WAREHOUSING AND MINING BY RASHMI BHAT 60
• Applying Type 1 Changes to data warehouse
• Overwrite the attribute value in the dimension table row with the new value.
• The old value of the attribute is not preserved.
• No other changes are made in the dimension table row.
• The key of this dimension table or any other key values are not affected.
• This type of change is easiest to implement.
Type 1 Change: Correction of Errors
DATA WAREHOUSING AND MINING BY RASHMI BHAT 61
Fig. The method for applying type 1 changes
Updates to Dimension Tables
DATA WAREHOUSING AND MINING BY RASHMI BHAT 62
• Sometimes we require to preserve historical value of attributes.
• The general principles for this type of change:
• They usually relate to true changes in source systems.
• There is a need to preserve history in the data warehouse.
• This type of change partitions the history in the data warehouse.
• Every change for the same attribute must be preserved.
Type 2 Change: Preservation of History
Updates to Dimension Tables
DATA WAREHOUSING AND MINING BY RASHMI BHAT 63
• Applying Type 2 Changes to the Data Warehouse
• Add a new dimension table row with the new value of the changed attribute.
• An effective date field may be included in the dimension table.
• There are no changes to the original row in the dimension table.
• The key of the original row is not affected.
• The new row is inserted with a new surrogate key.
Type 2 Change: Preservation of History
DATA WAREHOUSING AND MINING BY RASHMI BHAT 64
Fig. Applying type 2 changes
94236
Updates to Dimension Tables
DATA WAREHOUSING AND MINING BY RASHMI BHAT 65
• Assume your marketing department is planning a realignment of the territorial assignments
for salespersons.
• Before making a permanent realignment, they want to count the orders in two ways:
• according to the current territorial alignment
• according to the proposed realignment
Type 3 Change: Tentative Soft Revisions
Updates to Dimension Tables
DATA WAREHOUSING AND MINING BY RASHMI BHAT 66
• The general principles for type 3 changes:
• They usually relate to “soft” or tentative changes in the source systems.
• There is a need to keep track of history with old and new values of the changed
attribute.
• They are used to compare performances across the transition.
• They provide the ability to track forward and backward.
Type 3 Change: Tentative Soft Revisions
Updates to Dimension Tables
DATA WAREHOUSING AND MINING BY RASHMI BHAT 67
• Applying Type 3 Changes to the Data Warehouse
• Add an “old” field in the dimension table for the affected attribute.
• Push down the existing value of the attribute from the “current” field to the “old” field.
• Keep the new value of the attribute in the “current” field.
• Also, you may add a “current” effective date field for the attribute.
• The key of the row is not affected.
Type 3 Change: Tentative Soft Revisions
Updates to Dimension Tables
DATA WAREHOUSING AND MINING BY RASHMI BHAT 68
• Applying Type 3 Changes to the Data Warehouse
• No new dimension row is needed.
• The existing queries will seamlessly switch to the “current” value.
• Any queries that need to use the “old” value must be revised accordingly.
• The technique works best for one “soft” change at a time.
• If there is a succession of changes, more sophisticated techniques must be devised.
Type 3 Change: Tentative Soft Revisions
DATA WAREHOUSING AND MINING BY RASHMI BHAT 69
Fig. Aplying type 3 changes
Updates to Dimension Tables
DATA WAREHOUSING AND MINING BY RASHMI BHAT 70
• Rapidly changing large dimensions can be too problematic for the type 2 approach.
• The dimension table could be littered with a very large number of additional rows created
every time there is an incremental load
• What if the dimension table is too large and is changing too rapidly?
• Break the large dimension table into one or more simpler dimension tables
• Break off the rapidly changing attributes into another dimension table, leaving the slowly
changing attributes behind in the original table
Rapidly Changing Dimensions
DATA WAREHOUSING AND MINING BY RASHMI BHAT 71
Fig: Dividing a large, rapidly changing dimension table.

Data Warehouse Fundamentals

  • 1.
  • 2.
    Strategic Information DATA WAREHOUSINGAND MINING BY RASHMI BHAT 2 An interesting post on Facebook
  • 3.
    Strategic Information DATA WAREHOUSINGAND MINING BY RASHMI BHAT 3 • The information collected and transformed to support or shape competitive strategy. What is Strategic Information? • By the executives and managers who are responsible for keeping the enterprise competitive need information to make proper decisions • They need information to formulate the business strategies, establish goals, set objectives, and monitor results. Who needs the Strategic Information?
  • 4.
    Strategic Information DATA WAREHOUSINGAND MINING BY RASHMI BHAT 4 Get the data in • Making the wheels of business turn • Take an order • Process a claim • Make a shipment • Generate an invoice • Receive cash • Reserve an airline seat Operational Systems Get the information out • Watching the wheels of business turn • Show me the top-selling products • Show me the problem regions • Tell me why (drill down) • Let me see other data (drill across) • Show the highest margins • Alert me when a district sells below target Decision Support Systems
  • 5.
    Strategic Information DATA WAREHOUSINGAND MINING BY RASHMI BHAT 5 How are Operational and informational systems different? Parameters Operational Systems Informational Systems Data Contents Current values Archived, derived, summarized Data Structure Optimized for transactions Optimized for complex queries Access Frequency High Medium to low Access Type Read, update, delete Read Usage Predictable, repetitive Ad hoc, random, heuristic Response Time Sub-seconds Several seconds to minutes Users Large number Relatively small number
  • 6.
    Data Warehouse DATA WAREHOUSINGAND MINING BY RASHMI BHAT 6 • Database designed for analytical tasks • Data from multiple applications • Easy to use and beneficial to long interactive sessions by users • Read-intensive data usage • Direct interaction with the system by the users without IT assistance • Content updated periodically and stable • Content to include current and historical data • Ability for users to run queries and get results online • Ability for users to initiate reports Desired Features of New Type of System
  • 7.
    Data Warehouse DATA WAREHOUSINGAND MINING BY RASHMI BHAT 7 • Running of simple queries and reports against current and historical data. • Ability to perform “what if” analysis in many different ways. • Ability to query, step back, analyze, and then continue the process to any desired length. • Ability to spot historical trends and apply them in future interactive processes Processing Requirements in New System This new environment is known as the Data Warehouse environment
  • 8.
    Data Warehouse DATA WAREHOUSINGAND MINING BY RASHMI BHAT 8 • Provides an integrated and total view of the enterprise. • Makes the enterprise’s current and historical information easily available for strategic decision making. • Makes decision-support transactions possible without hindering operational systems. • Renders the organization’s information consistent. • Presents a flexible and interactive source of strategic information. Data Warehouse is an informational environment that:
  • 9.
    Data Warehouse DATA WAREHOUSINGAND MINING BY RASHMI BHAT 9 “A Data Warehouse is a subject oriented, integrated, nonvolatile, and time variant collection of data in support of management’s decisions.” -Bill Inmon What is Data Warehouse?
  • 10.
    DATA WAREHOUSING ANDMINING BY RASHMI BHAT 10 Features of Data Warehouse Subject Oriented Data Integrated Data Time Variant Data Non-Volatile Data Data Granularity
  • 11.
    Data Warehouse Architecture DATAWAREHOUSING AND MINING BY RASHMI BHAT 11 ETL
  • 12.
    DATA WAREHOUSING ANDMINING BY RASHMI BHAT 12 Data Warehouse Architecture
  • 13.
    Data Warehouse Architecture DATAWAREHOUSING AND MINING BY RASHMI BHAT 13 • Bottom Tier: Warehouse Database Server • A relational database system. • Data is fed from operational databases or other external sources • Data are extracted using application program called as Gateways. • Contains metadata repository. • OLAP Servers • Implemented using relational OLAP or multi-dimensional OLAP • Front-end Client Layer • Contains query and reporting tools, analysis tools and/or data mining tools Three Tier Architecture
  • 14.
    Data Warehouse andData Marts DATA WAREHOUSING AND MINING BY RASHMI BHAT 14 • A data mart is a subset of a data warehouse focused on a particular line of business, department, or subject area. • It is designed for use by a specific department, unit or set of users in an organization. E.g., Marketing, Sales, HR or finance. • It is often controlled by a single department in an organization. • Data Mart usually draws data from only a few sources compared to a Data warehouse. • Data marts are small in size and are more flexible compared to a Datawarehouse. What is Data Mart?
  • 15.
    Data Warehouse andData Marts DATA WAREHOUSING AND MINING BY RASHMI BHAT 15 What is Data Mart? Data Warehouse Data Mart Data Mart Data Mart
  • 16.
    Data Warehouse andData Marts DATA WAREHOUSING AND MINING BY RASHMI BHAT 16 Data Warehouse Data Mart Scope • Application independent • Centralized or enterprise wide • Planned • Specific application • Decentralized by group • Organic but may be planned Data • Historical, detailed, summary • Some denormalization • Some history, detailed summary • Highly denormalized Sources • Many internal and external sources • Few internal and external sources Other • Flexible • Data oriented • Long life • Single complex structure • Restrictive • Project oriented • Short life • Multiple simple structure
  • 17.
    Metadata DATA WAREHOUSING ANDMINING BY RASHMI BHAT 17 • The metadata component is the data about the data in the data warehouse. • The Yellow Pages is a directory with data about the institutions in your town. • The metadata component serves as a directory of the contents of your data warehouse • Metadata is a key architectural component of the data warehouse • Types of Metadata 1. Operational metadata 2. Extraction and transformation metadata 3. End-user metadata Metadata in Data Warehouse
  • 18.
    Metadata DATA WAREHOUSING ANDMINING BY RASHMI BHAT 18 • Operational metadata • contain all of this information about the operational data sources. • Extraction and transformation metadata • contain data about the extraction of data from the source systems, namely, the extraction frequencies, extraction methods, and business rules for the data extraction • contains information about all the data transformations that take place in the data staging area. • End-user metadata • the navigational map of the data warehouse. • enables the end-users to find information from the data warehouse Types of Metadata
  • 19.
    Metadata DATA WAREHOUSING ANDMINING BY RASHMI BHAT 19 • It acts as the glue that connects all parts of the data warehouse. • It provides information about the contents and structures to the developers. • It opens the door to the end-users and makes the contents recognizable in their own terms. Why is Metadata important in data warehouse?
  • 20.
    Dimensional Modelling DATA WAREHOUSINGAND MINING BY RASHMI BHAT 20
  • 21.
    Dimensional Modelling DATA WAREHOUSINGAND MINING BY RASHMI BHAT 21 • A data structure technique optimized for data storage in a Data warehouse. • Optimizes the database for faster retrieval of data. • A dimensional model in data warehouse is designed to read, summarize, analyze numeric information. • Developed by Ralph Kimball and consists of “fact” and “dimension” tables. What is Dimensional Modelling?
  • 22.
    Dimensional Modelling DATA WAREHOUSINGAND MINING BY RASHMI BHAT 22 • Dimensions and Facts are the basic building block of data warehouse. • Everything in data warehouse revolves around dimensions and facts in data warehouse. • When we collect data, we should be able to identify which entities are dimensions and which are facts from available data. Dimensions and Facts Data Warehous e Dimensions Facts
  • 23.
    Dimensional Modelling DATA WAREHOUSINGAND MINING BY RASHMI BHAT 23 • Consider a Retail business process. • Business process contains two types of data entities 1. Descriptive entity 2. Business numbers • Business numbers provides quantitative information about business processes called as facts (or metric/measurements). • e.g. Quantity, sales amount, total earning etc. • Descriptive entities describe this quantitative numbers. This is the data which is level based, descriptive and that explains business numbers are called as dimensions. • e.g. people, product, place, time. Dimensions and Facts
  • 24.
    Dimensional Modelling DATA WAREHOUSINGAND MINING BY RASHMI BHAT 24 Dimensions and Facts Retail Shop Transaction # Date Store Product Quantity Unit Price Sales Amount 854751 11/04/20 Pune Chair 15 550 8250 875486 21/07/16 Thane Desktop 5 8500 42500 Date Store Product Text Entities/ Attributes Numeric Entities/ Attributes Numeric Attributes
  • 25.
    Dimensional Modelling DATA WAREHOUSINGAND MINING BY RASHMI BHAT 25 Dimensions and Facts Retail Shop Transaction # Date Store Product Quantity Unit Price Sales Amount 854751 11/04/20 Pune Chair 15 550 8250 875486 21/07/16 Thane Desktop 5 8500 42500 Date Store Product Chair 15
  • 26.
    Dimensional Modelling DATA WAREHOUSINGAND MINING BY RASHMI BHAT 26 • Dimensional Modelling is a set of guidelines to design a database table structure for easier and faster data retrieval. • Not restricted to relational databases. Once logical design is ready, can be used in relational an multidimensional model • Easier to understand • Made from dimension tables and fact tables. • Separate model for each business process • Build to enhance query performance What is Dimensional Modelling
  • 27.
    Dimensional Modelling DATA WAREHOUSINGAND MINING BY RASHMI BHAT 27 ER-Modelling Dimensional Modelling Data is stored in RDBMS Data is stored in RDBMS or multidimensional databases Data is stored in Tables Data is stored in data cubes Data is normalized and used in OLTP systems Data is de-normalized and used in OLAP systems Several tables and chain od relationships among them Few tables and fact tables are connected to dimensional tables a Volatile data Non-volatile data Designed for reducing data redundancy Designed for easy queries and navigation
  • 28.
    Dimensional Modelling DATA WAREHOUSINGAND MINING BY RASHMI BHAT 28 • Dimensional Modeling is a logical design technique to structure the business dimensions and the metrics that are analyzed along these dimensions. • The multidimensional information package diagram is the foundation for the dimensional model. • The dimensional model consists of the specific data structures needed to represent the business dimensions. • These data structures also contain the metrics or facts. Information Package Diagram (IPD)
  • 29.
    Dimensional Modelling DATA WAREHOUSINGAND MINING BY RASHMI BHAT 29 Time Locations Products Age Groups Year Country Class Group 1 Month State Subclass Subgroup Measured Facts: Target Sales, Actual Sales, Forecast Sale Hierarchies Fig. Information Package Diagram Information Subject: Sales Analysis
  • 30.
    Dimensional Modelling DATA WAREHOUSINGAND MINING BY RASHMI BHAT 30 • The fact table gets its name from the subject for analysis • Each fact item or measurement goes into the fact table as an attribute for automaker sales • The product business dimension is used when we want to analyze the facts by different attributes • By product • by individual models. • by product lines. • by product categories. Fact Table and Dimension Table
  • 31.
    DATA WAREHOUSING ANDMINING BY RASHMI BHAT 31 Fig: Formation of the automaker sales fact table.
  • 32.
    DATA WAREHOUSING ANDMINING BY RASHMI BHAT 32 Fig: Formation of the automaker dimension tables
  • 33.
    DATA WAREHOUSING ANDMINING BY RASHMI BHAT 33 Dimension Schema Star Schema Snowflake Schema Fact Constellation Schema (Galaxy Schema)
  • 34.
    STAR Schema DATA WAREHOUSINGAND MINING BY RASHMI BHAT 34 • The arrangement of fact table and dimensional table looks like a star formation • The fact table at the core of the star and the dimension tables along the spikes of the star. • This dimensional model is called as STAR schema. STAR Schema
  • 35.
    STAR Schema DATA WAREHOUSINGAND MINING BY RASHMI BHAT 35 Product Dealer Time Payment Method Customer Demographics Auto Sales Fig.: Star schema for Automaker Sales
  • 36.
    STAR Schema DATA WAREHOUSINGAND MINING BY RASHMI BHAT 36 Fig.: Star schema for Order Analysis Order Order Rs Cost Margin Quantity Sold Customer Cust_Name Cust_Code Billing Address Shipping Address Phone Number Salesperson SalesPerson Name Territory Name Region Name Product Prod_Name Brand Name Batch no Order Date Date Month Quarter Year
  • 37.
    STAR Schema DATA WAREHOUSINGAND MINING BY RASHMI BHAT 37 • It is a structure that can be easily understood by the users. • The structure mirrors how the users normally view their critical measures along their business dimensions. • When a query is made against the data warehouse, the results of the query are produced by combining or joining one of more dimension tables with the fact table. • The joins are between the fact table and individual dimension tables. STAR Schema
  • 38.
    STAR Schema DATA WAREHOUSINGAND MINING BY RASHMI BHAT 38 • The dimension tables represent the business dimensions along which the metrics are analyzed. • The primary key of the dimension table uniquely identifies each row in the table. • The attributes in a dimension table are of textual format. • Attributes represent the textual descriptions of the components within the business dimensions. • Users will compose their queries using these descriptors Inside a Dimension Table Dimension Dimension_Pkey . . . Set of attributes . . . Fig. Inside the dimension table
  • 39.
    STAR Schema DATA WAREHOUSINGAND MINING BY RASHMI BHAT 39 • Fact table is a place where we keep the measurements. • Some fact tables may just contain summary data. These are called aggregate fact tables. • Fact table contains set of foreign keys of each dimension table. • The row in the fact table must be identified by the primary keys of these dimension tables. • the primary key of the fact table must be the concatenation of the primary keys of all the dimension tables. • Fact tables contain facts(metrics) which is stored as data grain(level of detail for the measurements or metrics). Inside a Fact Table Fact Set of foreign keys . . . Set of facts . . . Fig. Inside the dimension table
  • 40.
    DATA WAREHOUSING ANDMINING BY RASHMI BHAT 40 Attendance Date_key Classroom_key Subj_key Stud_key Teacher_key Present Student Stud_key Roll_No Student_Name Department Class Division Teacher Teacher_key Teacher Name Department Emp_id Date Date_key Day Month Year Subject Subj_key Sub_code Sub_name Semester Is_elective Classroom Classroom_key Wing Floor Fig.: Star schema for Attendance
  • 41.
    DATA WAREHOUSING ANDMINING BY RASHMI BHAT 41 A manufacturing company has a huge sales network. To control the sales, it is divided in the regions,. Each region has multiple zones. Each zone has different cities. Each sales person is allocated different cities. The objective is to track sales figure at different granularity levels of region. Also to count no. of products sold. Design a Star schema to take into consideration of above granularity levels for region, sales person and time.
  • 42.
    DATA WAREHOUSING ANDMINING BY RASHMI BHAT 42 Prod_key Prod_name Prod_subcategory Prod_category Dim_Product Location_key City Zone Region Dim_Location Time_key Day Week Month Quarter Year Dim_Time SP_key SP_name SP_type SP_Dept Dim_Salesperson Prod_key Time_key Location_key SP_key No_of_prod_sold Sales_amount Fact_sales
  • 43.
    STAR Schema DATA WAREHOUSINGAND MINING BY RASHMI BHAT 43 • We are building a fact table to track the attendance of students. • For analyzing student attendance, the dimensions are student, course, date, room, and professor • The attendance may be affected by any of these dimensions. • Every fact table row will contain the number one as attendance. • The fact tables which do not need to contain facts are factless fact tables. Factless Fact Table
  • 44.
    DATA WAREHOUSING ANDMINING BY RASHMI BHAT 44 Attendance Date_key Classroom_key Subj_key Stud_key Teacher_key Student Stud_key Roll_No Student_Name Department Class Division Teacher Teacher_key Teacher Name Department Emp_id Date Date_key Day Month Year Subject Subj_key Sub_code Sub_name Semester Is_elective Classroom Classroom_key Wing Floor Fig.: Star schema for Attendance
  • 45.
    Snowflake Schema DATA WAREHOUSINGAND MINING BY RASHMI BHAT 45 • “Snowflaking” is a method of normalizing the dimension tables in a STAR schema. • When you completely normalize all the dimension tables, the resultant structure resembles a snowflake with the fact table in the middle. Snowflake Schema
  • 46.
    Snowflake Schema DATA WAREHOUSINGAND MINING BY RASHMI BHAT 46 • “Snowflaking” or normalization of the dimension tables can be achieved in a few different ways- • Partially normalize only a few dimension tables, leaving the others intact. • Partially or fully normalize only a few dimension tables, leaving the rest intact. • Partially normalize every dimension table. • Fully normalize every dimension table. Snowflake Schema
  • 47.
    DATA WAREHOUSING ANDMINING BY RASHMI BHAT 47 SALES FACT Prod_key Time_key Cust_key SalesRep_key Quantity_sold Sales_amount Sales_margin CUSTOMER Cust_key Cust_name Cust_code Maritual_status Address State Pin Country SALESREP SalesRep_key SalesPerson_name Territory_name Region_name TIME Time_key Date Month Quarter Year Product Prod_key Prod_name Prod_code Package_type Brand_name Product Category Fig: Star Schema for Sales
  • 48.
    DATA WAREHOUSING ANDMINING BY RASHMI BHAT 48 SALES FACT Prod_key Time_key Cust_key SalesRep_key Quantity_sold Sales_amount Sales_margin CUSTOMER Cust_key Cust_name Cust_code Maritual_status Address State Pin Country SALESREP SalesRep_key SalesPerson_name Territory_name Region_name TIME Time_key Date Month Quarter Year Product Prod_key Prod_name Prod_code Package_type Brand_key Brand Brand_key Brand_name Category_key Category Category_key Product category Fig: Snowflake Schema: Partially normalized Product Dimension
  • 49.
    DATA WAREHOUSING ANDMINING BY RASHMI BHAT 49 SALES FACT Prod_key Time_key Cust_key SalesRep_key Quantity_sold Sales_amount Sales_margin CUSTOMER Cust_key Cust_name Cust_code Maritual_status Address State Pin Country_key SALESREP SalesRep_key SalesPerson_name Territory_key TIME Time_key Date Month Quarter Year Product Prod_key Prod_name Prod_code Package_key Brand_key Brand Brand_key Brand_name Category_key Category Category_key Product category Fig: Snowflake Schema for Sales COUNTRY Country_key Country_Name PACKAGE Package_key Package_type TERRITORY Territory_key Territory_name Region_key REGION Region_key Region_Name
  • 50.
    Snowflake Schema DATA WAREHOUSINGAND MINING BY RASHMI BHAT 50 • By normalizing, we remove redundancies that might appear to save significant storage space when the dimensions are large. • Advantages of Snowflaking • Normalized structures are easier to update and maintain • Small savings in storage space • Limitations of Snowflaking • Schema less intuitive and end-users are put off by the complexity • Ability to browse through the contents difficult • Degraded query performance because of additional joins Snowflake Schema
  • 51.
    Fact Constellation Schema DATAWAREHOUSING AND MINING BY RASHMI BHAT 51 • It is a group of different fact tables that have few similar dimension tables. • It can be represented as a group of multiple Star schema and thus also called as Galaxy Schema. • A combined group of star schema is called as fact constellation schema • Used to design a data warehouse • It is a little complicated than star and snowflake schema. Fact Constellation Schema
  • 52.
    Fact Constellation Schema DATAWAREHOUSING AND MINING BY RASHMI BHAT 52 Basic schema for Fact Constellation Schema Fact Table Dimension Table Fact Table Dimension Table Dimension Table Dimension Table Dimension Table Dimension Table
  • 53.
    Fact Constellation Schema DATAWAREHOUSING AND MINING BY RASHMI BHAT 53 Student Roll_no Stud_name Marks TPD Tpd_id Tpo_name Department Company Company_id Company_name Package offered Placement Roll_no Company_id Tpd_id no._of_students Attented_students Selected _students Training Class Roll_no Training_code Tpd_id Attended_students Passed_students Workshop Training_code Training_name Subject Workshop_fees Fig: Fact Constellation Schema
  • 54.
    Updates to DimensionTables DATA WAREHOUSING AND MINING BY RASHMI BHAT 56 • Over time the number of rows in the fact table continues to grow. • Very rarely are the rows in a fact table updated with changes. • Adjustments to the prior numbers are processed as additional adjustment rows and added to the fact table • Dimension tables are more stable and less volatile. • Slowly Changing Dimensions • Customers address • Customers marital status • Saleman’s region etc. Updates to Fact and Dimension Tables
  • 55.
    Updates to DimensionTables DATA WAREHOUSING AND MINING BY RASHMI BHAT 57 • For changes to the dimension tables, we can derive the following principles: • Most dimensions are generally constant over time. • Many dimensions, though not constant over time, change slowly. • The product key of the source record does not change. • The description and other attributes change slowly over time. • In the source OLTP systems, the new values overwrite the old ones. • Overwriting of dimension table attributes is not always the appropriate option in a data warehouse. • The ways changes are made to the dimension tables depend on the types of changes and what information must be preserved in the data warehouse. Slowly Changing Dimensions
  • 56.
    Updates to DimensionTables DATA WAREHOUSING AND MINING BY RASHMI BHAT 58 Updates to table Type 1 Changes Correction of Errors Type 2 Changes Preservation of History Type 3 Changes Tentative Soft Revisions
  • 57.
    Updates to DimensionTables DATA WAREHOUSING AND MINING BY RASHMI BHAT 59 • These changes usually relate to the corrections of errors in the source systems. • General principles for type 1 changes: • Usually, the changes relate to correction of errors in source systems. • Sometimes the change in the source system has no significance. • The old value in the source system needs to be discarded. • The change in the source system need not be preserved in the data warehouse. Type 1 Change: Correction of Errors
  • 58.
    Updates to DimensionTables DATA WAREHOUSING AND MINING BY RASHMI BHAT 60 • Applying Type 1 Changes to data warehouse • Overwrite the attribute value in the dimension table row with the new value. • The old value of the attribute is not preserved. • No other changes are made in the dimension table row. • The key of this dimension table or any other key values are not affected. • This type of change is easiest to implement. Type 1 Change: Correction of Errors
  • 59.
    DATA WAREHOUSING ANDMINING BY RASHMI BHAT 61 Fig. The method for applying type 1 changes
  • 60.
    Updates to DimensionTables DATA WAREHOUSING AND MINING BY RASHMI BHAT 62 • Sometimes we require to preserve historical value of attributes. • The general principles for this type of change: • They usually relate to true changes in source systems. • There is a need to preserve history in the data warehouse. • This type of change partitions the history in the data warehouse. • Every change for the same attribute must be preserved. Type 2 Change: Preservation of History
  • 61.
    Updates to DimensionTables DATA WAREHOUSING AND MINING BY RASHMI BHAT 63 • Applying Type 2 Changes to the Data Warehouse • Add a new dimension table row with the new value of the changed attribute. • An effective date field may be included in the dimension table. • There are no changes to the original row in the dimension table. • The key of the original row is not affected. • The new row is inserted with a new surrogate key. Type 2 Change: Preservation of History
  • 62.
    DATA WAREHOUSING ANDMINING BY RASHMI BHAT 64 Fig. Applying type 2 changes 94236
  • 63.
    Updates to DimensionTables DATA WAREHOUSING AND MINING BY RASHMI BHAT 65 • Assume your marketing department is planning a realignment of the territorial assignments for salespersons. • Before making a permanent realignment, they want to count the orders in two ways: • according to the current territorial alignment • according to the proposed realignment Type 3 Change: Tentative Soft Revisions
  • 64.
    Updates to DimensionTables DATA WAREHOUSING AND MINING BY RASHMI BHAT 66 • The general principles for type 3 changes: • They usually relate to “soft” or tentative changes in the source systems. • There is a need to keep track of history with old and new values of the changed attribute. • They are used to compare performances across the transition. • They provide the ability to track forward and backward. Type 3 Change: Tentative Soft Revisions
  • 65.
    Updates to DimensionTables DATA WAREHOUSING AND MINING BY RASHMI BHAT 67 • Applying Type 3 Changes to the Data Warehouse • Add an “old” field in the dimension table for the affected attribute. • Push down the existing value of the attribute from the “current” field to the “old” field. • Keep the new value of the attribute in the “current” field. • Also, you may add a “current” effective date field for the attribute. • The key of the row is not affected. Type 3 Change: Tentative Soft Revisions
  • 66.
    Updates to DimensionTables DATA WAREHOUSING AND MINING BY RASHMI BHAT 68 • Applying Type 3 Changes to the Data Warehouse • No new dimension row is needed. • The existing queries will seamlessly switch to the “current” value. • Any queries that need to use the “old” value must be revised accordingly. • The technique works best for one “soft” change at a time. • If there is a succession of changes, more sophisticated techniques must be devised. Type 3 Change: Tentative Soft Revisions
  • 67.
    DATA WAREHOUSING ANDMINING BY RASHMI BHAT 69 Fig. Aplying type 3 changes
  • 68.
    Updates to DimensionTables DATA WAREHOUSING AND MINING BY RASHMI BHAT 70 • Rapidly changing large dimensions can be too problematic for the type 2 approach. • The dimension table could be littered with a very large number of additional rows created every time there is an incremental load • What if the dimension table is too large and is changing too rapidly? • Break the large dimension table into one or more simpler dimension tables • Break off the rapidly changing attributes into another dimension table, leaving the slowly changing attributes behind in the original table Rapidly Changing Dimensions
  • 69.
    DATA WAREHOUSING ANDMINING BY RASHMI BHAT 71 Fig: Dividing a large, rapidly changing dimension table.