2. Level of Refinement
● The information packaging methodology focuses on
several different levels or cuts of the information
models that are derived during the process of building
a data warehousing system
● Each level is essentially a refinement or more detailed
version of the previously developed data model
4. Level of Refinement
● By working through multiple levels of detail during
design of a data warehousing system,
● your project team builds in quality
● delivers subject-oriented data warehouses that more closely
align with what the users have requested
6. Design Decision
● Some of the design decisions are:
1. Choosing the Process
▪ Selecting the subjects from the information packages for the first
set of logical structures to be designed.
2. Choosing the Grain.
▪ Determining the level of detail for the data in the data structures
3. Identifying and Conforming the Dimensions
▪ Choosing the business dimensions (such as product, market,
time, etc.) to sure that each particular data element in every
business dimension is conformed to one another
7. Design Decision
4. Choosing the Facts.
▪ Selecting the metrics or units of measurements (such as product
sale units, dollar sales, dollar revenue, etc.) to be included in the
first set of structures.
5. Choosing the Duration of the Database.
▪ Determining how far back in time you should go for historical
data.
8. Dimensional modeling -
basic concept
● Consist of special data structure
● Group related data item into one structure-dimensional
table
● The metrics inside the fact table are analyzed across
one or more dimensions using the dimension table
attributes.
9. Criteria to form
Dimensional Model
● The model shall provide the best data access
● The whole model must be query-centric
● It must be optimized for queries and analyses
● The model must show that the dimension tables
interact with the fact table
● It shall be structured in such away that every
dimension can interact equally with the fact table
● The model shall allow drilling down or rolling up along
dimension hierarchies.
12. Dimensional modeling
basic concept
● How much sales proceeds did the Jeep Cherokee, Year
2007 Model with standard options, generate in July
2007 at Big Sam Auto dealership for buyers who own
their homes and who took 3-year leases, financed by
Daimler-Chrysler Financing?
● Analyzing actual sale price, MSRP, and full price
● Analyzing the facts along attributes in the various dimension
tables.
● The attributes in the dimension tables act as constraints and
filters queries
13. Dimensional modeling
basic concept
● Any or all of the attributes of each dimension table can
participate in a query
● Each dimension table has an equal chance to be part
of a query.
14. Conceptual Modeling of
Data Warehouses
● A Data warehouse conceptual data model is nothing
but a highest-level relationships between the different
entities (in other word different table) in the data
model.
● This is initial or high-level relation between different
entities in the data model. Conceptual model includes
the important entities and the relationships among
them.
15. Logical Modeling of
Data Warehouses
Star schema:
Data warehouse Star schema is a popular data warehouse
design and dimensional model, which divides business
data into fact and dimensions. In this model, centralized fact
table references many dimension tables and primary keys from
dimension table flows into fact table as a foreign key. This
entity-relationship diagram looks star, hence the name star
schema.
A fact table in the middle connected to a set of dimension tables
16. Logical Modeling of
Data Warehouses
● Most data warehouses use a star schema to represent
multi-dimensional model.
● Each dimension is represented by a dimension table
that describes it.
● A fact table connects to all dimension tables with a
multiple join. Each tuple in the fact table consists of a
pointer to each of the dimension tables that provide its
multi-dimensional coordinates and stores measures for
those coordinates.
● The links between the fact table in the center and the
dimension tables in the extremities form a shape like a
star
17. Logical Modeling of
Data Warehouses
● Snowflake schema: A refinement of star schema where some
dimensional hierarchy is normalized into a set of smaller
dimension tables, forming a shape similar to snowflake
● Fact constellations: Multiple fact tables share dimension
tables, viewed as a collection
of stars, therefore called
galaxy schema or fact
constellation
19. Difference between ER and
dimensional Modeling
OLTP/ER Modeling
● Capture details of events or
transaction
● Focus on individual events
● An OLTP system is a window
into micro-level transactions
● Picture at detail level
necessary to run the business
Suitable only for questions at
transaction level
● Data consistency,
non-redundancy, and efficient
data storage critical
Dimensional Modeling
● DW meant to answer
questions on overall process
● Focus is on how managers
view the business
● DW reveals business trends
● Information is centered
around a business process
● Answers show how the
business measures the
process
● The measures to be studied
in many ways along several
business dimensions
23. Query evaluation
● When a query is made against the data warehouse,
the results of the query are produced by combining or
joining one of more dimension tables with the fact
table.
● The joins are between the fact table and individual
dimension tables
24. Star schema for sale
Let us say that the marketing department wants the quantity sold
and order dollars for product bigpart-1, relating to customers in
the state of Maine, obtained by salesperson Jane Doe, during the
month of June
25. Inside dimensional
Table
● Dimensional table characteristics are:
1. Dimension Table Key
2. Table is Wide.
3. Textual Attributes.
4. Attributes not Directly Related.
1. For example, package size is not directly related to product brand;
5. Not Normalized
6. Drilling Down, Rolling Up.
7. Multiple Hierarchies.
8. Fewer record: Less record then fact table
26.
27.
28. Fact Table
● Contains two or more foreign keys
● Tend to have huge numbers of records
● Useful facts tend to be numeric and additive
● Two classes of fact-table attributes:
1. Dimension attributes : the key of a
dimension table.
2. Dependent attributes : a value determined
by the dimension attributes of the tuple.
29. Inside fact table
● Fact table characteristics are:
1. Concatenated fact table key
2. Grain or level of data identified
3. Fully additive measures
4. Semi-additive measures
5. Large number of records
6. Only a few attributes
7. Sparsity of data
8. Degenerate dimensions