Data Warehousing and Business Intelligence is one of the hottest skills today, and is the cornerstone for reporting, data science, and analytics. This course teaches the fundamentals with examples plus a project to fully illustrate the concepts.
1. Database keys for fact & Dimension Tables
Fact Table and Fact Types
Aggregate and Fact less
-
1
Ch Anwar ul Hassan (Lecturer)
Department of Computer Science and Software Engineering
Capital University of Sciences & Technology, Islamabad
Pakistan
anwarchaudary@gmail.com
Intro to Data warehousing
2. A DBMS key is an attribute or set of an
attribute which helps you to identify a
row(tuple) in a relation(table). They allow
you to find the relation between two tables.
Keys help you uniquely identify a row in a
table by a combination of one or more
columns in that table
What are keys
3. Keys help you to identify any row of data in a
table. In a real-world application, a table could
contain thousands of records. Moreover, the
records could be duplicated. Keys ensure that
you can uniquely identify a table record despite
these challenges.
Allows you to establish a relationship between
and identify the relation between tables
Help you to enforce identity and integrity in the
relationship.
Why we need a Key?
4. DBMS has several types of Keys each have
their different functionality:
Super Key
Primary Key
Candidate Key
Alternate Key
Foreign Key
Compound Key
Composite Key
Surrogate Key
Various Keys in Database Management
System
5. A superkey is a combination of columns that uniquely
identifies any rows in a table. A Super key may have
additional attributes that are not needed for unique
identification.
A single attribute or a set of attributes that can uniquely
identify all attributes of a particular relation is called Super
key. On the other hands, a super key that is a proper subset
of another super key is called candidate key. All candidate
keys are super keys but the inverse is not true
Example, EmpSSN and EmpNum name are superkeys.
What is the Super key?
6. A column or group of columns in a table which helps us
to uniquely identifies every row in that table is called a
primary key. This DBMS can't be a duplicate. The same
value can't appear more than once in the table.
Example, StudID
What is a Primary Key?
7. All the keys which are not primary key are called an
alternate key. It is a candidate key which is currently not
the primary key. However, A table may have single or
multiple choices for the primary key.
StudID, Roll No, Email are qualified to become a primary
key. But since StudID is the primary key, Roll No, Email
becomes the alternative key.
What is the Alternate key?
8. A super key with no repeated attribute is called candidate key.
The Primary key should be selected from the candidate keys. Every
table must have at least a single candidate key
Example: In the given table Stud ID, Roll No, and email are
candidate keys which help us to uniquely identify the student record
in the table.
What is a Candidate Key?
9. A foreign key is a column which is added to create a
relationship with another table. Foreign keys help us to
maintain data integrity and also allows navigation
between two different instances of an entity. Every
relationship in the model needs to be supported by a
foreign key
What is the Foreign key?
10. Compound key has many fields which allow you to uniquely recognize
a specific record. It is possible that each column may be not unique by
itself within the database. However, when combined with the other
column or columns the combination of composite keys become
unique.
In this example, OrderNo and ProductID can't be a primary key as it
does not uniquely identify a record. However, a compound key of
Order ID and Product ID could be used as it uniquely identified each
record.
What is the Compound key?
11. A key which has multiple attributes to uniquely
identify rows in a table is called a composite key.
The difference between compound and the
composite key is that any part of the compound
key can be a foreign key, but the composite key
may or maybe not a part of the foreign key.
A super key uniquely identifies a row. It could be
made up of one column or many. A composite
key is a key made of more than one column. If a
Super Key is made of more than one column it is
also a composite
What is the Composite key?
12. An artificial key which aims to uniquely identify each record
is called a surrogate key. These kind of key are unique
because they are created when you don't have any natural
primary key. They do not lend any meaning to the data in
the table. Surrogate key is usually an integer.
Given example, shown shift timings of the different
employee. In this example, a surrogate key is needed to
uniquely identify each employee. E.g. start or end time
What is a Surrogate Key?
13. Surrogate keys are widely used and accepted
design standard in data warehouses. It is
sequentially generated unique number
attached with each and every record in
a Dimension table in any Data Warehouse. It
join between the fact and dimension
tables and is necessary to handle changes
in dimension table attributes.
-
13
Surrogate Key in DWH
14. Dimension tables are referenced by fact tables using keys.
When creating a dimension table in a data warehouse, a
system-generated key is used to uniquely identify a row in
the dimension. This key is also known as a surrogate key.
The surrogate key is used as the primary key in the
dimension table.
-
14
How keys of dimension tables are generated?
15. Fact Table Structure:
Primary Key
Set of Foreign Keys
Set of Business Metrics
-
15
How key of fact table is generated?
16. A Fact Table is the central table in a star
schema of a data warehouse.
It consists of two types of columns.
The foreign keys column allows to join with
dimension tables .
The measure columns contain the data that
is being analyzed.
FACT TABLE
17. Foreign Key – Column Type
These are the columns as coming from
Dimension Tables
Are the primary keys of Dimension Tables
FACT TABLE COLUMNS – FOREIGN KEY
18. Fact tables - Measures
Collection of measurements on a specific aspects
of business
sales amount, order quantity, profit, balance and
discount amount.
Measures Or Facts
19. A fact table might contain either detail level
facts or facts that have been aggregated.
The primary purpose of Data warehouse is
reporting, and forecasting
Many times reports are aggregations such as
sum
Example: sales by quarter, by region,
Many reports are usually aggregations
Additivity of Measures
20. Types of Additivity of Measures
Additive measures
Semi-additive measures
Non-additive measures
Types Of Additivity Of Measures
21. Additive
If a measure can be summed across all
dimensions
The purpose of this table is to record the
sales amount for each product in each
store on a daily basis.
Date
Store
Product
Sales_Amount
Additive Measures
22. Semi-additive
Sometimes, we can sum a
measure across all
dimensions except for time
such as account balance
We can’t sum the account balance
across the time dimension
Date
Account
Current_Balance
Semi-Additive Measures
23. Non-additive
Some measures can’t ever
be summed these are called
non-additive measures
Such as discount percentages
and prices
Profit_Margin is a non-additive
fact, for it does not make sense to
add them up for the account level
or the day level
Date
Account
Profit_margin
Non-Additive Measures
24. In the real world, it is possible to have a fact
table that contains no measures or facts.
These tables are called "Factless Fact tables".
A factless fact table is a fact table that does
have only dimensions (keys).
It is essentially an intersection of dimensions.
Fact less Fact Table
25. The fact table would
consist of 3 dimensions:
the student dimension, the
time dimension, and the
class dimension.
Example
26. Fact less Fact table does not contain any facts.
Generally Fact less fact tables are used to record
the events such as students attendance,
attendance of participants for an Event like a
Meeting.
There are applications in which fact tables do not
have non key data but that do have foreign keys for
the associated dimensions
Fact less Fact table
29. Aggregates are pre calculated summaries derived
from the most granular fact table(level of detail in
the fact table).
If the fact table is at the lowest grain.
The facts or metrics are at the lowest possible level
at which they could be captured from the
operational system.
what is the Advantages of lowest grain?
Aggregate Fact Tables
30. Choose a simple STAR schema with the fact table
at the lowest possible level of granularity.
Assume four dimension table with most granular
fact table.
Operational system-single order, invoice, product
and so on.
Data warehouse- produce large result set(these
retrieve hundreds and thousands
Aggregate Fact Tables