Database keys for fact & Dimension Tables
Fact Table and Fact Types
Aggregate and Fact less
-
1
Ch Anwar ul Hassan (Lecturer)
Department of Computer Science and Software Engineering
Capital University of Sciences & Technology, Islamabad
Pakistan
anwarchaudary@gmail.com
Intro to Data warehousing
 A DBMS key is an attribute or set of an
attribute which helps you to identify a
row(tuple) in a relation(table). They allow
you to find the relation between two tables.
Keys help you uniquely identify a row in a
table by a combination of one or more
columns in that table
What are keys
 Keys help you to identify any row of data in a
table. In a real-world application, a table could
contain thousands of records. Moreover, the
records could be duplicated. Keys ensure that
you can uniquely identify a table record despite
these challenges.
 Allows you to establish a relationship between
and identify the relation between tables
 Help you to enforce identity and integrity in the
relationship.
Why we need a Key?
 DBMS has several types of Keys each have
their different functionality:
 Super Key
 Primary Key
 Candidate Key
 Alternate Key
 Foreign Key
 Compound Key
 Composite Key
 Surrogate Key
Various Keys in Database Management
System
 A superkey is a combination of columns that uniquely
identifies any rows in a table. A Super key may have
additional attributes that are not needed for unique
identification.
 A single attribute or a set of attributes that can uniquely
identify all attributes of a particular relation is called Super
key. On the other hands, a super key that is a proper subset
of another super key is called candidate key. All candidate
keys are super keys but the inverse is not true
 Example, EmpSSN and EmpNum name are superkeys.
What is the Super key?
 A column or group of columns in a table which helps us
to uniquely identifies every row in that table is called a
primary key. This DBMS can't be a duplicate. The same
value can't appear more than once in the table.
 Example, StudID
What is a Primary Key?
 All the keys which are not primary key are called an
alternate key. It is a candidate key which is currently not
the primary key. However, A table may have single or
multiple choices for the primary key.
 StudID, Roll No, Email are qualified to become a primary
key. But since StudID is the primary key, Roll No, Email
becomes the alternative key.
What is the Alternate key?
 A super key with no repeated attribute is called candidate key.
 The Primary key should be selected from the candidate keys. Every
table must have at least a single candidate key
 Example: In the given table Stud ID, Roll No, and email are
candidate keys which help us to uniquely identify the student record
in the table.
What is a Candidate Key?
 A foreign key is a column which is added to create a
relationship with another table. Foreign keys help us to
maintain data integrity and also allows navigation
between two different instances of an entity. Every
relationship in the model needs to be supported by a
foreign key
What is the Foreign key?
 Compound key has many fields which allow you to uniquely recognize
a specific record. It is possible that each column may be not unique by
itself within the database. However, when combined with the other
column or columns the combination of composite keys become
unique.
 In this example, OrderNo and ProductID can't be a primary key as it
does not uniquely identify a record. However, a compound key of
Order ID and Product ID could be used as it uniquely identified each
record.
What is the Compound key?
 A key which has multiple attributes to uniquely
identify rows in a table is called a composite key.
The difference between compound and the
composite key is that any part of the compound
key can be a foreign key, but the composite key
may or maybe not a part of the foreign key.
 A super key uniquely identifies a row. It could be
made up of one column or many. A composite
key is a key made of more than one column. If a
Super Key is made of more than one column it is
also a composite
What is the Composite key?
 An artificial key which aims to uniquely identify each record
is called a surrogate key. These kind of key are unique
because they are created when you don't have any natural
primary key. They do not lend any meaning to the data in
the table. Surrogate key is usually an integer.
 Given example, shown shift timings of the different
employee. In this example, a surrogate key is needed to
uniquely identify each employee. E.g. start or end time
What is a Surrogate Key?
Surrogate keys are widely used and accepted
design standard in data warehouses. It is
sequentially generated unique number
attached with each and every record in
a Dimension table in any Data Warehouse. It
join between the fact and dimension
tables and is necessary to handle changes
in dimension table attributes.
-
13
Surrogate Key in DWH
Dimension tables are referenced by fact tables using keys.
When creating a dimension table in a data warehouse, a
system-generated key is used to uniquely identify a row in
the dimension. This key is also known as a surrogate key.
The surrogate key is used as the primary key in the
dimension table.
-
14
How keys of dimension tables are generated?
Fact Table Structure:
 Primary Key
 Set of Foreign Keys
 Set of Business Metrics
-
15
How key of fact table is generated?
 A Fact Table is the central table in a star
schema of a data warehouse.
 It consists of two types of columns.
 The foreign keys column allows to join with
dimension tables .
 The measure columns contain the data that
is being analyzed.
FACT TABLE
 Foreign Key – Column Type
 These are the columns as coming from
Dimension Tables
 Are the primary keys of Dimension Tables
FACT TABLE COLUMNS – FOREIGN KEY
 Fact tables - Measures
 Collection of measurements on a specific aspects
of business
 sales amount, order quantity, profit, balance and
discount amount.
Measures Or Facts
 A fact table might contain either detail level
facts or facts that have been aggregated.
 The primary purpose of Data warehouse is
reporting, and forecasting
 Many times reports are aggregations such as
sum
 Example: sales by quarter, by region,
 Many reports are usually aggregations
Additivity of Measures
 Types of Additivity of Measures
 Additive measures
 Semi-additive measures
 Non-additive measures
Types Of Additivity Of Measures
 Additive
 If a measure can be summed across all
dimensions
 The purpose of this table is to record the
sales amount for each product in each
store on a daily basis.
Date
Store
Product
Sales_Amount
Additive Measures
Semi-additive
 Sometimes, we can sum a
measure across all
dimensions except for time
 such as account balance
 We can’t sum the account balance
across the time dimension
Date
Account
Current_Balance
Semi-Additive Measures
Non-additive
 Some measures can’t ever
be summed these are called
non-additive measures
 Such as discount percentages
and prices
 Profit_Margin is a non-additive
fact, for it does not make sense to
add them up for the account level
or the day level
Date
Account
Profit_margin
Non-Additive Measures
 In the real world, it is possible to have a fact
table that contains no measures or facts.
These tables are called "Factless Fact tables".
 A factless fact table is a fact table that does
have only dimensions (keys).
 It is essentially an intersection of dimensions.
Fact less Fact Table
 The fact table would
consist of 3 dimensions:
the student dimension, the
time dimension, and the
class dimension.
Example
 Fact less Fact table does not contain any facts.
 Generally Fact less fact tables are used to record
the events such as students attendance,
attendance of participants for an Event like a
Meeting.
 There are applications in which fact tables do not
have non key data but that do have foreign keys for
the associated dimensions
Fact less Fact table
Fact less Fact table
Fact less Fact table Example
 Aggregates are pre calculated summaries derived
from the most granular fact table(level of detail in
the fact table).
 If the fact table is at the lowest grain.
 The facts or metrics are at the lowest possible level
at which they could be captured from the
operational system.
 what is the Advantages of lowest grain?
Aggregate Fact Tables
 Choose a simple STAR schema with the fact table
at the lowest possible level of granularity.
 Assume four dimension table with most granular
fact table.
 Operational system-single order, invoice, product
and so on.
 Data warehouse- produce large result set(these
retrieve hundreds and thousands
Aggregate Fact Tables
Star schema with most granular fact table
Fact base record
Fact table size
Fact table size
Need for aggregates
Aggregating fact table
One Way Aggregates
Two Way Aggregates
Three Way Aggregates

Intro to Data warehousing lecture 12

  • 1.
    Database keys forfact & Dimension Tables Fact Table and Fact Types Aggregate and Fact less - 1 Ch Anwar ul Hassan (Lecturer) Department of Computer Science and Software Engineering Capital University of Sciences & Technology, Islamabad Pakistan anwarchaudary@gmail.com Intro to Data warehousing
  • 2.
     A DBMSkey is an attribute or set of an attribute which helps you to identify a row(tuple) in a relation(table). They allow you to find the relation between two tables. Keys help you uniquely identify a row in a table by a combination of one or more columns in that table What are keys
  • 3.
     Keys helpyou to identify any row of data in a table. In a real-world application, a table could contain thousands of records. Moreover, the records could be duplicated. Keys ensure that you can uniquely identify a table record despite these challenges.  Allows you to establish a relationship between and identify the relation between tables  Help you to enforce identity and integrity in the relationship. Why we need a Key?
  • 4.
     DBMS hasseveral types of Keys each have their different functionality:  Super Key  Primary Key  Candidate Key  Alternate Key  Foreign Key  Compound Key  Composite Key  Surrogate Key Various Keys in Database Management System
  • 5.
     A superkeyis a combination of columns that uniquely identifies any rows in a table. A Super key may have additional attributes that are not needed for unique identification.  A single attribute or a set of attributes that can uniquely identify all attributes of a particular relation is called Super key. On the other hands, a super key that is a proper subset of another super key is called candidate key. All candidate keys are super keys but the inverse is not true  Example, EmpSSN and EmpNum name are superkeys. What is the Super key?
  • 6.
     A columnor group of columns in a table which helps us to uniquely identifies every row in that table is called a primary key. This DBMS can't be a duplicate. The same value can't appear more than once in the table.  Example, StudID What is a Primary Key?
  • 7.
     All thekeys which are not primary key are called an alternate key. It is a candidate key which is currently not the primary key. However, A table may have single or multiple choices for the primary key.  StudID, Roll No, Email are qualified to become a primary key. But since StudID is the primary key, Roll No, Email becomes the alternative key. What is the Alternate key?
  • 8.
     A superkey with no repeated attribute is called candidate key.  The Primary key should be selected from the candidate keys. Every table must have at least a single candidate key  Example: In the given table Stud ID, Roll No, and email are candidate keys which help us to uniquely identify the student record in the table. What is a Candidate Key?
  • 9.
     A foreignkey is a column which is added to create a relationship with another table. Foreign keys help us to maintain data integrity and also allows navigation between two different instances of an entity. Every relationship in the model needs to be supported by a foreign key What is the Foreign key?
  • 10.
     Compound keyhas many fields which allow you to uniquely recognize a specific record. It is possible that each column may be not unique by itself within the database. However, when combined with the other column or columns the combination of composite keys become unique.  In this example, OrderNo and ProductID can't be a primary key as it does not uniquely identify a record. However, a compound key of Order ID and Product ID could be used as it uniquely identified each record. What is the Compound key?
  • 11.
     A keywhich has multiple attributes to uniquely identify rows in a table is called a composite key. The difference between compound and the composite key is that any part of the compound key can be a foreign key, but the composite key may or maybe not a part of the foreign key.  A super key uniquely identifies a row. It could be made up of one column or many. A composite key is a key made of more than one column. If a Super Key is made of more than one column it is also a composite What is the Composite key?
  • 12.
     An artificialkey which aims to uniquely identify each record is called a surrogate key. These kind of key are unique because they are created when you don't have any natural primary key. They do not lend any meaning to the data in the table. Surrogate key is usually an integer.  Given example, shown shift timings of the different employee. In this example, a surrogate key is needed to uniquely identify each employee. E.g. start or end time What is a Surrogate Key?
  • 13.
    Surrogate keys arewidely used and accepted design standard in data warehouses. It is sequentially generated unique number attached with each and every record in a Dimension table in any Data Warehouse. It join between the fact and dimension tables and is necessary to handle changes in dimension table attributes. - 13 Surrogate Key in DWH
  • 14.
    Dimension tables arereferenced by fact tables using keys. When creating a dimension table in a data warehouse, a system-generated key is used to uniquely identify a row in the dimension. This key is also known as a surrogate key. The surrogate key is used as the primary key in the dimension table. - 14 How keys of dimension tables are generated?
  • 15.
    Fact Table Structure: Primary Key  Set of Foreign Keys  Set of Business Metrics - 15 How key of fact table is generated?
  • 16.
     A FactTable is the central table in a star schema of a data warehouse.  It consists of two types of columns.  The foreign keys column allows to join with dimension tables .  The measure columns contain the data that is being analyzed. FACT TABLE
  • 17.
     Foreign Key– Column Type  These are the columns as coming from Dimension Tables  Are the primary keys of Dimension Tables FACT TABLE COLUMNS – FOREIGN KEY
  • 18.
     Fact tables- Measures  Collection of measurements on a specific aspects of business  sales amount, order quantity, profit, balance and discount amount. Measures Or Facts
  • 19.
     A facttable might contain either detail level facts or facts that have been aggregated.  The primary purpose of Data warehouse is reporting, and forecasting  Many times reports are aggregations such as sum  Example: sales by quarter, by region,  Many reports are usually aggregations Additivity of Measures
  • 20.
     Types ofAdditivity of Measures  Additive measures  Semi-additive measures  Non-additive measures Types Of Additivity Of Measures
  • 21.
     Additive  Ifa measure can be summed across all dimensions  The purpose of this table is to record the sales amount for each product in each store on a daily basis. Date Store Product Sales_Amount Additive Measures
  • 22.
    Semi-additive  Sometimes, wecan sum a measure across all dimensions except for time  such as account balance  We can’t sum the account balance across the time dimension Date Account Current_Balance Semi-Additive Measures
  • 23.
    Non-additive  Some measurescan’t ever be summed these are called non-additive measures  Such as discount percentages and prices  Profit_Margin is a non-additive fact, for it does not make sense to add them up for the account level or the day level Date Account Profit_margin Non-Additive Measures
  • 24.
     In thereal world, it is possible to have a fact table that contains no measures or facts. These tables are called "Factless Fact tables".  A factless fact table is a fact table that does have only dimensions (keys).  It is essentially an intersection of dimensions. Fact less Fact Table
  • 25.
     The facttable would consist of 3 dimensions: the student dimension, the time dimension, and the class dimension. Example
  • 26.
     Fact lessFact table does not contain any facts.  Generally Fact less fact tables are used to record the events such as students attendance, attendance of participants for an Event like a Meeting.  There are applications in which fact tables do not have non key data but that do have foreign keys for the associated dimensions Fact less Fact table
  • 27.
  • 28.
    Fact less Facttable Example
  • 29.
     Aggregates arepre calculated summaries derived from the most granular fact table(level of detail in the fact table).  If the fact table is at the lowest grain.  The facts or metrics are at the lowest possible level at which they could be captured from the operational system.  what is the Advantages of lowest grain? Aggregate Fact Tables
  • 30.
     Choose asimple STAR schema with the fact table at the lowest possible level of granularity.  Assume four dimension table with most granular fact table.  Operational system-single order, invoice, product and so on.  Data warehouse- produce large result set(these retrieve hundreds and thousands Aggregate Fact Tables
  • 31.
    Star schema withmost granular fact table
  • 32.
  • 33.
  • 34.
  • 35.
  • 36.
  • 37.
  • 38.
  • 39.