Data Warehouse Architecture
Dr. G.Jasmine Beulah
Dept. Computer Science,
Kristu Jayanti College, Bengaluru.
Functions of Data Warehouse Tools and
Utilities
The following are the functions of data warehouse tools and utilities −
• Data Extraction − Involves gathering data from multiple
heterogeneous sources.
• Data Cleaning − Involves finding and correcting the errors in data.
• Data Transformation − Involves converting the data from legacy
format to warehouse format.
• Data Loading − Involves sorting, summarizing, consolidating,
checking integrity, and building indices and partitions.
• Refreshing − Involves updating from data sources to warehouse.
Terminologies
Metadata
Metadata is simply defined as data about data.
The data that are used to represent other data is known as metadata.
For example, the index of a book serves as a metadata for the contents
in the book.
In terms of data warehouse, we can define metadata as following −
• Metadata is a road-map to data warehouse.
• Metadata in data warehouse defines the warehouse objects.
• Metadata acts as a directory. This directory helps the decision support
system to locate the contents of a data warehouse.
Metadata Repository
Metadata repository is an integral part of a data warehouse system. It contains the following
metadata −
Business metadata − It contains the data ownership information, business definition, and
changing policies.
Operational metadata − It includes currency of data and data lineage. Currency of data
refers to the data being active, archived, or purged. Lineage of data means history of data
migrated and transformation applied on it.
Data for mapping from operational environment to data warehouse − It metadata
includes source databases and their contents, data extraction, data partition, cleaning,
transformation rules, data refresh and purging rules.
The algorithms for summarization − It includes dimension algorithms, data on granularity,
aggregation, summarizing, etc.
Data Cube
A data cube helps us represent data in multiple dimensions. It is
defined by dimensions and facts.
The dimensions are the entities with respect to which an enterprise
preserves the records.
Example
Suppose a company wants to keep track of sales records with the help of
sales data warehouse with respect to time, item, branch, and location.
These dimensions allow to keep track of monthly sales and at which
branch the items were sold. There is a table associated with each
dimension. This table is known as dimension table. For example, "item"
dimension table may have attributes such as item_name, item_type, and
item_brand.
Data Cube
2-D
view of
Sales
Data
for a
compa
ny with
respect
to time,
item,
3-D view of the sales data with respect to time,
item, and location
Data Mart
• Data marts contain a subset of organization-wide data that is valuable
to specific groups of people in an organization.
• In other words, a data mart contains only those data that is specific to a
particular group.
• For example, the marketing data mart may contain only data related to
items, customers, and sales.
• Data marts are confined to subjects.
Graphical Representation of Data Mart
Multi-dimensional Data Model(MDM)
• A multidimensional model views data in the form of a data-cube.
• A data cube enables data to be modeled and viewed in multiple
dimensions.
• It is defined by dimensions and facts.
The dimensions are the perspectives or entities concerning which an
organization keeps records. For example, a shop may create a sales data
warehouse to keep records of the store's sales for the dimension time,
item, and location. These dimensions allow the shop to keep track of
things, for example, monthly sales of items and the locations at which
the items were sold. Each dimension has a table related to it, called a
dimensional table, which describes the dimension further. For example,
a dimensional table for an item may contain the attributes item_name,
brand, and type.
Multi-dimensional Data Model(MDM)
• A multidimensional data model is organized around a central theme,
for example, sales. This theme is represented by a fact table. Facts are
numerical measures. The fact table contains the names of the facts or
measures of the related dimensional tables.
Multi-dimensional Data Model(MDM)
• In this 2D representation, the sales for Delhi are shown for the time
dimension (organized in quarters) and the item dimension (classified
according to the types of an item sold).
• The fact or measure displayed in rupee_sold (in thousands).
View the sales data with a third dimension
Multi-dimensional Data Model(MDM)
Same data in the form of a 3D data cube
Schemas for Multidimensional Data Model are:-
Star Schema
Snowflakes Schema
Fact Constellations Schema
Star Schema
• A star schema is the elementary form of a dimensional model, in
which data are organized into facts and dimensions.
• A fact is an event that is counted or measured, such as a sale or log in.
A dimension includes reference data about the fact, such as date, item,
or customer.
• A star schema is a relational schema where a relational schema whose
design represents a multidimensional data model.
• The star schema is the explicit data warehouse schema.
• It is known as star schema because the entity-relationship diagram of
this schemas simulates a star, with points, diverge from a central table.
The center of the schema consists of a large fact table, and the points
of the star are the dimension tables.
Star Schema
Fact Tables
A fact table has two types of columns: one column of foreign keys (pointing to the dimension tables) and
other of numeric values.
Dimension Tables
Dimension table is generally small in size as compared to a fact table.The primary key of a dimension
table is a foreign key in a fact table.
Example of Dimension Tables are:-
Time dimension table
Product dimension table
Employee dimension table
Geography dimension table
The main characteristics of star schema are that it is easy to understand and small number of tables can
join.
Fact Tables
• A table in a star schema which contains facts and connected to
dimensions.
• A fact table has two types of columns: those that include fact and those
that are foreign keys to the dimension table.
• The primary key of the fact tables is generally a composite key that is
made up of all of its foreign keys.
• A fact table might involve either detail level fact or fact that have been
aggregated (fact tables that include aggregated fact are often instead
called summary tables).
• A fact table generally contains facts with the same level of
aggregation.
Dimension Tables
• A dimension is an architecture usually composed of one or more
hierarchies that categorize data.
• If a dimension has not got hierarchies and levels, it is called a flat
dimension or list.
• The primary keys of each of the dimensions table are part of the
composite primary keys of the fact table.
• Dimensional attributes help to define the dimensional value. They are
generally descriptive, textual values.
• Dimensional tables are usually small in size than fact table.
• Fact tables store data about sales while dimension tables data about the
geographic region (markets, cities), clients, products, times, channels.
Snowflake Schemas for Multidimensional Model
The snowflake schema is a more complex than star schema because dimension tables of the
snowflake are normalized.
The snowflake schema is represented by centralized fact table which is connected to
multiple dimension table and this dimension table can be normalized into additional
dimension tables.
The major difference between the snowflake and star schema models is that the
dimension tables of the snowflake model are normalized to reduce redundancies.
Fact constellation Schemas for Multidimensional Modal
A fact constellation can have multiple fact tables that share many dimension tables. This type
of schema can be viewed as a collection of stars, Snowflake and hence is called a galaxy
schema or a fact constellation.
The main disadvantage of fact constellation schemas is its more complicated design.
This schema defines two fact tables, sales, and shipping. Sales are treated along four
dimensions, namely, time, item, branch, and location
Data Warehouse Architecture
• A data warehouse architecture is a method of defining the overall
architecture of data communication processing and presentation that
exist for end-clients computing within the enterprise.
• Production applications such as payroll accounts payable product
purchasing and inventory control are designed for online transaction
processing (OLTP).
• Such applications gather detailed data from day to day operations.

Data Warehouse_Architecture.pptx

  • 1.
    Data Warehouse Architecture Dr.G.Jasmine Beulah Dept. Computer Science, Kristu Jayanti College, Bengaluru.
  • 2.
    Functions of DataWarehouse Tools and Utilities The following are the functions of data warehouse tools and utilities − • Data Extraction − Involves gathering data from multiple heterogeneous sources. • Data Cleaning − Involves finding and correcting the errors in data. • Data Transformation − Involves converting the data from legacy format to warehouse format. • Data Loading − Involves sorting, summarizing, consolidating, checking integrity, and building indices and partitions. • Refreshing − Involves updating from data sources to warehouse.
  • 3.
    Terminologies Metadata Metadata is simplydefined as data about data. The data that are used to represent other data is known as metadata. For example, the index of a book serves as a metadata for the contents in the book. In terms of data warehouse, we can define metadata as following − • Metadata is a road-map to data warehouse. • Metadata in data warehouse defines the warehouse objects. • Metadata acts as a directory. This directory helps the decision support system to locate the contents of a data warehouse.
  • 4.
    Metadata Repository Metadata repositoryis an integral part of a data warehouse system. It contains the following metadata − Business metadata − It contains the data ownership information, business definition, and changing policies. Operational metadata − It includes currency of data and data lineage. Currency of data refers to the data being active, archived, or purged. Lineage of data means history of data migrated and transformation applied on it. Data for mapping from operational environment to data warehouse − It metadata includes source databases and their contents, data extraction, data partition, cleaning, transformation rules, data refresh and purging rules. The algorithms for summarization − It includes dimension algorithms, data on granularity, aggregation, summarizing, etc.
  • 5.
    Data Cube A datacube helps us represent data in multiple dimensions. It is defined by dimensions and facts. The dimensions are the entities with respect to which an enterprise preserves the records. Example Suppose a company wants to keep track of sales records with the help of sales data warehouse with respect to time, item, branch, and location. These dimensions allow to keep track of monthly sales and at which branch the items were sold. There is a table associated with each dimension. This table is known as dimension table. For example, "item" dimension table may have attributes such as item_name, item_type, and item_brand.
  • 6.
    Data Cube 2-D view of Sales Data fora compa ny with respect to time, item, 3-D view of the sales data with respect to time, item, and location
  • 7.
    Data Mart • Datamarts contain a subset of organization-wide data that is valuable to specific groups of people in an organization. • In other words, a data mart contains only those data that is specific to a particular group. • For example, the marketing data mart may contain only data related to items, customers, and sales. • Data marts are confined to subjects.
  • 8.
  • 9.
    Multi-dimensional Data Model(MDM) •A multidimensional model views data in the form of a data-cube. • A data cube enables data to be modeled and viewed in multiple dimensions. • It is defined by dimensions and facts. The dimensions are the perspectives or entities concerning which an organization keeps records. For example, a shop may create a sales data warehouse to keep records of the store's sales for the dimension time, item, and location. These dimensions allow the shop to keep track of things, for example, monthly sales of items and the locations at which the items were sold. Each dimension has a table related to it, called a dimensional table, which describes the dimension further. For example, a dimensional table for an item may contain the attributes item_name, brand, and type.
  • 10.
    Multi-dimensional Data Model(MDM) •A multidimensional data model is organized around a central theme, for example, sales. This theme is represented by a fact table. Facts are numerical measures. The fact table contains the names of the facts or measures of the related dimensional tables.
  • 11.
    Multi-dimensional Data Model(MDM) •In this 2D representation, the sales for Delhi are shown for the time dimension (organized in quarters) and the item dimension (classified according to the types of an item sold). • The fact or measure displayed in rupee_sold (in thousands). View the sales data with a third dimension
  • 12.
    Multi-dimensional Data Model(MDM) Samedata in the form of a 3D data cube
  • 13.
    Schemas for MultidimensionalData Model are:- Star Schema Snowflakes Schema Fact Constellations Schema
  • 14.
    Star Schema • Astar schema is the elementary form of a dimensional model, in which data are organized into facts and dimensions. • A fact is an event that is counted or measured, such as a sale or log in. A dimension includes reference data about the fact, such as date, item, or customer. • A star schema is a relational schema where a relational schema whose design represents a multidimensional data model. • The star schema is the explicit data warehouse schema. • It is known as star schema because the entity-relationship diagram of this schemas simulates a star, with points, diverge from a central table. The center of the schema consists of a large fact table, and the points of the star are the dimension tables.
  • 15.
  • 16.
    Fact Tables A facttable has two types of columns: one column of foreign keys (pointing to the dimension tables) and other of numeric values. Dimension Tables Dimension table is generally small in size as compared to a fact table.The primary key of a dimension table is a foreign key in a fact table. Example of Dimension Tables are:- Time dimension table Product dimension table Employee dimension table Geography dimension table The main characteristics of star schema are that it is easy to understand and small number of tables can join.
  • 17.
    Fact Tables • Atable in a star schema which contains facts and connected to dimensions. • A fact table has two types of columns: those that include fact and those that are foreign keys to the dimension table. • The primary key of the fact tables is generally a composite key that is made up of all of its foreign keys. • A fact table might involve either detail level fact or fact that have been aggregated (fact tables that include aggregated fact are often instead called summary tables). • A fact table generally contains facts with the same level of aggregation.
  • 18.
    Dimension Tables • Adimension is an architecture usually composed of one or more hierarchies that categorize data. • If a dimension has not got hierarchies and levels, it is called a flat dimension or list. • The primary keys of each of the dimensions table are part of the composite primary keys of the fact table. • Dimensional attributes help to define the dimensional value. They are generally descriptive, textual values. • Dimensional tables are usually small in size than fact table. • Fact tables store data about sales while dimension tables data about the geographic region (markets, cities), clients, products, times, channels.
  • 19.
    Snowflake Schemas forMultidimensional Model The snowflake schema is a more complex than star schema because dimension tables of the snowflake are normalized. The snowflake schema is represented by centralized fact table which is connected to multiple dimension table and this dimension table can be normalized into additional dimension tables. The major difference between the snowflake and star schema models is that the dimension tables of the snowflake model are normalized to reduce redundancies.
  • 20.
    Fact constellation Schemasfor Multidimensional Modal A fact constellation can have multiple fact tables that share many dimension tables. This type of schema can be viewed as a collection of stars, Snowflake and hence is called a galaxy schema or a fact constellation. The main disadvantage of fact constellation schemas is its more complicated design. This schema defines two fact tables, sales, and shipping. Sales are treated along four dimensions, namely, time, item, branch, and location
  • 21.
    Data Warehouse Architecture •A data warehouse architecture is a method of defining the overall architecture of data communication processing and presentation that exist for end-clients computing within the enterprise. • Production applications such as payroll accounts payable product purchasing and inventory control are designed for online transaction processing (OLTP). • Such applications gather detailed data from day to day operations.