1. Assignment # 3
Information System & Data Processing II
Submitted By
Abdul-rehman Aslam
Roll # (9998)
Submitted To
Madam: Nargis Fatima
NATIONAL UNIVERSITY OF MODERN LANGUAGES H-9
ISLAMABAD
2. Question :
Differenciate between the following
1. star schema and snow flake
2. snow flake and fact constellation schema
3. Star schema and fact constellation
1.Star schema and snow flake
Star Schema Snow Flake Schema
The star schema is the simplest
data warehouse scheme.
In star schema each of the
dimensions is represented in a
single table .It should not have any
hierarchies between dims.
It contains a fact table surrounded
by dimension tables. If the
dimensions are de-normalized, we
say it is a star schema design.
In star schema only one join
establishes the relationship
between the fact table and any one
of the dimension tables.
A star schema optimizes the
performance by keeping queries
simple and providing fast response
time. All the information about the
each level is stored in one row.
It is called a star schema because
the diagram resembles a star.
Snowflake schema is a more
complex data warehouse model
than a star schema.
In snow flake schema at least one
hierarchy should exists between
dimension tables.
It contains a fact table surrounded
by dimension tables. If a
dimension is normalized, we say it
is a snow flaked design.
In snow flake schema since there
is relationship between the
dimensions tables it has to do
many joins to fetch the data.
Snowflake schemas normalize
dimensions to eliminated
redundancy. The result is more
complex queries and reduced
query performance.
It is called a snowflake schema
because the diagram resembles a
snowflake.
3. 2.Snow flake and fact constellation schema
Snowflake schema:
The snowflake schema is a variant of the star schema model, where some dimension
tables are normalized, thereby further splitting the data into additional tables. The
resulting schema graph forms a shape similar to a snowflake.
· The major difference between the snowflake and star schema models is that the
dimension tables of the snowflake model may be kept in normalized form to reduce
redundancies.
· Such a table is easy to maintain and saves storage space. However, this saving of
space is negligible in comparison to the typical magnitude of the fact table.
· Snowflake structure can reduce the effectiveness of browsing, since more joins will be
needed to execute a query.
· The system performance may be adversely impacted. Hence, although the snowflake
schema reduces redundancy, it is not as popular as the star schema in data warehouse
design.
Example
Here, the sales fact table is identical to that of the star schema The main difference
between the two schemas is in the definition of dimension tables.
The single dimension table for item in the star schema is normalized in the snowflake
schema, resulting in new item and supplier tables.
Fact constellation: Sophisticated applications may require multiple fact
tables to share
dimension tables. This kind of schema can be viewed as a collection of stars, and
hence is called a galaxy schema or a fact constellation.
Fact constellation. This schema specifies two fact tables, sales and shipping.
The sales table definition is identical to that of the star schema (Figure 3.4).
The shipping table has five dimensions, or keys: item key, time key, shipper key, from
location, and to location, and two measures: dollars cost and units shipped. A fact
constellation schema allows dimension tables to be shared between fact tables. For
4. example, the dimensions tables for time, item, and location are shared between both
the sales and shipping fact tables.
The fact constellation schema is commonly used, since it can model multiple,
interrelated subjects. A data mart, on the other hand, is a department subset of the data
warehouse that focuses on selected subjects, and thus its scope is departmentwide.
For data marts, the star or snowflake schema are commonly used, since both are
geared toward modeling single subjects, although the star schema is more popular and
efficient.
A dimension table will not have parent table in star schema, whereas snow flake
schemas have one or more parent tables
Performance wise, star schema is good. But if memory utilization is a major concern,
then snow flake schema is better than star schema.
3.Star schema and fact constellation
Star schema:
The most common modeling paradigm is the star schema, in which the data warehouse
contains (1) a large central table (fact table) containing the bulk of data with no
redundancy, and (2) a set of smaller attunement tables (dimension tables), one for
each dimension.
It is the basic structure for a dimensional model. It has one fact table and a set of
smaller dimension tables arranged around the fact table. The fact data will not
change over time. The most useful fact tables are numeric and additive because data
warehouse applications almost never access a single record. They access hundreds,
thousands, millions of records at a time and aggre-gate
5. Fact constellation
Sophisticated applications may require multiple fact tables to share dimension tables.
This kind of schema can be viewed as a collection of stars. This kind of schema can be
viewed as a collection of stars, and hence is called as a galaxy schema or a fact
constellation.
Example for defining Star, Snowflake and Fact Constellation Schema
Just as we use relational query languages like SQL, a data miming query language can
be used to query a data-mining task DMQL, whi9ch contains language primitives for
defining data warehouse and data marts. Data warehouse and data marts can be
defined using two language primitives, one for cube definition and another for
dimension definition.
The cube definition has the following syntax:
Define cube <cube_name> [(dimensional list)]:<measure list>
The dimension definition has the following syntax:
Define dimension<dimension_name> as (<attribute or sub-dimension list>)