BI_LECTURE_4-2021.pptx

DATA WARE HOUSING-
BUILDING THE BLOCKS:
design AND architecture
WITH AARONE ATUHE
2020

 80% of this lecture is based on Ponniah’s “Datawarehousing fundamentals for
IT professionals”
 And
 A data ware house tool kit by Ralph Kimbal and Maggy Rose, A complete guide
to dimensional modelling

Information Systems: PROFILE AND ROLE
 Information systems are rooted in the relationship between information,
decision and control
 An IS should collect and classify the information, by means of integrated and
suitable procedures, in order to produce in time and at the right levels the
synthesis to be used to support the decisional process, as well as to
administrate and globally control the enterprise activity

Information as a resource
 Information is an increasing value resource, required from
managers to schedule and monitor effectively the
enterprise activities.
 Information is the first matter which is transformed by
information systems like unfinished products are
transformed by manufacturing systems

Value of information
 Information is an enterprise resource like capital,
first matters, plants and people; thus, it has a
cost.
 Hence, understanding the value of information is
important

DW and DSS
 How are DWs DSS?

DW DESIGN COMPONENTS: Granularity
 is the extent to which a system is broken down into small parts,
either the system itself or its description or observation. It is the
extent to which a larger entity is subdivided.
 For example, a yard broken into inches has finer granularity than a
yard broken into feet.

Data granularity
 The granularity of data refers to the size in which data fields are sub-divided. For example, a
postal address can be recorded, with coarse granularity, as a single field:
 address = 200 2nd Ave. South #358, St. Petersburg, FL 33701-4313 USA or
 with fine granularity, as multiple fields:
 street address = 200 2nd Ave. South #358
 city = St. Petersburg
 postal code = FL 33701-4313
 country = USA

or even finer granularity:
 street = 2nd Ave. South
 address number = 200
 suite/apartment number = #358
 city = St. Petersburg
 state = FL
 postal-code = 33701
 postal-code-add-on = 4313
 country = USA

Data Granularity in DW
 In an operational system, data is usually kept at the lowest level of detail.
 In a point-of-sale system for a grocery store, the units of sale are captured and stored at the
level of units of a product per transaction at the check-out counter.
 In an order entry system, the quantity ordered is captured and stored at the level of units of
a product per order received from the customer. Whenever you need summary data, you add
up the individual transactions.
 If you are looking for units of a product ordered this month, you read all the orders entered
for the entire month for that product and add up.

 Data granularity in a data warehouse refers to the level of detail. The
lower the level of detail, the finer is the data granularity. Of course, if you
want to keep data in the lowest level of detail, you have to store a lot of
data in the data warehouse
 In a data warehouse, therefore, you find it efficient to keep data
summarized at different levels. Depending on the query, you can then go to
the particular level of detail and satisfy the query.

Figure below shows examples of data granularity
in a typical data warehouse.

WHAT ARE OUR CONCERNS IN DW DESISGN?
 Before deciding to build a data warehouse for your organization, you need to
ask the following basic and fundamental questions and address the relevant
issues:
 Top-down or bottom-up approach?
 Enterprise-wide or departmental?
 Which first—data warehouse or data mart?
 Build pilot or go with a full-fledged implementation?
 Dependent or independent data marts?

 Should you build a large data warehouse and then let that
repository feed data into local, departmental data marts?
 On the other hand, should you build individual local data marts,
and combine them to form your overall data warehouse?
 Should these local data marts be independent of one another?
Or should they be dependent on the overall data warehouse for
data feed?
THIS MEANS WE NEED TO KNOW MORE ABOUT DW AND DATA MARTS

Dw and dm: How Are They Different?
 Inmon stated, “The single most important issue facing the IT manager this
year is whether to build the data warehouse first or the data mart first.”
 Here are the two different basic approaches:
 (1) overall data warehouse feeding dependent data marts, and
 (2) several departmental or local data marts combining into a data
warehouse.
 So, which approach is best in your case, the top-down or the bottom-up
approach? Let us examine these two approaches carefully.
 In the first approach, you extract data from the operational systems; you
then transform, clean, integrate, and keep the data in the datawarehouse.

Top-Down Approach
 In this approach the data in the data warehouse is stored at the
lowest level of granularity based on a normalized data model.

 This is the big-picture approach in which you build the overall, big, enterprise-wide data
warehouse. Here you do not have a collection of fragmented islands of information. The data
warehouse is large and integrated.
 This approach, however, would take longer to build and has a high risk of failure.
 If you do not have experienced professionals on your team, this approach could be
hazardous.

Bottom-Up Approach
 Ralph Kimball, another leading author and expert practitioner in data warehousing, is a
proponent of the approach that has come to be known as the bottom-up approach.
 Kimball (1996) envisions the corporate data warehouse as a collection of conformed data
marts. The key consideration is the conforming of the dimensions among the separate data
marts.
 In this approach data marts are created first to provide analytical and reporting capabilities
for specific business subjects based on the dimensional data model.

 Data marts contain data at the lowest level of granularity and also as summaries depending on
the needs for analysis. These data marts are joined or “unioned” together by conforming the
dimensions
 In this bottom-up approach, you build your departmental data marts one by one. You would set
a priority scheme to determine which data marts you must build first. The most severe
drawback of this approach is data fragmentation. Each independent data mart will be blind to
the overall requirements of the entire organization.

METADATA IN THE DATA WAREHOUSE
 Think of metadata as the Yellow Pages of your town. Do you need information about the
stores in your town, where they are, what their names are, and what products they
specialize in? Go to the Yellow Pages.
 The Yellow Pages is a directory with data about the institutions in your town. Almost in
the same manner, the metadata component serves as a directory of the contents of your
data warehouse.

Types of Metadata
 Metadata in a data warehouse fall into three major
categories:
 Operational metadata
 Extraction and transformation metadata
 End-user metadata

Operational Metadata
 the operational metadata is used to explain how the data was created or transformed
•Whether the job run failed or had warnings
•Which database tables or files were read from, written to, or referenced
•How many rows were read, written to, or referenced
•When the job started and finished
•Which stages and links were used

Extraction and Transformation Metadata
 Extraction and transformation metadata contain data about the
extraction of data from the source systems, namely, the extraction
frequencies, extraction methods, and business rules for the data
extraction.
 Also, this category of metadata contains information about all the data
transformations that take place in the data staging area.

End-User Metadata
 The end-user metadata is the navigational map of the data warehouse.
 It enables the end-users to find information from the data warehouse.
 The end-user metadata allows the end-users to use their own business terminology and
look for information in those ways in which they normally think of the business.

Why is metadata especially important in
a data warehouse?
 First, it acts as the glue that connects all parts of the data warehouse.
 Next, it provides information about the contents and structures to the
developers.
 Finally, it opens the door to the end-users and makes the contents
recognizable in their own terms.

FACT TABLES. WHAT ARE THEY?
 In data warehousing, a Fact table consists of the measurements, metrics or facts of a
business process.
 It is located at the center of a star schema or a snowflake schema surrounded by
dimension tables.
 The primary key of a fact table is usually a composite key that is made up of all of its
foreign keys.
 Fact tables contain the content of the data warehouse and store different types of
measures like additive, non additive, and semi additive measures.

Fact tables CONTINUED…
 A fact table is the primary table in a dimensional model where the numerical
performance measurements of the business are stored, We use the term fact to
represent a business measure.
 We can imagine standing in the marketplace watching products being sold and writing
down the quantity sold and dollar sales amount each day for each product in each store
 .

 A measurement is taken at the intersection of all the dimensions (day,
product, and store). This list of dimensions defines the grain of the fact table
and tells us what the scope of the measurement is.
 The most useful facts are numeric and additive, such as dollar sales amount

Dimension Tables
 Dimension tables are integral companions to a fact table. The dimension tables contain the
textual descriptors of the business.
 In a well-designed dimensional model, dimension tables have many columns or attributes.
These attributes describe the rows in the dimension table,
 Each dimension is defined by its single primary key, designated by the PK notation which
serves as the basis for referential integrity with any given fact table to which it is joined.

 Dimension attributes serve as the primary source of query constraints, groupings, and report
labels. In a query or report request, attributes are identified as the by words.
 For example, when a user states that he or she wants to see dollar sales by week by brand,
week and brand must be available as dimension attributes.
 Dimension table attributes play a vital role in the data warehouse. Since they are the source
of virtually all interesting constraints and report labels, they are key to making the data
warehouse usable and understandable

Bringing Together Facts and
Dimensions: Fact and dimension tables in a
dimensional model

The Process of DataWarehouse Design
 A data warehouse can be built using a top-down approach (Starts with overall design
and planning), a bottom-up approach (Starts with experiments and prototypes (rapid)),
or a combination of both.
 From the software engineering point of view, the design and construction of a data
warehouse may consist of the following steps

 : planning,
 requirements study,
 problem analysis,
 warehouse design,
 data integration and testing, and
 finally deployment of the data warehouse.
 Large software systems can be developed using two methodologies: the waterfall method or
the spiral method.

Data ware house architectures:
Three-Tier Data Warehouse Architecture
 Generally the data warehouses adopt the three-tier architecture. Following are the
three tiers of data warehouse architecture.
 Bottom Tier - The bottom tier of the architecture is the data warehouse database
server. It is the relational database system. We use the back end tools and utilities to
feed data into bottom tier.

 Middle Tier - In the middle tier we have OLAP Server. the OLAP Server can be implemented in either of the
following ways.
 By relational OLAP (ROLAP), which is an extended relational database management system. The ROLAP
maps the operations on multidimensional data to standard relational operations,
 By Multidimensional OLAP (MOLAP) model, which directly implements multidimensional data and
operations.
 Top-Tier - This tier is the front-end client layer. This layer hold the query tools and reporting
tool, analysis tools and data mining tools.

7/20/2022
MIS7206-1
Simple Data warehouse Architecture
Registration
System
Exam results
System
Fees payment System
Extract,
Transform,
Load
(ETL)
F
a
c
t
s
Source systems Information
Data Warehouse
Data
Marts
Knowledge
Knowledge
management
Statistical
analysis
Data mining
Unstructured information
Performance
management
Staging area

7/20/2022
MIS7202-1
Data warehouse Architecture explained
 Source systems: these refer to the different operational systems where data
is extracted from
 ETL: this refers to the software tool used in extraction transforming and
loading of data This could be from source direct to data warehouse or to a
staging area data base where data cleaning can be done.

7/20/2022
MIS7202-1
 In the data warehouse the data is arranged in a
dimensional way with facts and dimensions, Depending on
the model used the data warehouse could then be split in
data marts to handle specific user needs.
 A data mart is a sub section of a data warehouse
customized for one business area

Three Data Warehouse models
 Enterprise warehouse
 Data Mart
 Virtual warehouse (make very brief notes)

Up next-
 DW SDLC
 DWOlap
 Dimensional modelling and schemas
 NEXTGENERATION DW

BI_LECTURE_4-2021.pptx

Recommended

Recommended

More Related Content

Similar to BI_LECTURE_4-2021.pptx

Similar to BI_LECTURE_4-2021.pptx (20)

Recently uploaded

Recently uploaded (20)

BI_LECTURE_4-2021.pptx