2. Outline
❖ Data Warehouse
❖ Features
❖ Components
❖ Architectures
❖ Schemas
❖ OLAP Concepts
❖ Benefits of Data Warehouse
3. Data Warehouse
❖ Centralized Storage System that allows storing, analysing and interpreting
data
❖ Subject-oriented, integrated, and time-variant store of information
❖ Collect data from various heterogeneous databases- supports business
analysis and decision making task
❖ Primary purpose
➢ To aggregate information throughout an organization into single repository for decision
making purpose
4. Key Features of Data Warehouse
❖ Subject Oriented: Analyze data about a particular subject or functional area
(such as sales, product, customers etc.)
❖ Integrated: Integrates various heterogeneous data sources like relational
database, flat files, and online transactional records.
❖ Time-Variant: Historical Information is kept.
❖ Non-Volatile: Once the data is enter into data warehouse, they are not
changed and removed.
6. Components of Data Warehouse
❖ Database
❖ Staging Area
❖ Metadata
❖ Data Mart
❖ Access Tools
7. Components contd..
❖ Database: Data is stored and accessed
❖ Staging Stage: Staging layer uses ETL tool for extracting data from various
formats and checks the quality before loading in data warehouse
❖ MetaData: information that defines the data. Helps to classify, locate and
direct queries to required data.
❖ Data Marts: allow you to have multiple groups within the system by segmenting
the data in the warehouse into categories.
❖ Access Tools: Users interacting tools. Eg: OLAP, data mining, reporting tools
11. Single-Tier Architecture
❖ Only layer physically available is the source layer
❖ Data Warehouses are virtual
❖ Data Warehouse is implemented as multidimensional view of operational
data created by specific middleware or an intermediate processing layer
❖ Rarely used in practice
❖ Reduces amount of data stored by avoiding repetition
13. Two-Tier Architecture
❖ Source Layer: Data is stored initially to relational databases or legacy
databases
❖ Data Staging: Extraction, Transformation, and Loading Tools(ETL) load
source data into a data warehouse
❖ Data Warehouse Layer: Information is saved to one logically centralized
individual repository
❖ Analysis: Integrated data is efficiently accessed to issue reports,
dynamically analyze information, and simulate hypothetical business
scenarios
15. Three-Tier Architecture
Most widely used architecture in data warehouse system
❖ Bottom Tier: Database of the warehouse, where the cleansed and
transformed data is loaded
❖ Middle Tier: Application layer and gives abstracted view of the database.
OLAP server is implemented using either a relational OLAP(ROLAP) model
or a multidimensional OLAP(MOLAP) model
❖ Top Tier: Represents front-end client layer where reporting tools, query,
analysis or data mining tools.
16. Dimensions
❖ The tables that describe the dimensions are called as Dimensions tables.
❖ Dividing Data Warehouse project into dimensions provides structured
information for analysis and reporting.
17. Facts and Measures
❖ A fact is a measure that can be summed, averaged or manipulated
❖ Fact table contains 2 kinds of data-a dimension key and a measure
❖ Every dimension table is linked to a Fact table
18. Schemas
A schema give the logical description of entire database
It gives details about the constraints placed on tables, key values present and
how the key values are linked between the various tables.
A database uses relational model, which a data warehouse uses Star, Snowflake
and Fact Constellation schema.
19. Star Schema
❖ Form of dimensional model where
data are organized in facts and
dimension table
❖ Center of star has one fact table and
a number of associated dimension
tables
21. Fact Constellation Schema
❖ Multiple fact tables shares
dimension tables, viewed as a
collection or starts, therefore
called galaxy schema or fact
constellation.
22. Tools of Data Warehouse
❖ Data Extraction: SAS, Web Scraper, Import.io
❖ Data Cleaning: Apertus, Trillium
❖ Data Storage: ORACLE, SYBASE
❖ OLAP Tool
❖ Some examples:
➢ MarkLogic
➢ Oracle
➢ Amazon Redshift
23. OLAP(Online Analytical Processing)
❖ OLAP is a way that allows users to
analyze information from multiple
database systems at the same time
❖ Technology that organizes large
business databases and support the
complex analysis
❖ OLAP databases are divided into one
or more cubes
❖ Each cubes have numerical facts
called as hypercube
24. OLAP Operations: RollUp
Roll-up performs aggregation on data cube
by either:
1. Climbing up a concept hierarchy for a
dimension
2. Dimension reduction
25. OLAP Operations: Drill-down
Drill-down is the reverse operation of roll-up.
It is performed by either:
1. Stepping down a concept hierarchy for a dimension
2. Introducing a new dimension
26. OLAP Operations: Slice
The slice operation provides a new
sub-cube from one particular
dimension in a given cube.
27. OLAP Operations: Dice
The dice operation provides a new
sub-cube from two or more
dimensions in a given cube.
28. OLAP Operations: Pivot
The pivot operation is also known as rotation
operation.
It transposes the axes in order to provide an
alternative presentation of data.
30. Advantages and Disadvantages of OLAP
Advantages Disadvantages
Information and calculations are in OLAP cube Organize data into star or snowflake schema-
complicated to implement
Beneficial for all business for planning,
budgeting, reporting and analysis.
Cannot have large number of dimensions in a
single OLAP cube
Allows users to do slice and dice cube data by
various dimensions, filters and measures
Transactional data cannot be accessed with
OLAP system
Good for analyzing time series Any modification in OLAP cube need full
update of the cube- time consuming process
Finding clusters and outliers is easier with
OLAP
31. Benefits of Data Warehouse
❖ Enables Historical Insight
❖ Enhance Quality of Data
❖ Increase the power and speed of data analytics
❖ Drives Revenus
❖ Data Security
❖ Much higher query performance
❖ Boost efficiency