2. Overview
● Introduction
● Need of Data warehousing
● Data warehousing Building
Blocks
● OLAP in Data warehouse
● OLAP models
● Operational DB vs Data
warehouse
3. Introduction
Data Warehouse
A data warehouse is a subject-oriented, integrated, time-variant and
non-volatile collection of data in support of management's decision
making process. -Bill Inmon
It is the process whereby organizations extract value from their
informational assets through use of special stores called data
warehouses
4. Need of Data Warehouse
Computer applications that support day-to-day business operations. are
effective in what they are designed to do. They gather, store, and process all the
data needed to successfully perform the daily operations.
As businesses grew more complex and business executives became
desperate for information to stay competitive. what the executives needed
were different kinds of information that could be readily used to make strategic
decisions. The operational systems, important as they were, could not provide
strategic information. Businesses, therefore, were compelled to turn to new
ways of getting strategic information.Data warehousing is a new paradigm
specifically intended to provide vital strategic information.
5. Need of Data Warehouse
ESCALATING NEED FOR STRATEGIC INFORMATION
The executives and managers who are responsible for keeping the enterprise
competitive need information to make proper decisions. They need information
to formulate the:-business strategies, establish goals, set objectives, and
monitor results
Critical business decisions depend on the availability of proper strategic
information in an enterprise.
6. The desired
characteristics of
strategic
information are:
1. INTEGRATED- must have a single,
enterprise-wide view
2. DATA INTEGRITY- Information
must be accurate and must
conform to business rules
3. ACCESSIBLE- Easily accessible
with intuitive access paths, and
responsive for analysis.
4. CREDIBLE- Every business factor
must have one and only one value
5. TIMELY- Information must be
available within the stipulated
time frame.
7. Need of Data Warehouse
FAILURES OF PAST DECISION-SUPPORT SYSTEMS
The user must be able to query online, get results, and query some more. The
information must be in a format suitable for analysis.
PAST DECISION-SUPPORT SYSTEMS
Ad Hoc Reports. send requests to IT for special reports.IT would write special programs, typically one for each
request, and produce the ad hoc reports.
Special Extract Programs.For the types of reports that would be requested from time to time. IT would write a
suite of programs and run the programs periodically to extract data from the various applications.For any reports
that could not be run off the extracted files, IT would write individual special programs.
8. Need of Data Warehouse
PAST DECISION-SUPPORT SYSTEMS
Small Applications.IT would create simple applications based on the extracted files.The users could stipulate the
parameters for each special report.
Information Centers. The information center typically was a place where users view special information on
screens.
Decision-Support Systems. these systems were supported by extracted files.The systems were menu-driven
and provided online information and also the ability to print special reports.
Executive Information Systems.The main criteria were simplicity and ease of use. The system would display
key information every day and provide ability to request simple, straightforward reports.However, only
preprogrammed screens and reports were available
9. Data Warehouse- the only viable solution
DATA WAREHOUSE IS DEFINED AS:
The data warehouse is an informational environment that:-
● Provides an integrated and total view of the enterprise
● Makes the enterprise’s current and historical information easily available
for decision making
● Makes decision-support transactions possible without hindering
operational systems
● Renders the organization’s information consistent
● Presents a flexible and interactive source of strategic information
10.
11. Data Warehouse: The building block
A data warehouse is typically a dedicated database system for decision
making that is separate from the production database(s) used
operationally. It differs from production system in that:
● it covers a much longer time horizon than transaction systems
● it includes multiple databases that have been processed so that the
warehouse’s data are defined uniformly (i.e., ‘clean’ data)
● it is optimized for answering complex queries from managers and
analysts
15. Subject Orientated
● Data is organized around major subjects of the enterprise.
● Data warehouses are designed to help you analyze data.
● For example, to learn more about your company's sales data, you
can build a warehouse that concentrates on sales.
● Using this warehouse, you can answer questions like "Who was
our best customer for this item last year?" This ability to define a
data warehouse by subject matter, sales in this case, makes the
data warehouse subject oriented.
16. Integrated
● Integration is closely related to subject orientation.
● Data warehouses must put data from disparate sources into a
consistent format.
● They must resolve such problems as naming conflicts and
inconsistencies among units of measure.
● When they achieve this, they are said to be integrated.
17. Non Volatile
● Non-volatile means that, once entered into the warehouse, data are
not changed/updated.
● This is logical because the purpose of a warehouse is to enable you
to analyze what has occurred
18. Time Variant
● In order to discover trends in business, analysts need large
amounts of data.
● This is very much in contrast to online transaction processing
(OLTP) systems, where performance requirements demand that
historical data be moved to an archive.
● The data are kept for many years so they can be used for trends,
forecasting, and comparisons over time.
● A data warehouse's focus on change over time is what is meant
by the term time variant.
20. Data Marts
● Data Mart: A scaled-down version of the data warehouse
● A data mart is a small warehouse designed for the Small
Business Unit (SBU) or department level.
● It is often a way to gain entry and provide an opportunity to
learn
● Major problem: if they differ from department to department,
they can be difficult to integrate enterprise-wide
23. Data Warehouse vs Operational Database Management System
● OLTP (on-line transaction processing)
○ Major task of traditional relational DBMS
○ Day-to-day operations: purchasing, inventory, banking,
manufacturing, payroll, registration, accounting, etc.
● OLAP (on-line analytical processing)
○ Major task of data warehouse system
○ Data analysis and decision making
● Distinct features (OLTP vs. OLAP):
○ User and system orientation: customer vs. market
○ Data contents: current, detailed vs. historical,consolidated
○ Database design: ER + application vs. star + subject
○ View: current, local vs. evolutionary, integrated
○ Access patterns: update vs. read-only but complex queries
25. Why separate Data Warehouse?
● High performance for both systems
○ DBMS— tuned for OLTP: access methods, indexing,concurrency control, recovery
○ Warehouse—tuned for OLAP: complex OLAP queries, multidimensional view, and
consolidation.
● Different functions and different data:
○ Missing data: Decision support requires historical data which operational DBs do not
typically maintain
○ Data consolidation: DS requires consolidation(aggregation, summarization) of data
from heterogeneous sources
○ Data quality: different sources typically use inconsistent data representations, codes
and formats which have to be reconciled.
26. OLAP
OLAP (for online analytical processing) is software for performing
multidimensional analysis at high speeds on large volumes of data from a
data warehouse, data mart, or some other unified, centralized data store.
In a data warehouse, data sets are stored in tables, each of which can
organize data into just two of these dimensions at a time. OLAP extracts
data from multiple relational data sets and reorganizes it into a
multidimensional format that enables very fast processing and very
insightful analysis
.
27. Types of OLAP models
We have four types of OLAP models−
● Relational OLAP (ROLAP)
○ Use relational or extended-relational DBMS to store and manage warehouse data and OLAP
middle ware to support missing pieces
○ Include optimization of DBMS backend, implementation of aggregation navigation logic, and
additional tools and services
○ greater scalability
● Multidimensional OLAP (MOLAP)
○ Array-based multidimensional storage engine (sparse matrix techniques)
○ fast indexing to pre-computed summarized data
● Hybrid OLAP (HOLAP)
○ User flexibility, e.g., low level: relational, highlevel: array
○ Specialized SQL servers
○ specialized support for SQL queries over star/snowflake schemas