Data Warehousing is a topic on Management of Information Technology that would help students on their subject matter and as reference for their assigned report.
2. WHAT IS DATA WAREHOUSING?
-The concept of data warehousing was introduced in
1988 by IBM researchers Barry Devlin and Paul
Murphy.
-The term “Data Warehouse” was first coined by Bill
Inmon in 1990. According to Inmon, a data warehouse
is a subject oriented, integrated, time-variant, and
non volatile collection of data.
-It is the process of constructing and using a data
warehouse.
-It involves data cleaning, data integration, and data
3. WHAT IS DATA WAREHOUSE?
-It is constructed by integrating data from multiple
heterogeneous sources that support analytical
reporting, structured and/or ad hoc queries, and
decision making.
-It is the secure electronic storage of information
by a business or other organization.
-A vital component of business intelligence.
-An information storage system for historical data
that can be analyzed in numerous ways.
4. •An operational database undergoes frequent changes
on a daily basis on account of transactions that take
place.
•A data warehouses provides generalized and
consolidated data in multidimensional view. Along
with generalized and consolidated view of data, a data
warehouse also provides Online Analytical Processing
(OLAP) tools. These tools help us in interactive and
effective analysis of data in a multidimensional space.
This analysis results in data generalization and data
mining.
•Data mining functions as association, clustering,
5. Understanding a Data Warehouse
• A data warehouse is a database, which is kept separate from the
organization’s operational database;
• There is no frequent updating done in a data warehouse;
• It possesses consolidated historical data, which helps the
organization to analyze its business;
• A data warehouse helps executives to organize, understand, and
use their data to take strategic decisions;
• Data warehouse systems helps in the integration of diversity of
application systems; and
• A data warehouse system helps in consolidated historical data
analysis.
6. Using Data Warehouse Information
•Tuning Production Strategies – the product strategies
can be well tuned by repositioning the products and
managing the product portfolios by comparing the
sales quarterly or yearly.
•Customer Analysis – Customer analysis is done by
analyzing the customer’s buying preferences, buying
time, budget cycles, etc.
•Operations Analysis – Data warehousing also helps in
customer relationship management, and making
environmental corrections. The information also
7. Why a Data Warehouse is Separated
from Operational Databases
• An operational database is constructed for well-known tasks and
workloads such as searching particular records, indexing, etc. In
contract, data warehouse queries are often complex and they
present a general form of data;
• Operational databases support concurrent processing of multiple
transactions. Concurrency control and recovery mechanisms are
required for operational databases to ensure robustness and
consistency of the database;
• An operational database query allows to read and modify
operations, while an OLAP query needs only read only access of
stored data; and
8. Data Warehouse Features
• Subject Oriented – a data warehouse is subject oriented
because it provides information around a subject rather
than the organization’s ongoing operations.
• Integrated – a data warehouse is constructed by
integrating data from heterogenous sources such as
relational databases, flat files, etc.
• Time Variant – the data collected in a data warehouse is
identified with a particular time period.
• Non-Volatile – Non-volatile means the previous data is
not erased when new data is added to it.
9. Stages in Creating a Data Warehouse
• Determining the business objectives and its key performance
indicators.
• Collecting and analyzing the appropriate information.
• Identifying the core business process that contribute the key
data.
• Constructing a conceptual data model that shows how the
data are displayed to the end-user.
• Locating the sources of data and establishing a process for
feeding data into the warehouse.
• Establish a tracking duration. Data warehouses can become
unwieldy. Many are built with level of archiving, so that older
information is retained in less detail
• Implementing the plan.
10. Maintaining a Data Warehouse
One step is data extraction, which involves gathering
large amounts of data from multiple source points.
After a set of data has been compiled, it goes through
data cleaning, the process of combining through it for
errors and correcting or excluding any that are found.
The cleaned-up data is then converted from a
database format to warehouse format. Once stored in
the warehouse, the data goes through sorting,
consolidating, and summarizing, so that it will be
easier to use. Today, businesses can invest in cloud-
based data warehouse software services from
companies including Microsoft, Google, Amazon, and
Oracle, among others.
11. Data Warehouse Applications
Data warehouses are widely used in the following
fields:
•Financial services
•Banking services
•Consumer goods
•Retail sectors
•Controlled manufacturing
12. Types of Data Warehouse
•Information Processing – a data warehouse allows to
process the data stored in it. This data can be
processed by means of querying, basic statistical
analysis, reporting using crosstabs, tables, charts or
graphs.
•Analytical Processing – a data warehouse supports
analytical processing of the information stored in it.
The data can be analyzed by means of basic OLAP
operations, including slice-and-dice, drill down, drill
up, and pivoting.
•Data Mining – Data mining supports knowledge
discovery by finding hidden patterns and associations,
13. Sr.
No.
Data Warehouse (OLAP) Operational Database (OLTP)
1 It involves historical processing of information. It involves day-to-day processing.
2 OLAP systems are used by knowledge workers
such as executives, managers, and analysts.
OLTP systems are used by clerks, DBAs,
or database professionals.
3 It is used to analyze the business. It is used to run the business.
4 It focuses on Information out. It focuses on Data in.
5 It is based on Star Schema, Snowflake Schema,
and Fact Constellation Schema.
It is based on Entity Relationship Model.
6 It focuses on Information out. It is application oriented.
7 It contains historical data. It contains current data.
8 It provides summarized and consolidated data. It provides primitive and highly detailed
data.
9 It provides summarized and multidimensional
view of data.
It provides detailed and flat relational
view of data.
10 The number of users is in hundreds. The number of users is in thousands.
11 The number of records accessed is in millions. The number of records accessed in tens.
12 The database size is from 100 GB to 100 TB. The database size is from 100 MB to 100
14. 5 Steps of Data Mining
1. An organization collects data and loads it into a data
warehouse.
2. The data are then stored and managed, either on in-
house servers or in a cloud service.
3. Business analysts, management teams, and
information technology professionals access and
organize the data.
4. Application software sorts the data.
5. The end-user presents the data in an easy-to-share
format, such as graph or table.
15. Functions of Data Warehouse Tools and
Utilities
• Data Extraction – involves gathering data from multiple
heterogeneous sources.
• Data Cleaning – involves finding and correcting the errors in
data.
• Data Transformation – involves converting the data from
legacy format to warehouse format.
• Data Loading – involves sorting, summarizing, consolidating,
checking integrity, and building indices and partitions.
• Refreshing – involves updating from the data sources to
warehouse.
16. Data Warehouse Architecture
Single-tier Architecture: Single-tier architecture is
hardly used in the creation of data warehouses for
real-time systems.
Two-tier Architecture: In a two-tier architecture
design, the analytical process is separated from the
business process.
Three-tier Architecture: A three-tier architecture
design has a top, middle, and bottom tier; these are
17. Data Warehouse vs. Database
A data warehouse is not the same as a database:
• A database is a transactional system that monitors and
updates real-time data in order to have only the most recent
data available.
• A data warehouse is programmed to aggregate structured
data over time.
For example, a database might only have the most recent
address of a customer, while a data warehouse might have all
the addresses of the customer for the past 10 years.
Data mining relies on the data warehouse. The data in the
18. Data Warehouse vs. Data Lake
Data lake holds raw data of which the goal has
not yet been determined, while data warehouse
hold refined data that has been filtered to be
used for a specific purpose.
Data lakes are primarily used by data scientists
while data warehouses are most often used by
business professionals. Data lakes are also more
easily accessible and easier to update while data
warehouses are more structured and any changes
19. Data Warehouse vs. Data Mart
Data mart is just a smaller version of a data warehouse.
A data mart collects data from a small number of
sources and focuses on one subject area. Data marts
are faster and easier to use than data warehouses.
Data marts typically function as a subset of a data
warehouse to focus on one area for analytical purposes,
such as specific department within an organization.
Data marts are used to help make business decisions
by helping with analysis and reporting.
20. Advantages and Disadvantages of Data
Warehousing
Advantages
• Provides fact-based analysis on past company performance to inform
decision-making.
• Serves as a historical archive of relevant data.
• Can be shared across key departments for maximum usefulness.
Disadvantages
• Creating and maintaining the warehouse is resource-heavy.
• Input errors can damage the integrity of the information archived.
• Use of multiple sources can cause inconsistencies in the data.
21. Is SQL a Data Warehouse?
SQL, or Structured Query Language, is a computer language
that is used to interact with a database in terms that it can
understand and respond to. It contains a number of commands
such as “select”, “insert”, and “update”. It is the standard
language for relational database management systems.
What is ETL in a Data Warehouse?
“ETL” stands for “extract, transform, and load”. ETL is a data
process that combines data from multiple sources into one
single data storage unit, which is then loaded into a data
warehouse or similar data system. It is used in data analytics
and machine learning.