SlideShare a Scribd company logo
1 of 22
DATA
WAREHOUSING
WHAT IS DATA WAREHOUSING?
-The concept of data warehousing was introduced in
1988 by IBM researchers Barry Devlin and Paul
Murphy.
-The term “Data Warehouse” was first coined by Bill
Inmon in 1990. According to Inmon, a data warehouse
is a subject oriented, integrated, time-variant, and
non volatile collection of data.
-It is the process of constructing and using a data
warehouse.
-It involves data cleaning, data integration, and data
WHAT IS DATA WAREHOUSE?
-It is constructed by integrating data from multiple
heterogeneous sources that support analytical
reporting, structured and/or ad hoc queries, and
decision making.
-It is the secure electronic storage of information
by a business or other organization.
-A vital component of business intelligence.
-An information storage system for historical data
that can be analyzed in numerous ways.
•An operational database undergoes frequent changes
on a daily basis on account of transactions that take
place.
•A data warehouses provides generalized and
consolidated data in multidimensional view. Along
with generalized and consolidated view of data, a data
warehouse also provides Online Analytical Processing
(OLAP) tools. These tools help us in interactive and
effective analysis of data in a multidimensional space.
This analysis results in data generalization and data
mining.
•Data mining functions as association, clustering,
Understanding a Data Warehouse
• A data warehouse is a database, which is kept separate from the
organization’s operational database;
• There is no frequent updating done in a data warehouse;
• It possesses consolidated historical data, which helps the
organization to analyze its business;
• A data warehouse helps executives to organize, understand, and
use their data to take strategic decisions;
• Data warehouse systems helps in the integration of diversity of
application systems; and
• A data warehouse system helps in consolidated historical data
analysis.
Using Data Warehouse Information
•Tuning Production Strategies – the product strategies
can be well tuned by repositioning the products and
managing the product portfolios by comparing the
sales quarterly or yearly.
•Customer Analysis – Customer analysis is done by
analyzing the customer’s buying preferences, buying
time, budget cycles, etc.
•Operations Analysis – Data warehousing also helps in
customer relationship management, and making
environmental corrections. The information also
Why a Data Warehouse is Separated
from Operational Databases
• An operational database is constructed for well-known tasks and
workloads such as searching particular records, indexing, etc. In
contract, data warehouse queries are often complex and they
present a general form of data;
• Operational databases support concurrent processing of multiple
transactions. Concurrency control and recovery mechanisms are
required for operational databases to ensure robustness and
consistency of the database;
• An operational database query allows to read and modify
operations, while an OLAP query needs only read only access of
stored data; and
Data Warehouse Features
• Subject Oriented – a data warehouse is subject oriented
because it provides information around a subject rather
than the organization’s ongoing operations.
• Integrated – a data warehouse is constructed by
integrating data from heterogenous sources such as
relational databases, flat files, etc.
• Time Variant – the data collected in a data warehouse is
identified with a particular time period.
• Non-Volatile – Non-volatile means the previous data is
not erased when new data is added to it.
Stages in Creating a Data Warehouse
• Determining the business objectives and its key performance
indicators.
• Collecting and analyzing the appropriate information.
• Identifying the core business process that contribute the key
data.
• Constructing a conceptual data model that shows how the
data are displayed to the end-user.
• Locating the sources of data and establishing a process for
feeding data into the warehouse.
• Establish a tracking duration. Data warehouses can become
unwieldy. Many are built with level of archiving, so that older
information is retained in less detail
• Implementing the plan.
Maintaining a Data Warehouse
One step is data extraction, which involves gathering
large amounts of data from multiple source points.
After a set of data has been compiled, it goes through
data cleaning, the process of combining through it for
errors and correcting or excluding any that are found.
The cleaned-up data is then converted from a
database format to warehouse format. Once stored in
the warehouse, the data goes through sorting,
consolidating, and summarizing, so that it will be
easier to use. Today, businesses can invest in cloud-
based data warehouse software services from
companies including Microsoft, Google, Amazon, and
Oracle, among others.
Data Warehouse Applications
Data warehouses are widely used in the following
fields:
•Financial services
•Banking services
•Consumer goods
•Retail sectors
•Controlled manufacturing
Types of Data Warehouse
•Information Processing – a data warehouse allows to
process the data stored in it. This data can be
processed by means of querying, basic statistical
analysis, reporting using crosstabs, tables, charts or
graphs.
•Analytical Processing – a data warehouse supports
analytical processing of the information stored in it.
The data can be analyzed by means of basic OLAP
operations, including slice-and-dice, drill down, drill
up, and pivoting.
•Data Mining – Data mining supports knowledge
discovery by finding hidden patterns and associations,
Sr.
No.
Data Warehouse (OLAP) Operational Database (OLTP)
1 It involves historical processing of information. It involves day-to-day processing.
2 OLAP systems are used by knowledge workers
such as executives, managers, and analysts.
OLTP systems are used by clerks, DBAs,
or database professionals.
3 It is used to analyze the business. It is used to run the business.
4 It focuses on Information out. It focuses on Data in.
5 It is based on Star Schema, Snowflake Schema,
and Fact Constellation Schema.
It is based on Entity Relationship Model.
6 It focuses on Information out. It is application oriented.
7 It contains historical data. It contains current data.
8 It provides summarized and consolidated data. It provides primitive and highly detailed
data.
9 It provides summarized and multidimensional
view of data.
It provides detailed and flat relational
view of data.
10 The number of users is in hundreds. The number of users is in thousands.
11 The number of records accessed is in millions. The number of records accessed in tens.
12 The database size is from 100 GB to 100 TB. The database size is from 100 MB to 100
5 Steps of Data Mining
1. An organization collects data and loads it into a data
warehouse.
2. The data are then stored and managed, either on in-
house servers or in a cloud service.
3. Business analysts, management teams, and
information technology professionals access and
organize the data.
4. Application software sorts the data.
5. The end-user presents the data in an easy-to-share
format, such as graph or table.
Functions of Data Warehouse Tools and
Utilities
• Data Extraction – involves gathering data from multiple
heterogeneous sources.
• Data Cleaning – involves finding and correcting the errors in
data.
• Data Transformation – involves converting the data from
legacy format to warehouse format.
• Data Loading – involves sorting, summarizing, consolidating,
checking integrity, and building indices and partitions.
• Refreshing – involves updating from the data sources to
warehouse.
Data Warehouse Architecture
Single-tier Architecture: Single-tier architecture is
hardly used in the creation of data warehouses for
real-time systems.
Two-tier Architecture: In a two-tier architecture
design, the analytical process is separated from the
business process.
Three-tier Architecture: A three-tier architecture
design has a top, middle, and bottom tier; these are
Data Warehouse vs. Database
A data warehouse is not the same as a database:
• A database is a transactional system that monitors and
updates real-time data in order to have only the most recent
data available.
• A data warehouse is programmed to aggregate structured
data over time.
For example, a database might only have the most recent
address of a customer, while a data warehouse might have all
the addresses of the customer for the past 10 years.
Data mining relies on the data warehouse. The data in the
Data Warehouse vs. Data Lake
Data lake holds raw data of which the goal has
not yet been determined, while data warehouse
hold refined data that has been filtered to be
used for a specific purpose.
Data lakes are primarily used by data scientists
while data warehouses are most often used by
business professionals. Data lakes are also more
easily accessible and easier to update while data
warehouses are more structured and any changes
Data Warehouse vs. Data Mart
Data mart is just a smaller version of a data warehouse.
A data mart collects data from a small number of
sources and focuses on one subject area. Data marts
are faster and easier to use than data warehouses.
Data marts typically function as a subset of a data
warehouse to focus on one area for analytical purposes,
such as specific department within an organization.
Data marts are used to help make business decisions
by helping with analysis and reporting.
Advantages and Disadvantages of Data
Warehousing
Advantages
• Provides fact-based analysis on past company performance to inform
decision-making.
• Serves as a historical archive of relevant data.
• Can be shared across key departments for maximum usefulness.
Disadvantages
• Creating and maintaining the warehouse is resource-heavy.
• Input errors can damage the integrity of the information archived.
• Use of multiple sources can cause inconsistencies in the data.
Is SQL a Data Warehouse?
SQL, or Structured Query Language, is a computer language
that is used to interact with a database in terms that it can
understand and respond to. It contains a number of commands
such as “select”, “insert”, and “update”. It is the standard
language for relational database management systems.
What is ETL in a Data Warehouse?
“ETL” stands for “extract, transform, and load”. ETL is a data
process that combines data from multiple sources into one
single data storage unit, which is then loaded into a data
warehouse or similar data system. It is used in data analytics
and machine learning.
Thank you!!!

More Related Content

Similar to DATA WAREHOUSING.2.pptx

presentationofism-complete-1-100227093028-phpapp01.pptx
presentationofism-complete-1-100227093028-phpapp01.pptxpresentationofism-complete-1-100227093028-phpapp01.pptx
presentationofism-complete-1-100227093028-phpapp01.pptxvipush1
 
DWDM Unit 1 (1).pptx
DWDM Unit 1 (1).pptxDWDM Unit 1 (1).pptx
DWDM Unit 1 (1).pptxSalehaMariyam
 
Data ware housing - Introduction to data ware housing process.
Data ware housing - Introduction to data ware housing process.Data ware housing - Introduction to data ware housing process.
Data ware housing - Introduction to data ware housing process.Vibrant Technologies & Computers
 
20IT501_DWDM_PPT_Unit_I.ppt
20IT501_DWDM_PPT_Unit_I.ppt20IT501_DWDM_PPT_Unit_I.ppt
20IT501_DWDM_PPT_Unit_I.pptPalaniKumarR2
 
Data warehouse introduction
Data warehouse introductionData warehouse introduction
Data warehouse introductionMurli Jha
 
Data warehousing and data mart
Data warehousing and data martData warehousing and data mart
Data warehousing and data martAmit Sarkar
 
20IT501_DWDM_PPT_Unit_I.ppt
20IT501_DWDM_PPT_Unit_I.ppt20IT501_DWDM_PPT_Unit_I.ppt
20IT501_DWDM_PPT_Unit_I.pptSumathiG8
 
Introduction to Data Warehouse
Introduction to Data WarehouseIntroduction to Data Warehouse
Introduction to Data WarehouseSOMASUNDARAM T
 
Module 1_Data Warehousing Fundamentals.pptx
Module 1_Data Warehousing Fundamentals.pptxModule 1_Data Warehousing Fundamentals.pptx
Module 1_Data Warehousing Fundamentals.pptxnikshaikh786
 
Manish tripathi-ea-dw-bi
Manish tripathi-ea-dw-biManish tripathi-ea-dw-bi
Manish tripathi-ea-dw-biA P
 
20IT501_DWDM_PPT_Unit_I.ppt
20IT501_DWDM_PPT_Unit_I.ppt20IT501_DWDM_PPT_Unit_I.ppt
20IT501_DWDM_PPT_Unit_I.pptSamPrem3
 
Data mining & data warehousing (ppt)
Data mining & data warehousing (ppt)Data mining & data warehousing (ppt)
Data mining & data warehousing (ppt)Harish Chand
 
Introduction to Data warehouse
Introduction to Data warehouseIntroduction to Data warehouse
Introduction to Data warehouseSwapnilSaurav7
 
Introduction to data mining and data warehousing
Introduction to data mining and data warehousingIntroduction to data mining and data warehousing
Introduction to data mining and data warehousingEr. Nawaraj Bhandari
 

Similar to DATA WAREHOUSING.2.pptx (20)

Data Warehouse
Data Warehouse Data Warehouse
Data Warehouse
 
presentationofism-complete-1-100227093028-phpapp01.pptx
presentationofism-complete-1-100227093028-phpapp01.pptxpresentationofism-complete-1-100227093028-phpapp01.pptx
presentationofism-complete-1-100227093028-phpapp01.pptx
 
Data Mining
Data MiningData Mining
Data Mining
 
DWDM Unit 1 (1).pptx
DWDM Unit 1 (1).pptxDWDM Unit 1 (1).pptx
DWDM Unit 1 (1).pptx
 
Data ware housing - Introduction to data ware housing process.
Data ware housing - Introduction to data ware housing process.Data ware housing - Introduction to data ware housing process.
Data ware housing - Introduction to data ware housing process.
 
20IT501_DWDM_PPT_Unit_I.ppt
20IT501_DWDM_PPT_Unit_I.ppt20IT501_DWDM_PPT_Unit_I.ppt
20IT501_DWDM_PPT_Unit_I.ppt
 
Data warehouse introduction
Data warehouse introductionData warehouse introduction
Data warehouse introduction
 
Data warehousing
Data warehousingData warehousing
Data warehousing
 
Data warehousing and data mart
Data warehousing and data martData warehousing and data mart
Data warehousing and data mart
 
20IT501_DWDM_PPT_Unit_I.ppt
20IT501_DWDM_PPT_Unit_I.ppt20IT501_DWDM_PPT_Unit_I.ppt
20IT501_DWDM_PPT_Unit_I.ppt
 
Introduction to Data Warehouse
Introduction to Data WarehouseIntroduction to Data Warehouse
Introduction to Data Warehouse
 
Unit 1
Unit 1Unit 1
Unit 1
 
Module 1_Data Warehousing Fundamentals.pptx
Module 1_Data Warehousing Fundamentals.pptxModule 1_Data Warehousing Fundamentals.pptx
Module 1_Data Warehousing Fundamentals.pptx
 
Manish tripathi-ea-dw-bi
Manish tripathi-ea-dw-biManish tripathi-ea-dw-bi
Manish tripathi-ea-dw-bi
 
20IT501_DWDM_PPT_Unit_I.ppt
20IT501_DWDM_PPT_Unit_I.ppt20IT501_DWDM_PPT_Unit_I.ppt
20IT501_DWDM_PPT_Unit_I.ppt
 
Oracle sql plsql & dw
Oracle sql plsql & dwOracle sql plsql & dw
Oracle sql plsql & dw
 
Data mining & data warehousing (ppt)
Data mining & data warehousing (ppt)Data mining & data warehousing (ppt)
Data mining & data warehousing (ppt)
 
Data mining notes
Data mining notesData mining notes
Data mining notes
 
Introduction to Data warehouse
Introduction to Data warehouseIntroduction to Data warehouse
Introduction to Data warehouse
 
Introduction to data mining and data warehousing
Introduction to data mining and data warehousingIntroduction to data mining and data warehousing
Introduction to data mining and data warehousing
 

Recently uploaded

ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...
ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...
ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...JhezDiaz1
 
Meghan Sutherland In Media Res Media Component
Meghan Sutherland In Media Res Media ComponentMeghan Sutherland In Media Res Media Component
Meghan Sutherland In Media Res Media ComponentInMediaRes1
 
Earth Day Presentation wow hello nice great
Earth Day Presentation wow hello nice greatEarth Day Presentation wow hello nice great
Earth Day Presentation wow hello nice greatYousafMalik24
 
Pharmacognosy Flower 3. Compositae 2023.pdf
Pharmacognosy Flower 3. Compositae 2023.pdfPharmacognosy Flower 3. Compositae 2023.pdf
Pharmacognosy Flower 3. Compositae 2023.pdfMahmoud M. Sallam
 
Types of Journalistic Writing Grade 8.pptx
Types of Journalistic Writing Grade 8.pptxTypes of Journalistic Writing Grade 8.pptx
Types of Journalistic Writing Grade 8.pptxEyham Joco
 
Computed Fields and api Depends in the Odoo 17
Computed Fields and api Depends in the Odoo 17Computed Fields and api Depends in the Odoo 17
Computed Fields and api Depends in the Odoo 17Celine George
 
Procuring digital preservation CAN be quick and painless with our new dynamic...
Procuring digital preservation CAN be quick and painless with our new dynamic...Procuring digital preservation CAN be quick and painless with our new dynamic...
Procuring digital preservation CAN be quick and painless with our new dynamic...Jisc
 
Alper Gobel In Media Res Media Component
Alper Gobel In Media Res Media ComponentAlper Gobel In Media Res Media Component
Alper Gobel In Media Res Media ComponentInMediaRes1
 
Framing an Appropriate Research Question 6b9b26d93da94caf993c038d9efcdedb.pdf
Framing an Appropriate Research Question 6b9b26d93da94caf993c038d9efcdedb.pdfFraming an Appropriate Research Question 6b9b26d93da94caf993c038d9efcdedb.pdf
Framing an Appropriate Research Question 6b9b26d93da94caf993c038d9efcdedb.pdfUjwalaBharambe
 
Employee wellbeing at the workplace.pptx
Employee wellbeing at the workplace.pptxEmployee wellbeing at the workplace.pptx
Employee wellbeing at the workplace.pptxNirmalaLoungPoorunde1
 
DATA STRUCTURE AND ALGORITHM for beginners
DATA STRUCTURE AND ALGORITHM for beginnersDATA STRUCTURE AND ALGORITHM for beginners
DATA STRUCTURE AND ALGORITHM for beginnersSabitha Banu
 
MICROBIOLOGY biochemical test detailed.pptx
MICROBIOLOGY biochemical test detailed.pptxMICROBIOLOGY biochemical test detailed.pptx
MICROBIOLOGY biochemical test detailed.pptxabhijeetpadhi001
 
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...Marc Dusseiller Dusjagr
 
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️9953056974 Low Rate Call Girls In Saket, Delhi NCR
 
CELL CYCLE Division Science 8 quarter IV.pptx
CELL CYCLE Division Science 8 quarter IV.pptxCELL CYCLE Division Science 8 quarter IV.pptx
CELL CYCLE Division Science 8 quarter IV.pptxJiesonDelaCerna
 
Historical philosophical, theoretical, and legal foundations of special and i...
Historical philosophical, theoretical, and legal foundations of special and i...Historical philosophical, theoretical, and legal foundations of special and i...
Historical philosophical, theoretical, and legal foundations of special and i...jaredbarbolino94
 
Roles & Responsibilities in Pharmacovigilance
Roles & Responsibilities in PharmacovigilanceRoles & Responsibilities in Pharmacovigilance
Roles & Responsibilities in PharmacovigilanceSamikshaHamane
 

Recently uploaded (20)

ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...
ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...
ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...
 
Meghan Sutherland In Media Res Media Component
Meghan Sutherland In Media Res Media ComponentMeghan Sutherland In Media Res Media Component
Meghan Sutherland In Media Res Media Component
 
Earth Day Presentation wow hello nice great
Earth Day Presentation wow hello nice greatEarth Day Presentation wow hello nice great
Earth Day Presentation wow hello nice great
 
Pharmacognosy Flower 3. Compositae 2023.pdf
Pharmacognosy Flower 3. Compositae 2023.pdfPharmacognosy Flower 3. Compositae 2023.pdf
Pharmacognosy Flower 3. Compositae 2023.pdf
 
Model Call Girl in Tilak Nagar Delhi reach out to us at 🔝9953056974🔝
Model Call Girl in Tilak Nagar Delhi reach out to us at 🔝9953056974🔝Model Call Girl in Tilak Nagar Delhi reach out to us at 🔝9953056974🔝
Model Call Girl in Tilak Nagar Delhi reach out to us at 🔝9953056974🔝
 
OS-operating systems- ch04 (Threads) ...
OS-operating systems- ch04 (Threads) ...OS-operating systems- ch04 (Threads) ...
OS-operating systems- ch04 (Threads) ...
 
Model Call Girl in Bikash Puri Delhi reach out to us at 🔝9953056974🔝
Model Call Girl in Bikash Puri  Delhi reach out to us at 🔝9953056974🔝Model Call Girl in Bikash Puri  Delhi reach out to us at 🔝9953056974🔝
Model Call Girl in Bikash Puri Delhi reach out to us at 🔝9953056974🔝
 
Types of Journalistic Writing Grade 8.pptx
Types of Journalistic Writing Grade 8.pptxTypes of Journalistic Writing Grade 8.pptx
Types of Journalistic Writing Grade 8.pptx
 
Computed Fields and api Depends in the Odoo 17
Computed Fields and api Depends in the Odoo 17Computed Fields and api Depends in the Odoo 17
Computed Fields and api Depends in the Odoo 17
 
Procuring digital preservation CAN be quick and painless with our new dynamic...
Procuring digital preservation CAN be quick and painless with our new dynamic...Procuring digital preservation CAN be quick and painless with our new dynamic...
Procuring digital preservation CAN be quick and painless with our new dynamic...
 
Alper Gobel In Media Res Media Component
Alper Gobel In Media Res Media ComponentAlper Gobel In Media Res Media Component
Alper Gobel In Media Res Media Component
 
Framing an Appropriate Research Question 6b9b26d93da94caf993c038d9efcdedb.pdf
Framing an Appropriate Research Question 6b9b26d93da94caf993c038d9efcdedb.pdfFraming an Appropriate Research Question 6b9b26d93da94caf993c038d9efcdedb.pdf
Framing an Appropriate Research Question 6b9b26d93da94caf993c038d9efcdedb.pdf
 
Employee wellbeing at the workplace.pptx
Employee wellbeing at the workplace.pptxEmployee wellbeing at the workplace.pptx
Employee wellbeing at the workplace.pptx
 
DATA STRUCTURE AND ALGORITHM for beginners
DATA STRUCTURE AND ALGORITHM for beginnersDATA STRUCTURE AND ALGORITHM for beginners
DATA STRUCTURE AND ALGORITHM for beginners
 
MICROBIOLOGY biochemical test detailed.pptx
MICROBIOLOGY biochemical test detailed.pptxMICROBIOLOGY biochemical test detailed.pptx
MICROBIOLOGY biochemical test detailed.pptx
 
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
 
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️
 
CELL CYCLE Division Science 8 quarter IV.pptx
CELL CYCLE Division Science 8 quarter IV.pptxCELL CYCLE Division Science 8 quarter IV.pptx
CELL CYCLE Division Science 8 quarter IV.pptx
 
Historical philosophical, theoretical, and legal foundations of special and i...
Historical philosophical, theoretical, and legal foundations of special and i...Historical philosophical, theoretical, and legal foundations of special and i...
Historical philosophical, theoretical, and legal foundations of special and i...
 
Roles & Responsibilities in Pharmacovigilance
Roles & Responsibilities in PharmacovigilanceRoles & Responsibilities in Pharmacovigilance
Roles & Responsibilities in Pharmacovigilance
 

DATA WAREHOUSING.2.pptx

  • 2. WHAT IS DATA WAREHOUSING? -The concept of data warehousing was introduced in 1988 by IBM researchers Barry Devlin and Paul Murphy. -The term “Data Warehouse” was first coined by Bill Inmon in 1990. According to Inmon, a data warehouse is a subject oriented, integrated, time-variant, and non volatile collection of data. -It is the process of constructing and using a data warehouse. -It involves data cleaning, data integration, and data
  • 3. WHAT IS DATA WAREHOUSE? -It is constructed by integrating data from multiple heterogeneous sources that support analytical reporting, structured and/or ad hoc queries, and decision making. -It is the secure electronic storage of information by a business or other organization. -A vital component of business intelligence. -An information storage system for historical data that can be analyzed in numerous ways.
  • 4. •An operational database undergoes frequent changes on a daily basis on account of transactions that take place. •A data warehouses provides generalized and consolidated data in multidimensional view. Along with generalized and consolidated view of data, a data warehouse also provides Online Analytical Processing (OLAP) tools. These tools help us in interactive and effective analysis of data in a multidimensional space. This analysis results in data generalization and data mining. •Data mining functions as association, clustering,
  • 5. Understanding a Data Warehouse • A data warehouse is a database, which is kept separate from the organization’s operational database; • There is no frequent updating done in a data warehouse; • It possesses consolidated historical data, which helps the organization to analyze its business; • A data warehouse helps executives to organize, understand, and use their data to take strategic decisions; • Data warehouse systems helps in the integration of diversity of application systems; and • A data warehouse system helps in consolidated historical data analysis.
  • 6. Using Data Warehouse Information •Tuning Production Strategies – the product strategies can be well tuned by repositioning the products and managing the product portfolios by comparing the sales quarterly or yearly. •Customer Analysis – Customer analysis is done by analyzing the customer’s buying preferences, buying time, budget cycles, etc. •Operations Analysis – Data warehousing also helps in customer relationship management, and making environmental corrections. The information also
  • 7. Why a Data Warehouse is Separated from Operational Databases • An operational database is constructed for well-known tasks and workloads such as searching particular records, indexing, etc. In contract, data warehouse queries are often complex and they present a general form of data; • Operational databases support concurrent processing of multiple transactions. Concurrency control and recovery mechanisms are required for operational databases to ensure robustness and consistency of the database; • An operational database query allows to read and modify operations, while an OLAP query needs only read only access of stored data; and
  • 8. Data Warehouse Features • Subject Oriented – a data warehouse is subject oriented because it provides information around a subject rather than the organization’s ongoing operations. • Integrated – a data warehouse is constructed by integrating data from heterogenous sources such as relational databases, flat files, etc. • Time Variant – the data collected in a data warehouse is identified with a particular time period. • Non-Volatile – Non-volatile means the previous data is not erased when new data is added to it.
  • 9. Stages in Creating a Data Warehouse • Determining the business objectives and its key performance indicators. • Collecting and analyzing the appropriate information. • Identifying the core business process that contribute the key data. • Constructing a conceptual data model that shows how the data are displayed to the end-user. • Locating the sources of data and establishing a process for feeding data into the warehouse. • Establish a tracking duration. Data warehouses can become unwieldy. Many are built with level of archiving, so that older information is retained in less detail • Implementing the plan.
  • 10. Maintaining a Data Warehouse One step is data extraction, which involves gathering large amounts of data from multiple source points. After a set of data has been compiled, it goes through data cleaning, the process of combining through it for errors and correcting or excluding any that are found. The cleaned-up data is then converted from a database format to warehouse format. Once stored in the warehouse, the data goes through sorting, consolidating, and summarizing, so that it will be easier to use. Today, businesses can invest in cloud- based data warehouse software services from companies including Microsoft, Google, Amazon, and Oracle, among others.
  • 11. Data Warehouse Applications Data warehouses are widely used in the following fields: •Financial services •Banking services •Consumer goods •Retail sectors •Controlled manufacturing
  • 12. Types of Data Warehouse •Information Processing – a data warehouse allows to process the data stored in it. This data can be processed by means of querying, basic statistical analysis, reporting using crosstabs, tables, charts or graphs. •Analytical Processing – a data warehouse supports analytical processing of the information stored in it. The data can be analyzed by means of basic OLAP operations, including slice-and-dice, drill down, drill up, and pivoting. •Data Mining – Data mining supports knowledge discovery by finding hidden patterns and associations,
  • 13. Sr. No. Data Warehouse (OLAP) Operational Database (OLTP) 1 It involves historical processing of information. It involves day-to-day processing. 2 OLAP systems are used by knowledge workers such as executives, managers, and analysts. OLTP systems are used by clerks, DBAs, or database professionals. 3 It is used to analyze the business. It is used to run the business. 4 It focuses on Information out. It focuses on Data in. 5 It is based on Star Schema, Snowflake Schema, and Fact Constellation Schema. It is based on Entity Relationship Model. 6 It focuses on Information out. It is application oriented. 7 It contains historical data. It contains current data. 8 It provides summarized and consolidated data. It provides primitive and highly detailed data. 9 It provides summarized and multidimensional view of data. It provides detailed and flat relational view of data. 10 The number of users is in hundreds. The number of users is in thousands. 11 The number of records accessed is in millions. The number of records accessed in tens. 12 The database size is from 100 GB to 100 TB. The database size is from 100 MB to 100
  • 14. 5 Steps of Data Mining 1. An organization collects data and loads it into a data warehouse. 2. The data are then stored and managed, either on in- house servers or in a cloud service. 3. Business analysts, management teams, and information technology professionals access and organize the data. 4. Application software sorts the data. 5. The end-user presents the data in an easy-to-share format, such as graph or table.
  • 15. Functions of Data Warehouse Tools and Utilities • Data Extraction – involves gathering data from multiple heterogeneous sources. • Data Cleaning – involves finding and correcting the errors in data. • Data Transformation – involves converting the data from legacy format to warehouse format. • Data Loading – involves sorting, summarizing, consolidating, checking integrity, and building indices and partitions. • Refreshing – involves updating from the data sources to warehouse.
  • 16. Data Warehouse Architecture Single-tier Architecture: Single-tier architecture is hardly used in the creation of data warehouses for real-time systems. Two-tier Architecture: In a two-tier architecture design, the analytical process is separated from the business process. Three-tier Architecture: A three-tier architecture design has a top, middle, and bottom tier; these are
  • 17. Data Warehouse vs. Database A data warehouse is not the same as a database: • A database is a transactional system that monitors and updates real-time data in order to have only the most recent data available. • A data warehouse is programmed to aggregate structured data over time. For example, a database might only have the most recent address of a customer, while a data warehouse might have all the addresses of the customer for the past 10 years. Data mining relies on the data warehouse. The data in the
  • 18. Data Warehouse vs. Data Lake Data lake holds raw data of which the goal has not yet been determined, while data warehouse hold refined data that has been filtered to be used for a specific purpose. Data lakes are primarily used by data scientists while data warehouses are most often used by business professionals. Data lakes are also more easily accessible and easier to update while data warehouses are more structured and any changes
  • 19. Data Warehouse vs. Data Mart Data mart is just a smaller version of a data warehouse. A data mart collects data from a small number of sources and focuses on one subject area. Data marts are faster and easier to use than data warehouses. Data marts typically function as a subset of a data warehouse to focus on one area for analytical purposes, such as specific department within an organization. Data marts are used to help make business decisions by helping with analysis and reporting.
  • 20. Advantages and Disadvantages of Data Warehousing Advantages • Provides fact-based analysis on past company performance to inform decision-making. • Serves as a historical archive of relevant data. • Can be shared across key departments for maximum usefulness. Disadvantages • Creating and maintaining the warehouse is resource-heavy. • Input errors can damage the integrity of the information archived. • Use of multiple sources can cause inconsistencies in the data.
  • 21. Is SQL a Data Warehouse? SQL, or Structured Query Language, is a computer language that is used to interact with a database in terms that it can understand and respond to. It contains a number of commands such as “select”, “insert”, and “update”. It is the standard language for relational database management systems. What is ETL in a Data Warehouse? “ETL” stands for “extract, transform, and load”. ETL is a data process that combines data from multiple sources into one single data storage unit, which is then loaded into a data warehouse or similar data system. It is used in data analytics and machine learning.