2. DATA IN THE DATA WAREHOUSE
Data warehouse is
the collection of
data marts as
shown in the figure
Data in the data
warehouse are
from different
source .
They are
integreted
together
3. TYPES OF DATA IN THE
DATA WAREHOUSE
rec
sec or
on ds
d ary
a dat
d at a
m ary
pri
e s
ag
im
charts
4.
5. OPERATIONS ON DATA
The available data are
processed in the
staging area.
i.e. ETL process
To increase the data
consistency and to
increase the scope of
data for strategic
information
6. DATA AFTER
ETL PROCESS
Even though, the data are processed in the
staging area and made available for the end
user. The data purity cannot be calculated and
set to 100% .
The level of data quality is rare.
Thus data purification process is
important
7. PURIFICATION PROCESS
Purification Process Is
Unpredictable i.e. We Can’t
Have Idea How To Purify And
SINCE DATA IN
When To Stop Purification
THE DATA
Process On Particular Data.
WAREHOUSE IS
LARGE IN
NUMBER
8. WAY TO PURIFY HUGE DATA
STEP 1
THE DATA IS DIVIDED INTO DIFFERENT
CATEGORIES ACCORDING TO THEIR
PRIORITY
HUGE DATA
PRIORITY
LOW
HIGH MEDIUM
10. STEP 2
Process Each Data According To Its Priority
Such As …..
Data In The High Priority Should Be Purified 100%
Data In The Medium Priority
Should Be Purified 50%
Data In The Low Priority Can Be
Left As Such No Problem
11. STEP 3
ELIMINATION OF REDUNDENT DATA
The Main Reason Of Data Corruption i.e.
Impurity Of Data Is Caused Due To
Duplication Of Data .
Example: record of a person in multiple
name or in different format
12. Necessary things during
purification of data:
knowledge to differentiate data
Select tools for data purification
Review each data after
purification.
Data is ready to use with high
scope
Priority should b maintained.
Schedule i.e. is time period of
purification should be conformed.