SlideShare a Scribd company logo
1 of 14
DATA
 WAREHOUSING


PURIFICATION OF
    DATA
         NADAR MISPA PAULRAJ
DATA IN THE DATA WAREHOUSE

Data warehouse is
 the collection of
 data marts as
 shown in the figure

Data in the data
 warehouse are
 from different
 source .

They are
 integreted
 together
TYPES OF DATA IN THE
                     DATA WAREHOUSE
                                                       rec
                       sec                                or
                          on                                ds
                             d   ary
                a                      dat
           d at                           a
   m ary
pri

                                               e   s
                                            ag
                                         im
                                                                 charts
OPERATIONS ON DATA
The available data are
processed in the
staging area.

i.e. ETL process

To increase the data
consistency and to
increase the scope of
data for strategic
information
DATA AFTER
            ETL PROCESS
 Even though, the data are processed in the
 staging area and made available for the end
 user. The data purity cannot be calculated and
 set to 100% .

 The level of data quality is rare.


Thus data purification process is
important
PURIFICATION PROCESS
Purification Process Is
Unpredictable i.e. We Can’t
Have Idea How To Purify And
                              SINCE DATA IN
When To Stop Purification
                              THE DATA
Process On Particular Data.
                              WAREHOUSE IS
                              LARGE IN
                              NUMBER
WAY TO PURIFY HUGE DATA
STEP 1

THE DATA IS DIVIDED INTO DIFFERENT
CATEGORIES ACCORDING TO THEIR
PRIORITY
               HUGE DATA



                  PRIORITY



                                     LOW
   HIGH           MEDIUM
HUGE DATA




DIVIDED DATA
STEP 2


Process Each Data According To Its Priority

Such As …..

Data In The High Priority Should Be Purified 100%


Data In The Medium Priority
Should Be Purified 50%


                  Data In The Low Priority Can Be
                  Left As Such No Problem
STEP 3

    ELIMINATION OF REDUNDENT DATA


The Main Reason Of Data Corruption i.e.
Impurity Of Data Is Caused Due To
Duplication Of Data .

Example: record of a person in multiple
name or in different format
Necessary things during
purification of data:

knowledge to differentiate data

Select tools for data purification

Review each data after
purification.
                                     Data is ready to use with high
                                     scope
Priority should b maintained.

Schedule i.e. is time period of
purification should be conformed.
Data is ready to
use…
THANKYOU…!!!

More Related Content

Similar to Purification of data in data warehouse after etl process

Introduction to dm and dw
Introduction to dm and dwIntroduction to dm and dw
Introduction to dm and dwANUSUYA T K
 
Data mining
Data miningData mining
Data miningSilicon
 
Data pre processing
Data pre processingData pre processing
Data pre processingpommurajopt
 
Combining Data Lake and Data Wrangling for Ensuring Data Quality in CRIS
Combining Data Lake and Data Wrangling for Ensuring Data Quality in CRISCombining Data Lake and Data Wrangling for Ensuring Data Quality in CRIS
Combining Data Lake and Data Wrangling for Ensuring Data Quality in CRISAnastasija Nikiforova
 
Duplicate Detection of Records in Queries using Clustering
Duplicate Detection of Records in Queries using ClusteringDuplicate Detection of Records in Queries using Clustering
Duplicate Detection of Records in Queries using ClusteringIJORCS
 
Dm data pre processing
Dm data pre processingDm data pre processing
Dm data pre processingSangeethaSasi1
 
DM Lecture 3
DM Lecture 3DM Lecture 3
DM Lecture 3asad199
 
DATA CLEANING.pdf
DATA CLEANING.pdfDATA CLEANING.pdf
DATA CLEANING.pdfRumanaAykiz
 
Security issues in big data
Security issues in big data Security issues in big data
Security issues in big data Shallote Dsouza
 
Presentation of DBMS (database management system) part 2
Presentation of DBMS (database management system) part 2Presentation of DBMS (database management system) part 2
Presentation of DBMS (database management system) part 2Junaid Nadeem
 
Data Cleaning Best Practices.pdf
Data Cleaning Best Practices.pdfData Cleaning Best Practices.pdf
Data Cleaning Best Practices.pdfUncodemy
 
Data Preprocessing || Data Mining
Data Preprocessing || Data MiningData Preprocessing || Data Mining
Data Preprocessing || Data MiningIffat Firozy
 

Similar to Purification of data in data warehouse after etl process (20)

Hi2413031309
Hi2413031309Hi2413031309
Hi2413031309
 
Introduction to dm and dw
Introduction to dm and dwIntroduction to dm and dw
Introduction to dm and dw
 
Information Management
Information ManagementInformation Management
Information Management
 
Data mining
Data miningData mining
Data mining
 
Data pre processing
Data pre processingData pre processing
Data pre processing
 
Combining Data Lake and Data Wrangling for Ensuring Data Quality in CRIS
Combining Data Lake and Data Wrangling for Ensuring Data Quality in CRISCombining Data Lake and Data Wrangling for Ensuring Data Quality in CRIS
Combining Data Lake and Data Wrangling for Ensuring Data Quality in CRIS
 
Data science unit1
Data science unit1Data science unit1
Data science unit1
 
Duplicate Detection of Records in Queries using Clustering
Duplicate Detection of Records in Queries using ClusteringDuplicate Detection of Records in Queries using Clustering
Duplicate Detection of Records in Queries using Clustering
 
Dm data pre processing
Dm data pre processingDm data pre processing
Dm data pre processing
 
DM Lecture 3
DM Lecture 3DM Lecture 3
DM Lecture 3
 
My3prep
My3prepMy3prep
My3prep
 
Data Preparation.pptx
Data Preparation.pptxData Preparation.pptx
Data Preparation.pptx
 
DATA CLEANING.pdf
DATA CLEANING.pdfDATA CLEANING.pdf
DATA CLEANING.pdf
 
M.Florence Dayana
M.Florence DayanaM.Florence Dayana
M.Florence Dayana
 
Security issues in big data
Security issues in big data Security issues in big data
Security issues in big data
 
Presentation of DBMS (database management system) part 2
Presentation of DBMS (database management system) part 2Presentation of DBMS (database management system) part 2
Presentation of DBMS (database management system) part 2
 
Preprocess
PreprocessPreprocess
Preprocess
 
9. Data Warehousing & Mining.pptx
9. Data Warehousing & Mining.pptx9. Data Warehousing & Mining.pptx
9. Data Warehousing & Mining.pptx
 
Data Cleaning Best Practices.pdf
Data Cleaning Best Practices.pdfData Cleaning Best Practices.pdf
Data Cleaning Best Practices.pdf
 
Data Preprocessing || Data Mining
Data Preprocessing || Data MiningData Preprocessing || Data Mining
Data Preprocessing || Data Mining
 

Recently uploaded

Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111Sapana Sha
 
Nutritional Needs Presentation - HLTH 104
Nutritional Needs Presentation - HLTH 104Nutritional Needs Presentation - HLTH 104
Nutritional Needs Presentation - HLTH 104misteraugie
 
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptxSOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptxiammrhaywood
 
Sanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdfSanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdfsanyamsingh5019
 
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...EduSkills OECD
 
Advanced Views - Calendar View in Odoo 17
Advanced Views - Calendar View in Odoo 17Advanced Views - Calendar View in Odoo 17
Advanced Views - Calendar View in Odoo 17Celine George
 
mini mental status format.docx
mini    mental       status     format.docxmini    mental       status     format.docx
mini mental status format.docxPoojaSen20
 
Introduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The BasicsIntroduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The BasicsTechSoup
 
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptxPOINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptxSayali Powar
 
social pharmacy d-pharm 1st year by Pragati K. Mahajan
social pharmacy d-pharm 1st year by Pragati K. Mahajansocial pharmacy d-pharm 1st year by Pragati K. Mahajan
social pharmacy d-pharm 1st year by Pragati K. Mahajanpragatimahajan3
 
APM Welcome, APM North West Network Conference, Synergies Across Sectors
APM Welcome, APM North West Network Conference, Synergies Across SectorsAPM Welcome, APM North West Network Conference, Synergies Across Sectors
APM Welcome, APM North West Network Conference, Synergies Across SectorsAssociation for Project Management
 
Activity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfActivity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfciinovamais
 
Mastering the Unannounced Regulatory Inspection
Mastering the Unannounced Regulatory InspectionMastering the Unannounced Regulatory Inspection
Mastering the Unannounced Regulatory InspectionSafetyChain Software
 
The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13Steve Thomason
 
Beyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global ImpactBeyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global ImpactPECB
 
1029 - Danh muc Sach Giao Khoa 10 . pdf
1029 -  Danh muc Sach Giao Khoa 10 . pdf1029 -  Danh muc Sach Giao Khoa 10 . pdf
1029 - Danh muc Sach Giao Khoa 10 . pdfQucHHunhnh
 
9548086042 for call girls in Indira Nagar with room service
9548086042  for call girls in Indira Nagar  with room service9548086042  for call girls in Indira Nagar  with room service
9548086042 for call girls in Indira Nagar with room servicediscovermytutordmt
 

Recently uploaded (20)

Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111
 
Advance Mobile Application Development class 07
Advance Mobile Application Development class 07Advance Mobile Application Development class 07
Advance Mobile Application Development class 07
 
Nutritional Needs Presentation - HLTH 104
Nutritional Needs Presentation - HLTH 104Nutritional Needs Presentation - HLTH 104
Nutritional Needs Presentation - HLTH 104
 
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptxSOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
 
Sanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdfSanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdf
 
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
 
Advanced Views - Calendar View in Odoo 17
Advanced Views - Calendar View in Odoo 17Advanced Views - Calendar View in Odoo 17
Advanced Views - Calendar View in Odoo 17
 
mini mental status format.docx
mini    mental       status     format.docxmini    mental       status     format.docx
mini mental status format.docx
 
Código Creativo y Arte de Software | Unidad 1
Código Creativo y Arte de Software | Unidad 1Código Creativo y Arte de Software | Unidad 1
Código Creativo y Arte de Software | Unidad 1
 
Introduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The BasicsIntroduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The Basics
 
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptxPOINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
 
social pharmacy d-pharm 1st year by Pragati K. Mahajan
social pharmacy d-pharm 1st year by Pragati K. Mahajansocial pharmacy d-pharm 1st year by Pragati K. Mahajan
social pharmacy d-pharm 1st year by Pragati K. Mahajan
 
APM Welcome, APM North West Network Conference, Synergies Across Sectors
APM Welcome, APM North West Network Conference, Synergies Across SectorsAPM Welcome, APM North West Network Conference, Synergies Across Sectors
APM Welcome, APM North West Network Conference, Synergies Across Sectors
 
INDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptx
INDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptxINDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptx
INDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptx
 
Activity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfActivity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdf
 
Mastering the Unannounced Regulatory Inspection
Mastering the Unannounced Regulatory InspectionMastering the Unannounced Regulatory Inspection
Mastering the Unannounced Regulatory Inspection
 
The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13
 
Beyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global ImpactBeyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global Impact
 
1029 - Danh muc Sach Giao Khoa 10 . pdf
1029 -  Danh muc Sach Giao Khoa 10 . pdf1029 -  Danh muc Sach Giao Khoa 10 . pdf
1029 - Danh muc Sach Giao Khoa 10 . pdf
 
9548086042 for call girls in Indira Nagar with room service
9548086042  for call girls in Indira Nagar  with room service9548086042  for call girls in Indira Nagar  with room service
9548086042 for call girls in Indira Nagar with room service
 

Purification of data in data warehouse after etl process

  • 1. DATA WAREHOUSING PURIFICATION OF DATA NADAR MISPA PAULRAJ
  • 2. DATA IN THE DATA WAREHOUSE Data warehouse is the collection of data marts as shown in the figure Data in the data warehouse are from different source . They are integreted together
  • 3. TYPES OF DATA IN THE DATA WAREHOUSE rec sec or on ds d ary a dat d at a m ary pri e s ag im charts
  • 4.
  • 5. OPERATIONS ON DATA The available data are processed in the staging area. i.e. ETL process To increase the data consistency and to increase the scope of data for strategic information
  • 6. DATA AFTER ETL PROCESS Even though, the data are processed in the staging area and made available for the end user. The data purity cannot be calculated and set to 100% . The level of data quality is rare. Thus data purification process is important
  • 7. PURIFICATION PROCESS Purification Process Is Unpredictable i.e. We Can’t Have Idea How To Purify And SINCE DATA IN When To Stop Purification THE DATA Process On Particular Data. WAREHOUSE IS LARGE IN NUMBER
  • 8. WAY TO PURIFY HUGE DATA STEP 1 THE DATA IS DIVIDED INTO DIFFERENT CATEGORIES ACCORDING TO THEIR PRIORITY HUGE DATA PRIORITY LOW HIGH MEDIUM
  • 10. STEP 2 Process Each Data According To Its Priority Such As ….. Data In The High Priority Should Be Purified 100% Data In The Medium Priority Should Be Purified 50% Data In The Low Priority Can Be Left As Such No Problem
  • 11. STEP 3 ELIMINATION OF REDUNDENT DATA The Main Reason Of Data Corruption i.e. Impurity Of Data Is Caused Due To Duplication Of Data . Example: record of a person in multiple name or in different format
  • 12. Necessary things during purification of data: knowledge to differentiate data Select tools for data purification Review each data after purification. Data is ready to use with high scope Priority should b maintained. Schedule i.e. is time period of purification should be conformed.
  • 13. Data is ready to use…