SlideShare a Scribd company logo
1 of 39
Data Quality: A Raising Data
Warehousing Concern
Presented by: Chowdhury, Mohammad Aminul Hoque
http://aminchowdhury.info
Data Warehousing
Characteristics of Data Warehouse
• Data warehousing it supports to management on decision
making
• It is Subject Oriented and gives information about a
company's ongoing operations
• Data is gathered in Integrated way into the data warehouse
from a variety of sources and merged into a coherently
• Data warehouse is a Time-variant and is identified with a
particular time period
• It is Non-volatile means stable.
Benefits of a data warehouse
 Maintain data history
 Integrate data from multiple source systems, enabling a central
view
 Improve data quality, by providing codes and descriptions, or
even fixing bad data
 Present the organization's information consistently
 Provide a single common data model for all data source
 Restructure the data to makes sense the users
 Restructure the data to delivers excellent query performance
 Making decision–support queries easier.
Designing of Data Warehouse
 Top-down, bottom-up approaches or a combination of both
 software engineering point of view: Waterfall and Spiral
Conceptual Modeling of Data Warehouses
Modeling data warehouses: dimensions & measures
1. Star schema
2. Snowflake schema
3. Fact constellations
Extract, Transform, Load
(ETL)
Extract
 ETL process involves extracting the data from
the source systems.
 ETL Architecture Pattern
 Most data warehousing projects consolidate
data from different source systems
 Each separate system may also use a different
data organization and/or format
 The goal of the extraction phase is to convert
the data into a single format appropriate for
transformation processing.
Transform
 This stage applies a series of rules to extract data from source
to derive the data for loading into the end target
 Selecting only certain columns to load.
 Translating coded values (e.g., 1 for male and 2 for female)
 Encoding free-form values (e.g., mapping "Male" to "M")
 Deriving a new calculated value
 Sorting
 Joining data from multiple sources (e.g., lookup, merge) and
de-duplicating the data
 Aggregation (e.g summarizing multiple rows of data — total
sales for each store, and region, etc.)
Transform
 Generating surrogate-key values
 Transposing or pivoting
 Splitting a column into multiple columns
 Lookup and validate the relevant data from tables or
referential files for slowly changing dimensions
 Applying any form of simple or complex data validation.
Load
 This phase loads the data into data warehouse.
 This process varies widely. Some data warehouses may
overwrite existing information with cumulative information;
 However, the entry of data for any one year window is made in
a historical manner.
 As the load phase interacts with a database and contribute to
overall data quality performance of the ETL process
 ETL can be used to transform the data into a format suitable
for the new application to use.
Data Quality
Data quality is an essential characteristic that determines the
reliability of data for making decisions.
High-quality data:
Complete
Accurate
Available
Timely
Classification Of Data Quality
Issues
Data Quality Issues at Data Sources
Data Quality Issues at Data Profiling Stage
Data Quality issues at Data Staging ETL
Data Quality Problems at Data Modelling
DATA SOURCE
 The sources of dirty data include data entry error and
update error
 Part of the data comes from text files, part from MS Excel
and from other sources
 Some files are result of manual consolidation of multiple
files as a result of which data quality might be
compromised.
DATA PROFILE
• A process of developing information about data
instead of information from data.
Cont...
Example of Data Profiling
DATA STAGING ETL
• A data cleaning process is executed in the data
staging area to improve the accuracy
• The data staging area is the place where all
grooming is done on data after it is called from
the source systems
• It is a prime location for validating data quality
from source or auditing and tracking down data
issues.
Cont..
DATA MODELLING
• Schema Design of the greatly influences the
quality of the analysis
• Operational applications uses UML class model
for conceptual data modelling
• Issues as slowly changing dimensions, rapidly
changing dimension, and multi valued
dimensions etc.
Cont..
Causes Of Data Quality
CAUSES OF DATA QUALITY PROBLEMS AT DATA SOURCES
• Wrong information entered into source system
• As time and proximity from the source increase, the
chances for getting correct data decrease
• Inability to handle with ageing data contribute to data
quality problems
• Varying timeliness of data sources
• System fields designed to allow free forms (Field not
having adequate length).
• Missing values in data sources
• Additional columns
• Use of different representation formats in data sources
Causes Of Data Quality
CAUSES OF DATA QUALITY PROBLEMS AT DATA
PROFILING
• Unreliable and incomplete metadata of data source
• User Generated SQL queries for the data profiling
purpose leaves the data quality problems.
• Inability of evaluation of data structure, data values
and data relationships before data integration,
propagates poor data quality
• Inability of integration between Data profiling and
ETL causes Data quality problem
• Inappropriate selection of Automated profiling tool
cause data quality issues
• Insufficient structural analysis of the data sources in
the profiling stage.
Cont..
CAUSES OF DATA QUALITY ISSUES AT DATA STAGING AND ETL PHASE
• Different business rules of various data sources Creates
problem of data quality.
• Business rules lack currency contributes to DQ
• Lack of capturing only changes in source files
• Lack of periodical refreshing of the integrated data storage
• Disabling data integrity constraints in data staging tables
cause wrong data and relationships to be extracted
• Purging of data from the Data warehouse cause data quality
problem
• The inability to restart the ETL process from checkpoints
without losing data
• Lack of automatically generating rules for ETL tools to build
mapping that detect and fix data defects
• Unhandled null values causes data quality problem
Cont..
CAUSES OF DATA QUALITY ISSUES AT DATA WAREHOUSE SCHEM A DESIGN
• Incomplete or wrong requirement analysis of the project lead to poor
schema design
• Lack of currency in business rules cause poor requirement analysis
• Choice of dimensional modelling
(STAR,SNOWFLAKE,FACTCONSTALLATION) schema contribute to data
quality.
• Late identification of slowly changing dimensions contribute to data
quality problems.
• Late arriving dimensions cause DQ Problems.
• Multi valued dimensions cause DQ problems
• Incomplete/Wrong identification of facts/dimensions, bridge tables or
relationship tables or their
• Inability to support database schema refactoring cause data quality
problems
DQ TOOLS
REAL TIME INFORMATICA TOOL
Impact of Data Quality Issues
Cost of Poor Data Quality
Confidence and Satisfaction-
based impacts
 Bad quality of data results in low confidence in
forecasting, inconsistent operational and management
reporting.
 Its will cause delayed or improper decisions.
 It impacts satisfaction of customer, employee, or
supplier which leads to decreased organizational trust.
 Ex : An international bank, for example, could not meet
its customer satisfaction goals because agents in its 23
contact centres all followed different operational
processes, using up to 18 different apps — many of which
contained duplicate data — to serve a single customer.
Impact on Productivity
 Workloads : Increased need for reconciliation of reports
 Throughput : Increased time for data gathering and
preparation, reduced time for direct data analysis,
delays in delivering information products, lengthened
production and manufacturing cycles
 Output quality : Mistrusted reports
 Supply chain : Out-of-stock, delivery delays, missed
deliveries, duplicate costs for product delivery
Risk and Compliance impacts
 Risk and compliance impacts associated with credit
assessment, investment risks, competitive risk, capital
investment and/or development, fraud, and leakage,
and compliance with government regulations, industry
expectations, or self-imposed policies (such as privacy
policies).
Ex: Healthcare Systems dealing with sensitive information
about patients’ health condition. The privacy of these
kind of data should be protected.
Examples of Data Quality
Problem• Retail company found over 1m records contained
home tel number of “000000000” and addresses
containing flight numbers
• Insurance company found customer records with
99/99/99 in creation date field of policy
• Car rental company discovered duplicate agreement
numbers in their European data warehouse
• Healthcare company found 9 different values in
gender field
• Food/Beverage retail chain found the same product
was their No 1 and No 2 best sellers across their
business
Example cont..
Example cont..
Example cont..
Why Data Quality Influences?
 Schema Design influences the quality of the analysis
 Poor data handling procedures and processes
 Failure to stick on to data entry and maintenance
procedures
 Errors in the migration process from one system to
another
 External and third-party data that may not fit
Causes of Data Quality Problems
 Dimensional modelling (STAR, SNOWFLAKE, FACTCONSTALLATION) schema
Choosing
 Multi-valued dimensions
 Incomplete/Wrong identification of facts/dimensions, bridge tables or
relationship tables
 Incomplete/missing values
 Corrupted values
 Out of range values
 Wrong data
 Duplicate data
 Dissimilar data formats
 Incompatible structures
Missing Data
 nonresponse, no information is provided
 when data collection improperly
 mistakes in data entry
How to deal
• Imputation
• Reconstruction
• Denial/Remove
• Interpolation
Data Corruption
 Undetected/Silent
 Detected
Out of Range error
 Use specific business rules of various data sources
 Enabling data integrity constraints in data staging
 Providing internal profiling or integration to third-
party data profiling and cleansing tools
 Automatically generating rules for ETL tools to
build mapping
Techniques of Data Quality Control
Data warehousing security
 Appropriate to summaries and aggregates of data
 Exploration data warehouse
 Data encryption and enhancing privacy.
For more information visit
http://aminchowdhury.info

More Related Content

What's hot

Glossaries, Dictionaries, and Catalogs Result in Data Governance
Glossaries, Dictionaries, and Catalogs Result in Data GovernanceGlossaries, Dictionaries, and Catalogs Result in Data Governance
Glossaries, Dictionaries, and Catalogs Result in Data GovernanceDATAVERSITY
 
Data Governance Powerpoint Presentation Slides
Data Governance Powerpoint Presentation SlidesData Governance Powerpoint Presentation Slides
Data Governance Powerpoint Presentation SlidesSlideTeam
 
Data Catalog for Better Data Discovery and Governance
Data Catalog for Better Data Discovery and GovernanceData Catalog for Better Data Discovery and Governance
Data Catalog for Better Data Discovery and GovernanceDenodo
 
Data at the Speed of Business with Data Mastering and Governance
Data at the Speed of Business with Data Mastering and GovernanceData at the Speed of Business with Data Mastering and Governance
Data at the Speed of Business with Data Mastering and GovernanceDATAVERSITY
 
Big data architectures and the data lake
Big data architectures and the data lakeBig data architectures and the data lake
Big data architectures and the data lakeJames Serra
 
Data Virtualization: An Introduction
Data Virtualization: An IntroductionData Virtualization: An Introduction
Data Virtualization: An IntroductionDenodo
 
Data quality metrics infographic
Data quality metrics infographicData quality metrics infographic
Data quality metrics infographicIntellspot
 
Data Governance
Data GovernanceData Governance
Data GovernanceBoris Otto
 
Data Governance and Metadata Management
Data Governance and Metadata ManagementData Governance and Metadata Management
Data Governance and Metadata Management DATAVERSITY
 
Big Data
Big DataBig Data
Big DataNGDATA
 
Data Lake Overview
Data Lake OverviewData Lake Overview
Data Lake OverviewJames Serra
 
You Need a Data Catalog. Do You Know Why?
You Need a Data Catalog. Do You Know Why?You Need a Data Catalog. Do You Know Why?
You Need a Data Catalog. Do You Know Why?Precisely
 
Data Modeling & Metadata Management
Data Modeling & Metadata ManagementData Modeling & Metadata Management
Data Modeling & Metadata ManagementDATAVERSITY
 
Data Governance — Aligning Technical and Business Approaches
Data Governance — Aligning Technical and Business ApproachesData Governance — Aligning Technical and Business Approaches
Data Governance — Aligning Technical and Business ApproachesDATAVERSITY
 
Data Quality Best Practices
Data Quality Best PracticesData Quality Best Practices
Data Quality Best PracticesDATAVERSITY
 
Data Architecture Strategies: Building an Enterprise Data Strategy – Where to...
Data Architecture Strategies: Building an Enterprise Data Strategy – Where to...Data Architecture Strategies: Building an Enterprise Data Strategy – Where to...
Data Architecture Strategies: Building an Enterprise Data Strategy – Where to...DATAVERSITY
 

What's hot (20)

Glossaries, Dictionaries, and Catalogs Result in Data Governance
Glossaries, Dictionaries, and Catalogs Result in Data GovernanceGlossaries, Dictionaries, and Catalogs Result in Data Governance
Glossaries, Dictionaries, and Catalogs Result in Data Governance
 
Data Governance Powerpoint Presentation Slides
Data Governance Powerpoint Presentation SlidesData Governance Powerpoint Presentation Slides
Data Governance Powerpoint Presentation Slides
 
Data Catalog for Better Data Discovery and Governance
Data Catalog for Better Data Discovery and GovernanceData Catalog for Better Data Discovery and Governance
Data Catalog for Better Data Discovery and Governance
 
Data at the Speed of Business with Data Mastering and Governance
Data at the Speed of Business with Data Mastering and GovernanceData at the Speed of Business with Data Mastering and Governance
Data at the Speed of Business with Data Mastering and Governance
 
Big data architectures and the data lake
Big data architectures and the data lakeBig data architectures and the data lake
Big data architectures and the data lake
 
Data Virtualization: An Introduction
Data Virtualization: An IntroductionData Virtualization: An Introduction
Data Virtualization: An Introduction
 
Data quality metrics infographic
Data quality metrics infographicData quality metrics infographic
Data quality metrics infographic
 
Why data governance is the new buzz?
Why data governance is the new buzz?Why data governance is the new buzz?
Why data governance is the new buzz?
 
Data Governance
Data GovernanceData Governance
Data Governance
 
Data Governance and Metadata Management
Data Governance and Metadata ManagementData Governance and Metadata Management
Data Governance and Metadata Management
 
Big Data
Big DataBig Data
Big Data
 
Data Lake Overview
Data Lake OverviewData Lake Overview
Data Lake Overview
 
You Need a Data Catalog. Do You Know Why?
You Need a Data Catalog. Do You Know Why?You Need a Data Catalog. Do You Know Why?
You Need a Data Catalog. Do You Know Why?
 
Data Modeling & Metadata Management
Data Modeling & Metadata ManagementData Modeling & Metadata Management
Data Modeling & Metadata Management
 
Data Governance
Data GovernanceData Governance
Data Governance
 
Data Quality Definitions
Data Quality DefinitionsData Quality Definitions
Data Quality Definitions
 
Data Governance — Aligning Technical and Business Approaches
Data Governance — Aligning Technical and Business ApproachesData Governance — Aligning Technical and Business Approaches
Data Governance — Aligning Technical and Business Approaches
 
Data Mesh
Data MeshData Mesh
Data Mesh
 
Data Quality Best Practices
Data Quality Best PracticesData Quality Best Practices
Data Quality Best Practices
 
Data Architecture Strategies: Building an Enterprise Data Strategy – Where to...
Data Architecture Strategies: Building an Enterprise Data Strategy – Where to...Data Architecture Strategies: Building an Enterprise Data Strategy – Where to...
Data Architecture Strategies: Building an Enterprise Data Strategy – Where to...
 

Viewers also liked

ETIS09 - Data Quality: Common Problems & Checks - Presentation
ETIS09 -  Data Quality: Common Problems & Checks - PresentationETIS09 -  Data Quality: Common Problems & Checks - Presentation
ETIS09 - Data Quality: Common Problems & Checks - PresentationDavid Walker
 
Data quality and data profiling
Data quality and data profilingData quality and data profiling
Data quality and data profilingShailja Khurana
 
Ensuring data quality
Ensuring data qualityEnsuring data quality
Ensuring data qualityIUPUI
 
data warehousing
data warehousingdata warehousing
data warehousing143sohil
 
Data quality architecture
Data quality architectureData quality architecture
Data quality architectureanicewick
 
Data quality overview
Data quality overviewData quality overview
Data quality overviewAlex Meadows
 
Тестирование данных с помощью Data Quality Services (MS SQL 12)
Тестирование данных с помощью Data Quality Services (MS SQL 12)Тестирование данных с помощью Data Quality Services (MS SQL 12)
Тестирование данных с помощью Data Quality Services (MS SQL 12)SQALab
 
DATA WAREHOUSING
DATA WAREHOUSINGDATA WAREHOUSING
DATA WAREHOUSINGKing Julian
 
Data Quality in Data Warehouse and Business Intelligence Environments - Disc...
Data Quality in  Data Warehouse and Business Intelligence Environments - Disc...Data Quality in  Data Warehouse and Business Intelligence Environments - Disc...
Data Quality in Data Warehouse and Business Intelligence Environments - Disc...Alan D. Duncan
 
Foundation of data quality
Foundation of data qualityFoundation of data quality
Foundation of data qualityKhaled Mosharraf
 
Enterprise Business Intelligence & Data Warehousing: The Data Quality Conundrum
Enterprise Business Intelligence & Data Warehousing: The Data Quality ConundrumEnterprise Business Intelligence & Data Warehousing: The Data Quality Conundrum
Enterprise Business Intelligence & Data Warehousing: The Data Quality ConundrumRTTS
 
Odprti podatki & kakovost metapodatkov
Odprti podatki  & kakovost metapodatkovOdprti podatki  & kakovost metapodatkov
Odprti podatki & kakovost metapodatkovOpen Data Support
 
List of personal protective equipment to have
List of personal protective equipment to haveList of personal protective equipment to have
List of personal protective equipment to haveChristopher Dill
 
Martin Bardsley: Quality In Austerity-Indicators of Quality
Martin Bardsley: Quality In Austerity-Indicators of QualityMartin Bardsley: Quality In Austerity-Indicators of Quality
Martin Bardsley: Quality In Austerity-Indicators of QualityNuffield Trust
 
Applied semantic technology and linked data
Applied semantic technology and linked dataApplied semantic technology and linked data
Applied semantic technology and linked dataWilliam Smith
 
Query-Driven Management of Linked Data Quality
Query-Driven Management of Linked Data QualityQuery-Driven Management of Linked Data Quality
Query-Driven Management of Linked Data QualityFariz Darari
 

Viewers also liked (20)

ETIS09 - Data Quality: Common Problems & Checks - Presentation
ETIS09 -  Data Quality: Common Problems & Checks - PresentationETIS09 -  Data Quality: Common Problems & Checks - Presentation
ETIS09 - Data Quality: Common Problems & Checks - Presentation
 
Data quality and data profiling
Data quality and data profilingData quality and data profiling
Data quality and data profiling
 
Ensuring data quality
Ensuring data qualityEnsuring data quality
Ensuring data quality
 
data warehousing
data warehousingdata warehousing
data warehousing
 
Data quality architecture
Data quality architectureData quality architecture
Data quality architecture
 
Data quality overview
Data quality overviewData quality overview
Data quality overview
 
Тестирование данных с помощью Data Quality Services (MS SQL 12)
Тестирование данных с помощью Data Quality Services (MS SQL 12)Тестирование данных с помощью Data Quality Services (MS SQL 12)
Тестирование данных с помощью Data Quality Services (MS SQL 12)
 
DATA WAREHOUSING
DATA WAREHOUSINGDATA WAREHOUSING
DATA WAREHOUSING
 
Data Quality in Data Warehouse and Business Intelligence Environments - Disc...
Data Quality in  Data Warehouse and Business Intelligence Environments - Disc...Data Quality in  Data Warehouse and Business Intelligence Environments - Disc...
Data Quality in Data Warehouse and Business Intelligence Environments - Disc...
 
Foundation of data quality
Foundation of data qualityFoundation of data quality
Foundation of data quality
 
Moving up the ladder the economic freedom of the world data and egypt
Moving up the ladder   the economic freedom of the world data and egyptMoving up the ladder   the economic freedom of the world data and egypt
Moving up the ladder the economic freedom of the world data and egypt
 
Enterprise Business Intelligence & Data Warehousing: The Data Quality Conundrum
Enterprise Business Intelligence & Data Warehousing: The Data Quality ConundrumEnterprise Business Intelligence & Data Warehousing: The Data Quality Conundrum
Enterprise Business Intelligence & Data Warehousing: The Data Quality Conundrum
 
Strategy For Data Quality
Strategy For Data QualityStrategy For Data Quality
Strategy For Data Quality
 
Odprti podatki & kakovost metapodatkov
Odprti podatki  & kakovost metapodatkovOdprti podatki  & kakovost metapodatkov
Odprti podatki & kakovost metapodatkov
 
List of personal protective equipment to have
List of personal protective equipment to haveList of personal protective equipment to have
List of personal protective equipment to have
 
Big Data Pitfalls
Big Data PitfallsBig Data Pitfalls
Big Data Pitfalls
 
Martin Bardsley: Quality In Austerity-Indicators of Quality
Martin Bardsley: Quality In Austerity-Indicators of QualityMartin Bardsley: Quality In Austerity-Indicators of Quality
Martin Bardsley: Quality In Austerity-Indicators of Quality
 
Applied semantic technology and linked data
Applied semantic technology and linked dataApplied semantic technology and linked data
Applied semantic technology and linked data
 
Enhancing good governance and economic freedom of the Arab countries in the d...
Enhancing good governance and economic freedom of the Arab countries in the d...Enhancing good governance and economic freedom of the Arab countries in the d...
Enhancing good governance and economic freedom of the Arab countries in the d...
 
Query-Driven Management of Linked Data Quality
Query-Driven Management of Linked Data QualityQuery-Driven Management of Linked Data Quality
Query-Driven Management of Linked Data Quality
 

Similar to Data Quality: A Raising Data Warehousing Concern

Deliver Trusted Data by Leveraging ETL Testing
Deliver Trusted Data by Leveraging ETL TestingDeliver Trusted Data by Leveraging ETL Testing
Deliver Trusted Data by Leveraging ETL TestingCognizant
 
Techniques for effective test data management in test automation.pptx
Techniques for effective test data management in test automation.pptxTechniques for effective test data management in test automation.pptx
Techniques for effective test data management in test automation.pptxKnoldus Inc.
 
Data Management Workshop - ETOT 2016
Data Management Workshop - ETOT 2016Data Management Workshop - ETOT 2016
Data Management Workshop - ETOT 2016DataGenic Ltd
 
An Agile & Adaptive Approach to Addressing Financial Services Regulations and...
An Agile & Adaptive Approach to Addressing Financial Services Regulations and...An Agile & Adaptive Approach to Addressing Financial Services Regulations and...
An Agile & Adaptive Approach to Addressing Financial Services Regulations and...Neo4j
 
Building the enterprise data architecture
Building the enterprise data architectureBuilding the enterprise data architecture
Building the enterprise data architectureCosta Pissaris
 
Mind Map Test Data Management Overview
Mind Map Test Data Management OverviewMind Map Test Data Management Overview
Mind Map Test Data Management Overviewdublinx
 
Etl testing strategies
Etl testing strategiesEtl testing strategies
Etl testing strategiessivam_1
 
Business Intelligence and OLAP Practice
Business Intelligence and OLAP PracticeBusiness Intelligence and OLAP Practice
Business Intelligence and OLAP PracticeTatiana Ivanova
 
Creating a Data validation and Testing Strategy
Creating a Data validation and Testing StrategyCreating a Data validation and Testing Strategy
Creating a Data validation and Testing StrategyRTTS
 
593 Managing Enterprise Data Quality Using SAP Information Steward
593 Managing Enterprise Data Quality Using SAP Information Steward593 Managing Enterprise Data Quality Using SAP Information Steward
593 Managing Enterprise Data Quality Using SAP Information StewardVinny (Gurvinder) Ahuja
 
META DATA QUALITY CONTROL ARCHITECTURE IN DATA WAREHOUSING
META DATA QUALITY CONTROL ARCHITECTURE IN DATA WAREHOUSINGMETA DATA QUALITY CONTROL ARCHITECTURE IN DATA WAREHOUSING
META DATA QUALITY CONTROL ARCHITECTURE IN DATA WAREHOUSINGIJCSEIT Journal
 
Deliveinrg explainable AI
Deliveinrg explainable AIDeliveinrg explainable AI
Deliveinrg explainable AIGary Allemann
 
Data Virtualization for Compliance – Creating a Controlled Data Environment
Data Virtualization for Compliance – Creating a Controlled Data EnvironmentData Virtualization for Compliance – Creating a Controlled Data Environment
Data Virtualization for Compliance – Creating a Controlled Data EnvironmentDenodo
 
How to achieve a single view of critical business data with MDM
How to achieve a single view of critical business data with MDMHow to achieve a single view of critical business data with MDM
How to achieve a single view of critical business data with MDMPrecisely
 
Agility for big data
Agility for big data Agility for big data
Agility for big data Charlie Cheng
 

Similar to Data Quality: A Raising Data Warehousing Concern (20)

Deliver Trusted Data by Leveraging ETL Testing
Deliver Trusted Data by Leveraging ETL TestingDeliver Trusted Data by Leveraging ETL Testing
Deliver Trusted Data by Leveraging ETL Testing
 
Techniques for effective test data management in test automation.pptx
Techniques for effective test data management in test automation.pptxTechniques for effective test data management in test automation.pptx
Techniques for effective test data management in test automation.pptx
 
Data Management Workshop - ETOT 2016
Data Management Workshop - ETOT 2016Data Management Workshop - ETOT 2016
Data Management Workshop - ETOT 2016
 
An Agile & Adaptive Approach to Addressing Financial Services Regulations and...
An Agile & Adaptive Approach to Addressing Financial Services Regulations and...An Agile & Adaptive Approach to Addressing Financial Services Regulations and...
An Agile & Adaptive Approach to Addressing Financial Services Regulations and...
 
Building the enterprise data architecture
Building the enterprise data architectureBuilding the enterprise data architecture
Building the enterprise data architecture
 
Mind Map Test Data Management Overview
Mind Map Test Data Management OverviewMind Map Test Data Management Overview
Mind Map Test Data Management Overview
 
Etl testing strategies
Etl testing strategiesEtl testing strategies
Etl testing strategies
 
CET DQ Tool Selection - Executive
CET DQ Tool Selection - ExecutiveCET DQ Tool Selection - Executive
CET DQ Tool Selection - Executive
 
Business Intelligence and OLAP Practice
Business Intelligence and OLAP PracticeBusiness Intelligence and OLAP Practice
Business Intelligence and OLAP Practice
 
Creating a Data validation and Testing Strategy
Creating a Data validation and Testing StrategyCreating a Data validation and Testing Strategy
Creating a Data validation and Testing Strategy
 
593 Managing Enterprise Data Quality Using SAP Information Steward
593 Managing Enterprise Data Quality Using SAP Information Steward593 Managing Enterprise Data Quality Using SAP Information Steward
593 Managing Enterprise Data Quality Using SAP Information Steward
 
Datawarehousing Terminology
Datawarehousing TerminologyDatawarehousing Terminology
Datawarehousing Terminology
 
META DATA QUALITY CONTROL ARCHITECTURE IN DATA WAREHOUSING
META DATA QUALITY CONTROL ARCHITECTURE IN DATA WAREHOUSINGMETA DATA QUALITY CONTROL ARCHITECTURE IN DATA WAREHOUSING
META DATA QUALITY CONTROL ARCHITECTURE IN DATA WAREHOUSING
 
Deliveinrg explainable AI
Deliveinrg explainable AIDeliveinrg explainable AI
Deliveinrg explainable AI
 
Data Virtualization for Compliance – Creating a Controlled Data Environment
Data Virtualization for Compliance – Creating a Controlled Data EnvironmentData Virtualization for Compliance – Creating a Controlled Data Environment
Data Virtualization for Compliance – Creating a Controlled Data Environment
 
How to achieve a single view of critical business data with MDM
How to achieve a single view of critical business data with MDMHow to achieve a single view of critical business data with MDM
How to achieve a single view of critical business data with MDM
 
Agility for big data
Agility for big data Agility for big data
Agility for big data
 
ITReady DW Day2
ITReady DW Day2ITReady DW Day2
ITReady DW Day2
 
Data Warehouse
Data WarehouseData Warehouse
Data Warehouse
 
kalyani.ppt
kalyani.pptkalyani.ppt
kalyani.ppt
 

More from Amin Chowdhury

OPPORTUNITIES FOR THE USE OF DIGITAL TECHNOLOGY TOOLS
OPPORTUNITIES FOR THE USE OF DIGITAL TECHNOLOGY TOOLSOPPORTUNITIES FOR THE USE OF DIGITAL TECHNOLOGY TOOLS
OPPORTUNITIES FOR THE USE OF DIGITAL TECHNOLOGY TOOLSAmin Chowdhury
 
Tlad better with data - matthew love + charles (2)
Tlad   better with data - matthew love + charles (2)Tlad   better with data - matthew love + charles (2)
Tlad better with data - matthew love + charles (2)Amin Chowdhury
 
Tlad 2015 presentation amin+charles-final
Tlad 2015 presentation   amin+charles-finalTlad 2015 presentation   amin+charles-final
Tlad 2015 presentation amin+charles-finalAmin Chowdhury
 
Database Project management
Database Project managementDatabase Project management
Database Project managementAmin Chowdhury
 
Database Industry perspective
Database Industry perspectiveDatabase Industry perspective
Database Industry perspectiveAmin Chowdhury
 
090321 - EEHCO Project Plan PSTC- Dhaka
090321 - EEHCO Project Plan PSTC- Dhaka090321 - EEHCO Project Plan PSTC- Dhaka
090321 - EEHCO Project Plan PSTC- DhakaAmin Chowdhury
 
E-commerce Project Development
E-commerce Project DevelopmentE-commerce Project Development
E-commerce Project DevelopmentAmin Chowdhury
 

More from Amin Chowdhury (8)

OPPORTUNITIES FOR THE USE OF DIGITAL TECHNOLOGY TOOLS
OPPORTUNITIES FOR THE USE OF DIGITAL TECHNOLOGY TOOLSOPPORTUNITIES FOR THE USE OF DIGITAL TECHNOLOGY TOOLS
OPPORTUNITIES FOR THE USE OF DIGITAL TECHNOLOGY TOOLS
 
Tlad better with data - matthew love + charles (2)
Tlad   better with data - matthew love + charles (2)Tlad   better with data - matthew love + charles (2)
Tlad better with data - matthew love + charles (2)
 
Tlad 2015 presentation amin+charles-final
Tlad 2015 presentation   amin+charles-finalTlad 2015 presentation   amin+charles-final
Tlad 2015 presentation amin+charles-final
 
Database Project management
Database Project managementDatabase Project management
Database Project management
 
Database Industry perspective
Database Industry perspectiveDatabase Industry perspective
Database Industry perspective
 
Database Sizing
Database SizingDatabase Sizing
Database Sizing
 
090321 - EEHCO Project Plan PSTC- Dhaka
090321 - EEHCO Project Plan PSTC- Dhaka090321 - EEHCO Project Plan PSTC- Dhaka
090321 - EEHCO Project Plan PSTC- Dhaka
 
E-commerce Project Development
E-commerce Project DevelopmentE-commerce Project Development
E-commerce Project Development
 

Recently uploaded

Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779Delhi Call girls
 
Ukraine War presentation: KNOW THE BASICS
Ukraine War presentation: KNOW THE BASICSUkraine War presentation: KNOW THE BASICS
Ukraine War presentation: KNOW THE BASICSAishani27
 
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Serviceranjana rawat
 
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130Suhani Kapoor
 
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /WhatsappsBeautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsappssapnasaifi408
 
Log Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptxLog Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptxJohnnyPlasten
 
定制英国白金汉大学毕业证(UCB毕业证书) 成绩单原版一比一
定制英国白金汉大学毕业证(UCB毕业证书)																			成绩单原版一比一定制英国白金汉大学毕业证(UCB毕业证书)																			成绩单原版一比一
定制英国白金汉大学毕业证(UCB毕业证书) 成绩单原版一比一ffjhghh
 
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Callshivangimorya083
 
Customer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptxCustomer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptxEmmanuel Dauda
 
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...dajasot375
 
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...Jack DiGiovanna
 
Digi Khata Problem along complete plan.pptx
Digi Khata Problem along complete plan.pptxDigi Khata Problem along complete plan.pptx
Digi Khata Problem along complete plan.pptxTanveerAhmed817946
 
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...Suhani Kapoor
 
04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationships04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationshipsccctableauusergroup
 
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Callshivangimorya083
 
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Callshivangimorya083
 
Full night 🥵 Call Girls Delhi New Friends Colony {9711199171} Sanya Reddy ✌️o...
Full night 🥵 Call Girls Delhi New Friends Colony {9711199171} Sanya Reddy ✌️o...Full night 🥵 Call Girls Delhi New Friends Colony {9711199171} Sanya Reddy ✌️o...
Full night 🥵 Call Girls Delhi New Friends Colony {9711199171} Sanya Reddy ✌️o...shivangimorya083
 
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...Suhani Kapoor
 

Recently uploaded (20)

Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
 
Ukraine War presentation: KNOW THE BASICS
Ukraine War presentation: KNOW THE BASICSUkraine War presentation: KNOW THE BASICS
Ukraine War presentation: KNOW THE BASICS
 
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
 
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
 
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /WhatsappsBeautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsapps
 
VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...
VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...
VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...
 
Log Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptxLog Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptx
 
定制英国白金汉大学毕业证(UCB毕业证书) 成绩单原版一比一
定制英国白金汉大学毕业证(UCB毕业证书)																			成绩单原版一比一定制英国白金汉大学毕业证(UCB毕业证书)																			成绩单原版一比一
定制英国白金汉大学毕业证(UCB毕业证书) 成绩单原版一比一
 
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
 
Customer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptxCustomer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptx
 
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
 
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
 
Decoding Loan Approval: Predictive Modeling in Action
Decoding Loan Approval: Predictive Modeling in ActionDecoding Loan Approval: Predictive Modeling in Action
Decoding Loan Approval: Predictive Modeling in Action
 
Digi Khata Problem along complete plan.pptx
Digi Khata Problem along complete plan.pptxDigi Khata Problem along complete plan.pptx
Digi Khata Problem along complete plan.pptx
 
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
 
04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationships04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationships
 
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
 
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
 
Full night 🥵 Call Girls Delhi New Friends Colony {9711199171} Sanya Reddy ✌️o...
Full night 🥵 Call Girls Delhi New Friends Colony {9711199171} Sanya Reddy ✌️o...Full night 🥵 Call Girls Delhi New Friends Colony {9711199171} Sanya Reddy ✌️o...
Full night 🥵 Call Girls Delhi New Friends Colony {9711199171} Sanya Reddy ✌️o...
 
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
 

Data Quality: A Raising Data Warehousing Concern

  • 1. Data Quality: A Raising Data Warehousing Concern Presented by: Chowdhury, Mohammad Aminul Hoque http://aminchowdhury.info
  • 3. Characteristics of Data Warehouse • Data warehousing it supports to management on decision making • It is Subject Oriented and gives information about a company's ongoing operations • Data is gathered in Integrated way into the data warehouse from a variety of sources and merged into a coherently • Data warehouse is a Time-variant and is identified with a particular time period • It is Non-volatile means stable.
  • 4. Benefits of a data warehouse  Maintain data history  Integrate data from multiple source systems, enabling a central view  Improve data quality, by providing codes and descriptions, or even fixing bad data  Present the organization's information consistently  Provide a single common data model for all data source  Restructure the data to makes sense the users  Restructure the data to delivers excellent query performance  Making decision–support queries easier.
  • 5. Designing of Data Warehouse  Top-down, bottom-up approaches or a combination of both  software engineering point of view: Waterfall and Spiral Conceptual Modeling of Data Warehouses Modeling data warehouses: dimensions & measures 1. Star schema 2. Snowflake schema 3. Fact constellations
  • 7. Extract  ETL process involves extracting the data from the source systems.  ETL Architecture Pattern  Most data warehousing projects consolidate data from different source systems  Each separate system may also use a different data organization and/or format  The goal of the extraction phase is to convert the data into a single format appropriate for transformation processing.
  • 8. Transform  This stage applies a series of rules to extract data from source to derive the data for loading into the end target  Selecting only certain columns to load.  Translating coded values (e.g., 1 for male and 2 for female)  Encoding free-form values (e.g., mapping "Male" to "M")  Deriving a new calculated value  Sorting  Joining data from multiple sources (e.g., lookup, merge) and de-duplicating the data  Aggregation (e.g summarizing multiple rows of data — total sales for each store, and region, etc.)
  • 9. Transform  Generating surrogate-key values  Transposing or pivoting  Splitting a column into multiple columns  Lookup and validate the relevant data from tables or referential files for slowly changing dimensions  Applying any form of simple or complex data validation.
  • 10. Load  This phase loads the data into data warehouse.  This process varies widely. Some data warehouses may overwrite existing information with cumulative information;  However, the entry of data for any one year window is made in a historical manner.  As the load phase interacts with a database and contribute to overall data quality performance of the ETL process  ETL can be used to transform the data into a format suitable for the new application to use.
  • 11. Data Quality Data quality is an essential characteristic that determines the reliability of data for making decisions. High-quality data: Complete Accurate Available Timely
  • 12. Classification Of Data Quality Issues Data Quality Issues at Data Sources Data Quality Issues at Data Profiling Stage Data Quality issues at Data Staging ETL Data Quality Problems at Data Modelling
  • 13. DATA SOURCE  The sources of dirty data include data entry error and update error  Part of the data comes from text files, part from MS Excel and from other sources  Some files are result of manual consolidation of multiple files as a result of which data quality might be compromised. DATA PROFILE • A process of developing information about data instead of information from data. Cont...
  • 14. Example of Data Profiling
  • 15. DATA STAGING ETL • A data cleaning process is executed in the data staging area to improve the accuracy • The data staging area is the place where all grooming is done on data after it is called from the source systems • It is a prime location for validating data quality from source or auditing and tracking down data issues. Cont..
  • 16. DATA MODELLING • Schema Design of the greatly influences the quality of the analysis • Operational applications uses UML class model for conceptual data modelling • Issues as slowly changing dimensions, rapidly changing dimension, and multi valued dimensions etc. Cont..
  • 17. Causes Of Data Quality CAUSES OF DATA QUALITY PROBLEMS AT DATA SOURCES • Wrong information entered into source system • As time and proximity from the source increase, the chances for getting correct data decrease • Inability to handle with ageing data contribute to data quality problems • Varying timeliness of data sources • System fields designed to allow free forms (Field not having adequate length). • Missing values in data sources • Additional columns • Use of different representation formats in data sources
  • 18. Causes Of Data Quality CAUSES OF DATA QUALITY PROBLEMS AT DATA PROFILING • Unreliable and incomplete metadata of data source • User Generated SQL queries for the data profiling purpose leaves the data quality problems. • Inability of evaluation of data structure, data values and data relationships before data integration, propagates poor data quality • Inability of integration between Data profiling and ETL causes Data quality problem • Inappropriate selection of Automated profiling tool cause data quality issues • Insufficient structural analysis of the data sources in the profiling stage.
  • 19. Cont.. CAUSES OF DATA QUALITY ISSUES AT DATA STAGING AND ETL PHASE • Different business rules of various data sources Creates problem of data quality. • Business rules lack currency contributes to DQ • Lack of capturing only changes in source files • Lack of periodical refreshing of the integrated data storage • Disabling data integrity constraints in data staging tables cause wrong data and relationships to be extracted • Purging of data from the Data warehouse cause data quality problem • The inability to restart the ETL process from checkpoints without losing data • Lack of automatically generating rules for ETL tools to build mapping that detect and fix data defects • Unhandled null values causes data quality problem
  • 20. Cont.. CAUSES OF DATA QUALITY ISSUES AT DATA WAREHOUSE SCHEM A DESIGN • Incomplete or wrong requirement analysis of the project lead to poor schema design • Lack of currency in business rules cause poor requirement analysis • Choice of dimensional modelling (STAR,SNOWFLAKE,FACTCONSTALLATION) schema contribute to data quality. • Late identification of slowly changing dimensions contribute to data quality problems. • Late arriving dimensions cause DQ Problems. • Multi valued dimensions cause DQ problems • Incomplete/Wrong identification of facts/dimensions, bridge tables or relationship tables or their • Inability to support database schema refactoring cause data quality problems
  • 23. Impact of Data Quality Issues
  • 24. Cost of Poor Data Quality
  • 25. Confidence and Satisfaction- based impacts  Bad quality of data results in low confidence in forecasting, inconsistent operational and management reporting.  Its will cause delayed or improper decisions.  It impacts satisfaction of customer, employee, or supplier which leads to decreased organizational trust.  Ex : An international bank, for example, could not meet its customer satisfaction goals because agents in its 23 contact centres all followed different operational processes, using up to 18 different apps — many of which contained duplicate data — to serve a single customer.
  • 26. Impact on Productivity  Workloads : Increased need for reconciliation of reports  Throughput : Increased time for data gathering and preparation, reduced time for direct data analysis, delays in delivering information products, lengthened production and manufacturing cycles  Output quality : Mistrusted reports  Supply chain : Out-of-stock, delivery delays, missed deliveries, duplicate costs for product delivery
  • 27. Risk and Compliance impacts  Risk and compliance impacts associated with credit assessment, investment risks, competitive risk, capital investment and/or development, fraud, and leakage, and compliance with government regulations, industry expectations, or self-imposed policies (such as privacy policies). Ex: Healthcare Systems dealing with sensitive information about patients’ health condition. The privacy of these kind of data should be protected.
  • 28. Examples of Data Quality Problem• Retail company found over 1m records contained home tel number of “000000000” and addresses containing flight numbers • Insurance company found customer records with 99/99/99 in creation date field of policy • Car rental company discovered duplicate agreement numbers in their European data warehouse • Healthcare company found 9 different values in gender field • Food/Beverage retail chain found the same product was their No 1 and No 2 best sellers across their business
  • 29.
  • 33. Why Data Quality Influences?  Schema Design influences the quality of the analysis  Poor data handling procedures and processes  Failure to stick on to data entry and maintenance procedures  Errors in the migration process from one system to another  External and third-party data that may not fit
  • 34. Causes of Data Quality Problems  Dimensional modelling (STAR, SNOWFLAKE, FACTCONSTALLATION) schema Choosing  Multi-valued dimensions  Incomplete/Wrong identification of facts/dimensions, bridge tables or relationship tables  Incomplete/missing values  Corrupted values  Out of range values  Wrong data  Duplicate data  Dissimilar data formats  Incompatible structures
  • 35. Missing Data  nonresponse, no information is provided  when data collection improperly  mistakes in data entry How to deal • Imputation • Reconstruction • Denial/Remove • Interpolation
  • 37. Out of Range error
  • 38.  Use specific business rules of various data sources  Enabling data integrity constraints in data staging  Providing internal profiling or integration to third- party data profiling and cleansing tools  Automatically generating rules for ETL tools to build mapping Techniques of Data Quality Control
  • 39. Data warehousing security  Appropriate to summaries and aggregates of data  Exploration data warehouse  Data encryption and enhancing privacy. For more information visit http://aminchowdhury.info