SlideShare a Scribd company logo
1 of 39
[object Object],[object Object],[object Object],[object Object],Data Quality Testing
Agenda ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],DQ Management Overview DQ Testing Case Study Close
Overview: DQ Definition ,[object Object],[object Object],[object Object],DQ Management Overview DQ Testing Case Study Close
Overview: DQ Stats ,[object Object],[object Object],[object Object],DQ Management Overview DQ Testing Case Study Close
Testing :: DQ CheatSheet DQ Management Overview DQ Testing Case Study Close
Rule #1: Row Counts Count of records at Source and Target should be same at a given point of time. DQ Management Missing Records Extra Records Overview DQ Testing Case Study Close
# Example 1 DQ Management Source_Dept Target_Dept Overview DQ Testing Case Study Close DeptID DeptName DeptStartDate 1 HR 22-Aug-2007 2 Finance 12-June-1988 4 Admin 1-May-1999 5 IT 2-June-1997 DeptID DeptName DeptStartDate 1 Human Resource 22-Aug-2007 2 Finance 12-June-1978 3 Operations 11-May-1752
Rule #1: Row Counts Missing Records: Records which are only present at Source Extra Records: Records which are only present at Target DQ Management Overview DQ Testing Case Study Close DeptID DeptName DeptStartDate 4 Admin 1-May-1999 5 IT 2-June-1997 DeptID DeptName DeptStartDate 3 Operations 11-May-1752
Rule #2: Completeness All the data under consideration at the Source and Target should be same at a given point of time satisfying the business rules. DQ Management Source Table Target Table Overview DQ Testing Case Study Close
Rule #2: Completeness Missing Records: Records which are only present at Source Extra Records: Records which are only present at Target Mismatched Records: Which contain at least one different value for the same record between Source and Target DQ Management Overview DQ Testing Case Study Close DeptID DeptName DeptStartDate 4 Admin 1-May-1999 5 IT 2-June-1997 DeptID DeptName DeptStartDate 3 Operations 11-May-1752 DeptID DeptName DeptStartDate DifferenceType 2 Finance 12-June-1988 At Source 2 Finance 12-June-1978 At Target
Rule #3: Consistency This ensures that each user observes a consistent view of the data, including changes made by transactions There is  data inconsistency  between the Source & Target if the same data is stored in different formats or contain different values at different places. DQ Management Overview DQ Testing Case Study Close
# Example 2 DQ Management Source_Dept Warehouse_Dept Data Mart_Dept Overview DQ Testing Case Study Close DeptID DeptName Revenue ($) DeptStartDate 1 HR 100 22-Aug-2007 2 Finance 200 12-June-1988 DeptID DeptName Revenue (Euro) DeptStartDate 1 HR 70 22/08/2007 2 Finance 140 12/06/1978 DeptID DeptName Revenue (Euro) DeptStartDate 1 Human Resource 70 22/08/2007 2 Finance 999999 12/06/1978
Rule #3: Consistency Example #1: Zip code / Date / Currency formats a) b) DQ Management Overview DQ Testing Case Study Close DeptID DeptName Revenue ($ or Euro ) DeptStartDate Difference Point 1 HR 100  22-Aug-2007 Same data, Inconsistent  due to Revenue & Currency format 1 HR 70 22/08/2007 Same data, Inconsistent  due to Revenue & Currency format DeptID DeptName Revenue ($ or Euro ) DeptStartDate Difference Point 1 HR 100  22-Aug-2007 Same data, Inconsistent  due to different format of Department name 1 Human Resource 70 22/08/2007 Same data, Inconsistent  due to different format for department name
Rule #3: Consistency Example #2: Regional Setting e.g. Language Example #3:  Different values at different points DQ Management Overview DQ Testing Case Study Close DeptID DeptName Revenue ($ or Euro ) DeptStartDate Difference Point 1 Human Resource 100  22/08/2007 Same data, Inconsistent  due to different language used 1 人的資源 100 22/08/2007 Same data, Inconsistent  due to different language used DeptID DeptName Revenue ($ or Euro ) DeptStartDate Difference Point 2 Finance 140 12/06/1978 Same data, Inconsistent  value for Revenue between Warehouse & Mart 2 Finance 999999 12/06/1978 Same data, Inconsistent  value for Revenue between Warehouse & Mart
Rule #4: Validity ,[object Object],[object Object],[object Object],[object Object],[object Object],DQ Management Overview DQ Testing Case Study Close
Rule #4: Validity Example #1: Measuring “Unemployment” in a country -> Statistics are collected  reliably  month-on-month -> Definition of collecting “Unemployment” remains same. e.g.  Definition of “unemployment” has changed in past 25 years hence we can’t compare old data with current data as comparison is not valid Example #2: Values falling outside a range DQ Management Overview DQ Testing Case Study Close DeptID DeptName Revenue (Euro) DeptStartDate 1 Human Resource 70 22/08/2255 2 Finance 999999 12/06/1752
Rule #4: Validity Example #3: Dates having valid MM, DD, YYYY Example #4: Birth date > Death Date   DQ Management Overview DQ Testing Case Study Close DeptID DeptName Revenue (Euro) DeptStartDate 1 Human Resource 70 13/13/2007 EmpId EmpName DOB DOE 1 Jack 13/01/2008 24/11/1996
Rule #5: Redundancy Physical Duplicates: All the columns values repeating for at least 2 records in a table Logical Duplicates: Business Key (list of column) values are repeating for at least 2 records in a table DQ Management Logical Dups Physical Dups Overview DQ Testing Case Study Close
# Example 3 DQ Management Employee Example #1: Physical Duplicates Example #2: Logical Duplicates Overview DQ Testing Case Study Close EmpID EmpName EmpAddress Age DeptID 1 Jim #22, Jackson St., NY 23 1 2 Sam A302, Woodsvilla, WA 28 2 4 Samuel No. AA, Andrew Street, Redmond, WA 22 999 5 Jim #22, Jackson St., NY 23 1 2 Sam A302, Woodsvilla, WA 28 2 7 Jack #23, Jackson St., NY 41 NULL EmpID EmpName EmpAddress Age DeptID 2 Sam A302, Woodsvilla, WA 28 2 2 Sam A302, Woodsvilla, WA 28 2 EmpID EmpName EmpAddress Age DeptID 1 Jim #22, Jackson St., NY 23 1 5 Jim #22, Jackson St., NY 23 1
Rule #6: RI If there are child records for which no corresponding parent records existing then they are called “Orphan Records” Logical relationship rules between parent & child tables should be defined by business. DQ Management Overview DQ Testing Case Study Close
# Example 4 DQ Management Child Table:: Employee Parent Table:: Department Orphan Records Overview DQ Testing Case Study Close EmpID EmpName EmpAddress Age DeptID (FK) 1 Jim #22, Jackson St., NY 23 1 2 Sam A302, Woodsvilla, WA 28 2 4 Samuel No. AA, Andrew Street, Redmond, WA 22 999 5 Jim #22, Jackson St., NY 23 1 7 Jack #23, Jackson St., NY 41 NULL DeptID (PK) DeptName DeptStartDate 1 HR 22-Aug-2007 2 Finance 12-June-1988 3 Operations 11-May-1752 EmpID EmpName EmpAddress Age DeptID 4 Samuel No. AA, Andrew Street, Redmond, WA 22 999 7 Jack #23, Jackson St., NY 41 NULL
Rule #7: Domain Integrity ,[object Object],DQ Management Overview DQ Testing Case Study Close
Rule #7: Domain Integrity ,[object Object],[object Object],DQ Management Source Table Target Table Overview DQ Testing Case Study Close DeptID (PK) DeptName 1 HR 2 Finance 3 Operations 4 Invalid Dept DeptID (PK) DeptName (Varchar(50)) 1 HR 2 Finance 3 Operations DeptID (PK) DeptName (Varchar (2)) 1 HR 2 Fi 3 Op
Rule #7: Domain Integrity ,[object Object],DQ Management Source Table Target Table Overview DQ Testing Case Study Close DeptID (PK) DeptName (NOT NULL) 1 HR 2 Finance 3 Operations 4 Invalid Dept DeptID (PK) DeptName (NOT NULL) 1 HR 2 Finance 3 NULL 4 NULL
Rule #8: Accuracy Degree to which data reflects Real World objects Accuracy is generally measured by comparing against something defined as “true” source of information DQ Management Accuracy Overview DQ Testing Case Study Close
Rule #9: Usability Describes the relevance and the meaning of data   Example #:  Denotes the ease with which data can be used DQ Management Represented  As Mart Table ReportingTable Overview DQ Testing Case Study Close DeptID (PK) DeptName 1 HR 2 Fin 3 Ops DeptID (PK) DeptName 1 Human Resources 2 Finance 3 Operations
Rule #10: Timeliness ,[object Object],[object Object],[object Object],[object Object],DQ Management Overview DQ Testing Case Study Close
Testing :: DQ Case Study ADQC  (Automated Data Quality Check) v2.0 DQ Management Overview DQ Testing Case Study Close
DQ Test Management DQ Test Management: DQ Management Overview DQ Testing Case Study Close
DQTM: Test Planning ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],DQ Management Overview DQ Testing Case Study Close
DQTM: Test Design ,[object Object],[object Object],[object Object],DQ Management Overview DQ Testing Case Study Close
DQTM: Test Execution ,[object Object],[object Object],[object Object],[object Object],DQ Management Overview DQ Testing Case Study Close
DQTM: Test Monitoring ,[object Object],[object Object],[object Object],[object Object],DQ Management Overview DQ Testing Case Study Close
DQ Challenges DQ Management Overview DQ Testing Case Study Close
DQ Best Practices DQ Management Overview DQ Testing Case Study Close
DQ Jargons ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],DQ Management Overview DQ Testing Case Study Close
References ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],DQ Management Overview DQ Testing Case Study Close
Questions & Answers  DQ Management Overview DQ Testing Case Study Close
Thank you DQ Management Overview DQ Testing Case Study Close

More Related Content

What's hot

Sound Data Quality for CRM
Sound Data Quality for CRMSound Data Quality for CRM
Sound Data Quality for CRM
Divya Malik
 
Practical Applications for Data Warehousing, Analytics, BI, and Meta-Integrat...
Practical Applications for Data Warehousing, Analytics, BI, and Meta-Integrat...Practical Applications for Data Warehousing, Analytics, BI, and Meta-Integrat...
Practical Applications for Data Warehousing, Analytics, BI, and Meta-Integrat...
DATAVERSITY
 
CBIG Event June 20th, 2013. Presentation by Albert Khair. “Emerging Trends in...
CBIG Event June 20th, 2013. Presentation by Albert Khair. “Emerging Trends in...CBIG Event June 20th, 2013. Presentation by Albert Khair. “Emerging Trends in...
CBIG Event June 20th, 2013. Presentation by Albert Khair. “Emerging Trends in...
Subrata Debnath
 

What's hot (20)

Dynamic Talks: "Implementing data quality automation with open source stack" ...
Dynamic Talks: "Implementing data quality automation with open source stack" ...Dynamic Talks: "Implementing data quality automation with open source stack" ...
Dynamic Talks: "Implementing data quality automation with open source stack" ...
 
Data Quality Definitions
Data Quality DefinitionsData Quality Definitions
Data Quality Definitions
 
Data quality - The True Big Data Challenge
Data quality - The True Big Data ChallengeData quality - The True Big Data Challenge
Data quality - The True Big Data Challenge
 
Data quality architecture
Data quality architectureData quality architecture
Data quality architecture
 
Sound Data Quality for CRM
Sound Data Quality for CRMSound Data Quality for CRM
Sound Data Quality for CRM
 
Data profiling-best-practices
Data profiling-best-practicesData profiling-best-practices
Data profiling-best-practices
 
Big Data Expo 2015 - Trillium software Big Data and the Data Quality
Big Data Expo 2015 - Trillium software Big Data and the Data QualityBig Data Expo 2015 - Trillium software Big Data and the Data Quality
Big Data Expo 2015 - Trillium software Big Data and the Data Quality
 
Introduction to data analytics
Introduction to data analyticsIntroduction to data analytics
Introduction to data analytics
 
Enterprise Analytics: Serving Big Data Projects for Healthcare
Enterprise Analytics: Serving Big Data Projects for HealthcareEnterprise Analytics: Serving Big Data Projects for Healthcare
Enterprise Analytics: Serving Big Data Projects for Healthcare
 
Unlocking Business Value Using Data
Unlocking Business Value Using DataUnlocking Business Value Using Data
Unlocking Business Value Using Data
 
Tamr overview
Tamr overviewTamr overview
Tamr overview
 
Practical Applications for Data Warehousing, Analytics, BI, and Meta-Integrat...
Practical Applications for Data Warehousing, Analytics, BI, and Meta-Integrat...Practical Applications for Data Warehousing, Analytics, BI, and Meta-Integrat...
Practical Applications for Data Warehousing, Analytics, BI, and Meta-Integrat...
 
Data Management - Basic Concepts
Data Management - Basic ConceptsData Management - Basic Concepts
Data Management - Basic Concepts
 
Data Lakehouse Symposium | Day 1 | Part 1
Data Lakehouse Symposium | Day 1 | Part 1Data Lakehouse Symposium | Day 1 | Part 1
Data Lakehouse Symposium | Day 1 | Part 1
 
Graph Grid by Atom Rain
Graph Grid by Atom RainGraph Grid by Atom Rain
Graph Grid by Atom Rain
 
AWC Career Bootcamp- August 21, 2013
AWC Career Bootcamp- August 21, 2013AWC Career Bootcamp- August 21, 2013
AWC Career Bootcamp- August 21, 2013
 
Differential Privacy Case Studies (CMU-MSR Mindswap on Privacy 2007)
Differential Privacy Case Studies (CMU-MSR Mindswap on Privacy 2007)Differential Privacy Case Studies (CMU-MSR Mindswap on Privacy 2007)
Differential Privacy Case Studies (CMU-MSR Mindswap on Privacy 2007)
 
Data analytics
Data analyticsData analytics
Data analytics
 
CBIG Event June 20th, 2013. Presentation by Albert Khair. “Emerging Trends in...
CBIG Event June 20th, 2013. Presentation by Albert Khair. “Emerging Trends in...CBIG Event June 20th, 2013. Presentation by Albert Khair. “Emerging Trends in...
CBIG Event June 20th, 2013. Presentation by Albert Khair. “Emerging Trends in...
 
Paradigm4 Research Report: Leaving Data on the table
Paradigm4 Research Report: Leaving Data on the tableParadigm4 Research Report: Leaving Data on the table
Paradigm4 Research Report: Leaving Data on the table
 

Viewers also liked

대출확실한곳『LG777』.『XYZ』경찰신용대출 미국비자발급
대출확실한곳『LG777』.『XYZ』경찰신용대출 미국비자발급대출확실한곳『LG777』.『XYZ』경찰신용대출 미국비자발급
대출확실한곳『LG777』.『XYZ』경찰신용대출 미국비자발급
giefheoie
 
Key Architectural Aspects of a Enterprise Mobility Solution
Key Architectural Aspects of a Enterprise Mobility SolutionKey Architectural Aspects of a Enterprise Mobility Solution
Key Architectural Aspects of a Enterprise Mobility Solution
roshanjk
 
Sports Public Relations
Sports Public Relations Sports Public Relations
Sports Public Relations
Zoe Bernstein
 
INTERNATIONAL SUPPLY CHAIN MANAGEMENT
INTERNATIONAL SUPPLY CHAIN MANAGEMENTINTERNATIONAL SUPPLY CHAIN MANAGEMENT
INTERNATIONAL SUPPLY CHAIN MANAGEMENT
Sreenath Hacko
 
Architecture of a Modern Web App
Architecture of a Modern Web AppArchitecture of a Modern Web App
Architecture of a Modern Web App
scothis
 
E marketing of financial product services of sharekhan(gaurav kumar)mr.vinay...
E marketing of financial product  services of sharekhan(gaurav kumar)mr.vinay...E marketing of financial product  services of sharekhan(gaurav kumar)mr.vinay...
E marketing of financial product services of sharekhan(gaurav kumar)mr.vinay...
GOPAL Atri
 
급대출//BU797。СΟΜ//법인신용대출 제3금융기관
급대출//BU797。СΟΜ//법인신용대출 제3금융기관급대출//BU797。СΟΜ//법인신용대출 제3금융기관
급대출//BU797。СΟΜ//법인신용대출 제3금융기관
hsldfsod
 

Viewers also liked (20)

Data quality and data profiling
Data quality and data profilingData quality and data profiling
Data quality and data profiling
 
Intro to Algebra II
Intro to Algebra IIIntro to Algebra II
Intro to Algebra II
 
대출확실한곳『LG777』.『XYZ』경찰신용대출 미국비자발급
대출확실한곳『LG777』.『XYZ』경찰신용대출 미국비자발급대출확실한곳『LG777』.『XYZ』경찰신용대출 미국비자발급
대출확실한곳『LG777』.『XYZ』경찰신용대출 미국비자발급
 
Key Architectural Aspects of a Enterprise Mobility Solution
Key Architectural Aspects of a Enterprise Mobility SolutionKey Architectural Aspects of a Enterprise Mobility Solution
Key Architectural Aspects of a Enterprise Mobility Solution
 
Creating a digital transformation vision
Creating a digital transformation visionCreating a digital transformation vision
Creating a digital transformation vision
 
An Introduction to Hadoop Hue Gui
An Introduction to Hadoop Hue GuiAn Introduction to Hadoop Hue Gui
An Introduction to Hadoop Hue Gui
 
How to conduct a records and information inventory
How to conduct a records and information inventoryHow to conduct a records and information inventory
How to conduct a records and information inventory
 
Smart transmitter
Smart transmitterSmart transmitter
Smart transmitter
 
RETAIL STORE ANALYSIS
RETAIL STORE ANALYSISRETAIL STORE ANALYSIS
RETAIL STORE ANALYSIS
 
Sports Public Relations
Sports Public Relations Sports Public Relations
Sports Public Relations
 
Clinical Trial Recruitment & Retention
Clinical Trial Recruitment & RetentionClinical Trial Recruitment & Retention
Clinical Trial Recruitment & Retention
 
IFRS vs Indian GAAP vs US GAAP
IFRS vs Indian GAAP vs US GAAPIFRS vs Indian GAAP vs US GAAP
IFRS vs Indian GAAP vs US GAAP
 
instruments of Money market and capital market
instruments of Money market and capital marketinstruments of Money market and capital market
instruments of Money market and capital market
 
INTERNATIONAL SUPPLY CHAIN MANAGEMENT
INTERNATIONAL SUPPLY CHAIN MANAGEMENTINTERNATIONAL SUPPLY CHAIN MANAGEMENT
INTERNATIONAL SUPPLY CHAIN MANAGEMENT
 
Architecture of a Modern Web App
Architecture of a Modern Web AppArchitecture of a Modern Web App
Architecture of a Modern Web App
 
E marketing of financial product services of sharekhan(gaurav kumar)mr.vinay...
E marketing of financial product  services of sharekhan(gaurav kumar)mr.vinay...E marketing of financial product  services of sharekhan(gaurav kumar)mr.vinay...
E marketing of financial product services of sharekhan(gaurav kumar)mr.vinay...
 
Special stain in histopathology
Special stain in histopathologySpecial stain in histopathology
Special stain in histopathology
 
Supply Chain Risk Management
Supply Chain Risk ManagementSupply Chain Risk Management
Supply Chain Risk Management
 
Human Resource Management: Reward and compensation
Human Resource Management: Reward and compensationHuman Resource Management: Reward and compensation
Human Resource Management: Reward and compensation
 
급대출//BU797。СΟΜ//법인신용대출 제3금융기관
급대출//BU797。СΟΜ//법인신용대출 제3금융기관급대출//BU797。СΟΜ//법인신용대출 제3금융기관
급대출//BU797。СΟΜ//법인신용대출 제3금융기관
 

Similar to Data Quality Testing Generic (http://www.geektester.blogspot.com/)

Super Strategies 2014 ACL Presentation
Super Strategies 2014 ACL PresentationSuper Strategies 2014 ACL Presentation
Super Strategies 2014 ACL Presentation
David Fernandes
 
EDS Data Warehouse on Demand Proposal
EDS Data Warehouse on Demand ProposalEDS Data Warehouse on Demand Proposal
EDS Data Warehouse on Demand Proposal
Cole Whitney
 
Data warehouse 101-fundamentals-
Data warehouse 101-fundamentals-Data warehouse 101-fundamentals-
Data warehouse 101-fundamentals-
AshishGuleria
 
Capstone Project - PPDWS Report 150807 1705 FINAL - Robert Balaam
Capstone Project - PPDWS Report 150807 1705 FINAL - Robert BalaamCapstone Project - PPDWS Report 150807 1705 FINAL - Robert Balaam
Capstone Project - PPDWS Report 150807 1705 FINAL - Robert Balaam
Robert Balaam
 
Implement Data Ware House
Implement Data Ware HouseImplement Data Ware House
Implement Data Ware House
bhuphender
 
D03 15 Deliverable Roadmap
D03 15 Deliverable RoadmapD03 15 Deliverable Roadmap
D03 15 Deliverable Roadmap
Leanleaders.org
 

Similar to Data Quality Testing Generic (http://www.geektester.blogspot.com/) (20)

Super Strategies 2014 ACL Presentation
Super Strategies 2014 ACL PresentationSuper Strategies 2014 ACL Presentation
Super Strategies 2014 ACL Presentation
 
How FDA will reject non compliant electronic submission
How FDA will reject non compliant electronic submissionHow FDA will reject non compliant electronic submission
How FDA will reject non compliant electronic submission
 
EDS Data Warehouse on Demand Proposal
EDS Data Warehouse on Demand ProposalEDS Data Warehouse on Demand Proposal
EDS Data Warehouse on Demand Proposal
 
Improving Profitability by Leveraging Technology and Best Practices
Improving Profitability by Leveraging Technology and Best PracticesImproving Profitability by Leveraging Technology and Best Practices
Improving Profitability by Leveraging Technology and Best Practices
 
Improving Profitability by Leveraging Technology and Best Practices
Improving Profitability by Leveraging Technology and Best PracticesImproving Profitability by Leveraging Technology and Best Practices
Improving Profitability by Leveraging Technology and Best Practices
 
GraphTour - How to Build Next-Generation Solutions using Graph Databases
GraphTour - How to Build Next-Generation Solutions using Graph DatabasesGraphTour - How to Build Next-Generation Solutions using Graph Databases
GraphTour - How to Build Next-Generation Solutions using Graph Databases
 
Data Quality
Data QualityData Quality
Data Quality
 
Hands-On: Managing Slowly Changing Dimensions Using TD Workflow
Hands-On: Managing Slowly Changing Dimensions Using TD WorkflowHands-On: Managing Slowly Changing Dimensions Using TD Workflow
Hands-On: Managing Slowly Changing Dimensions Using TD Workflow
 
VaARNG Cooperative Agreement Process
VaARNG Cooperative Agreement ProcessVaARNG Cooperative Agreement Process
VaARNG Cooperative Agreement Process
 
Enterprise Information Management (EIM) in SQL Server 2012
Enterprise Information Management (EIM) in SQL Server 2012Enterprise Information Management (EIM) in SQL Server 2012
Enterprise Information Management (EIM) in SQL Server 2012
 
Lecture 02 - The Data Warehouse Environment
Lecture 02 - The Data Warehouse Environment Lecture 02 - The Data Warehouse Environment
Lecture 02 - The Data Warehouse Environment
 
Data science role in business
Data science role in businessData science role in business
Data science role in business
 
Data warehouse 101-fundamentals-
Data warehouse 101-fundamentals-Data warehouse 101-fundamentals-
Data warehouse 101-fundamentals-
 
Capstone Project - PPDWS Report 150807 1705 FINAL - Robert Balaam
Capstone Project - PPDWS Report 150807 1705 FINAL - Robert BalaamCapstone Project - PPDWS Report 150807 1705 FINAL - Robert Balaam
Capstone Project - PPDWS Report 150807 1705 FINAL - Robert Balaam
 
Dwh lecture slides-week 13
Dwh lecture slides-week 13Dwh lecture slides-week 13
Dwh lecture slides-week 13
 
Data integrity and consistency
Data integrity and consistencyData integrity and consistency
Data integrity and consistency
 
Exploring the Data science Process
Exploring the Data science ProcessExploring the Data science Process
Exploring the Data science Process
 
Rethinking the eDiscovery Process by Kelly Twigger
Rethinking the eDiscovery Process by Kelly TwiggerRethinking the eDiscovery Process by Kelly Twigger
Rethinking the eDiscovery Process by Kelly Twigger
 
Implement Data Ware House
Implement Data Ware HouseImplement Data Ware House
Implement Data Ware House
 
D03 15 Deliverable Roadmap
D03 15 Deliverable RoadmapD03 15 Deliverable Roadmap
D03 15 Deliverable Roadmap
 

More from raj.kamal13 (6)

Test2008 Resurrecting The Prodigal Son Data Quality (http://www.geektest...
Test2008   Resurrecting The Prodigal Son   Data Quality  (http://www.geektest...Test2008   Resurrecting The Prodigal Son   Data Quality  (http://www.geektest...
Test2008 Resurrecting The Prodigal Son Data Quality (http://www.geektest...
 
Rational Requisite Pro - Advanced (http://www.geektester.blogspot.com)
Rational Requisite Pro - Advanced (http://www.geektester.blogspot.com)Rational Requisite Pro - Advanced (http://www.geektester.blogspot.com)
Rational Requisite Pro - Advanced (http://www.geektester.blogspot.com)
 
Performance Teting - VU Scripting Using Rational (http://www.geektester.blogs...
Performance Teting - VU Scripting Using Rational (http://www.geektester.blogs...Performance Teting - VU Scripting Using Rational (http://www.geektester.blogs...
Performance Teting - VU Scripting Using Rational (http://www.geektester.blogs...
 
Rational Robot (http://www.geektester.blogspot.com)
Rational Robot (http://www.geektester.blogspot.com)Rational Robot (http://www.geektester.blogspot.com)
Rational Robot (http://www.geektester.blogspot.com)
 
Priotizing Test Activities (http://www.geektester.blogspot.com)
Priotizing Test Activities (http://www.geektester.blogspot.com)Priotizing Test Activities (http://www.geektester.blogspot.com)
Priotizing Test Activities (http://www.geektester.blogspot.com)
 
Advanced Rational Robot A Tribute (http://www.geektester.blogspot.com)
Advanced Rational Robot   A Tribute (http://www.geektester.blogspot.com)Advanced Rational Robot   A Tribute (http://www.geektester.blogspot.com)
Advanced Rational Robot A Tribute (http://www.geektester.blogspot.com)
 

Recently uploaded

Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Victor Rentea
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
?#DUbAI#??##{{(☎️+971_581248768%)**%*]'#abortion pills for sale in dubai@
 

Recently uploaded (20)

How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectors
 
Ransomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdfRansomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdf
 
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
 
Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challenges
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
 
Exploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusExploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with Milvus
 
AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
 
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfRising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
 
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
 

Data Quality Testing Generic (http://www.geektester.blogspot.com/)

  • 1.
  • 2.
  • 3.
  • 4.
  • 5. Testing :: DQ CheatSheet DQ Management Overview DQ Testing Case Study Close
  • 6. Rule #1: Row Counts Count of records at Source and Target should be same at a given point of time. DQ Management Missing Records Extra Records Overview DQ Testing Case Study Close
  • 7. # Example 1 DQ Management Source_Dept Target_Dept Overview DQ Testing Case Study Close DeptID DeptName DeptStartDate 1 HR 22-Aug-2007 2 Finance 12-June-1988 4 Admin 1-May-1999 5 IT 2-June-1997 DeptID DeptName DeptStartDate 1 Human Resource 22-Aug-2007 2 Finance 12-June-1978 3 Operations 11-May-1752
  • 8. Rule #1: Row Counts Missing Records: Records which are only present at Source Extra Records: Records which are only present at Target DQ Management Overview DQ Testing Case Study Close DeptID DeptName DeptStartDate 4 Admin 1-May-1999 5 IT 2-June-1997 DeptID DeptName DeptStartDate 3 Operations 11-May-1752
  • 9. Rule #2: Completeness All the data under consideration at the Source and Target should be same at a given point of time satisfying the business rules. DQ Management Source Table Target Table Overview DQ Testing Case Study Close
  • 10. Rule #2: Completeness Missing Records: Records which are only present at Source Extra Records: Records which are only present at Target Mismatched Records: Which contain at least one different value for the same record between Source and Target DQ Management Overview DQ Testing Case Study Close DeptID DeptName DeptStartDate 4 Admin 1-May-1999 5 IT 2-June-1997 DeptID DeptName DeptStartDate 3 Operations 11-May-1752 DeptID DeptName DeptStartDate DifferenceType 2 Finance 12-June-1988 At Source 2 Finance 12-June-1978 At Target
  • 11. Rule #3: Consistency This ensures that each user observes a consistent view of the data, including changes made by transactions There is data inconsistency between the Source & Target if the same data is stored in different formats or contain different values at different places. DQ Management Overview DQ Testing Case Study Close
  • 12. # Example 2 DQ Management Source_Dept Warehouse_Dept Data Mart_Dept Overview DQ Testing Case Study Close DeptID DeptName Revenue ($) DeptStartDate 1 HR 100 22-Aug-2007 2 Finance 200 12-June-1988 DeptID DeptName Revenue (Euro) DeptStartDate 1 HR 70 22/08/2007 2 Finance 140 12/06/1978 DeptID DeptName Revenue (Euro) DeptStartDate 1 Human Resource 70 22/08/2007 2 Finance 999999 12/06/1978
  • 13. Rule #3: Consistency Example #1: Zip code / Date / Currency formats a) b) DQ Management Overview DQ Testing Case Study Close DeptID DeptName Revenue ($ or Euro ) DeptStartDate Difference Point 1 HR 100 22-Aug-2007 Same data, Inconsistent due to Revenue & Currency format 1 HR 70 22/08/2007 Same data, Inconsistent due to Revenue & Currency format DeptID DeptName Revenue ($ or Euro ) DeptStartDate Difference Point 1 HR 100 22-Aug-2007 Same data, Inconsistent due to different format of Department name 1 Human Resource 70 22/08/2007 Same data, Inconsistent due to different format for department name
  • 14. Rule #3: Consistency Example #2: Regional Setting e.g. Language Example #3: Different values at different points DQ Management Overview DQ Testing Case Study Close DeptID DeptName Revenue ($ or Euro ) DeptStartDate Difference Point 1 Human Resource 100 22/08/2007 Same data, Inconsistent due to different language used 1 人的資源 100 22/08/2007 Same data, Inconsistent due to different language used DeptID DeptName Revenue ($ or Euro ) DeptStartDate Difference Point 2 Finance 140 12/06/1978 Same data, Inconsistent value for Revenue between Warehouse & Mart 2 Finance 999999 12/06/1978 Same data, Inconsistent value for Revenue between Warehouse & Mart
  • 15.
  • 16. Rule #4: Validity Example #1: Measuring “Unemployment” in a country -> Statistics are collected reliably month-on-month -> Definition of collecting “Unemployment” remains same. e.g. Definition of “unemployment” has changed in past 25 years hence we can’t compare old data with current data as comparison is not valid Example #2: Values falling outside a range DQ Management Overview DQ Testing Case Study Close DeptID DeptName Revenue (Euro) DeptStartDate 1 Human Resource 70 22/08/2255 2 Finance 999999 12/06/1752
  • 17. Rule #4: Validity Example #3: Dates having valid MM, DD, YYYY Example #4: Birth date > Death Date  DQ Management Overview DQ Testing Case Study Close DeptID DeptName Revenue (Euro) DeptStartDate 1 Human Resource 70 13/13/2007 EmpId EmpName DOB DOE 1 Jack 13/01/2008 24/11/1996
  • 18. Rule #5: Redundancy Physical Duplicates: All the columns values repeating for at least 2 records in a table Logical Duplicates: Business Key (list of column) values are repeating for at least 2 records in a table DQ Management Logical Dups Physical Dups Overview DQ Testing Case Study Close
  • 19. # Example 3 DQ Management Employee Example #1: Physical Duplicates Example #2: Logical Duplicates Overview DQ Testing Case Study Close EmpID EmpName EmpAddress Age DeptID 1 Jim #22, Jackson St., NY 23 1 2 Sam A302, Woodsvilla, WA 28 2 4 Samuel No. AA, Andrew Street, Redmond, WA 22 999 5 Jim #22, Jackson St., NY 23 1 2 Sam A302, Woodsvilla, WA 28 2 7 Jack #23, Jackson St., NY 41 NULL EmpID EmpName EmpAddress Age DeptID 2 Sam A302, Woodsvilla, WA 28 2 2 Sam A302, Woodsvilla, WA 28 2 EmpID EmpName EmpAddress Age DeptID 1 Jim #22, Jackson St., NY 23 1 5 Jim #22, Jackson St., NY 23 1
  • 20. Rule #6: RI If there are child records for which no corresponding parent records existing then they are called “Orphan Records” Logical relationship rules between parent & child tables should be defined by business. DQ Management Overview DQ Testing Case Study Close
  • 21. # Example 4 DQ Management Child Table:: Employee Parent Table:: Department Orphan Records Overview DQ Testing Case Study Close EmpID EmpName EmpAddress Age DeptID (FK) 1 Jim #22, Jackson St., NY 23 1 2 Sam A302, Woodsvilla, WA 28 2 4 Samuel No. AA, Andrew Street, Redmond, WA 22 999 5 Jim #22, Jackson St., NY 23 1 7 Jack #23, Jackson St., NY 41 NULL DeptID (PK) DeptName DeptStartDate 1 HR 22-Aug-2007 2 Finance 12-June-1988 3 Operations 11-May-1752 EmpID EmpName EmpAddress Age DeptID 4 Samuel No. AA, Andrew Street, Redmond, WA 22 999 7 Jack #23, Jackson St., NY 41 NULL
  • 22.
  • 23.
  • 24.
  • 25. Rule #8: Accuracy Degree to which data reflects Real World objects Accuracy is generally measured by comparing against something defined as “true” source of information DQ Management Accuracy Overview DQ Testing Case Study Close
  • 26. Rule #9: Usability Describes the relevance and the meaning of data Example #: Denotes the ease with which data can be used DQ Management Represented As Mart Table ReportingTable Overview DQ Testing Case Study Close DeptID (PK) DeptName 1 HR 2 Fin 3 Ops DeptID (PK) DeptName 1 Human Resources 2 Finance 3 Operations
  • 27.
  • 28. Testing :: DQ Case Study ADQC (Automated Data Quality Check) v2.0 DQ Management Overview DQ Testing Case Study Close
  • 29. DQ Test Management DQ Test Management: DQ Management Overview DQ Testing Case Study Close
  • 30.
  • 31.
  • 32.
  • 33.
  • 34. DQ Challenges DQ Management Overview DQ Testing Case Study Close
  • 35. DQ Best Practices DQ Management Overview DQ Testing Case Study Close
  • 36.
  • 37.
  • 38. Questions & Answers DQ Management Overview DQ Testing Case Study Close
  • 39. Thank you DQ Management Overview DQ Testing Case Study Close