Data Quality Testing Generic (http://www.geektester.blogspot.com/)
Upcoming SlideShare
Loading in...5
×
 

Data Quality Testing Generic (http://www.geektester.blogspot.com/)

on

  • 8,958 views

http://www.geektester.blogspot.com/

http://www.geektester.blogspot.com/

Statistics

Views

Total Views
8,958
Views on SlideShare
8,899
Embed Views
59

Actions

Likes
1
Downloads
305
Comments
2

4 Embeds 59

http://www.geektester.blogspot.com 30
http://www.slideshare.net 16
http://geektester.blogspot.com 12
http://static.slideshare.net 1

Accessibility

Categories

Upload Details

Uploaded via as Microsoft PowerPoint

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
  • planning DQ test and this gave a good overview, tnx
    Are you sure you want to
    Your message goes here
    Processing…
  • great presentation
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

Data Quality Testing Generic (http://www.geektester.blogspot.com/) Data Quality Testing Generic (http://www.geektester.blogspot.com/) Presentation Transcript

    • “ Virtually everything in business today is an undifferentiated commodity, except how a company manages its information. How you manage information determines whether you win or lose.” Bill Gates
    • [email_address]
    • -Narendra Parihar
    • - Bhoomika Goyal - Raj Kamal (rajkamal)
    Data Quality Testing
  • Agenda
    • Data Quality Overview
    • Testing :: DQ Categories / Checks
    • Testing :: DQ Case Study
    • DQ Test Management
    • DQ Benefits & Challenges
    • Q & A
    DQ Management Overview DQ Testing Case Study Close
  • Overview: DQ Definition
      • Data are of high quality "if they are fit for their intended uses in operations , decision making and planning " (J.M. Juran).
      • The state of completeness, validity, consistency, timeliness and accuracy that makes data appropriate for a specific use.
    • DQ Impact : Organizations with poor data quality spend time working with conflicting reports and flawed business plans, resulting in erroneous decisions that are made with outdated, inconsistent, and invalid data
    DQ Management Overview DQ Testing Case Study Close
  • Overview: DQ Stats
      • “ End users spend as much as 40-50% of a typical IT budget reworking data in one application to make it work with another”. The high cost of low data quality.
      • The Data Warehouse Institute estimates that bad customer data costs American companies upwards of $600billion dollars per year By Wayne W. Eckerson
    • POOR Data Quality can kill your business!!!!
    DQ Management Overview DQ Testing Case Study Close
  • Testing :: DQ CheatSheet DQ Management Overview DQ Testing Case Study Close
  • Rule #1: Row Counts Count of records at Source and Target should be same at a given point of time. DQ Management Missing Records Extra Records Overview DQ Testing Case Study Close
  • # Example 1 DQ Management Source_Dept Target_Dept Overview DQ Testing Case Study Close DeptID DeptName DeptStartDate 1 HR 22-Aug-2007 2 Finance 12-June-1988 4 Admin 1-May-1999 5 IT 2-June-1997 DeptID DeptName DeptStartDate 1 Human Resource 22-Aug-2007 2 Finance 12-June-1978 3 Operations 11-May-1752
  • Rule #1: Row Counts Missing Records: Records which are only present at Source Extra Records: Records which are only present at Target DQ Management Overview DQ Testing Case Study Close DeptID DeptName DeptStartDate 4 Admin 1-May-1999 5 IT 2-June-1997 DeptID DeptName DeptStartDate 3 Operations 11-May-1752
  • Rule #2: Completeness All the data under consideration at the Source and Target should be same at a given point of time satisfying the business rules. DQ Management Source Table Target Table Overview DQ Testing Case Study Close
  • Rule #2: Completeness Missing Records: Records which are only present at Source Extra Records: Records which are only present at Target Mismatched Records: Which contain at least one different value for the same record between Source and Target DQ Management Overview DQ Testing Case Study Close DeptID DeptName DeptStartDate 4 Admin 1-May-1999 5 IT 2-June-1997 DeptID DeptName DeptStartDate 3 Operations 11-May-1752 DeptID DeptName DeptStartDate DifferenceType 2 Finance 12-June-1988 At Source 2 Finance 12-June-1978 At Target
  • Rule #3: Consistency This ensures that each user observes a consistent view of the data, including changes made by transactions There is data inconsistency between the Source & Target if the same data is stored in different formats or contain different values at different places. DQ Management Overview DQ Testing Case Study Close
  • # Example 2 DQ Management Source_Dept Warehouse_Dept Data Mart_Dept Overview DQ Testing Case Study Close DeptID DeptName Revenue ($) DeptStartDate 1 HR 100 22-Aug-2007 2 Finance 200 12-June-1988 DeptID DeptName Revenue (Euro) DeptStartDate 1 HR 70 22/08/2007 2 Finance 140 12/06/1978 DeptID DeptName Revenue (Euro) DeptStartDate 1 Human Resource 70 22/08/2007 2 Finance 999999 12/06/1978
  • Rule #3: Consistency Example #1: Zip code / Date / Currency formats a) b) DQ Management Overview DQ Testing Case Study Close DeptID DeptName Revenue ($ or Euro ) DeptStartDate Difference Point 1 HR 100 22-Aug-2007 Same data, Inconsistent due to Revenue & Currency format 1 HR 70 22/08/2007 Same data, Inconsistent due to Revenue & Currency format DeptID DeptName Revenue ($ or Euro ) DeptStartDate Difference Point 1 HR 100 22-Aug-2007 Same data, Inconsistent due to different format of Department name 1 Human Resource 70 22/08/2007 Same data, Inconsistent due to different format for department name
  • Rule #3: Consistency Example #2: Regional Setting e.g. Language Example #3: Different values at different points DQ Management Overview DQ Testing Case Study Close DeptID DeptName Revenue ($ or Euro ) DeptStartDate Difference Point 1 Human Resource 100 22/08/2007 Same data, Inconsistent due to different language used 1 人的資源 100 22/08/2007 Same data, Inconsistent due to different language used DeptID DeptName Revenue ($ or Euro ) DeptStartDate Difference Point 2 Finance 140 12/06/1978 Same data, Inconsistent value for Revenue between Warehouse & Mart 2 Finance 999999 12/06/1978 Same data, Inconsistent value for Revenue between Warehouse & Mart
  • Rule #4: Validity
    • Validity is the correctness and reasonableness of data.
    • A valid measure must be reliable, but a reliable measure need not be valid.
    • Questions:
      • -> Is Information Reliable?
      • -> How is Information measured ?
    DQ Management Overview DQ Testing Case Study Close
  • Rule #4: Validity Example #1: Measuring “Unemployment” in a country -> Statistics are collected reliably month-on-month -> Definition of collecting “Unemployment” remains same. e.g. Definition of “unemployment” has changed in past 25 years hence we can’t compare old data with current data as comparison is not valid Example #2: Values falling outside a range DQ Management Overview DQ Testing Case Study Close DeptID DeptName Revenue (Euro) DeptStartDate 1 Human Resource 70 22/08/2255 2 Finance 999999 12/06/1752
  • Rule #4: Validity Example #3: Dates having valid MM, DD, YYYY Example #4: Birth date > Death Date  DQ Management Overview DQ Testing Case Study Close DeptID DeptName Revenue (Euro) DeptStartDate 1 Human Resource 70 13/13/2007 EmpId EmpName DOB DOE 1 Jack 13/01/2008 24/11/1996
  • Rule #5: Redundancy Physical Duplicates: All the columns values repeating for at least 2 records in a table Logical Duplicates: Business Key (list of column) values are repeating for at least 2 records in a table DQ Management Logical Dups Physical Dups Overview DQ Testing Case Study Close
  • # Example 3 DQ Management Employee Example #1: Physical Duplicates Example #2: Logical Duplicates Overview DQ Testing Case Study Close EmpID EmpName EmpAddress Age DeptID 1 Jim #22, Jackson St., NY 23 1 2 Sam A302, Woodsvilla, WA 28 2 4 Samuel No. AA, Andrew Street, Redmond, WA 22 999 5 Jim #22, Jackson St., NY 23 1 2 Sam A302, Woodsvilla, WA 28 2 7 Jack #23, Jackson St., NY 41 NULL EmpID EmpName EmpAddress Age DeptID 2 Sam A302, Woodsvilla, WA 28 2 2 Sam A302, Woodsvilla, WA 28 2 EmpID EmpName EmpAddress Age DeptID 1 Jim #22, Jackson St., NY 23 1 5 Jim #22, Jackson St., NY 23 1
  • Rule #6: RI If there are child records for which no corresponding parent records existing then they are called “Orphan Records” Logical relationship rules between parent & child tables should be defined by business. DQ Management Overview DQ Testing Case Study Close
  • # Example 4 DQ Management Child Table:: Employee Parent Table:: Department Orphan Records Overview DQ Testing Case Study Close EmpID EmpName EmpAddress Age DeptID (FK) 1 Jim #22, Jackson St., NY 23 1 2 Sam A302, Woodsvilla, WA 28 2 4 Samuel No. AA, Andrew Street, Redmond, WA 22 999 5 Jim #22, Jackson St., NY 23 1 7 Jack #23, Jackson St., NY 41 NULL DeptID (PK) DeptName DeptStartDate 1 HR 22-Aug-2007 2 Finance 12-June-1988 3 Operations 11-May-1752 EmpID EmpName EmpAddress Age DeptID 4 Samuel No. AA, Andrew Street, Redmond, WA 22 999 7 Jack #23, Jackson St., NY 41 NULL
  • Rule #7: Domain Integrity
    • Possible values that can be allowed in a data element.
    DQ Management Overview DQ Testing Case Study Close
  • Rule #7: Domain Integrity
    • Example #1: Invalid Lookup Table Values (Valid:: HR, Finance, Operations)
    • Example #2: Truncation::Data Types, Data Length etc
    DQ Management Source Table Target Table Overview DQ Testing Case Study Close DeptID (PK) DeptName 1 HR 2 Finance 3 Operations 4 Invalid Dept DeptID (PK) DeptName (Varchar(50)) 1 HR 2 Finance 3 Operations DeptID (PK) DeptName (Varchar (2)) 1 HR 2 Fi 3 Op
  • Rule #7: Domain Integrity
    • Example #3: Constraints: NOT NULL, CHECK, PK, UK etc
    DQ Management Source Table Target Table Overview DQ Testing Case Study Close DeptID (PK) DeptName (NOT NULL) 1 HR 2 Finance 3 Operations 4 Invalid Dept DeptID (PK) DeptName (NOT NULL) 1 HR 2 Finance 3 NULL 4 NULL
  • Rule #8: Accuracy Degree to which data reflects Real World objects Accuracy is generally measured by comparing against something defined as “true” source of information DQ Management Accuracy Overview DQ Testing Case Study Close
  • Rule #9: Usability Describes the relevance and the meaning of data Example #: Denotes the ease with which data can be used DQ Management Represented As Mart Table ReportingTable Overview DQ Testing Case Study Close DeptID (PK) DeptName 1 HR 2 Fin 3 Ops DeptID (PK) DeptName 1 Human Resources 2 Finance 3 Operations
  • Rule #10: Timeliness
    • Defines if data required is available when required as per SLA
    • Example #1: Data Freshness
      • If everyday data is pulled 24 times and target doesn’t get even for one cycle, “data freshness” get impacted and users see old data which can impact business decisions.
    • For decision making & mission critical system, timely availability of information is must.
    DQ Management Overview DQ Testing Case Study Close
  • Testing :: DQ Case Study ADQC (Automated Data Quality Check) v2.0 DQ Management Overview DQ Testing Case Study Close
  • DQ Test Management DQ Test Management: DQ Management Overview DQ Testing Case Study Close
  • DQTM: Test Planning
    • DQ Test Management: Planning
    • 1. Ensure DQ Requirement are covered in following documents:
          • BRD
          • FSD
          • Test Plan
      • 2. Ensure DQ Requirements are clarified by Business / PDMs
    DQ Management Overview DQ Testing Case Study Close
  • DQTM: Test Design
    • DQ Test Management: Test Case Design
    • 1. Ensure DQ Requirement are covered in Test Scenarios and Test Cases
      • 2. Ensure DQ Test cases are automated.
    DQ Management Overview DQ Testing Case Study Close
  • DQTM: Test Execution
    • DQ Test Management: Test Execution
    • 1. Ensure Test Cases related to DQ Requirements are executed in Test cycles
      • 2. Ensure DQ Test results & DQ Bugs are shared with the Business / PDM in the triage meeting to understand the correct priority based on the impact.
    DQ Management Overview DQ Testing Case Study Close
  • DQTM: Test Monitoring
    • DQ Test Management: Test Monitoring
    • 1. Regularly collect DQ Metrics to depict the trend
      • 2. If DQ Issues Trend is upward, immediate action need to be taken
    DQ Management Overview DQ Testing Case Study Close
  • DQ Challenges DQ Management Overview DQ Testing Case Study Close
  • DQ Best Practices DQ Management Overview DQ Testing Case Study Close
  • DQ Jargons
    • DATA GOVERNANCE
      • Data governance (DG) refers to the overall management of the availability, usability, integrity, and security of the data employed in an enterprise
      • Data governance program includes a governing body or council, a defined set of procedures, and a plan to execute those procedures
    • DATA STEWARDS
      • Data Stewards are those individuals ultimately responsible for the definition, management, control, integrity or maintenance of Enterprise data.
    • DATA INTEGRITY
      • Data integrity is the assurance that data is correct and consistent--that the data correctly reflects the "real" world.
    DQ Management Overview DQ Testing Case Study Close
  • References
      • www.infoimpact.com
      • http://www.idma.org/valuePropositionGeneral.pdf
      • http://www.intelligententerprise.com/showArticle.jhtml?articleID=17701630
      • http://www.sociology.org.uk/p1mc5n1a.htm
      • http://blogs.sun.com/emmyp/entry/ensuring_the_validity_of_your
      • http://www.dmreview.com/dmdirect/20021108/6019-1.html
    DQ Management Overview DQ Testing Case Study Close
  • Questions & Answers DQ Management Overview DQ Testing Case Study Close
  • Thank you DQ Management Overview DQ Testing Case Study Close