Your SlideShare is downloading. ×
Test2008   Resurrecting The Prodigal Son   Data Quality  (http://www.geektester.blogspot.com)
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×

Saving this for later?

Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime - even offline.

Text the download link to your phone

Standard text messaging rates apply

Test2008 Resurrecting The Prodigal Son Data Quality (http://www.geektester.blogspot.com)

908
views

Published on

http://www.geektester.blogspot.com/

http://www.geektester.blogspot.com/

Published in: Technology, Business

0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
908
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
30
Comments
0
Likes
0
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide
  • Transcript

    • 1. Resurrecting the Prodigal Son - Data Quality “ Rise from Ashes: Battle of Data Quality Testing”
    • 2. Speakers
      • Bhoomika Goyal
        • Working @ Microsoft for over an year
        • Engineer from Mumbai
        • Loves playing Chess, Solving Puzzles and Reading
      • Raj
        • W orking @ Microsoft Business Intelligence COE
        • 5.5 + years of Testing experience
        • Loves watching movies, reading suspense thrillers & playing cricket
        • Passion - Testing ( http://www.itest.co.nr )
      www.Test2008.in
    • 3. Horror Story
      • Loss: $ 125 million
      • Reason: Discrepancy between the two
      • measures (rocket thrusts to newtons)
      • NASA Mars Climate Orbiter spacecraft LOST
      www.Test2008.in
    • 4. Bad, Bad, Bad Data Quality www.Test2008.in Erroneous Mailing hit $611 billion for US businesses in 2002
    • 5. DQ is not my problem? Think Again !!!!! www.Test2008.in
    • 6. DQ Hot Candidates www.Test2008.in Data Movement Migrations Backups Restore Import Export Data Warehousing Business Intelligence OLTP OLAP CRM ERP
    • 7. DQ Ishikawa Diagram www.Test2008.in Bad Decisions (Loss $ & Customers) DQ Reqmts not documented Lack of white box testing Data is dynamic CRM & ERPs Implementations Mergers / Take Over
    • 8. www.Test2008.in Data Quality DQ is an indicator that tells about the health of the DATA
    • 9. www.Test2008.in GOOD Data Quality DQ is good if data is fit to use for decision making
    • 10. www.Test2008.in Data Quality Testing
        • Involves validating , monitoring & reporting various attributes of Data
        • like
        • accuracy , validity , timeliness etc
    • 11. DQ Checks www.Test2008.in Row Counts Consistency Referential Integrity Redundancy Usability Completeness Domain Integrity Timeliness Accuracy Validity
    • 12. Row Count Check www.Test2008.in
    • 13. Completeness Check www.Test2008.in
    • 14. Among Voters seen Dead People www.Test2008.in US General Election: 4,755 deceased people voted
    • 15. Consistency Check www.Test2008.in
    • 16. A One-House, $400 Million Bubble Goes Pop www.Test2008.in $1,21, 000 overvalued at $ 400 million Govt. Expected $8 million as Tax Revenue
    • 17. Accuracy Check www.Test2008.in
    • 18. Validity Check www.Test2008.in
    • 19. CD Mail Fraud
      • Man received 22,260 CDs at discounted price by making each address different enough
      www.Test2008.in David Loshin 123 Main Street Any town, NY 11787 David Loshin 123 Main Street, Near Wal-Mart Any town, NY 11787
    • 20. Redundancy Check www.Test2008.in
    • 21. Referential Integrity Check www.Test2008.in
    • 22. Domain Integrity Check www.Test2008.in
    • 23. Timeliness www.Test2008.in
    • 24. How do we test DQ? www.Test2008.in DQ Rule Engine Metadata Results Create Procedure RowCount (SrcTbl, TgtTbl) Begin Declare SRC, TGT Integer Select SRC = Count(*) from SrcTbl Select TGT = Count(*) from TgtTbl) If SRC = TGT Then Return “PASS” Else Return SRC – TGT End If End Metadata Results Row Count Logic Duplicate Logic Create Procedure Duplicate(Tbl) Begin Declare Dup Integer Select Dup = Count of Select * from Tbl GroupBy <<ColumnList>> Having count(*) > 1 If Dup = 0 Then Return “PASS” Else Return Dup End If End End Rule Tbl1 Tbl2 RC Emp Emp RI Emp Dept DC HR HR Rule Result Comment RC Pass - RI Fail 10 DC Pass -
    • 25. You can’t improve what you can’t measure www.Test2008.in Threshold Time 5 % 10 % 100 % Data Quality Red: BAD DQ Yellow: Watch it Green: Good DQ
    • 26. DQ Testing is your friend !!!
      • High Data (Test) Coverage
      • Automation (Manual Effort Reduction)
      • High confidence about your data
      • Accurate Decisions
      www.Test2008.in
    • 27. References
      • http://www.dataqualitysolutions.com/data/index.shtml
      • http://searchdatamanagement.techtarget.com/generic/0,295582,sid91_gci1251808,00.html
      • http://en.wikipedia.org/wiki/Effect_of_Hurricane_Katrina_on_New_Orleans
      www.Test2008.in
    • 28.
      • Thank you.
      • [email_address]
      • [email_address]
      www.Test2008.in