Your SlideShare is downloading. ×
0
Bcs 20080228 Ku
Bcs 20080228 Ku
Bcs 20080228 Ku
Bcs 20080228 Ku
Bcs 20080228 Ku
Bcs 20080228 Ku
Bcs 20080228 Ku
Bcs 20080228 Ku
Bcs 20080228 Ku
Bcs 20080228 Ku
Bcs 20080228 Ku
Bcs 20080228 Ku
Bcs 20080228 Ku
Bcs 20080228 Ku
Bcs 20080228 Ku
Bcs 20080228 Ku
Bcs 20080228 Ku
Bcs 20080228 Ku
Bcs 20080228 Ku
Bcs 20080228 Ku
Bcs 20080228 Ku
Bcs 20080228 Ku
Bcs 20080228 Ku
Bcs 20080228 Ku
Bcs 20080228 Ku
Bcs 20080228 Ku
Bcs 20080228 Ku
Bcs 20080228 Ku
Bcs 20080228 Ku
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×
Saving this for later? Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime – even offline.
Text the download link to your phone
Standard text messaging rates apply

Bcs 20080228 Ku

344

Published on

A Discussion of the 5 types of data and information quality defects and the ways in which they can arise. First given at a BCS meeting a solent University 200803

A Discussion of the 5 types of data and information quality defects and the ways in which they can arise. First given at a BCS meeting a solent University 200803

Published in: Technology, Business
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
344
On Slideshare
0
From Embeds
0
Number of Embeds
1
Actions
Shares
0
Downloads
0
Comments
0
Likes
0
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide

Transcript

  • 1. The quality of information and data is strained International Association for Information and Data Quality Keith Underdown Convenor, British Community of Practice International Association for Information and Data Quality
  • 2. Shameless Plug International Association for  Information & Data Quality www.iaidq.org ◦ Student Membership—$25 ◦ Personal Membership—$85 International Association for Information and Data Quality ◦ Corporate Membership Available ◦ Extensive Conference Discounts www.justgiving.com/keithunderdown  ◦ My fundraising page ◦ Reward me if you enjoy my presentation International Association for Information and Data Quality
  • 3. Data “Everybody knows what data is”!  ◦ “Define:data” in a Google search gives 41 results ◦ Mix of International Association for Information and Data Quality  “data processing” biased  Philosophical  Irrelevant (Data is an android in Startrek TNG) My Preference:  A collection of facts held in a formalized manner suitable for processing by automatic or human means. International Association for Information and Data Quality
  • 4. Fundamental Data Quality The facts in the case can be:  ◦ Inaccurate ◦ Incomplete ◦ Inconsistent International Association for Information and Data Quality ◦ Invalid ◦ Incomprehensible International Association for Information and Data Quality
  • 5. The Five “I’s”  Incomplete Data ◦ mandatory fields with null, empty string, etc…  Invalid Data ◦ values outside the allowed value set or fails tests against rules  Inconsistent Data International Association inconsistencyand Data Quality ◦ intra-record for Information ◦ inter-record inconsistency ◦ Inter-datastore Consistency  Inaccurate Data: ◦ Statistical outliers & other “sore thumbs”  E.g. Price 10 times higher than similar models Incomprehensible Data  ◦ without full and accurate context International Association for Information and Data Quality
  • 6. Incomplete Data Facts essential to business process are  missing ◦ Implies that data validation incorrect ◦ Often arises during bulk import of data International Association for Information and Data Quality  Data not immediately available so validation relaxed  Follow-up not completed  Database field cannot be made mandatory International Association for Information and Data Quality
  • 7. Example Change in Law made knowledge of  Social Security number mandatory ◦ Too expensive to go to customers ◦ Populate at need International Association for Information and Data Quality ◦ Telephone agents used their own Customer failed to fill in DoB field  ◦ Data entry clerk guessed! ◦ Customer has high value transaction turned down ◦ Lots of adverse publicity International Association for Information and Data Quality
  • 8. How can we avoid these? Plan for their absence  ◦ When creating new databases plan to populate fields ◦ When bulk updates required bite the bullet International Association for Information and Data Quality ◦ Ensure agents have time and understand the need to collect data  Check for likely “cheats” International Association for Information and Data Quality
  • 9. Invalid Data Data that fails genuine business rules  Or  Fails unstated real world validation  ◦ Company name info spills over into International Association for Information and Data Quality address fields International Association for Information and Data Quality
  • 10. Examples 01222 535681 looks like a valid phone  no. ◦ But Cardiff is an exception  029 2053 5681 International Association for might work it out Quality  Human being Information and Data  Power dialler won’t 02/03/08  ◦ US=3rd February 08 ◦ UK= 2nd March 08 ◦ Which century? International Association for Information and Data Quality
  • 11. How do we avoid these Make field syntax as tight as possible  ◦ E.g. Always use date-stamp fields for dates ◦ Use external validation systems International Association Address File and Data Quality  E.g. Postal for Information ◦ Use masks to validate input patterns  Use carefully, still allows cheating ◦ Use drop-down lists from reference tables International Association for Information and Data Quality
  • 12. Inconsistent Data  Intra-record inconsistency: ◦ Gender=“m”, Marital-Status=“Wife”;  inter-record inconsistency ◦ R1: VIN=VF7N1KFXF36772582; International Association forMark=T87BRB Quality Registration Information and Data ◦ R2: VIN=VF7N1KFXF36772582; Registration Mark=CC04PNL  Inter-datastore inconsistency ◦ E.g. Customer data in many data stores International Association for Information and Data Quality
  • 13. How do we avoid these? “Common sense validation”  ◦ Men cannot be wives But: what is correct value?  So: don’t over-specify  International Association for Information and Data Quality ◦ Marital status? ◦ Better: Relationship Status Legally Married  In Civil Partnership  Unmarried  Divorced  International Association for Information and Data Quality
  • 14. Careful of surrogate keys Entities can often be identified in  different ways ◦ NI Number ◦ NHS Number International Association for Information and Data Quality These are surrogate keys  All key fields should be unique  VIN example could not have arisen if  field required to be unique Nor would have SSN example earlier  International Association for Information and Data Quality
  • 15. Root Cause Often historically poor data quality  ◦ NI numbers poorly administered  Many to many relationships! Keys not unique in practice  International Association for Information and Data Quality  Allows for new errors in data entry International Association for Information and Data Quality
  • 16. An Aside—Checksums Checksums ancient technique to  validate input data ◦ Additional digit attached to key ◦ Derived from key bytes International Association for Information and Data Quality ◦ Mis-keying always generates mismatch Not part of key so store separately if  at all Better to generate key automatically  validate against existing  International Association for Information and Data Quality
  • 17. Inaccurate Data  Statistical outliers & other “sore thumbs” ◦ E.g. Price 10 times higher than similar model ◦ River Temperature >100° C ◦ Gas Bill orders of magnitude too high International Association for Information and Data Quality Transposed Digits  ◦ Accountancy packages have lots of tricks to find these Spurious Accuracy  ◦ Wall length in mm ◦ Averages computed to too many places International Association for Information and Data Quality
  • 18. Incomprehensible Data The facts could meet all the previous  strictures but still be useless They must be put in context  International Association for Information and Data Quality International Association for Information and Data Quality
  • 19. Data in Context 3.142 is a fact   Gertie 3.142 2005-02-02 is data Name Height Measurement Date  International Association for Information and Data Quality Gertie 3.142 2005-02-02 is becoming “Data in Context”  Still need ◦ units for Height (metres) ◦ Date rules (ISO 8601) ◦… International Association for Information and Data Quality
  • 20. No Context => Expensive errors Mars Climate Orbiter  ◦ Discrepancies observed in approach but not formally noted ◦ Spacecraft vanished during insertion into orbit International Association for Information and Data Quality ◦ Engineers specified forces to applied in lb Force (poundal) not Newtons ◦ Factor of 4.45 difference! ◦ They did it again for Mars Polar Lander! International Association for Information and Data Quality
  • 21. More examples Redefining field usage on the fly  ◦ 2-byte field in database but highest value <256 ◦ Project team seeks to avoid cost of inserting new field International Association for Information and Data Quality ◦ Redefines field in code to be two 1-byte fields ◦ Existing reports start giving odd results but nobody notices ◦ Wrong business decisions made International Association for Information and Data Quality
  • 22. Information Information is  ◦ What sentient beings use to:  Facilitate decision-making  Communicate International Association for are sentient (so far)  Only humans Information and Data Quality ◦ Information only exists when humans in value chain Machine-machine communication  ◦ Data in context International Association for Information and Data Quality
  • 23. What is Quality Information Conveys the right “impression”  ◦ Trespassing on Conrad’s territory ◦ We’ll look at some graphical examples Takes into account cultural differences  International Association for Information and Data Quality ◦ “Wait while the red light flashes” International Association for Information and Data Quality
  • 24. Phone Number example again 01222 331988   I see that and “know” that it is wrong  I could programme the rule to convert an erroneously converted number International Association for Information and Data Quality  01222 => 029  Prefix subscriber number with 20  But 029 is officially the code for Wales and other prefixes will appear, 21 already in use. International Association for Information and Data Quality
  • 25. Information Presentation Which of these companies would you  rather buy into? International Association for Information and Data Quality 1 2 3 4 5 6 7 8 International Association for Information and Data Quality
  • 26. Illegality US accounting rules now outlaw chart  manipulations Money Laundering rules  Managers could go to prison International Association for Information and Data Quality Basel II and Sarbanes-Oxley • Directors could go to prison International Association for Information and Data Quality
  • 27. Data Quality is Free Poor Data Quality costs 10-30% of  Turnover routinely Particular issues can be catastrophic  ◦ Regulator can fine companies International Association for Information and Data Quality ◦ People can sue ◦ Officers and directors could go to jail Data Quality is better then Free  But needs to be worked at  International Association for Information and Data Quality
  • 28. No IQ without DQ Cannot have good Information Quality  ◦ Without good quality data Information Quality is a business issue  ◦ Needs complete commitment International Association for Information and Data Quality ◦ Very strong management process Information is the Third Asset  ◦ It is not a cost centre ◦ It is not reflected on the bottom line ◦ Yet International Association for Information and Data Quality
  • 29. Any Questions? keith.underdown@iaidq.org

×