Data-Ed Online: Engineering Solutions to Data Quality Challenges
Upcoming SlideShare
Loading in...5
×
 

Like this? Share it with your network

Share

Data-Ed Online: Engineering Solutions to Data Quality Challenges

on

  • 1,236 views

This webinar originally aired on Tuesday, October 9th, 2012. It is part of Data Blueprint's ongoing webinar series on data management with Dr. Aiken. ...

This webinar originally aired on Tuesday, October 9th, 2012. It is part of Data Blueprint's ongoing webinar series on data management with Dr. Aiken.

Sign up for future sessions at http://www.datablueprint.com/webinar-schedule.

Abstract:
This presentation provides guidance to organizations considering or preparing for data quality initiatives. We will illustrate how organizations with chronic business challenges often can trace the root of the problem to poor data quality. Showing how data quality can be engineered provides a useful framework in which to develop an organizational approach. This in turn will allow organizations to more quickly identify data problems caused by structural issues versus practice-oriented defects. Participants will also learn the importance of practicing data quality engineering quantification.

Statistics

Views

Total Views
1,236
Views on SlideShare
566
Embed Views
670

Actions

Likes
1
Downloads
29
Comments
0

2 Embeds 670

http://www.datablueprint.com 669
http://172.16.5.31 1

Accessibility

Categories

Upload Details

Uploaded via as Apple Keynote

Usage Rights

CC Attribution-NonCommercial-NoDerivs LicenseCC Attribution-NonCommercial-NoDerivs LicenseCC Attribution-NonCommercial-NoDerivs License

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • Answer: C\n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • Answer: A\n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • Answer: B\n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n

Data-Ed Online: Engineering Solutions to Data Quality Challenges Presentation Transcript

  • 1. Data Quality Engineering TITLE This presentation provides guidance to organizations considering data quality initiatives or preparing for data quality initiatives. This talk will illustrate how organizations with chronic business challenges often can trace the root of the problem to poor data quality. Showing how data quality can be engineered provides a useful framework in which to develop an organizational approach. This in turn will allow organizations to more quickly identify data problems caused by structural issues versus practice-oriented defects. Participants will also Starting learn the importance of practicing data quality point for new system Metadata Creation • Define Data Architecture • Define Data Model Structures Metadata Refinement • Correct Structural Defects • Update Implementation engineering quantification. development architecture data architecture refinements Metadata Structuring Data Refinement • Implement Data Model Views • Correct Data Value Defects • Populate Data Model Views corrected • Re-store Data Values data data Date: October 9, 2012 architecture and Metadata & data models Data Storage data performance metadata Data Creation facts & Data Assessment • Create Data meanings • Assess Data Values Time: 2:00 PM ET • Verify Data Values shared data updated data • Assess Metadata Starting point for existing Presented by: Dr. Peter Aiken Data Utilization Data Manipulation systems • Inspect Data • Manipulate Data • Present Data • Updata Data PRODUCED BY CLASSIFICATION DATE SLIDE DATA BLUEPRINT 10124-C W. BROAD ST, GLEN ALLEN, VA 23060 EDUCATION 10/09/12 110/04/12 © Copyright this and previous years by Data Blueprint - all rights reserved!
  • 2. TITLE Commonly Asked Questions PRODUCED BY CLASSIFICATION DATE SLIDE EDUCATION 209/10/12 DATACopyright this and previous years by Data W. BROAD reserved! © BLUEPRINT 10124-C Blueprint - all rights ST, GLEN ALLEN, VA 23060
  • 3. TITLE Commonly Asked Questions 1) Will I get copies of the slides after the event? PRODUCED BY CLASSIFICATION DATE SLIDE EDUCATION 209/10/12 DATACopyright this and previous years by Data W. BROAD reserved! © BLUEPRINT 10124-C Blueprint - all rights ST, GLEN ALLEN, VA 23060
  • 4. TITLE Commonly Asked Questions 1) Will I get copies of the slides after the event? 2) Is this being recorded so I can view it afterwards? PRODUCED BY CLASSIFICATION DATE SLIDE EDUCATION 209/10/12 DATACopyright this and previous years by Data W. BROAD reserved! © BLUEPRINT 10124-C Blueprint - all rights ST, GLEN ALLEN, VA 23060
  • 5. Get Social With Us! TITLE Live Twitter Feed Like Us on Facebook Join the Group Join the conversation! www.facebook.com/ Data Management & Follow us: datablueprint Business Intelligence @datablueprint Post questions and Ask questions, gain insights comments and collaborate with fellow @paiken Find industry news, insightful data management Ask questions and submit content professionals your comments: #dataed and event updates. PRODUCED BY CLASSIFICATION DATE SLIDE DATA BLUEPRINT 10124-C W. BROAD ST, GLEN ALLEN, VA 23060 EDUCATION 10/09/12 310/04/12 © Copyright this and previous years by Data Blueprint - all rights reserved!
  • 6. 4 - datablueprint.com 10/16/2012 © Copyright this and previous years by Data Blueprint - all rights reserved!
  • 7. Meet Your Presenter: Dr. Peter Aiken • Internationally recognized thought- leader in the data management field - 30 years of experience – Recipient of multiple international awards – Founder, Data Blueprint (http://datablueprint.com) • 7 books and dozens of articles • Experienced w/ 500+ data management practices in 20 countries • Multi-year immersions with organizations as diverse as the US DoD, Deutsche Bank, Nokia, Wells Fargo, the Commonwealth of Virginia and Walmart4 - datablueprint.com 10/16/2012 © Copyright this and previous years by Data Blueprint - all rights reserved!
  • 8. Data Quality Engineering Data Quality EngineeringDATA BLUEPRINT 10124-C W. BROAD ST, GLEN ALLEN, VA 23060 EDUCATION 10/09/12
  • 9. Data Quality Engineering Data Quality EngineeringDATA BLUEPRINT 10124-C W. BROAD ST, GLEN ALLEN, VA 23060 EDUCATION 10/09/12
  • 10. Data Quality Engineering Data Quality EngineeringDATA BLUEPRINT 10124-C W. BROAD ST, GLEN ALLEN, VA 23060 EDUCATION 10/09/12
  • 11. Data Quality Engineering Data Quality EngineeringDATA BLUEPRINT 10124-C W. BROAD ST, GLEN ALLEN, VA 23060 EDUCATION 10/09/12
  • 12. TITLE Outline Tweeting now: #dataed PRODUCED BY CLASSIFICATION DATE SLIDE DATA BLUEPRINT 10124-C W. BROAD ST, GLEN ALLEN, VA 23060 EDUCATION 10/09/12 610/04/12 © Copyright this and previous years by Data Blueprint - all rights reserved!
  • 13. TITLE Outline 1. Data Management Introduction Tweeting now: #dataed PRODUCED BY CLASSIFICATION DATE SLIDE DATA BLUEPRINT 10124-C W. BROAD ST, GLEN ALLEN, VA 23060 EDUCATION 10/09/12 610/04/12 © Copyright this and previous years by Data Blueprint - all rights reserved!
  • 14. TITLE Outline 1. Data Management Introduction 2. Data Quality Definitions & Overview Tweeting now: #dataed PRODUCED BY CLASSIFICATION DATE SLIDE DATA BLUEPRINT 10124-C W. BROAD ST, GLEN ALLEN, VA 23060 EDUCATION 10/09/12 610/04/12 © Copyright this and previous years by Data Blueprint - all rights reserved!
  • 15. TITLE Outline 1. Data Management Introduction 2. Data Quality Definitions & Overview 3. DQM Cycle Tweeting now: #dataed PRODUCED BY CLASSIFICATION DATE SLIDE DATA BLUEPRINT 10124-C W. BROAD ST, GLEN ALLEN, VA 23060 EDUCATION 10/09/12 610/04/12 © Copyright this and previous years by Data Blueprint - all rights reserved!
  • 16. TITLE Outline 1. Data Management Introduction 2. Data Quality Definitions & Overview 3. DQM Cycle 4. DQ Awareness & Requirements Tweeting now: #dataed PRODUCED BY CLASSIFICATION DATE SLIDE DATA BLUEPRINT 10124-C W. BROAD ST, GLEN ALLEN, VA 23060 EDUCATION 10/09/12 610/04/12 © Copyright this and previous years by Data Blueprint - all rights reserved!
  • 17. TITLE Outline 1. Data Management Introduction 2. Data Quality Definitions & Overview 3. DQM Cycle 4. DQ Awareness & Requirements 5. DQ Dimensions Tweeting now: #dataed PRODUCED BY CLASSIFICATION DATE SLIDE DATA BLUEPRINT 10124-C W. BROAD ST, GLEN ALLEN, VA 23060 EDUCATION 10/09/12 610/04/12 © Copyright this and previous years by Data Blueprint - all rights reserved!
  • 18. TITLE Outline 1. Data Management Introduction 2. Data Quality Definitions & Overview 3. DQM Cycle 4. DQ Awareness & Requirements 5. DQ Dimensions 6. Data Quality Tools Tweeting now: #dataed PRODUCED BY CLASSIFICATION DATE SLIDE DATA BLUEPRINT 10124-C W. BROAD ST, GLEN ALLEN, VA 23060 EDUCATION 10/09/12 610/04/12 © Copyright this and previous years by Data Blueprint - all rights reserved!
  • 19. TITLE Outline 1. Data Management Introduction 2. Data Quality Definitions & Overview 3. DQM Cycle 4. DQ Awareness & Requirements 5. DQ Dimensions 6. Data Quality Tools 7. Guiding Principles Tweeting now: #dataed PRODUCED BY CLASSIFICATION DATE SLIDE DATA BLUEPRINT 10124-C W. BROAD ST, GLEN ALLEN, VA 23060 EDUCATION 10/09/12 610/04/12 © Copyright this and previous years by Data Blueprint - all rights reserved!
  • 20. TITLE Outline 1. Data Management Introduction 2. Data Quality Definitions & Overview 3. DQM Cycle 4. DQ Awareness & Requirements 5. DQ Dimensions 6. Data Quality Tools 7. Guiding Principles Tweeting now: 8. References and Q&A #dataed PRODUCED BY CLASSIFICATION DATE SLIDE DATA BLUEPRINT 10124-C W. BROAD ST, GLEN ALLEN, VA 23060 EDUCATION 10/09/12 610/04/12 © Copyright this and previous years by Data Blueprint - all rights reserved!
  • 21. TITLE Outline 1. Data Management Introduction 2. Data Quality Definitions & Overview 3. DQM Cycle 4. DQ Awareness & Requirements 5. DQ Dimensions 6. Data Quality Tools 7. Guiding Principles Tweeting now: 8. References and Q&A #dataed PRODUCED BY CLASSIFICATION DATE SLIDE DATA BLUEPRINT 10124-C W. BROAD ST, GLEN ALLEN, VA 23060 EDUCATION 10/09/12 610/04/12 © Copyright this and previous years by Data Blueprint - all rights reserved!
  • 22. TITLE The DAMA Guide to the Data Management Body of Knowledge Data Management Functions PRODUCED BY CLASSIFICATION DATE SLIDE DATA BLUEPRINT 10124-C W. BROAD ST, GLEN ALLEN, VA 23060 EDUCATION 10/09/12 710/04/12 © Copyright this and previous years by Data Blueprint - all rights reserved!
  • 23. TITLE The DAMA Guide to the Data Management Body of Knowledge Published by DAMA International Data Management Functions PRODUCED BY CLASSIFICATION DATE SLIDE DATA BLUEPRINT 10124-C W. BROAD ST, GLEN ALLEN, VA 23060 EDUCATION 10/09/12 710/04/12 © Copyright this and previous years by Data Blueprint - all rights reserved!
  • 24. TITLE The DAMA Guide to the Data Management Body of Knowledge Published by DAMA International • The professional association for Data Managers (40 chapters worldwide) Data Management Functions PRODUCED BY CLASSIFICATION DATE SLIDE DATA BLUEPRINT 10124-C W. BROAD ST, GLEN ALLEN, VA 23060 EDUCATION 10/09/12 710/04/12 © Copyright this and previous years by Data Blueprint - all rights reserved!
  • 25. TITLE The DAMA Guide to the Data Management Body of Knowledge Published by DAMA International • The professional association for Data Managers (40 chapters worldwide) DMBoK organized around Data Management Functions PRODUCED BY CLASSIFICATION DATE SLIDE DATA BLUEPRINT 10124-C W. BROAD ST, GLEN ALLEN, VA 23060 EDUCATION 10/09/12 710/04/12 © Copyright this and previous years by Data Blueprint - all rights reserved!
  • 26. TITLE The DAMA Guide to the Data Management Body of Knowledge Published by DAMA International • The professional association for Data Managers (40 chapters worldwide) DMBoK organized around • Primary data management functions focused around data delivery to the organization Data Management Functions PRODUCED BY CLASSIFICATION DATE SLIDE DATA BLUEPRINT 10124-C W. BROAD ST, GLEN ALLEN, VA 23060 EDUCATION 10/09/12 710/04/12 © Copyright this and previous years by Data Blueprint - all rights reserved!
  • 27. TITLE The DAMA Guide to the Data Management Body of Knowledge Published by DAMA International • The professional association for Data Managers (40 chapters worldwide) DMBoK organized around • Primary data management functions focused around data delivery to the organization • Organized around several environmental elements Data Management Functions PRODUCED BY CLASSIFICATION DATE SLIDE DATA BLUEPRINT 10124-C W. BROAD ST, GLEN ALLEN, VA 23060 EDUCATION 10/09/12 710/04/12 © Copyright this and previous years by Data Blueprint - all rights reserved!
  • 28. TITLE The DAMA Guide to the Data Management Body of Knowledge Published by DAMA International • The professional association for Data Managers (40 chapters worldwide) DMBoK organized around • Primary data management functions focused around data delivery to the organization • Organized around several environmental elements Data Management Functions PRODUCED BY CLASSIFICATION DATE SLIDE DATA BLUEPRINT 10124-C W. BROAD ST, GLEN ALLEN, VA 23060 EDUCATION 10/09/12 710/04/12 © Copyright this and previous years by Data Blueprint - all rights reserved!
  • 29. TITLE The DAMA Guide to the Data Management Body of Knowledge PRODUCED BY CLASSIFICATION DATE SLIDE DATA BLUEPRINT 10124-C W. BROAD ST, GLEN ALLEN, VA 23060 EDUCATION 10/09/12 810/04/12 © Copyright this and previous years by Data Blueprint - all rights reserved!
  • 30. TITLE The DAMA Guide to the Data Management Body of Knowledge Environmental Elements PRODUCED BY CLASSIFICATION DATE SLIDE DATA BLUEPRINT 10124-C W. BROAD ST, GLEN ALLEN, VA 23060 EDUCATION 10/09/12 810/04/12 © Copyright this and previous years by Data Blueprint - all rights reserved!
  • 31. TITLE The DAMA Guide to the Data Management Body of Knowledge Amazon: http:// www.amazon.com/ DAMA-Guide- Management- Knowledge-DAMA- DMBOK/dp/ 0977140083 Or enter the terms "dama dm bok" at the Amazon search engine Environmental Elements PRODUCED BY CLASSIFICATION DATE SLIDE DATA BLUEPRINT 10124-C W. BROAD ST, GLEN ALLEN, VA 23060 EDUCATION 10/09/12 810/04/12 © Copyright this and previous years by Data Blueprint - all rights reserved!
  • 32. TITLE What is the CDMP? • Certified Data Management Professional • DAMA International and ICCP • Membership in a distinct group made up of your fellow professionals • Recognition for your specialized knowledge in a choice of 17 specialty areas • Series of 3 exams • For more information, please visit: – http://www.dama.org/i4a/pages/ index.cfm?pageid=3399 – http://iccp.org/certification/ designations/cdmp #dataed PRODUCED BY CLASSIFICATION DATE SLIDE DATA BLUEPRINT 10124-C W. BROAD ST, GLEN ALLEN, VA 23060 EDUCATION 5/15/2012 9© Copyright this and previous years by Data Blueprint - all rights reserved!
  • 33. TITLE Data Management PRODUCED BY CLASSIFICATION DATE SLIDE DATA BLUEPRINT 10124-C W. BROAD ST, GLEN ALLEN, VA 23060 EDUCATION 10/09/12 101/26/201010/04/12 © Copyright this and previous years by Data Blueprint - all rights reserved!
  • 34. TITLE Data Management Data Program Coordination Organizational Data Integration Data Stewardship Data Development Data Support Operations PRODUCED BY CLASSIFICATION DATE SLIDE DATA BLUEPRINT 10124-C W. BROAD ST, GLEN ALLEN, VA 23060 EDUCATION 10/09/12 111/26/201010/04/12 © Copyright this and previous years by Data Blueprint - all rights reserved!
  • 35. TITLE Data Management Manage data coherently. Data Program Coordination Organizational Data Integration Data Stewardship Data Development Data Support Operations PRODUCED BY CLASSIFICATION DATE SLIDE DATA BLUEPRINT 10124-C W. BROAD ST, GLEN ALLEN, VA 23060 EDUCATION 10/09/12 111/26/201010/04/12 © Copyright this and previous years by Data Blueprint - all rights reserved!
  • 36. TITLE Data Management Manage data coherently. Data Program Coordination Share data across boundaries. Organizational Data Integration Data Stewardship Data Development Data Support Operations PRODUCED BY CLASSIFICATION DATE SLIDE DATA BLUEPRINT 10124-C W. BROAD ST, GLEN ALLEN, VA 23060 EDUCATION 10/09/12 111/26/201010/04/12 © Copyright this and previous years by Data Blueprint - all rights reserved!
  • 37. TITLE Data Management Manage data coherently. Data Program Coordination Share data across boundaries. Organizational Data Integration Data Stewardship Data Development Assign responsibilities for data. Data Support Operations PRODUCED BY CLASSIFICATION DATE SLIDE DATA BLUEPRINT 10124-C W. BROAD ST, GLEN ALLEN, VA 23060 EDUCATION 10/09/12 111/26/201010/04/12 © Copyright this and previous years by Data Blueprint - all rights reserved!
  • 38. TITLE Data Management Manage data coherently. Data Program Coordination Share data across boundaries. Organizational Data Integration Data Stewardship Data Development Assign responsibilities for data. Engineer data delivery systems. Data Support Operations PRODUCED BY CLASSIFICATION DATE SLIDE DATA BLUEPRINT 10124-C W. BROAD ST, GLEN ALLEN, VA 23060 EDUCATION 10/09/12 111/26/201010/04/12 © Copyright this and previous years by Data Blueprint - all rights reserved!
  • 39. TITLE Data Management Manage data coherently. Data Program Coordination Share data across boundaries. Organizational Data Integration Data Stewardship Data Development Assign responsibilities for data. Engineer data delivery systems. Data Support Operations Maintain data availability. PRODUCED BY CLASSIFICATION DATE SLIDE DATA BLUEPRINT 10124-C W. BROAD ST, GLEN ALLEN, VA 23060 EDUCATION 10/09/12 111/26/201010/04/12 © Copyright this and previous years by Data Blueprint - all rights reserved!
  • 40. TITLE Data Management PRODUCED BY CLASSIFICATION DATE SLIDE DATA BLUEPRINT 10124-C W. BROAD ST, GLEN ALLEN, VA 23060 EDUCATION 10/09/12 1210/04/12 © Copyright this and previous years by Data Blueprint - all rights reserved!
  • 41. TITLE Overview: Data Quality Engineering from The DAMA Guide to the Data Management Body of Knowledge © 2009 by DAMA International PRODUCED BY CLASSIFICATION DATE SLIDE DATA BLUEPRINT 10124-C W. BROAD ST, GLEN ALLEN, VA 23060 EDUCATION 10/09/12 131/26/201010/04/12 © Copyright this and previous years by Data Blueprint - all rights reserved!
  • 42. TITLE Overview: Data Quality Engineering from The DAMA Guide to the Data Management Body of Knowledge © 2009 by DAMA International PRODUCED BY CLASSIFICATION DATE SLIDE DATA BLUEPRINT 10124-C W. BROAD ST, GLEN ALLEN, VA 23060 EDUCATION 10/09/12 141/26/201010/04/12 © Copyright this and previous years by Data Blueprint - all rights reserved!
  • 43. TITLE Outline 1. Data Management Introduction 2. Data Quality Definitions & Overview 3. DQM Cycle 4. DQ Awareness & Requirements 5. DQ Dimensions 6. Data Quality Tools 7. Guiding Principles Tweeting now: 8. References and Q&A #dataed PRODUCED BY CLASSIFICATION DATE SLIDE DATA BLUEPRINT 10124-C W. BROAD ST, GLEN ALLEN, VA 23060 EDUCATION 10/09/12 1510/04/12 © Copyright this and previous years by Data Blueprint - all rights reserved!
  • 44. TITLE Outline 1. Data Management Introduction 2. Data Quality Definitions & Overview 3. DQM Cycle 4. DQ Awareness & Requirements 5. DQ Dimensions 6. Data Quality Tools 7. Guiding Principles Tweeting now: 8. References and Q&A #dataed PRODUCED BY CLASSIFICATION DATE SLIDE DATA BLUEPRINT 10124-C W. BROAD ST, GLEN ALLEN, VA 23060 EDUCATION 10/09/12 1510/04/12 © Copyright this and previous years by Data Blueprint - all rights reserved!
  • 45. TITLE Definitions from The DAMA Guide to the Data Management Body of Knowledge © 2009 by DAMA International PRODUCED BY CLASSIFICATION DATE SLIDE DATA BLUEPRINT 10124-C W. BROAD ST, GLEN ALLEN, VA 23060 EDUCATION 10/09/2012 10/09/12 1610/04/12 © Copyright this and previous years by Data Blueprint - all rights reserved!
  • 46. TITLE Definitions Data Quality Management from The DAMA Guide to the Data Management Body of Knowledge © 2009 by DAMA International PRODUCED BY CLASSIFICATION DATE SLIDE DATA BLUEPRINT 10124-C W. BROAD ST, GLEN ALLEN, VA 23060 EDUCATION 10/09/2012 10/09/12 1610/04/12 © Copyright this and previous years by Data Blueprint - all rights reserved!
  • 47. TITLE Definitions Data Quality Management • Planning, implementation and control activities that apply quality management techniques to measure, assess, improve, and ensure the fitness of data for use from The DAMA Guide to the Data Management Body of Knowledge © 2009 by DAMA International PRODUCED BY CLASSIFICATION DATE SLIDE DATA BLUEPRINT 10124-C W. BROAD ST, GLEN ALLEN, VA 23060 EDUCATION 10/09/2012 10/09/12 1610/04/12 © Copyright this and previous years by Data Blueprint - all rights reserved!
  • 48. TITLE Definitions Data Quality Management • Planning, implementation and control activities that apply quality management techniques to measure, assess, improve, and ensure the fitness of data for use • Entails the establishment and deployment of roles, responsibilities concerning the acquisition, maintenance, dissemination, and disposition of data.” http://www2.sas.com/proceedings/sugi29/098-29.pdf from The DAMA Guide to the Data Management Body of Knowledge © 2009 by DAMA International PRODUCED BY CLASSIFICATION DATE SLIDE DATA BLUEPRINT 10124-C W. BROAD ST, GLEN ALLEN, VA 23060 EDUCATION 10/09/2012 10/09/12 1610/04/12 © Copyright this and previous years by Data Blueprint - all rights reserved!
  • 49. TITLE Definitions Data Quality Management • Planning, implementation and control activities that apply quality management techniques to measure, assess, improve, and ensure the fitness of data for use • Entails the establishment and deployment of roles, responsibilities concerning the acquisition, maintenance, dissemination, and disposition of data.” http://www2.sas.com/proceedings/sugi29/098-29.pdf • Critical support process in organizational change management • Continuous process for defining the parameters for specifying acceptable levels of data quality to meet business needs and for ensuring that data quality meets these levels Data Quality • Synonymous with information quality, since poor data quality results in inaccurate information and poor business performance from The DAMA Guide to the Data Management Body of Knowledge © 2009 by DAMA International PRODUCED BY CLASSIFICATION DATE SLIDE DATA BLUEPRINT 10124-C W. BROAD ST, GLEN ALLEN, VA 23060 EDUCATION 10/09/2012 10/09/12 1610/04/12 © Copyright this and previous years by Data Blueprint - all rights reserved!
  • 50. TITLE Overview: DQM Concepts and Activities 1) Data Quality Management Approach 2) Develop and promote data quality awareness 3) Define data quality requirements 4) Profile, analyze and assess data quality 5) Define data quality metrics 6) Define data quality business rules 7) Test and validate data quality requirements 8) Set and evaluate data quality service levels 9) Measure and monitor data quality 10) Manage data quality issues 11) Clean and correct data quality defects 12) Design and implement operational DQM procedures 13) Monitor operational DQM procedures and performance from The DAMA Guide to the Data Management Body of Knowledge © 2009 by DAMA International PRODUCED BY CLASSIFICATION DATE SLIDE DATA BLUEPRINT 10124-C W. BROAD ST, GLEN ALLEN, VA 23060 EDUCATION 10/09/12 1710/04/12 © Copyright this and previous years by Data Blueprint - all rights reserved!
  • 51. TITLE Concepts and Activities from The DAMA Guide to the Data Management Body of Knowledge © 2009 by DAMA International PRODUCED BY CLASSIFICATION DATE SLIDE DATA BLUEPRINT 10124-C W. BROAD ST, GLEN ALLEN, VA 23060 EDUCATION 10/09/12 1810/04/12 © Copyright this and previous years by Data Blueprint - all rights reserved!
  • 52. TITLE Concepts and Activities Data quality expectations provide the inputs necessary to define the data quality framework: – Requirements – Inspection policies – Measures, and monitors that reflect changes in data quality and performance from The DAMA Guide to the Data Management Body of Knowledge © 2009 by DAMA International PRODUCED BY CLASSIFICATION DATE SLIDE DATA BLUEPRINT 10124-C W. BROAD ST, GLEN ALLEN, VA 23060 EDUCATION 10/09/12 1810/04/12 © Copyright this and previous years by Data Blueprint - all rights reserved!
  • 53. TITLE Concepts and Activities Data quality expectations provide the inputs necessary to define the data quality framework: – Requirements – Inspection policies – Measures, and monitors that reflect changes in data quality and performance • The data quality framework requirements reflect 3 aspects of business data expectations 1) A manner to record the expectation in business rules 2) A way to measure the quality of data within that dimension 3) An acceptability threshold from The DAMA Guide to the Data Management Body of Knowledge © 2009 by DAMA International PRODUCED BY CLASSIFICATION DATE SLIDE DATA BLUEPRINT 10124-C W. BROAD ST, GLEN ALLEN, VA 23060 EDUCATION 10/09/12 1810/04/12 © Copyright this and previous years by Data Blueprint - all rights reserved!
  • 54. TITLE Outline 1. Data Management Introduction 2. Data Quality Definitions & Overview 3. DQM Cycle 4. DQ Awareness & Requirements 5. DQ Dimensions 6. Data Quality Tools 7. Guiding Principles Tweeting now: 8. References and Q&A #dataed PRODUCED BY CLASSIFICATION DATE SLIDE DATA BLUEPRINT 10124-C W. BROAD ST, GLEN ALLEN, VA 23060 EDUCATION 10/09/12 1910/04/12 © Copyright this and previous years by Data Blueprint - all rights reserved!
  • 55. TITLE Outline 1. Data Management Introduction 2. Data Quality Definitions & Overview 3. DQM Cycle 4. DQ Awareness & Requirements 5. DQ Dimensions 6. Data Quality Tools 7. Guiding Principles Tweeting now: 8. References and Q&A #dataed PRODUCED BY CLASSIFICATION DATE SLIDE DATA BLUEPRINT 10124-C W. BROAD ST, GLEN ALLEN, VA 23060 EDUCATION 10/09/12 1910/04/12 © Copyright this and previous years by Data Blueprint - all rights reserved!
  • 56. TITLE The DQM Cycle The general approach to DQM is a version of the Deming cycle. Deming proposes a problem–solving model known as “plan-do-study-act” or “plan-do-check-act” The cycle begins by: 1) Identifying data issues that are critical to the achievement of business objectives 2) Defining business requirements for data quality 3) Identifying key data quality dimensions 4) Defining business rules critical to ensuring high quality data from The DAMA Guide to the Data Management Body of Knowledge © 2009 by DAMA International PRODUCED BY CLASSIFICATION DATE SLIDE DATA BLUEPRINT 10124-C W. BROAD ST, GLEN ALLEN, VA 23060 EDUCATION 10/09/12 2010/04/12 © Copyright this and previous years by Data Blueprint - all rights reserved!
  • 57. TITLE The DQM Cycle The general approach to DQM is a version of the Deming cycle. Deming proposes a problem–solving model known as “plan-do-study-act” or “plan-do-check-act” The cycle begins by: 1) Identifying data issues that are critical to the achievement of business objectives 2) Defining business requirements for data quality 3) Identifying key data quality dimensions 4) Defining business rules critical to ensuring high quality data from The DAMA Guide to the Data Management Body of Knowledge © 2009 by DAMA International PRODUCED BY CLASSIFICATION DATE SLIDE DATA BLUEPRINT 10124-C W. BROAD ST, GLEN ALLEN, VA 23060 EDUCATION 10/09/12 2010/04/12 © Copyright this and previous years by Data Blueprint - all rights reserved!
  • 58. TITLE The DQM Cycle: (1) Plan Plan for the assessment of the current state and identification of key metrics for measuring quality • The data quality team assesses the scope of known issues • This involves: – Determining cost and impact – Evaluating alternatives for addressing them from The DAMA Guide to the Data Management Body of Knowledge © 2009 by DAMA International PRODUCED BY CLASSIFICATION DATE SLIDE DATA BLUEPRINT 10124-C W. BROAD ST, GLEN ALLEN, VA 23060 EDUCATION 10/09/12 2110/04/12 © Copyright this and previous years by Data Blueprint - all rights reserved!
  • 59. TITLE The DQM Cycle: (2) Deploy Deploy processes for measuring and improving the quality of data: • Data profiling • Institute inspections and monitors to identify data issues when they occur • Fix flawed processes that are the root cause of data errors or correct errors downstream • When it is not possible to correct errors at their source, correct them at their earliest point in the data flow from The DAMA Guide to the Data Management Body of Knowledge © 2009 by DAMA International PRODUCED BY CLASSIFICATION DATE SLIDE DATA BLUEPRINT 10124-C W. BROAD ST, GLEN ALLEN, VA 23060 EDUCATION 10/09/12 2210/04/12 © Copyright this and previous years by Data Blueprint - all rights reserved!
  • 60. TITLE The DQM Cycle: (3) Monitor Monitor the quality of data as measured against the defined business rules • If data quality meets defined thresholds for acceptability, the processes are in control and the level of data quality meets the business requirements • If data quality falls below acceptability thresholds, notify data stewards so they can take action during the next stage from The DAMA Guide to the Data Management Body of Knowledge © 2009 by DAMA International PRODUCED BY CLASSIFICATION DATE SLIDE DATA BLUEPRINT 10124-C W. BROAD ST, GLEN ALLEN, VA 23060 EDUCATION 10/09/12 2310/04/12 © Copyright this and previous years by Data Blueprint - all rights reserved!
  • 61. TITLE The DQM Cycle: (4) Act Act to resolve any identified issues to improve data quality and better meet business expectations • New cycles begin as new data sets come under investigation or as new data quality requirements are identified for existing data sets from The DAMA Guide to the Data Management Body of Knowledge © 2009 by DAMA International PRODUCED BY CLASSIFICATION DATE SLIDE DATA BLUEPRINT 10124-C W. BROAD ST, GLEN ALLEN, VA 23060 EDUCATION 10/09/12 2410/04/12 © Copyright this and previous years by Data Blueprint - all rights reserved!
  • 62. TITLE Outline 1. Data Management Introduction 2. Data Quality Definitions & Overview 3. DQM Cycle 4. DQ Awareness & Requirements 5. DQ Dimensions 6. Data Quality Tools 7. Guiding Principles Tweeting now: 8. References and Q&A #dataed PRODUCED BY CLASSIFICATION DATE SLIDE DATA BLUEPRINT 10124-C W. BROAD ST, GLEN ALLEN, VA 23060 EDUCATION 10/09/12 2510/04/12 © Copyright this and previous years by Data Blueprint - all rights reserved!
  • 63. TITLE Outline 1. Data Management Introduction 2. Data Quality Definitions & Overview 3. DQM Cycle 4. DQ Awareness & Requirements 5. DQ Dimensions 6. Data Quality Tools 7. Guiding Principles Tweeting now: 8. References and Q&A #dataed PRODUCED BY CLASSIFICATION DATE SLIDE DATA BLUEPRINT 10124-C W. BROAD ST, GLEN ALLEN, VA 23060 EDUCATION 10/09/12 2510/04/12 © Copyright this and previous years by Data Blueprint - all rights reserved!
  • 64. TITLE Develop and Promote DQ Awareness from The DAMA Guide to the Data Management Body of Knowledge © 2009 by DAMA International PRODUCED BY CLASSIFICATION DATE SLIDE DATA BLUEPRINT 10124-C W. BROAD ST, GLEN ALLEN, VA 23060 EDUCATION 10/09/12 2610/04/12 © Copyright this and previous years by Data Blueprint - all rights reserved!
  • 65. TITLE Develop and Promote DQ Awareness • Promoting data quality awareness is essential to ensure buy-in of necessary stakeholders in the organization from The DAMA Guide to the Data Management Body of Knowledge © 2009 by DAMA International PRODUCED BY CLASSIFICATION DATE SLIDE DATA BLUEPRINT 10124-C W. BROAD ST, GLEN ALLEN, VA 23060 EDUCATION 10/09/12 2610/04/12 © Copyright this and previous years by Data Blueprint - all rights reserved!
  • 66. TITLE Develop and Promote DQ Awareness • Promoting data quality awareness is essential to ensure buy-in of necessary stakeholders in the organization • Ensure that the right people in the organization are aware of the existence of data quality issues from The DAMA Guide to the Data Management Body of Knowledge © 2009 by DAMA International PRODUCED BY CLASSIFICATION DATE SLIDE DATA BLUEPRINT 10124-C W. BROAD ST, GLEN ALLEN, VA 23060 EDUCATION 10/09/12 2610/04/12 © Copyright this and previous years by Data Blueprint - all rights reserved!
  • 67. TITLE Develop and Promote DQ Awareness • Promoting data quality awareness is essential to ensure buy-in of necessary stakeholders in the organization • Ensure that the right people in the organization are aware of the existence of data quality issues • Awareness increases the chance of success of any DQM program from The DAMA Guide to the Data Management Body of Knowledge © 2009 by DAMA International PRODUCED BY CLASSIFICATION DATE SLIDE DATA BLUEPRINT 10124-C W. BROAD ST, GLEN ALLEN, VA 23060 EDUCATION 10/09/12 2610/04/12 © Copyright this and previous years by Data Blueprint - all rights reserved!
  • 68. TITLE Develop and Promote DQ Awareness • Promoting data quality awareness is essential to ensure buy-in of necessary stakeholders in the organization • Ensure that the right people in the organization are aware of the existence of data quality issues • Awareness increases the chance of success of any DQM program • Awareness includes: – Relating material impacts to data issues – Ensuring systematic approaches to regulators – Oversight of the quality of organizational data – Socializing the concept that data quality problems cannot be solely addressed by technology solutions from The DAMA Guide to the Data Management Body of Knowledge © 2009 by DAMA International PRODUCED BY CLASSIFICATION DATE SLIDE DATA BLUEPRINT 10124-C W. BROAD ST, GLEN ALLEN, VA 23060 EDUCATION 10/09/12 2610/04/12 © Copyright this and previous years by Data Blueprint - all rights reserved!
  • 69. TITLE Polling Question #1 Which is not a step to promote data quality awareness? a) Training on the core concepts of data quality b) Establish data governance framework for data quality c) Create a data architecture map PRODUCED BY CLASSIFICATION DATE SLIDE DATA BLUEPRINT 10124-C W. BROAD ST, GLEN ALLEN, VA 23060 EDUCATION 10/09/12 2710/04/12 © Copyright this and previous years by Data Blueprint - all rights reserved!
  • 70. TITLE Develop and Promote DQ Awareness: Steps from The DAMA Guide to the Data Management Body of Knowledge © 2009 by DAMA International PRODUCED BY CLASSIFICATION DATE SLIDE DATA BLUEPRINT 10124-C W. BROAD ST, GLEN ALLEN, VA 23060 EDUCATION 10/09/12 2810/04/12 © Copyright this and previous years by Data Blueprint - all rights reserved!
  • 71. TITLE Develop and Promote DQ Awareness: Steps 1) Training on the core concepts of data quality from The DAMA Guide to the Data Management Body of Knowledge © 2009 by DAMA International PRODUCED BY CLASSIFICATION DATE SLIDE DATA BLUEPRINT 10124-C W. BROAD ST, GLEN ALLEN, VA 23060 EDUCATION 10/09/12 2810/04/12 © Copyright this and previous years by Data Blueprint - all rights reserved!
  • 72. TITLE Develop and Promote DQ Awareness: Steps 1) Training on the core concepts of data quality 2) Establish data governance framework for data quality from The DAMA Guide to the Data Management Body of Knowledge © 2009 by DAMA International PRODUCED BY CLASSIFICATION DATE SLIDE DATA BLUEPRINT 10124-C W. BROAD ST, GLEN ALLEN, VA 23060 EDUCATION 10/09/12 2810/04/12 © Copyright this and previous years by Data Blueprint - all rights reserved!
  • 73. TITLE Develop and Promote DQ Awareness: Steps 1) Training on the core concepts of data quality 2) Establish data governance framework for data quality 3) Create a data quality oversight board that has a reporting hierarchy associated with the different data governance roles from The DAMA Guide to the Data Management Body of Knowledge © 2009 by DAMA International PRODUCED BY CLASSIFICATION DATE SLIDE DATA BLUEPRINT 10124-C W. BROAD ST, GLEN ALLEN, VA 23060 EDUCATION 10/09/12 2810/04/12 © Copyright this and previous years by Data Blueprint - all rights reserved!
  • 74. TITLE Define DQ Requirements • Data quality must be understood within the context of ‘fitness for use’ • Data quality requirements are often hidden within defined business policies • Incremental detailed review and iterative refinement of business policies helps to identify those information requirements which become data quality rules • Steps for incremental detailed review: – Identify key data components associated with business policies – Determine how identified data assertions affect the business – Evaluate how data errors are categorized within a set of data quality dimensions – Specify the business rules that measure the occurrence of data errors – Provide a means for implementing measurement processes that assess conformance to those business rules from The DAMA Guide to the Data Management Body of Knowledge © 2009 by DAMA International PRODUCED BY CLASSIFICATION DATE SLIDE DATA BLUEPRINT 10124-C W. BROAD ST, GLEN ALLEN, VA 23060 EDUCATION 10/09/12 2910/04/12 © Copyright this and previous years by Data Blueprint - all rights reserved!
  • 75. TITLE Data Quality Dimensions from The DAMA Guide to the Data Management Body of Knowledge © 2009 by DAMA International PRODUCED BY CLASSIFICATION DATE SLIDE DATA BLUEPRINT 10124-C W. BROAD ST, GLEN ALLEN, VA 23060 EDUCATION 10/09/12 3010/04/12 © Copyright this and previous years by Data Blueprint - all rights reserved!
  • 76. TITLE Profile, Analyze and Assess DQ Data assessment using 2 different approaches: 1) Bottom-up 2) Top-down Bottom-up assessment: • Inspection and evaluation of the data sets • Highlight potential issues based on the results of automated processes Top-down assessment: • Engage business users to document their business processes and the corresponding critical data dependencies • Understand how their processes consume data and which data elements are critical to the success of the business application from The DAMA Guide to the Data Management Body of Knowledge © 2009 by DAMA International PRODUCED BY CLASSIFICATION DATE SLIDE DATA BLUEPRINT 10124-C W. BROAD ST, GLEN ALLEN, VA 23060 EDUCATION 10/09/12 3110/04/12 © Copyright this and previous years by Data Blueprint - all rights reserved!
  • 77. TITLE Define DQ Metrics • Metrics development occurs as part of the strategy/design/plan step • Process for defining data quality metrics: 1) Select one of the identified critical business impacts 2) Evaluate the dependent data elements, create and update processes associate with that business impact 3) List any associated data requirements 4) Specify the associated dimension of data quality and one or more business rules to use to determine conformance of the data to expectations 5) Describe the process for measuring conformance 6) Specify an acceptability threshold from The DAMA Guide to the Data Management Body of Knowledge © 2009 by DAMA International PRODUCED BY CLASSIFICATION DATE SLIDE DATA BLUEPRINT 10124-C W. BROAD ST, GLEN ALLEN, VA 23060 EDUCATION 10/09/12 3210/04/12 © Copyright this and previous years by Data Blueprint - all rights reserved!
  • 78. TITLE Test and Validate DQ Requirements • Data profiling tools analyze data to find potential anomalies • Use the same tools for rule validation • Rules discovered or defined during the data quality assessment phase are referenced in measuring conformance as part of the operational process from The DAMA Guide to the Data Management Body of Knowledge © 2009 by DAMA International PRODUCED BY CLASSIFICATION DATE SLIDE DATA BLUEPRINT 10124-C W. BROAD ST, GLEN ALLEN, VA 23060 EDUCATION 10/09/12 3310/04/12 © Copyright this and previous years by Data Blueprint - all rights reserved!
  • 79. TITLE Set and Evaluate DQ Service Levels • Data quality inspection and monitoring are used to measure and monitor compliance with defined data quality rules • Data quality SLAs specify the organization’s expectations for response and remediation • Operational data quality control defined in data quality SLAs includes: – Data elements covered by the agreement – Business impacts associated with data flaws – Data quality dimensions associated with each data element – Quality expectations for each data element of the indentified dimensions in each application for system in the value chain – Methods for measuring against those expectations – (…) from The DAMA Guide to the Data Management Body of Knowledge © 2009 by DAMA International PRODUCED BY CLASSIFICATION DATE SLIDE DATA BLUEPRINT 10124-C W. BROAD ST, GLEN ALLEN, VA 23060 EDUCATION 10/09/12 3410/04/12 © Copyright this and previous years by Data Blueprint - all rights reserved!
  • 80. TITLE Measure and Monitor DQ • DQM procedures depend on available data quality measuring and monitoring services • 2 contexts for control/measurement of conformance to data quality business rules exist: – In-stream: collect in-stream measurements while creating data – In batch: perform batch activities on collections of data instances assembled in a data set • Apply measurements at 3 levels of granularity: – Data element value – Data instance or record – Data set from The DAMA Guide to the Data Management Body of Knowledge © 2009 by DAMA International PRODUCED BY CLASSIFICATION DATE SLIDE DATA BLUEPRINT 10124-C W. BROAD ST, GLEN ALLEN, VA 23060 EDUCATION 10/09/12 3510/04/12 © Copyright this and previous years by Data Blueprint - all rights reserved!
  • 81. Clean & Correct Manage DQ Issues DQ Defects • Supporting the enforcement of Perform data correction the data quality SLA requires a mechanism for reporting and in 3 ways: tracking data quality incidents 1) Automated correction and activities for researching 2) Manual directed correction and resolving those incidents 3) Manual correction • A data quality incident reporting system can provide this capability • It can log the evaluation, initial diagnosis, and actions associated with data quality events from The DAMA Guide to the Data Management Body of Knowledge © 2009 by DAMA International PRODUCED BY CLASSIFICATION DATE SLIDE DATA BLUEPRINT 10124-C W. BROAD ST, GLEN ALLEN, VA 23060 EDUCATION 10/09/12 3610/04/12 © Copyright this and previous years by Data Blueprint - all rights reserved!
  • 82. Manage DQ Issues: Example TITLE Data quality incident tracking focuses on training staff to recognize when data issues appear and how they are to be classified, logged and tracked according to the data quality SLA from The DAMA Guide to the Data Management Body of Knowledge © 2009 by DAMA International PRODUCED BY CLASSIFICATION DATE SLIDE DATA BLUEPRINT 10124-C W. BROAD ST, GLEN ALLEN, VA 23060 EDUCATION 10/09/12 3710/04/12 © Copyright this and previous years by Data Blueprint - all rights reserved!
  • 83. Design and Implement Monitor Operational Operational DQM DQM Procedures and Procedures Performances 1) Inspection and monitoring 1) Accountability is critical 2) Diagnosis and evaluation to governance of remediation protocols overseeing alternatives data quality control 3) Resolve issues 2) All issues must be 4) Reporting assigned 3) The tracking process should specify and document the ultimate issue accountability from The DAMA Guide to the Data Management Body of Knowledge © 2009 by DAMA International PRODUCED BY CLASSIFICATION DATE SLIDE DATA BLUEPRINT 10124-C W. BROAD ST, GLEN ALLEN, VA 23060 EDUCATION 10/09/12 3810/04/12 © Copyright this and previous years by Data Blueprint - all rights reserved!
  • 84. TITLE Outline 1. Data Management Introduction 2. Data Quality Definitions & Overview 3. DQM Cycle 4. DQ Awareness & Requirements 5. DQ Dimensions 6. Data Quality Tools 7. Guiding Principles Tweeting now: 8. References and Q&A #dataed PRODUCED BY CLASSIFICATION DATE SLIDE DATA BLUEPRINT 10124-C W. BROAD ST, GLEN ALLEN, VA 23060 EDUCATION 10/09/12 3910/04/12 © Copyright this and previous years by Data Blueprint - all rights reserved!
  • 85. TITLE Outline 1. Data Management Introduction 2. Data Quality Definitions & Overview 3. DQM Cycle 4. DQ Awareness & Requirements 5. DQ Dimensions 6. Data Quality Tools 7. Guiding Principles Tweeting now: 8. References and Q&A #dataed PRODUCED BY CLASSIFICATION DATE SLIDE DATA BLUEPRINT 10124-C W. BROAD ST, GLEN ALLEN, VA 23060 EDUCATION 10/09/12 3910/04/12 © Copyright this and previous years by Data Blueprint - all rights reserved!
  • 86. TITLE Example: Data Quality Interview Session Summary • During mid-February, the Data Governance Team and Data Blueprint conducted ten qualitative interview sessions with groups of individuals who interact with data on regular basis • A series of patterns emerged as participants shared stories about the impact of poor data quality on the client, its products, and its customers • These patterns highlight gaps in best practices for ensuring data quality, i.e. the extent to which data is “fit for use” • Our preliminary analysis evaluated these stories against attributes of four data quality dimensions • At this early stage of the post-interview process, we are seeking confirmation of our assumptions and method PRODUCED BY CLASSIFICATION DATE SLIDE DATA BLUEPRINT 10124-C W. BROAD ST, GLEN ALLEN, VA 23060 EDUCATION 10/09/12 4010/04/12 © Copyright this and previous years by Data Blueprint - all rights reserved!
  • 87. TITLE Which Activities Support Quality Data? • Data quality best practices depend on both – Practice-oriented activities – Structure-oriented activities Quality Practice-oriented Data Structure-oriented activities focus on activities focus on the capture and the data manipulation of data implementation PRODUCED BY CLASSIFICATION DATE SLIDE DATA BLUEPRINT 10124-C W. BROAD ST, GLEN ALLEN, VA 23060 EDUCATION 10/09/12 4110/04/12 © Copyright this and previous years by Data Blueprint - all rights reserved!
  • 88. TITLE Quality Dimensions Practice-oriented causes • Stem from a failure to rigor when capturing and manipulating data such as: – Edit masking – Range checking of input data – CRC-checking of transmitted data Structure-oriented causes • Occur because of data and metadata that has been arranged imperfectly. For example: – When the data is in the system but we just cant access it; – When a correct data value is provided as the wrong response to a query; or – When data is not provided because it is unavailable or inaccessible to the customer • Developer focus within system boundaries instead of within organization boundaries PRODUCED BY CLASSIFICATION DATE SLIDE DATA BLUEPRINT 10124-C W. BROAD ST, GLEN ALLEN, VA 23060 EDUCATION 10/09/12 4210/04/12 © Copyright this and previous years by Data Blueprint - all rights reserved!
  • 89. TITLE Practice-Oriented Activities • Affect the Data Value Quality and Data Representation Quality • Examples of improper practice-oriented activities: – Allowing imprecise or incorrect data to be collected when requirements specify otherwise – Presenting data out of sequence • Typically diagnosed in bottom-up manner: find and fix the resulting problem • Addressed by imposing more rigorous data-handling governance Practice-oriented activities Quality of Quality of Data Values Data Representatio n PRODUCED BY CLASSIFICATION DATE SLIDE DATA BLUEPRINT 10124-C W. BROAD ST, GLEN ALLEN, VA 23060 EDUCATION 10/09/12 4310/04/12 © Copyright this and previous years by Data Blueprint - all rights reserved!
  • 90. TITLE Structure-Oriented Activities • Affect the Data Model Quality and Data Architecture Quality • Examples of improper structure-oriented activities: – Providing a correct response but incomplete data to a query because the user did not comprehend the system data structure – Costly maintenance of inconsistent data used by redundant systems • Typically diagnosed in top-down manner: root cause fixes • Addressed through fundamental data structure governance Structure-oriented activities Quality of Quality of Data Models Data Architecture PRODUCED BY CLASSIFICATION DATE SLIDE DATA BLUEPRINT 10124-C W. BROAD ST, GLEN ALLEN, VA 23060 EDUCATION 10/09/12 4410/04/12 © Copyright this and previous years by Data Blueprint - all rights reserved!
  • 91. TITLE 4 Dimensions of Data Quality An organization’s overall data quality is a function of four distinct components, each with its own attributes: • Data Value: the quality of data as stored & maintained in the system Practice- oriented • Data Representation – the quality of representation for stored values; perfect data values stored in a system that are inappropriately represented can be harmful • Data Model – the quality of data logically representing user requirements related to data entities, associated attributes, and their relationships; essential for effective Structure- communication among data suppliers and consumers oriented • Data Architecture – the coordination of data management activities in cross-functional system development and operations PRODUCED BY CLASSIFICATION DATE SLIDE DATA BLUEPRINT 10124-C W. BROAD ST, GLEN ALLEN, VA 23060 EDUCATION 10/09/2012 10/09/12 4510/04/12 © Copyright this and previous years by Data Blueprint - all rights reserved!
  • 92. TITLE Effective Data Quality Engineering • Data quality engineering has been focused on operational problem correction – Directing attention to practice-oriented data imperfections • Data quality engineering is more effective when also focused on structure-oriented causes – Ensuring the quality of shared data across system boundaries (closer to the user) (closer to the architect) Data Data Value Data Data Model Representatio Quality Architecture Quality n Quality Quality As an As understood As presented As maintained organizational by developers to the user in the system asset PRODUCED BY CLASSIFICATION DATE SLIDE DATA BLUEPRINT 10124-C W. BROAD ST, GLEN ALLEN, VA 23060 EDUCATION 10/09/12 4610/04/12 © Copyright this and previous years by Data Blueprint - all rights reserved!
  • 93. TITLE Full Set of Data Quality Attributes PRODUCED BY CLASSIFICATION DATE SLIDE DATA BLUEPRINT 10124-C W. BROAD ST, GLEN ALLEN, VA 23060 EDUCATION 10/09/12 4710/04/12 © Copyright this and previous years by Data Blueprint - all rights reserved!
  • 94. TITLE Data Value Quality PRODUCED BY CLASSIFICATION DATE SLIDE DATA BLUEPRINT 10124-C W. BROAD ST, GLEN ALLEN, VA 23060 EDUCATION 10/09/12 4810/04/12 © Copyright this and previous years by Data Blueprint - all rights reserved!
  • 95. TITLE Data Representation Quality PRODUCED BY CLASSIFICATION DATE SLIDE DATA BLUEPRINT 10124-C W. BROAD ST, GLEN ALLEN, VA 23060 EDUCATION 10/09/12 4910/04/12 © Copyright this and previous years by Data Blueprint - all rights reserved!
  • 96. TITLE Data Model Quality PRODUCED BY CLASSIFICATION DATE SLIDE DATA BLUEPRINT 10124-C W. BROAD ST, GLEN ALLEN, VA 23060 EDUCATION 10/09/12 5010/04/12 © Copyright this and previous years by Data Blueprint - all rights reserved!
  • 97. TITLE Data Architecture Quality PRODUCED BY CLASSIFICATION DATE SLIDE DATA BLUEPRINT 10124-C W. BROAD ST, GLEN ALLEN, VA 23060 EDUCATION 10/09/12 5110/04/12 © Copyright this and previous years by Data Blueprint - all rights reserved!
  • 98. TITLE Extended data life cycle model with metadata sources and uses Starting point Metadata Refinement Metadata Creation for new • Define Data Architecture • Correct Structural Defects system • Update Implementation • Define Data Model Structures development architecture data architecture refinements Metadata Structuring Data Refinement • Implement Data Model Views • Correct Data Value Defects • Populate Data Model Views corrected • Re-store Data Values data data architecture and Metadata & data models Data Storage data performance metadata Data Creation facts & Data Assessment • Create Data meanings • Assess Data Values • Verify Data Values • Assess Metadata shared data updated data Starting point for existing Data Utilization Data Manipulation systems • Inspect Data • Manipulate Data • Present Data • Updata Data PRODUCED BY CLASSIFICATION DATE SLIDE DATA BLUEPRINT 10124-C W. BROAD ST, GLEN ALLEN, VA 23060 EDUCATION 10/09/12 5210/04/12 © Copyright this and previous years by Data Blueprint - all rights reserved!
  • 99. TITLE Data Quality Engineering                                                                from The DAMA Guide to the Data Management Body of Knowledge © 2009 by DAMA International PRODUCED BY CLASSIFICATION DATE SLIDE DATA BLUEPRINT 10124-C W. BROAD ST, GLEN ALLEN, VA 23060 EDUCATION 10/09/12 531/26/201010/04/12 © Copyright this and previous years by Data Blueprint - all rights reserved!
  • 100. Goals and Principles TITLE  To measurably improve the quality of data in relation to defined business expectations  To define requirements and specifications for integrating data quality control into the system development life cycle  To provide defined processes for measuring, monitoring, and reporting conformance to acceptable levels of data quality from The DAMA Guide to the Data Management Body of Knowledge © 2009 by DAMA International PRODUCED BY CLASSIFICATION DATE SLIDE DATA BLUEPRINT 10124-C W. BROAD ST, GLEN ALLEN, VA 23060 EDUCATION 10/09/12 541/26/201010/04/12 © Copyright this and previous years by Data Blueprint - all rights reserved!
  • 101. TITLE Activities • Develop and Promote Data Quality Awareness • Set and Evaluate Data Quality Service Levels • Test and Validate Data Quality Requirements • Profile, Analyze, and Assess Data Quality • Continuously Measure and Monitor Data Quality • Monitor Operational DQM Procedures and Performance • Define Data Quality Business Rules • Define Data Quality Metrics • Manage Data Quality Issues • Clean and Correct Data Quality Defects • Define Data Quality Requirements • Design and Implement Operational DQM Procedures from The DAMA Guide to the Data Management Body of Knowledge © 2009 by DAMA International PRODUCED BY CLASSIFICATION DATE SLIDE DATA BLUEPRINT 10124-C W. BROAD ST, GLEN ALLEN, VA 23060 EDUCATION 10/09/12 551/26/201010/04/12 © Copyright this and previous years by Data Blueprint - all rights reserved!
  • 102. TITLE Primary Deliverables • Improved Quality Data • Data Management Operational Analysis • Data profiles • Data Quality Certification Reports • Data Quality Service Level Agreements from The DAMA Guide to the Data Management Body of Knowledge © 2009 by DAMA International PRODUCED BY CLASSIFICATION DATE SLIDE DATA BLUEPRINT 10124-C W. BROAD ST, GLEN ALLEN, VA 23060 EDUCATION 10/09/12 561/26/201010/04/12 © Copyright this and previous years by Data Blueprint - all rights reserved!
  • 103. TITLE Roles and Responsibilities Suppliers:  External Sources  Regulatory Bodies  Business Subject Matter Experts  Information Consumers  Data Producers  Data Architects  Data Modelers  Data Stewards Participants: Consumers:  Data Quality Analysts  Data Stewards  Data Analysts  Data Professionals  Database Administrators  Other IT Professionals  Data Stewards  Knowledge Workers  Other Data Professionals  Managers and  DRM Director Executives  Data Stewardship Council  Customers from The DAMA Guide to the Data Management Body of Knowledge © 2009 by DAMA International PRODUCED BY CLASSIFICATION DATE SLIDE DATA BLUEPRINT 10124-C W. BROAD ST, GLEN ALLEN, VA 23060 EDUCATION 10/09/12 571/26/201010/04/12 © Copyright this and previous years by Data Blueprint - all rights reserved!
  • 104. TITLE Polling Question #2 What is one guiding principle for data quality? a. Business process owners will agree to and abide by data quality SLAs a. Identify a blue record for all data elements a. Upstream data consumers specific data quality expectations PRODUCED BY CLASSIFICATION DATE SLIDE DATA BLUEPRINT 10124-C W. BROAD ST, GLEN ALLEN, VA 23060 EDUCATION 10/09/12 5810/04/12 © Copyright this and previous years by Data Blueprint - all rights reserved!
  • 105. TITLE Outline 1. Data Management Introduction 2. Data Quality Definitions & Overview 3. DQM Cycle 4. DQ Awareness & Requirements 5. DQ Dimensions 6. Data Quality Tools 7. Guiding Principles Tweeting now: 8. References and Q&A #dataed PRODUCED BY CLASSIFICATION DATE SLIDE DATA BLUEPRINT 10124-C W. BROAD ST, GLEN ALLEN, VA 23060 EDUCATION 10/09/12 5910/04/12 © Copyright this and previous years by Data Blueprint - all rights reserved!
  • 106. TITLE Outline 1. Data Management Introduction 2. Data Quality Definitions & Overview 3. DQM Cycle 4. DQ Awareness & Requirements 5. DQ Dimensions 6. Data Quality Tools 7. Guiding Principles Tweeting now: 8. References and Q&A #dataed PRODUCED BY CLASSIFICATION DATE SLIDE DATA BLUEPRINT 10124-C W. BROAD ST, GLEN ALLEN, VA 23060 EDUCATION 10/09/12 5910/04/12 © Copyright this and previous years by Data Blueprint - all rights reserved!
  • 107. TITLE Technology • Data Profiling Tools • Statistical Analysis Tools • Data Cleansing Tools • Data Integration Tools • Issue and Event Management Tools from The DAMA Guide to the Data Management Body of Knowledge © 2009 by DAMA International PRODUCED BY CLASSIFICATION DATE SLIDE DATA BLUEPRINT 10124-C W. BROAD ST, GLEN ALLEN, VA 23060 EDUCATION 10/09/12 6010/04/12 © Copyright this and previous years by Data Blueprint - all rights reserved!
  • 108. TITLE Overview: Data Quality Tools 4 categories of Principal tools: activities: 1) Data Profiling 1) Analysis 2) Parsing and 2) Cleansing Standardization 3) Enhancement 3) Data Transformation 4) Monitoring 4) Identity Resolution and Matching 5) Enhancement 6) Reporting from The DAMA Guide to the Data Management Body of Knowledge © 2009 by DAMA International PRODUCED BY CLASSIFICATION DATE SLIDE DATA BLUEPRINT 10124-C W. BROAD ST, GLEN ALLEN, VA 23060 EDUCATION 10/09/12 6110/04/12 © Copyright this and previous years by Data Blueprint - all rights reserved!
  • 109. TITLE DQ Tool #1: Data Profiling • Data profiling is the assessment of value distribution and clustering of values into domains • Need to be able to distinguish between good and bad data before making any improvements • Data profiling is a set of algorithms for 2 purposes: – Statistical analysis and assessment of the data quality values within a data set – Exploring relationships that exist between value collections within and across data sets • At its most advanced, data profiling takes a series of prescribed rules from data quality engines. It then assesses the data, annotates and tracks violations to determine if they comprise new or inferred data quality rules PRODUCED BY CLASSIFICATION DATE SLIDE DATA BLUEPRINT 10124-C W. BROAD ST, GLEN ALLEN, VA 23060 EDUCATION 10/09/12 6210/04/12 © Copyright this and previous years by Data Blueprint - all rights reserved!
  • 110. TITLE DQ Tool #1: Data Profiling, cont’d • Data profiling vs. data quality-business context and semantic/logical layers – Data quality is concerned with proscriptive rules – Data profiling looks for patterns when rules are adhered to and when rules are violated; able to provide input into the business context layer • Incumbent that data profiling services notify all concerned parties of whatever is discovered • Profiling can be used to… – …notify the help desk that valid changes in the data are about to case an avalanche of “skeptical user” calls – …notify business analysts of precisely where they should be working today in terms of shifts in the data PRODUCED BY CLASSIFICATION DATE SLIDE DATA BLUEPRINT 10124-C W. BROAD ST, GLEN ALLEN, VA 23060 EDUCATION 10/09/12 6310/04/12 © Copyright this and previous years by Data Blueprint - all rights reserved!
  • 111. TITLE DQ Tool #2: Parsing & Standardization • Data parsing tools enable the definition of patterns that feed into a rules engine used to distinguish between valid and invalid data values • Actions are triggered upon matching a specific pattern • When an invalid pattern is recognized, the application may attempt to transform the invalid value into one that meets expectations • Data standardization is the process of conforming to a set of business rules and formats that are set up by data stewards and administrators • Data standardization example: – Brining all the different formats of “street” into a single format, e.g. “STR”, “ST.”, “STRT”, “STREET”, etc. PRODUCED BY CLASSIFICATION DATE SLIDE DATA BLUEPRINT 10124-C W. BROAD ST, GLEN ALLEN, VA 23060 EDUCATION 10/09/12 6410/04/12 © Copyright this and previous years by Data Blueprint - all rights reserved!
  • 112. TITLE DQ Tool #3: Data Transformation • Upon identification of data errors, trigger data rules to transform the flawed data • Perform standardization and guide rule-based transformations by mapping data values in their original formats and patterns into a target representation • Parsed components of a pattern are subjected to rearrangement, corrections, or any changes as directed by the rules in the knowledge base PRODUCED BY CLASSIFICATION DATE SLIDE DATA BLUEPRINT 10124-C W. BROAD ST, GLEN ALLEN, VA 23060 EDUCATION 10/09/12 6510/04/12 © Copyright this and previous years by Data Blueprint - all rights reserved!
  • 113. TITLE DQ Tool #4: Identify Resolution & Matching • Data matching enables analysts to identify relationships between records for de-duplication or group-based processing • Matching is central to maintaining data consistency and integrity throughout the enterprise • The matching process should be used in the initial data migration of data into a single repository 2 basic approaches to matching: • Deterministic – Relies on defined patterns/rules for assigning weights and scores to determine similarity – Predictable – Dependent on rules developers anticipations • Probabilistic – Relies on statistical techniques for assessing the probability that any pair of record represents the same entity – Not reliant on rules – Probabilities can be refined based on experience -> matchers can improve precision as more data is analyzed PRODUCED BY CLASSIFICATION DATE SLIDE DATA BLUEPRINT 10124-C W. BROAD ST, GLEN ALLEN, VA 23060 EDUCATION 10/09/12 6610/04/12 © Copyright this and previous years by Data Blueprint - all rights reserved!
  • 114. TITLE DQ Tool #5: Enhancement Definition: Examples of data • A method for adding value to enhancements: information by accumulating • Time/date stamps additional information about a • Auditing information base set of entities and then merging all the sets of • Contextual information information to provide a focused • Geographic information view. Improves master data. • Demographic information Benefits: • Psychographic information • Enables use of third party data sources • Allows you to take advantage of the information and research carried out by external data vendors to make data more meaningful and useful PRODUCED BY CLASSIFICATION DATE SLIDE DATA BLUEPRINT 10124-C W. BROAD ST, GLEN ALLEN, VA 23060 EDUCATION 10/09/12 6710/04/12 © Copyright this and previous years by Data Blueprint - all rights reserved!
  • 115. TITLE DQ Tool #6: Reporting • Good reporting supports: – Inspection and monitoring of conformance to data quality expectations – Monitoring performance of data stewards conforming to data quality SLAs – Workflow processing for data quality incidents – Manual oversight of data cleansing and correction • Data quality tools provide dynamic reporting and monitoring capabilities • Enables analyst and data stewards to support and drive the methodology for ongoing DQM and improvement with a single, easy-to-use solution • Associate report results with: – Data quality measurement – Metrics – Activity PRODUCED BY CLASSIFICATION DATE SLIDE DATA BLUEPRINT 10124-C W. BROAD ST, GLEN ALLEN, VA 23060 EDUCATION 10/09/12 6810/04/12 © Copyright this and previous years by Data Blueprint - all rights reserved!
  • 116. TITLE Outline 1. Data Management Introduction 2. Data Quality Definitions & Overview 3. DQM Cycle 4. DQ Awareness & Requirements 5. DQ Dimensions 6. Data Quality Tools 7. Guiding Principles Tweeting now: 8. References and Q&A #dataed PRODUCED BY CLASSIFICATION DATE SLIDE DATA BLUEPRINT 10124-C W. BROAD ST, GLEN ALLEN, VA 23060 EDUCATION 10/09/12 6910/04/12 © Copyright this and previous years by Data Blueprint - all rights reserved!
  • 117. TITLE Outline 1. Data Management Introduction 2. Data Quality Definitions & Overview 3. DQM Cycle 4. DQ Awareness & Requirements 5. DQ Dimensions 6. Data Quality Tools 7. Guiding Principles Tweeting now: 8. References and Q&A #dataed PRODUCED BY CLASSIFICATION DATE SLIDE DATA BLUEPRINT 10124-C W. BROAD ST, GLEN ALLEN, VA 23060 EDUCATION 10/09/12 6910/04/12 © Copyright this and previous years by Data Blueprint - all rights reserved!
  • 118. Guiding Principles TITLE 1) Manage data as a core organizational asset. 2) Identify a gold record for all data elements 3) All data elements will have a standardized data definition, data type, and acceptable value domain 4) Leverage data governance for the control and performance of DQM 5) Use industry and international data standards whenever possible 6) Downstream data consumers specify data quality expectations 7) Define business rules to assert conformance to data quality expectations 8) Validate data instances and data sets against defined business rules 9) Business process owners will agree to and abide by data quality SLAs 10) Apply data corrections at the original source if possible 11) If it is not possible to correct data at the source, forward data corrections to the owner of the original source. Influence on data brokers to conform to local requirements may be limited 12) Report measured levels of data quality to appropriate data stewards, business process owners, and SLA managers from The DAMA Guide to the Data Management Body of Knowledge © 2009 by DAMA International PRODUCED BY CLASSIFICATION DATE SLIDE DATA BLUEPRINT 10124-C W. BROAD ST, GLEN ALLEN, VA 23060 EDUCATION 10/09/12 7010/04/12 © Copyright this and previous years by Data Blueprint - all rights reserved!
  • 119. TITLE Interdependencies - Tools alone cannot do the job! Education and Training (People) Data Cleansing and Prevention Data Quality Tools (Process) (Technology) PRODUCED BY CLASSIFICATION DATE SLIDE DATA BLUEPRINT 10124-C W. BROAD ST, GLEN ALLEN, VA 23060 EDUCATION 10/09/1210/04/12 © Copyright this and previous years by Data Blueprint - all rights reserved!
  • 120. TITLE Summary: Data Quality Engineering from The DAMA Guide to the Data Management Body of Knowledge © 2009 by DAMA International PRODUCED BY CLASSIFICATION DATE SLIDE DATA BLUEPRINT 10124-C W. BROAD ST, GLEN ALLEN, VA 23060 EDUCATION 10/09/12 721/26/201010/04/12 © Copyright this and previous years by Data Blueprint - all rights reserved!
  • 121. TITLE Outline 1. Data Management Introduction 2. Data Quality Definitions & Overview 3. DQM Cycle 4. DQ Awareness & Requirements 5. DQ Dimensions 6. Data Quality Tools 7. Guiding Principles Tweeting now: 8. References and Q&A #dataed PRODUCED BY CLASSIFICATION DATE SLIDE DATA BLUEPRINT 10124-C W. BROAD ST, GLEN ALLEN, VA 23060 EDUCATION 10/09/12 7310/04/12 © Copyright this and previous years by Data Blueprint - all rights reserved!
  • 122. TITLE Outline 1. Data Management Introduction 2. Data Quality Definitions & Overview 3. DQM Cycle 4. DQ Awareness & Requirements 5. DQ Dimensions 6. Data Quality Tools 7. Guiding Principles Tweeting now: 8. References and Q&A #dataed PRODUCED BY CLASSIFICATION DATE SLIDE DATA BLUEPRINT 10124-C W. BROAD ST, GLEN ALLEN, VA 23060 EDUCATION 10/09/12 7310/04/12 © Copyright this and previous years by Data Blueprint - all rights reserved!
  • 123. TITLE Recommended Reading PRODUCED BY CLASSIFICATION DATE SLIDE DATA BLUEPRINT 10124-C W. BROAD ST, GLEN ALLEN, VA 23060 EDUCATION 10/09/12 7410/04/12 © Copyright this and previous years by Data Blueprint - all rights reserved!
  • 124. TITLE Questions? + = It’s your turn! Use the chat feature or Twitter (#dataed) to submit your questions to Peter now. PRODUCED BY CLASSIFICATION DATE SLIDE DATA BLUEPRINT 10124-C W. BROAD ST, GLEN ALLEN, VA 23060 EDUCATION 10/09/12 7510/04/12 © Copyright this and previous years by Data Blueprint - all rights reserved!
  • 125. TITLE Upcoming Events November Webinar: Get the Most Out of Your Tools: Data Management Technologies November 13, 2012 @ 2:00 PM – 3:30 PM ET (11:00 AM-12:30 PM PT) December Webinar: Show Me the Money: The Business Value of Data and ROI December 11, 2012 @ 2:00 PM – 3:30 PM ET (11:00 AM-12:30 PM PT) Sign up here: • www.datablueprint.com/webinar-schedule • www.Dataversity.net Brought to you by: PRODUCED BY CLASSIFICATION DATE SLIDE DATA BLUEPRINT 10124-C W. BROAD ST, GLEN ALLEN, VA 23060 EDUCATION 10/09/12 7610/04/12 © Copyright this and previous years by Data Blueprint - all rights reserved!