Data quality architecture

6,419 views

Published on

The following is a simple presentation of a data quality framework. For details see
www.data4USA.com

Published in: Technology

Data quality architecture

  1. 1. Data Quality Architecture Phase 1 – Account Verification Art Nicewick
  2. 2. Project Scope• Define a Architectural flow diagram that provides for basis for data governance, data quality and impact analysis• Create a framework to report on inconsistencies in data (Initial emphasis on Accounts)
  3. 3. FISMA• The architecture provides a foundation for verifying that Accounts are deleted after the employees leave the Gallery• The Exceptions Facility, Provides the ability for a application administrator to request that an Non-AD account be left on file – Audit trails – Non Standard accounts (e.g. TDP as Custodian) – CIO can approvedeny and give timelines for resolutions• Focus of first phase of the initiative
  4. 4. Why Consistency Reports• Common Practice (Asset Inventory, …)• Ensures that data is corrected in the correct manner• Re-validates automated processes• Some changes need to be informed to system manager (e.g. They should know if someone has a new last name)• Links into existing manual pratices
  5. 5. General Data Quality Process1. Identify data stores (Based on priority)2. Identify authoritative data3. Identify Interfaces replicated redundant data4. Identify consistency analysis process5. Correct and continuous monitoring
  6. 6. Identify data stores• 1.1. Create list of all know data applications – Define the name of the data application – Define the contacts related to the application • TDP Contact • Application Administrator – Categorize the application
  7. 7. Identify data stores• 1.2. Link data into data flow representation for a visual analysis on enterprise data flows
  8. 8. Identify authoritative data• 2.1. Review Application data to determine – What type of data is supported – Is data authoritative
  9. 9. Identify Interfaces replicated redundant data• 2.1. Review Application data to determine – Where the data is sent – Where the data is received from – Data Quality – Note: Source assumed by reverse lookup of target definitions
  10. 10. Identify Interfaces replicated redundant data• Diagram linkages between data stores for visual review and impact analysis
  11. 11. Identify consistency analysis process Review participating data sources and determine how to define consistency* At this point only “SQL” methods are used.
  12. 12. Correct and continuous monitoring• Inconsistencies are periodically sent to end users for “correction” or “exceptions”• Valid exceptions may be – “Supervisor Accounts Outside Active Directory” (e.g. TMSAdmin) – Ex-Employees with data attached to userid – Contractor or testing userid
  13. 13. Correct and continuous monitoring• Users can review and update exceptions online
  14. 14. Correct and continuous monitoring• Administrators can create schedules and Email recipients
  15. 15. Correct and continuous monitoring• Email can be sent to as many people as desired and as frequently (or infrequently) as desired.
  16. 16. Target Data• Userids (First Phase and Proof of concept)• Object Data• Location data• Employee Names and Titles• Other ..
  17. 17. Challenges• Object data (Portfolio)• Non-SQL Data (Filemaker)• Secure Data (Tradewin)• Desktop Data (Excel)• Offsite data (FMS)• Other …

×