Managing Enterprise Data Quality
using SAP Information Steward
Vinny Ahuja, Cheryl Johnson
Intel Corporation
SESSION CODE: BI593
Disclaimer
This presentation is for informational purposes only. INTEL MAKES NO WARRANTIES, EXPRESS OR IMPLIED, IN
THIS SUMMARY.
Software and workloads used in performance tests may have been optimized for performance only on Intel
microprocessors. Performance tests, such as SYSmark and MobileMark, are measured using specific computer
systems, components, software, operations and functions. Any change to any of those factors may cause the results to
vary. You should consult other information and performance tests to assist you in fully evaluating your contemplated
purchases, including the performance of that product when combined with other products.
For more complete information about performance and benchmark results, visit www.intel.com/benchmarks
Intel and the Intel logo are trademarks of Intel Corporation in the U.S. and/or other countries.
For a list of Intel trademarks, go to http://legal.intel.com/Trademarks/NamesDb.htm]
* Other names and brands may be claimed as the property of others.
Copyright © 2015, Intel Corporation. All rights reserved.
 Data Quality(DQ) challenges within information
pipeline for Business Intelligence (BI)
 Provide visibility into DQ issues within a
heterogeneous landscape
 Role of Information Steward in addressing DQ
 Share implementation experience
 Use DQ tool as a regression test tool
Learning Points
About Intel
 Data quality starts with systems of record
 Data movement can introduce data quality issues
 Don’t wait for customer to find data quality issues
 Build instrumentation in pipeline to monitor quality
Business Intelligence Information Pipeline
Source
Systems Operational
Data Store (ODS)
Extract Transform
& Load (ETL)
Enterprise
DataWarehouse
EDW
Data marts BI Platforms
 Accuracy
 Data was entered or derived correctly as measured by a
physical assessment
 Completeness
 Data is not missing
 Consistency
 Data that should be the same in various systems is, in fact,
the same
 Timeliness
 Data is available for use when the business requires it
 Validity
 Data conforms to business rules(constraints)
Key Data Characteristics (Dimensions)
 Lack of ownership/accountability
 Incomplete or no checks during data entry
 Heterogeneous platforms
 Purchased and homegrown applications
 Limited or no documentation
 Limited resources
 Run vs grow the business
 Mergers & acquisitions
What Makes Managing Data Quality Hard?
Managing Data Quality
Processes to Assess, Define, Monitor and Improve Data Quality
Discover &
Understand
Data
Define
Deploy
Monitor &
Remediate
•Data Ownership, Roles &
Responsibilities
•Data specifications
•Data quality requirements
•Workflows with R&R for
accountability to resolve data quality
issues
•Analyze Monitor results
•Execute workflows to fix DQ
issues
•Assess/Profile Data
•Assess Risks and Impact
•Catalog Data Assets
•Governance processes
•Operational processes
•DQ Audit and Monitors
Analyst, Data Steward, Product Data
Manager (PdM)
Analyst, Data Steward, PdM
Enterprise
Data
Analyst, Data Steward, PdM
Data Steward, Analyst, Developer
DQ Management Capability Stack
Data Sources (ERP, MDM, CRM, DW, Data Marts)
Data Access Layer
Data Profiler Rules Engine
Audit Results Repository
Reporting
Analysis
Events
Notifications
MetadataRepository
Workflow
Engine
Analyst, Data Steward, Product Data
Manager (PdM)
Data Steward,
DQ Management with Information Steward
Source
Systems ODS
ETL EDW Data marts
BI Platforms
• Data Validation Rules
• Data Profiles Setup
• DQ Scorecards
• DQ Monitor Tolerances
• Tasks and Notifications
• Accuracy
• Completeness
• Consistency
• Integrity
• Validity
DQ Metrics Repository
Data Profiles Data Fallouts DQ Metrics & Scorecards
 Data security
 Standards
 Systems landscape
 Roles & responsibilities
 Development lifecycle (migration)
 Dashboards, alerts and notifications
 Production support
 Training
 Upgrades
Rolling Out the DQ Management Tool
 Need to protect data from unauthorized use
 Master Data, Sales, Procurement
 Projects organized by subject areas (Data
Taxonomy) or business process/function
 Separate projects for business users and operations
 Data stewards approve access to data
Data Organization and Security
• Naming standards:
• Connections: SAPCRM_ChnlMgmt
• Views: VW0012_ADRC join Channel_Mgmt.v_addr
• Rules: LR0012 SAP ADRC PK not in EDW addr
• Tasks: CHNL RT0012 VW0012 ADRC compare addr
• Emphasis on names and detailed descriptions that
resonate with business or operations users
• Documentation within the tool
Naming Standards
Systems Landscape
PF DEV BM* PRDQA
PF DEV QA BM PROD
PF DEV QA BM PROD
Pathfinding Development Test Benchmark Production
*If Necessary
Data Sources
Biz or IT
Developer
Support
Analyst
Support
Analyst
 Separation of duties in support of audit requirements
 Those who write monitors cannot deploy in production
 Monitors developed and tested using non-production
systems
 Migrations to production though manual, handled by
separate role (support analyst)
 Support analyst and tool administrator separate roles
and individuals
 Changes migrate from non-production to production
 Changes directly in production on exception basis
Roles & Responsibilities
Initial
Engagement
•Meet with prospects to understand requirements
•Assess if tool is the right fit for requirements
•Tool limitations can be a show stopper for some scenarios
Development
•Assign a mentor to project team, share best practices, standards
•Document requirements for data, views, rules, bindings, schedules, thresholds
•Build DQ Monitor – data sources, views, rules, bindings, tasks, dashboards
Deployment
•Conduct a design review (quality assurance check)
•Migrate to production (support analyst), configure notifications
•Schedule tasks per schedule (support analyst)
Improvements
•Project team makes changes and tests in development and test environments
•Project team requests changes to be migrated to production
•Support analyst migrates changes to production
Development Methodology
3 - 4
Weeks
1 - 3
Days*
* Green Period
 Need for history of records with DQ issues
 Required custom solution to report historical records
Getting to Historical DQ Issue Records
 Information pipeline is comprised of multiple platforms
 One or more platforms get software/hardware upgrade
at a minimum once per year
 Each platform upgrade requires end-to-end testing of
information pipeline
 Was the flow of data complete and consistent after upgrade?
Reality of Platform Upgrades
Source
Systems Operational
Data Store (ODS)
Extract Transform
& Load (ETL)
Enterprise
DataWarehouse
EDW
Data marts BI Platforms
 DQ monitors validate data is complete and consistent
within and across data repositories
 DQ monitors are early detectors of data issues for
critical business processes
 An upgrade of a platform requires regression testing;
a DQ monitor can do that job
 Validate data is complete and consistent between source
and target repository after platform upgrade
 Eliminate need to maintain test data sets, test scripts for
individual platforms
 Test parts of pipeline or the entire pipeline using existing
DQ monitors
DQ Monitors – Regression Test Suite
 Trust in the data has significantly improved and more
focus can be directed to value added activities
 In one scenario improved DQ from 73% to 93% in 1
quarter
 Enabled streamlining for metrics process
 Reduction in recovery time and activities during data
excursions
 Monitors take out the guesswork on what the issue is and
what the resolution needs to be
 Recover in 2-4 hours, instead of 2-3 days
Return On Investment
 BI information pipeline is a good place to start
with DQ Monitors
 Showcase value to those in business responsible
for data
 Design for data security, separation of roles and
enforce standards
 Use data profiling to troubleshoot data issue
within data pipeline, especially in production
 Use DQ monitors as regression test suite for the
information pipeline
Best Practices
 Monitors with no ownership for action, is waste of
resources
 Having metrics helps get the necessary focus on
data quality
 Partner with those championing data governance
to derive value from IT investments
 A major data crisis will get the right attention, grab the
moment
 Monitors are valuable during platform upgrades
 Set development lifecycle expectations early in
the engagement
Key Learnings
STAY INFORMED
Follow the ASUGNews team:
Tom Wailgum: @twailgum
Chris Kanaracus: @chriskanaracus
Craig Powers: @Powers_ASUG
THANK YOU FOR PARTICIPATING
Please provide feedback on this session by completing
a short survey via the event mobile application.
SESSION CODE: BI593
For ongoing education on this area of focus,
visit www.ASUG.com

593 Managing Enterprise Data Quality Using SAP Information Steward

  • 2.
    Managing Enterprise DataQuality using SAP Information Steward Vinny Ahuja, Cheryl Johnson Intel Corporation SESSION CODE: BI593
  • 3.
    Disclaimer This presentation isfor informational purposes only. INTEL MAKES NO WARRANTIES, EXPRESS OR IMPLIED, IN THIS SUMMARY. Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors. Performance tests, such as SYSmark and MobileMark, are measured using specific computer systems, components, software, operations and functions. Any change to any of those factors may cause the results to vary. You should consult other information and performance tests to assist you in fully evaluating your contemplated purchases, including the performance of that product when combined with other products. For more complete information about performance and benchmark results, visit www.intel.com/benchmarks Intel and the Intel logo are trademarks of Intel Corporation in the U.S. and/or other countries. For a list of Intel trademarks, go to http://legal.intel.com/Trademarks/NamesDb.htm] * Other names and brands may be claimed as the property of others. Copyright © 2015, Intel Corporation. All rights reserved.
  • 4.
     Data Quality(DQ)challenges within information pipeline for Business Intelligence (BI)  Provide visibility into DQ issues within a heterogeneous landscape  Role of Information Steward in addressing DQ  Share implementation experience  Use DQ tool as a regression test tool Learning Points
  • 5.
  • 6.
     Data qualitystarts with systems of record  Data movement can introduce data quality issues  Don’t wait for customer to find data quality issues  Build instrumentation in pipeline to monitor quality Business Intelligence Information Pipeline Source Systems Operational Data Store (ODS) Extract Transform & Load (ETL) Enterprise DataWarehouse EDW Data marts BI Platforms
  • 7.
     Accuracy  Datawas entered or derived correctly as measured by a physical assessment  Completeness  Data is not missing  Consistency  Data that should be the same in various systems is, in fact, the same  Timeliness  Data is available for use when the business requires it  Validity  Data conforms to business rules(constraints) Key Data Characteristics (Dimensions)
  • 8.
     Lack ofownership/accountability  Incomplete or no checks during data entry  Heterogeneous platforms  Purchased and homegrown applications  Limited or no documentation  Limited resources  Run vs grow the business  Mergers & acquisitions What Makes Managing Data Quality Hard?
  • 9.
    Managing Data Quality Processesto Assess, Define, Monitor and Improve Data Quality Discover & Understand Data Define Deploy Monitor & Remediate •Data Ownership, Roles & Responsibilities •Data specifications •Data quality requirements •Workflows with R&R for accountability to resolve data quality issues •Analyze Monitor results •Execute workflows to fix DQ issues •Assess/Profile Data •Assess Risks and Impact •Catalog Data Assets •Governance processes •Operational processes •DQ Audit and Monitors Analyst, Data Steward, Product Data Manager (PdM) Analyst, Data Steward, PdM Enterprise Data Analyst, Data Steward, PdM Data Steward, Analyst, Developer
  • 10.
    DQ Management CapabilityStack Data Sources (ERP, MDM, CRM, DW, Data Marts) Data Access Layer Data Profiler Rules Engine Audit Results Repository Reporting Analysis Events Notifications MetadataRepository Workflow Engine Analyst, Data Steward, Product Data Manager (PdM) Data Steward,
  • 11.
    DQ Management withInformation Steward Source Systems ODS ETL EDW Data marts BI Platforms • Data Validation Rules • Data Profiles Setup • DQ Scorecards • DQ Monitor Tolerances • Tasks and Notifications • Accuracy • Completeness • Consistency • Integrity • Validity DQ Metrics Repository Data Profiles Data Fallouts DQ Metrics & Scorecards
  • 12.
     Data security Standards  Systems landscape  Roles & responsibilities  Development lifecycle (migration)  Dashboards, alerts and notifications  Production support  Training  Upgrades Rolling Out the DQ Management Tool
  • 13.
     Need toprotect data from unauthorized use  Master Data, Sales, Procurement  Projects organized by subject areas (Data Taxonomy) or business process/function  Separate projects for business users and operations  Data stewards approve access to data Data Organization and Security
  • 14.
    • Naming standards: •Connections: SAPCRM_ChnlMgmt • Views: VW0012_ADRC join Channel_Mgmt.v_addr • Rules: LR0012 SAP ADRC PK not in EDW addr • Tasks: CHNL RT0012 VW0012 ADRC compare addr • Emphasis on names and detailed descriptions that resonate with business or operations users • Documentation within the tool Naming Standards
  • 15.
    Systems Landscape PF DEVBM* PRDQA PF DEV QA BM PROD PF DEV QA BM PROD Pathfinding Development Test Benchmark Production *If Necessary Data Sources Biz or IT Developer Support Analyst Support Analyst
  • 16.
     Separation ofduties in support of audit requirements  Those who write monitors cannot deploy in production  Monitors developed and tested using non-production systems  Migrations to production though manual, handled by separate role (support analyst)  Support analyst and tool administrator separate roles and individuals  Changes migrate from non-production to production  Changes directly in production on exception basis Roles & Responsibilities
  • 17.
    Initial Engagement •Meet with prospectsto understand requirements •Assess if tool is the right fit for requirements •Tool limitations can be a show stopper for some scenarios Development •Assign a mentor to project team, share best practices, standards •Document requirements for data, views, rules, bindings, schedules, thresholds •Build DQ Monitor – data sources, views, rules, bindings, tasks, dashboards Deployment •Conduct a design review (quality assurance check) •Migrate to production (support analyst), configure notifications •Schedule tasks per schedule (support analyst) Improvements •Project team makes changes and tests in development and test environments •Project team requests changes to be migrated to production •Support analyst migrates changes to production Development Methodology 3 - 4 Weeks 1 - 3 Days* * Green Period
  • 18.
     Need forhistory of records with DQ issues  Required custom solution to report historical records Getting to Historical DQ Issue Records
  • 19.
     Information pipelineis comprised of multiple platforms  One or more platforms get software/hardware upgrade at a minimum once per year  Each platform upgrade requires end-to-end testing of information pipeline  Was the flow of data complete and consistent after upgrade? Reality of Platform Upgrades Source Systems Operational Data Store (ODS) Extract Transform & Load (ETL) Enterprise DataWarehouse EDW Data marts BI Platforms
  • 20.
     DQ monitorsvalidate data is complete and consistent within and across data repositories  DQ monitors are early detectors of data issues for critical business processes  An upgrade of a platform requires regression testing; a DQ monitor can do that job  Validate data is complete and consistent between source and target repository after platform upgrade  Eliminate need to maintain test data sets, test scripts for individual platforms  Test parts of pipeline or the entire pipeline using existing DQ monitors DQ Monitors – Regression Test Suite
  • 21.
     Trust inthe data has significantly improved and more focus can be directed to value added activities  In one scenario improved DQ from 73% to 93% in 1 quarter  Enabled streamlining for metrics process  Reduction in recovery time and activities during data excursions  Monitors take out the guesswork on what the issue is and what the resolution needs to be  Recover in 2-4 hours, instead of 2-3 days Return On Investment
  • 22.
     BI informationpipeline is a good place to start with DQ Monitors  Showcase value to those in business responsible for data  Design for data security, separation of roles and enforce standards  Use data profiling to troubleshoot data issue within data pipeline, especially in production  Use DQ monitors as regression test suite for the information pipeline Best Practices
  • 23.
     Monitors withno ownership for action, is waste of resources  Having metrics helps get the necessary focus on data quality  Partner with those championing data governance to derive value from IT investments  A major data crisis will get the right attention, grab the moment  Monitors are valuable during platform upgrades  Set development lifecycle expectations early in the engagement Key Learnings
  • 24.
    STAY INFORMED Follow theASUGNews team: Tom Wailgum: @twailgum Chris Kanaracus: @chriskanaracus Craig Powers: @Powers_ASUG
  • 25.
    THANK YOU FORPARTICIPATING Please provide feedback on this session by completing a short survey via the event mobile application. SESSION CODE: BI593 For ongoing education on this area of focus, visit www.ASUG.com