Five Steps to Mastering Master Data Management
                                     Ron Lewis
                              November 19, 2009
Presentation Overview

• Introduction
• What is Master Data Management?
                           g
• The 5 Steps for Master Data Management:
    • Discovery – finding all of the data sources, who they are used by and how they are used
    • Analysis – identifying authoritative sources, discrepancies, and candidates for consolidation
    • Design – designing the metadata repository
    • Implementation–implementing a metadata repository
    • Establish data governance

• Leveraging Technology to facilitate:
    • Business Process and Data Modeling
                                       g
    • Data Governance and Discovery
    • Metadata Repository Implementation
                   g
    • Metadata Management

• Presentation Focus:           The Discovery and Analysis Phases
19/11/2009                                                                                            2
Master Data Management

• Master Data Management
    • Master Data is: Principle business data essential for conducting business
    • MDM provides an enterprise perspective on the critical Business Processes and the Data necessary to
      support them
    • Bottom line: Improve decision making



• Core Tasks
    • Building the Business Process Models
    • Data Governance (Standardizing data - nomenclature, domains, data quality and consumption rules)
    • Synchronizing related operational systems using the data
    • Integrating/reconciling disparate data silos to provide single enterprise view
    • Building and managing an enterprise metadata repository



• Challenge: Must Shift Thinking to the Enterprise Perspective

11/15/2009                                                                                               3
Discovery Phase

• Step 1 – Discovery
    • Capturing and modeling the essential business processes
    • Mapping processes to the data necessary to complete each process successfully
    • Identifying data sources and gathering appropriate metadata

• Primary Challenges-
    • Cost - It’s Expensive and Disruptive
    • Gaining Executive Leadership Support – (“You mean we don’t have this already?”)

• Solution
  Solution-
    • Start with what’s most important
    • What’s important should be obvious




11/15/2009                                                                              4
Discovery Phase

• Involve your infrastructure and/or security personnel
• Iteration I: Capture existing data and schemas
                 p            g
    • Find your database servers, respective owners and access
    • Reverse engineering your physical data models
    • Build a master data dictionary and catalog
                                   y           g

• Iteration II: Profile existing applications to help with business
    • Database Centric: ETL, Stored Procedures, and Triggers
    • Application Source Code and User Behavior

• Tools You’ll Need
    • Infrastructure/security tools (
                            y       (Nessus)
                                           )
    • Data Modeling and Profiling tools (ER/Studio Data Architect/DBOptimizer)
    • Application Profiling tools (NitroSecurity APM)
    • Repository to manage the metadata byproducts
        p      y        g                yp



19/11/2009                                                                       5
Infrastructure / Security Tooling




19/11/2009                          6
Use ER Studio to Reverse Engineer




19/11/2009                          7
Reverse Engineer Physical Schemas




19/11/2009                          8
Example Reverse Engineered Model




19/11/2009                         9
Start Building Master Data Catalog




19/11/2009                           10
Exporting Catalog for Sharing




19/11/2009                      11
Discovery – Profiling Data Use

• Biggest Challenges We’re Solving:
    • Reconciling and integrating disparate “Data Silos” into a central location
    • Identifying duplicative data elements (or attributes)
    • Laying the foundation for identifying which of the data sources contain the actual “source data”

• High Percentage of Business Logic is encapsulated as Programming Logic
    g          g                g          p              g      g g
    • Stored Procedures and Trigger code stored in the database
    • Application Source Code
    • Extract Transform and Load Scripts
    • We need visibility to this logic, and we need to be able to store it somewhere

• Tools necessary for this:
    • DSAuditor and DB Optimizer or Performance Center (to capture live data use)
    • Source Code Analyzers (I like Fortify SCA, and Embarcadero JBuilder)
    • Profile ETL using Embarcadero’s MetaWizard (usually convert ETL to XML)
    • Store metadata in ER/Studio Data Architect’s Data Lineage and Transform Rules Support


19/11/2009                                                                                               12
Profiling Data Use with DBOptimizer




19/11/2009                            13
Analysis Phase

• Step 2 – Analysis
    • Identifying authoritative sources, discrepancies, and candidates for consolidation
    • Evaluating Data Flow and Transform Rules
    • Capturing/Defining Synonyms and Assigning Aliases
    • Setting the Foundation for Data Governance

• Primary Challenges-
    • Cost – It’s Time Consuming and is a “Team Effort”
    • Getting ancillary information that teams don’t want to share
            g         y

• Solution-
    • Start with what’s most important
    • Wh ’ i
      What’s important should b obvious
                        h ld be b i




11/15/2009                                                                                 14
Analysis Phase

• Iteration I: Evaluate ETL for data lineage and transform rules
    • Start by reverse engineering the ETL, converting it to XML
    • Incorporate it into the repository

• Iteration II: Identify synonymous elements and build alias list
    • Evaluate data domains and transform rules for issues such as state and use
    • Enlist database and development staff to identify alias and tag the data elements in the master catalog

• Tools You’ll Need
    • Data Modeling tools (ER/Studio and MetaWizard)
    • Repository to manage the metadata byproducts (ER/Studio)




19/11/2009                                                                                                 15
Analysis Phase – Evaluating ETL

• Biggest Challenges We’re Solving:
    • Finding which data source is feeding what other data sources
    • Collecting Data Lineage metadata
    • Making it accessible to the right team members

• Convert the ETL to a form that allows manipulation (
                                             p       (such as XML) )
• Importing the metadata into the data modeling tool
• Build, publish and control access to your master data repository
• Start gathering and applying metadata tags
• Tools necessary for this:
    • MetaWizard
    • ER/Studio Data Architect (or the like)




19/11/2009                                                             16
Data Lineage and Transform Rules




19/11/2009                         17
Setting the Foundation for Governance




  19/11/2009
                                        18
Analysis Phase – Identifying Synonyms


• Biggest Challenges We’re Solving:
    • Indentifying like data elements and candidates for consolidation
    • Building Aliases
    • Establishing the foundation for Data Governance

• Evaluate data nomenclature using tool functions such as Merge and
                                 g                           g
  Compare to identify the obvious overlaps
• Compare descriptors from database staff
• Compare data use and consumption rules derived from tools such as DB
  Optimizer
• Tools necessary f this:
                  for
    • ER/Studio Data Architect (or the like)




19/11/2009                                                               19
Performing Analysis With Compare Utility




19/11/2009                                 20
Exporting to Excel for Input into Database




19/11/2009                                   21
Candidates for Consolidation




19/11/2009                     22
Step 3 Building the Repository

• Step 3–Building Metadata Repository
    • Populating the Repository with the right metadata
    • Establishing and Controlling Access to the metadata
    • Performing metadata management

• Primary Challenges-
        y        g
    • Defining who needs access to what metadata
    • Establishing the rules of use

• Suggestions
  Suggestions-
    • Implement change control and auditing tool
    • What’s important should be obvious
    • Understand the value of the metadata on profitability




19/11/2009                                                    23
Step 4 Implementing the repository

• Step 4 - Implementing the repository
    • Mapping the metadata to the requisite business processes
    • Leveraging the metadata to determine candidates for business process re-engineering

• Primary Challenges-
    • Getting the p
            g     processes down in modeled form
    • Obtaining Middle Level Management and Senior Leadership buy in to changes identified by metadata

• Suggestions-
    • Leverage a modeling tool that facilitates data to process mapping (integrated metadata)
    • Focus on what’s most important to the business—try not to focus on EVERYTHING




19/11/2009                                                                                           24
Step 5 Establishing Data Governance

• Step 5 – Establishing Data Governance
    • All of the above steps lays the foundation for good data governance
    • Get Senior Leadership to stipulate policy enforcing the rules you’ve derived
    • Build a Plan and Standardize Iteratively – (don’t try to fix everything all at once)

• Primary Challenges-
        y        g
    • Fundamental Opposition to Change
    • Maintaining Momentum

• Suggestions
  Suggestions-
    • Find a quick kill – tackle the biggest organizational problem you can handle
    • Focus on what’s most important to the business—and what drives easily visible ROI




19/11/2009                                                                                   25
Summary

• What We Covered:
    • Defined Master Data and Master Data Management
    • The 5 Steps for Master Data Management:
         • Discovery – finding all of the data sources, who they are used by and how they are used
         • Analysis – identifying authoritative sources, discrepancies, and candidates for consolidation
         • Design – designing the metadata repository
         • Implementation–implementing a metadata repository
         • Establish data governance
    • Demonstrated how to leverage specific technology to facilitate:
         • Business Process and Data Modeling
         • Data Governance and Discovery
         • Metadata Repository Implementation
         • Metadata Management




19/11/2009                                                                                                 26
Questions and Answers

• Tools Discussed:
     • Nessus
     • ER/Studio Data Architect / Business Architect and ER/Studio Repository
     • DBOptimizer
     • Change Manager



• Technologies Discussed:
     • Building the Data Catalog
     • Capturing and Storing Metadata
     • Metadata Analysis



• Contact Info:
•   Ron Lewis, Ron.Lewis@cdotech.com




19/11/2009                                                                      27

5 Steps To Master Data Management

  • 1.
    Five Steps toMastering Master Data Management Ron Lewis November 19, 2009
  • 2.
    Presentation Overview • Introduction •What is Master Data Management? g • The 5 Steps for Master Data Management: • Discovery – finding all of the data sources, who they are used by and how they are used • Analysis – identifying authoritative sources, discrepancies, and candidates for consolidation • Design – designing the metadata repository • Implementation–implementing a metadata repository • Establish data governance • Leveraging Technology to facilitate: • Business Process and Data Modeling g • Data Governance and Discovery • Metadata Repository Implementation g • Metadata Management • Presentation Focus: The Discovery and Analysis Phases 19/11/2009 2
  • 3.
    Master Data Management •Master Data Management • Master Data is: Principle business data essential for conducting business • MDM provides an enterprise perspective on the critical Business Processes and the Data necessary to support them • Bottom line: Improve decision making • Core Tasks • Building the Business Process Models • Data Governance (Standardizing data - nomenclature, domains, data quality and consumption rules) • Synchronizing related operational systems using the data • Integrating/reconciling disparate data silos to provide single enterprise view • Building and managing an enterprise metadata repository • Challenge: Must Shift Thinking to the Enterprise Perspective 11/15/2009 3
  • 4.
    Discovery Phase • Step1 – Discovery • Capturing and modeling the essential business processes • Mapping processes to the data necessary to complete each process successfully • Identifying data sources and gathering appropriate metadata • Primary Challenges- • Cost - It’s Expensive and Disruptive • Gaining Executive Leadership Support – (“You mean we don’t have this already?”) • Solution Solution- • Start with what’s most important • What’s important should be obvious 11/15/2009 4
  • 5.
    Discovery Phase • Involveyour infrastructure and/or security personnel • Iteration I: Capture existing data and schemas p g • Find your database servers, respective owners and access • Reverse engineering your physical data models • Build a master data dictionary and catalog y g • Iteration II: Profile existing applications to help with business • Database Centric: ETL, Stored Procedures, and Triggers • Application Source Code and User Behavior • Tools You’ll Need • Infrastructure/security tools ( y (Nessus) ) • Data Modeling and Profiling tools (ER/Studio Data Architect/DBOptimizer) • Application Profiling tools (NitroSecurity APM) • Repository to manage the metadata byproducts p y g yp 19/11/2009 5
  • 6.
    Infrastructure / SecurityTooling 19/11/2009 6
  • 7.
    Use ER Studioto Reverse Engineer 19/11/2009 7
  • 8.
    Reverse Engineer PhysicalSchemas 19/11/2009 8
  • 9.
    Example Reverse EngineeredModel 19/11/2009 9
  • 10.
    Start Building MasterData Catalog 19/11/2009 10
  • 11.
    Exporting Catalog forSharing 19/11/2009 11
  • 12.
    Discovery – ProfilingData Use • Biggest Challenges We’re Solving: • Reconciling and integrating disparate “Data Silos” into a central location • Identifying duplicative data elements (or attributes) • Laying the foundation for identifying which of the data sources contain the actual “source data” • High Percentage of Business Logic is encapsulated as Programming Logic g g g p g g g • Stored Procedures and Trigger code stored in the database • Application Source Code • Extract Transform and Load Scripts • We need visibility to this logic, and we need to be able to store it somewhere • Tools necessary for this: • DSAuditor and DB Optimizer or Performance Center (to capture live data use) • Source Code Analyzers (I like Fortify SCA, and Embarcadero JBuilder) • Profile ETL using Embarcadero’s MetaWizard (usually convert ETL to XML) • Store metadata in ER/Studio Data Architect’s Data Lineage and Transform Rules Support 19/11/2009 12
  • 13.
    Profiling Data Usewith DBOptimizer 19/11/2009 13
  • 14.
    Analysis Phase • Step2 – Analysis • Identifying authoritative sources, discrepancies, and candidates for consolidation • Evaluating Data Flow and Transform Rules • Capturing/Defining Synonyms and Assigning Aliases • Setting the Foundation for Data Governance • Primary Challenges- • Cost – It’s Time Consuming and is a “Team Effort” • Getting ancillary information that teams don’t want to share g y • Solution- • Start with what’s most important • Wh ’ i What’s important should b obvious h ld be b i 11/15/2009 14
  • 15.
    Analysis Phase • IterationI: Evaluate ETL for data lineage and transform rules • Start by reverse engineering the ETL, converting it to XML • Incorporate it into the repository • Iteration II: Identify synonymous elements and build alias list • Evaluate data domains and transform rules for issues such as state and use • Enlist database and development staff to identify alias and tag the data elements in the master catalog • Tools You’ll Need • Data Modeling tools (ER/Studio and MetaWizard) • Repository to manage the metadata byproducts (ER/Studio) 19/11/2009 15
  • 16.
    Analysis Phase –Evaluating ETL • Biggest Challenges We’re Solving: • Finding which data source is feeding what other data sources • Collecting Data Lineage metadata • Making it accessible to the right team members • Convert the ETL to a form that allows manipulation ( p (such as XML) ) • Importing the metadata into the data modeling tool • Build, publish and control access to your master data repository • Start gathering and applying metadata tags • Tools necessary for this: • MetaWizard • ER/Studio Data Architect (or the like) 19/11/2009 16
  • 17.
    Data Lineage andTransform Rules 19/11/2009 17
  • 18.
    Setting the Foundationfor Governance 19/11/2009 18
  • 19.
    Analysis Phase –Identifying Synonyms • Biggest Challenges We’re Solving: • Indentifying like data elements and candidates for consolidation • Building Aliases • Establishing the foundation for Data Governance • Evaluate data nomenclature using tool functions such as Merge and g g Compare to identify the obvious overlaps • Compare descriptors from database staff • Compare data use and consumption rules derived from tools such as DB Optimizer • Tools necessary f this: for • ER/Studio Data Architect (or the like) 19/11/2009 19
  • 20.
    Performing Analysis WithCompare Utility 19/11/2009 20
  • 21.
    Exporting to Excelfor Input into Database 19/11/2009 21
  • 22.
  • 23.
    Step 3 Buildingthe Repository • Step 3–Building Metadata Repository • Populating the Repository with the right metadata • Establishing and Controlling Access to the metadata • Performing metadata management • Primary Challenges- y g • Defining who needs access to what metadata • Establishing the rules of use • Suggestions Suggestions- • Implement change control and auditing tool • What’s important should be obvious • Understand the value of the metadata on profitability 19/11/2009 23
  • 24.
    Step 4 Implementingthe repository • Step 4 - Implementing the repository • Mapping the metadata to the requisite business processes • Leveraging the metadata to determine candidates for business process re-engineering • Primary Challenges- • Getting the p g processes down in modeled form • Obtaining Middle Level Management and Senior Leadership buy in to changes identified by metadata • Suggestions- • Leverage a modeling tool that facilitates data to process mapping (integrated metadata) • Focus on what’s most important to the business—try not to focus on EVERYTHING 19/11/2009 24
  • 25.
    Step 5 EstablishingData Governance • Step 5 – Establishing Data Governance • All of the above steps lays the foundation for good data governance • Get Senior Leadership to stipulate policy enforcing the rules you’ve derived • Build a Plan and Standardize Iteratively – (don’t try to fix everything all at once) • Primary Challenges- y g • Fundamental Opposition to Change • Maintaining Momentum • Suggestions Suggestions- • Find a quick kill – tackle the biggest organizational problem you can handle • Focus on what’s most important to the business—and what drives easily visible ROI 19/11/2009 25
  • 26.
    Summary • What WeCovered: • Defined Master Data and Master Data Management • The 5 Steps for Master Data Management: • Discovery – finding all of the data sources, who they are used by and how they are used • Analysis – identifying authoritative sources, discrepancies, and candidates for consolidation • Design – designing the metadata repository • Implementation–implementing a metadata repository • Establish data governance • Demonstrated how to leverage specific technology to facilitate: • Business Process and Data Modeling • Data Governance and Discovery • Metadata Repository Implementation • Metadata Management 19/11/2009 26
  • 27.
    Questions and Answers •Tools Discussed: • Nessus • ER/Studio Data Architect / Business Architect and ER/Studio Repository • DBOptimizer • Change Manager • Technologies Discussed: • Building the Data Catalog • Capturing and Storing Metadata • Metadata Analysis • Contact Info: • Ron Lewis, Ron.Lewis@cdotech.com 19/11/2009 27