SlideShare a Scribd company logo
1 of 30
Metadata Administration – A
                        Case Study


                             Alex Friedgan

                        alex.friedgan@gmail.com


© Alex Friedgan, 2009
Overview
 Value Proposition
 Challenges & Solutions
 Quality Scorecard & Results




© 2009                Alex Friedgan   2
Importance of Documentation
 Development effort is subdivided into a
  series of projects
 Resources are geographically dispersed
 ETL and BI teams report to different
  managers
 Data delivery (ETL) and Reporting (BI) are
  handled as separate projects
 IMPORTANCE OF DOCUMENTATION !!



© 2009              Alex Friedgan              3
DW Development Workflow

                                                      Time
                   Data Delivery (ETL) Project                   Reporting (BI) Project
                                                      Gap
    Da
      ta
         An
           al
              ys
              is




            Org and/or Geography
                  Boundary



                                                             Source-to-Target Document
                            ET
                              L




                                                             (Data Analysis Hand-off)
                               De
                                   si
                                     gn




© 2009                                    Alex Friedgan                                   4
Data Analysis Hand-off Document
 Designed to be comprehensive and sufficient
 Serves multiple purposes
 Hundreds of documents
    Versioning Info
           Project, change description
        General Info
           Subject area, entity, data flow

        Column Level Maps
           Target column, business name and definition,

            source column, transformation rules
        Supporting material

© 2009                     Alex Friedgan                   5
Documentation “Neighborhood”

               Project Folders                Databases




 File Record
                                             Data Models
   Layouts




                                     Source-to-Target Documents
                                     (Data Analysis Hand-off)


© 2009               Alex Friedgan                                6
Metadata Framework
 Distributed metadata
    Information is spread across numerous documents
    Each document serves as a source of metadata

 Semi-structured format (semantically fuzzy data)
        Documentation is produced by flexible authoring tools
 Overlapping documentation
    Information is copied, re-created and redundantly
     stored
    No authoritative system of record when documents
     contradict
 Integration without a centralized repository !!
© 2009                       Alex Friedgan                       7
Value Proposition
 PROPOSITION
    Basic MME (managed metadata environment)
    Consistent and comprehensive documentation
     achieved within the distributed metadata framework
 Benefits are obvious but difficult to quantify
        Less confusion during development process
        Less re-work
        Shorter test cycles
        Improved quality of BI deliverables
 Low implementation cost; so only a confirmation of
   savings was needed, not a comprehensive estimate
© 2009                      Alex Friedgan                 8
Data Lineage Analysis – The “Before”
 Data Architect was receiving frequent
   requests to find documents
        Developer performing unsuccessful search

        Developer contacting Project Manager

        Project Manager contacting Data Architect

        Data Architect searching and replying

        Developer waiting for information requested

 Finding correct document is worth XX
   man-hours per month.

© 2009                       Alex Friedgan             9
Challenges & Solutions




© 2009         Alex Friedgan   10
Distributed Metadata Challenges
    Number of synchronization processes grows as square of
     sources

         File Record
                                 Project Folder
            Layout
                                     VUW
             DEF                                     Multiple types
                                                      of documents
                                                     Multiple
                                                      individual
                                                      documents

         Spreadsheet                 Model
            ABC                      XYZ


© 2009                     Alex Friedgan                         11
Challenges Mitigated #1
   Mitigating multiple document type integration
            Follow metadata model
            Concentrate on “pain points” only


                           Project Folders          Databases
Project # used. OK
                                                                F/E and R/E. OK


 File Record              Source-to-target
                                                   Data Models
   Layouts                 Spreadsheets


No complaints. OK                                 Broken Link

© 2009                            Alex Friedgan                            12
Challenges Mitigated # 2

          Spreadsheet                   Model
             ABC                        XYZ


  Two partial methods
     Full synchronization,
         BUT individual documents
     Full inventory,
         BUT limited functionality
  Methods complement each other providing
     a workable metadata integration solution
© 2009                  Alex Friedgan           13
Full Synchronization

 Round-trip engineering (forward and reverse-
  engineering) bridge
 (+) Compares each column/attribute to make
  sure physical and logical properties match
 (+) Modifies spreadsheet (or data model)
 (-) Compares one table at a time
 Run as needed during development effort



© 2009              Alex Friedgan            14
Full Synchronization – cont.

Model XYZ              Verify               Spreadsheet ABC
 table_a           documentation                table_a




                             Design new
                                table




         Spreadsheet DEF           F/E into data         Model XYZ
              table_b                 model               table_b



© 2009                      Alex Friedgan                       15
Full Inventory
   Creates inventory of documents x-ref to data models
   (+) Handles full documentation portfolio
   (-) But in reporting mode and table level only
   Run on a regular basis

                        ?
    Source-to-Target                         Quality
     Spreadsheets                           Scorecard

                                Run
                             Inventory
                                            Inventory
         Data Models


© 2009                   Alex Friedgan                  16
Semi-Structured Format Challenges
 Flexible Authoring Tool !!
 Issues with standard layouts
    Designed for humans not machines
    Evolved through several generations
     (legacy documents)
    Vary by table type
 Deviations from standards
    By author, by project, by table
    Spreadsheet columns are added; columns
     are moved around; headers change, etc.
© 2009              Alex Friedgan             17
One Atomic Datum Per Element Violations

 A single spreadsheet cell gets overloaded with
    multiple pieces of data
   Old value is crossed-out but kept together with
    the new value
   Comments get embedded in the cell text
   The value and the name of an artifact are
    combined.
   Multiple values get inserted separated by
    commas, spaces, or new lines
   The owner, table, and column names are
    combined with a dot notation
© 2009                    Alex Friedgan               18
Artifact Name Variations

         Derivation

         Load Rules
                                           Transformation Rules
         Rules / Algorithms

         … (over 40 variations)


         Source Field

         Datapoint Name
                                           Source Column Name
         Xmit Field Name

         …(over 50 variations)
© 2009                     Alex Friedgan                          19
Table Name Patterns

   table_a                                    table_b
                    Table Name                table_a
   table_b




                                           owner.table_a.
                    Table Name             Column Name

         table_a,
         table_b




© 2009                     Alex Friedgan                    20
Document Structure Recognition

                                            Analyzing
         Reading                            Document
                                            Structure




           Article, Title, Column, Photo, Caption
© 2009                  Alex Friedgan                   21
Algorithm
 Document structure recognition
 Semantic match
        Canonical form
        Synonyms
 Exception rules to eliminate false
  positives, etc.
 If nothing works, change the
  offending document
 Measure progress via metrics
© 2009                    Alex Friedgan   22
Quality Scorecard & Results




© 2009         Alex Friedgan   23
Quality Metrics: Consistency &
Completeness
                    Maps              Tables

          Missing    ??
           Maps


         Matched                                Matched
          Maps                                   Tables


     Duplicate
      Maps
   Extraneous                              ??
     Maps


© 2009                     Alex Friedgan                  24
Quality Scorecard
 Always up-to-date
 Unflattering measurements provide
  motivation for improvements
 Broken down by schema/application
 Laggards have additional motivation to catch
  up with others




© 2009                Alex Friedgan              25
Initial Efforts
 Fine-tune the tool
    Expanding synonyms, exception rules,
     etc.
 Clean-up existing documents
    Correcting misspellings, errors, etc.
    Archiving old versions
 Search for missing documentation
    NOTE: re-creating missing
     documents was not pursued
 Entire data modeling team was involved

© 2009              Alex Friedgan            26
Long-Term Approach
 On-going monitoring
    Run on a regular (monthly) basis
    Target new documentation only
    Identify incremental changes and verify each
    Fix issue or notify document author

 “Campaigns”
        Whenever project work permits
        Target full documentation portfolio
           Started with extraneous maps

           Next, handled missing maps

           Last, worked with duplicates


© 2009                        Alex Friedgan         27
Data Lineage Analysis – The “After”

                        Click to Open



                                           ABC
                                            ABC               Find
 Find                                        ABC
         Inventory                            ABC            Sources
 Table                                         ABC
                                                ABC
                                                Source-To-     And
                                                 ABC
                          hyperlinks            Target Map    Rules
                                                   ABC



                     Return to Inventory
                     with next table name

© 2009                     Alex Friedgan                         28
Lessons Learned
 Can Do Attitude is essential
 Distributed and “semantically fuzzy” metadata
    was tamed
   Initial effort successfully targeted urgent
    needs
   Low-cost metadata administration was
    established
   Round-trip bridge increased productivity
   Metrics effectively promoted quality
   Inventory became a hub of metadata

© 2009                     Alex Friedgan          29
Metadata Administration – A
                        Case Study


                             Alex Friedgan

                        alex.friedgan@gmail.com


© Alex Friedgan, 2009

More Related Content

Similar to Metadata Administration – A Case Study

Dimensional modelingowb11gr2 presentation
Dimensional modelingowb11gr2 presentationDimensional modelingowb11gr2 presentation
Dimensional modelingowb11gr2 presentationMaren Eschermann
 
Data Transformation using Semantic Web Standards
Data Transformation using Semantic Web StandardsData Transformation using Semantic Web Standards
Data Transformation using Semantic Web StandardsIrene Polikoff
 
Document Archiving & Sharing System
Document Archiving & Sharing SystemDocument Archiving & Sharing System
Document Archiving & Sharing SystemAshik Iqbal
 
Are You Ready for an Alternative in Application Development?
Are You Ready for an Alternative in Application Development?Are You Ready for an Alternative in Application Development?
Are You Ready for an Alternative in Application Development?LetsConnect
 
Open Services for Lifecycle Collaboration (OSLC) - Extending REST APIs to Con...
Open Services for Lifecycle Collaboration (OSLC) - Extending REST APIs to Con...Open Services for Lifecycle Collaboration (OSLC) - Extending REST APIs to Con...
Open Services for Lifecycle Collaboration (OSLC) - Extending REST APIs to Con...Axel Reichwein
 
System i - DDL vs DDS Presentation
System i - DDL vs DDS PresentationSystem i - DDL vs DDS Presentation
System i - DDL vs DDS PresentationChuck Walker
 
FME World Tour 2015: (EN) FME 2015 in action
FME World Tour 2015: (EN) FME 2015 in actionFME World Tour 2015: (EN) FME 2015 in action
FME World Tour 2015: (EN) FME 2015 in actionGIM_nv
 
Demystifying Modern PLM - Technology
Demystifying Modern PLM - TechnologyDemystifying Modern PLM - Technology
Demystifying Modern PLM - TechnologyOleg Shilovitsky
 
Demystifying Modern PLM Sessions. Part 1: Technology
Demystifying Modern PLM Sessions. Part 1: TechnologyDemystifying Modern PLM Sessions. Part 1: Technology
Demystifying Modern PLM Sessions. Part 1: TechnologyOleg Shilovitsky
 
Data Access Tech Ed India
Data Access   Tech Ed IndiaData Access   Tech Ed India
Data Access Tech Ed Indiarsnarayanan
 
InterConnect 2017 : Do You Have the Right Solution for z/OS Application Devel...
InterConnect 2017 : Do You Have the Right Solution for z/OS Application Devel...InterConnect 2017 : Do You Have the Right Solution for z/OS Application Devel...
InterConnect 2017 : Do You Have the Right Solution for z/OS Application Devel...DevOps for Enterprise Systems
 
Irina Kogan Resume
Irina Kogan ResumeIrina Kogan Resume
Irina Kogan Resumeirina_kogan
 
Finding your Way in the Midst of the NoSQL Haze - Abdelmonaim Remani
Finding your Way in the Midst of the NoSQL Haze - Abdelmonaim RemaniFinding your Way in the Midst of the NoSQL Haze - Abdelmonaim Remani
Finding your Way in the Midst of the NoSQL Haze - Abdelmonaim RemaniJAXLondon2014
 
Windows Azure: Lessons From The Field
Windows Azure: Lessons From The FieldWindows Azure: Lessons From The Field
Windows Azure: Lessons From The FieldRob Gillen
 
Snowflake + Power BI: Cloud Analytics for Everyone
Snowflake + Power BI: Cloud Analytics for EveryoneSnowflake + Power BI: Cloud Analytics for Everyone
Snowflake + Power BI: Cloud Analytics for EveryoneAngel Abundez
 
Soprex framework on .net in action
Soprex framework on .net in actionSoprex framework on .net in action
Soprex framework on .net in actionMilan Vukoje
 
Databricks Platform.pptx
Databricks Platform.pptxDatabricks Platform.pptx
Databricks Platform.pptxAlex Ivy
 

Similar to Metadata Administration – A Case Study (20)

Dimensional modelingowb11gr2 presentation
Dimensional modelingowb11gr2 presentationDimensional modelingowb11gr2 presentation
Dimensional modelingowb11gr2 presentation
 
Data Transformation using Semantic Web Standards
Data Transformation using Semantic Web StandardsData Transformation using Semantic Web Standards
Data Transformation using Semantic Web Standards
 
Document Archiving & Sharing System
Document Archiving & Sharing SystemDocument Archiving & Sharing System
Document Archiving & Sharing System
 
Are You Ready for an Alternative in Application Development?
Are You Ready for an Alternative in Application Development?Are You Ready for an Alternative in Application Development?
Are You Ready for an Alternative in Application Development?
 
Document Merge on Salesforce.com
Document Merge on Salesforce.comDocument Merge on Salesforce.com
Document Merge on Salesforce.com
 
Open Services for Lifecycle Collaboration (OSLC) - Extending REST APIs to Con...
Open Services for Lifecycle Collaboration (OSLC) - Extending REST APIs to Con...Open Services for Lifecycle Collaboration (OSLC) - Extending REST APIs to Con...
Open Services for Lifecycle Collaboration (OSLC) - Extending REST APIs to Con...
 
System i - DDL vs DDS Presentation
System i - DDL vs DDS PresentationSystem i - DDL vs DDS Presentation
System i - DDL vs DDS Presentation
 
FME World Tour 2015: (EN) FME 2015 in action
FME World Tour 2015: (EN) FME 2015 in actionFME World Tour 2015: (EN) FME 2015 in action
FME World Tour 2015: (EN) FME 2015 in action
 
Demystifying Modern PLM - Technology
Demystifying Modern PLM - TechnologyDemystifying Modern PLM - Technology
Demystifying Modern PLM - Technology
 
Demystifying Modern PLM Sessions. Part 1: Technology
Demystifying Modern PLM Sessions. Part 1: TechnologyDemystifying Modern PLM Sessions. Part 1: Technology
Demystifying Modern PLM Sessions. Part 1: Technology
 
Data Access Tech Ed India
Data Access   Tech Ed IndiaData Access   Tech Ed India
Data Access Tech Ed India
 
InterConnect 2017 : Do You Have the Right Solution for z/OS Application Devel...
InterConnect 2017 : Do You Have the Right Solution for z/OS Application Devel...InterConnect 2017 : Do You Have the Right Solution for z/OS Application Devel...
InterConnect 2017 : Do You Have the Right Solution for z/OS Application Devel...
 
Irina Kogan Resume
Irina Kogan ResumeIrina Kogan Resume
Irina Kogan Resume
 
NoSQL for SQL Users
NoSQL for SQL UsersNoSQL for SQL Users
NoSQL for SQL Users
 
Finding your Way in the Midst of the NoSQL Haze - Abdelmonaim Remani
Finding your Way in the Midst of the NoSQL Haze - Abdelmonaim RemaniFinding your Way in the Midst of the NoSQL Haze - Abdelmonaim Remani
Finding your Way in the Midst of the NoSQL Haze - Abdelmonaim Remani
 
Windows Azure: Lessons From The Field
Windows Azure: Lessons From The FieldWindows Azure: Lessons From The Field
Windows Azure: Lessons From The Field
 
Snowflake + Power BI: Cloud Analytics for Everyone
Snowflake + Power BI: Cloud Analytics for EveryoneSnowflake + Power BI: Cloud Analytics for Everyone
Snowflake + Power BI: Cloud Analytics for Everyone
 
Shapefile
ShapefileShapefile
Shapefile
 
Soprex framework on .net in action
Soprex framework on .net in actionSoprex framework on .net in action
Soprex framework on .net in action
 
Databricks Platform.pptx
Databricks Platform.pptxDatabricks Platform.pptx
Databricks Platform.pptx
 

Metadata Administration – A Case Study

  • 1. Metadata Administration – A Case Study Alex Friedgan alex.friedgan@gmail.com © Alex Friedgan, 2009
  • 2. Overview  Value Proposition  Challenges & Solutions  Quality Scorecard & Results © 2009 Alex Friedgan 2
  • 3. Importance of Documentation  Development effort is subdivided into a series of projects  Resources are geographically dispersed  ETL and BI teams report to different managers  Data delivery (ETL) and Reporting (BI) are handled as separate projects  IMPORTANCE OF DOCUMENTATION !! © 2009 Alex Friedgan 3
  • 4. DW Development Workflow Time Data Delivery (ETL) Project Reporting (BI) Project Gap Da ta An al ys is Org and/or Geography Boundary Source-to-Target Document ET L (Data Analysis Hand-off) De si gn © 2009 Alex Friedgan 4
  • 5. Data Analysis Hand-off Document  Designed to be comprehensive and sufficient  Serves multiple purposes  Hundreds of documents  Versioning Info  Project, change description  General Info  Subject area, entity, data flow  Column Level Maps  Target column, business name and definition, source column, transformation rules  Supporting material © 2009 Alex Friedgan 5
  • 6. Documentation “Neighborhood” Project Folders Databases File Record Data Models Layouts Source-to-Target Documents (Data Analysis Hand-off) © 2009 Alex Friedgan 6
  • 7. Metadata Framework  Distributed metadata  Information is spread across numerous documents  Each document serves as a source of metadata  Semi-structured format (semantically fuzzy data)  Documentation is produced by flexible authoring tools  Overlapping documentation  Information is copied, re-created and redundantly stored  No authoritative system of record when documents contradict  Integration without a centralized repository !! © 2009 Alex Friedgan 7
  • 8. Value Proposition  PROPOSITION  Basic MME (managed metadata environment)  Consistent and comprehensive documentation achieved within the distributed metadata framework  Benefits are obvious but difficult to quantify  Less confusion during development process  Less re-work  Shorter test cycles  Improved quality of BI deliverables  Low implementation cost; so only a confirmation of savings was needed, not a comprehensive estimate © 2009 Alex Friedgan 8
  • 9. Data Lineage Analysis – The “Before”  Data Architect was receiving frequent requests to find documents  Developer performing unsuccessful search  Developer contacting Project Manager  Project Manager contacting Data Architect  Data Architect searching and replying  Developer waiting for information requested  Finding correct document is worth XX man-hours per month. © 2009 Alex Friedgan 9
  • 10. Challenges & Solutions © 2009 Alex Friedgan 10
  • 11. Distributed Metadata Challenges  Number of synchronization processes grows as square of sources File Record Project Folder Layout VUW DEF  Multiple types of documents  Multiple individual documents Spreadsheet Model ABC XYZ © 2009 Alex Friedgan 11
  • 12. Challenges Mitigated #1  Mitigating multiple document type integration  Follow metadata model  Concentrate on “pain points” only Project Folders Databases Project # used. OK F/E and R/E. OK File Record Source-to-target Data Models Layouts Spreadsheets No complaints. OK Broken Link © 2009 Alex Friedgan 12
  • 13. Challenges Mitigated # 2 Spreadsheet Model ABC XYZ  Two partial methods  Full synchronization, BUT individual documents  Full inventory, BUT limited functionality  Methods complement each other providing a workable metadata integration solution © 2009 Alex Friedgan 13
  • 14. Full Synchronization  Round-trip engineering (forward and reverse- engineering) bridge  (+) Compares each column/attribute to make sure physical and logical properties match  (+) Modifies spreadsheet (or data model)  (-) Compares one table at a time  Run as needed during development effort © 2009 Alex Friedgan 14
  • 15. Full Synchronization – cont. Model XYZ Verify Spreadsheet ABC table_a documentation table_a Design new table Spreadsheet DEF F/E into data Model XYZ table_b model table_b © 2009 Alex Friedgan 15
  • 16. Full Inventory  Creates inventory of documents x-ref to data models  (+) Handles full documentation portfolio  (-) But in reporting mode and table level only  Run on a regular basis ? Source-to-Target Quality Spreadsheets Scorecard Run Inventory Inventory Data Models © 2009 Alex Friedgan 16
  • 17. Semi-Structured Format Challenges  Flexible Authoring Tool !!  Issues with standard layouts  Designed for humans not machines  Evolved through several generations (legacy documents)  Vary by table type  Deviations from standards  By author, by project, by table  Spreadsheet columns are added; columns are moved around; headers change, etc. © 2009 Alex Friedgan 17
  • 18. One Atomic Datum Per Element Violations  A single spreadsheet cell gets overloaded with multiple pieces of data  Old value is crossed-out but kept together with the new value  Comments get embedded in the cell text  The value and the name of an artifact are combined.  Multiple values get inserted separated by commas, spaces, or new lines  The owner, table, and column names are combined with a dot notation © 2009 Alex Friedgan 18
  • 19. Artifact Name Variations Derivation Load Rules Transformation Rules Rules / Algorithms … (over 40 variations) Source Field Datapoint Name Source Column Name Xmit Field Name …(over 50 variations) © 2009 Alex Friedgan 19
  • 20. Table Name Patterns table_a table_b Table Name table_a table_b owner.table_a. Table Name Column Name table_a, table_b © 2009 Alex Friedgan 20
  • 21. Document Structure Recognition Analyzing Reading Document Structure Article, Title, Column, Photo, Caption © 2009 Alex Friedgan 21
  • 22. Algorithm  Document structure recognition  Semantic match  Canonical form  Synonyms  Exception rules to eliminate false positives, etc.  If nothing works, change the offending document  Measure progress via metrics © 2009 Alex Friedgan 22
  • 23. Quality Scorecard & Results © 2009 Alex Friedgan 23
  • 24. Quality Metrics: Consistency & Completeness Maps Tables Missing ?? Maps Matched Matched Maps Tables Duplicate Maps Extraneous ?? Maps © 2009 Alex Friedgan 24
  • 25. Quality Scorecard  Always up-to-date  Unflattering measurements provide motivation for improvements  Broken down by schema/application  Laggards have additional motivation to catch up with others © 2009 Alex Friedgan 25
  • 26. Initial Efforts  Fine-tune the tool  Expanding synonyms, exception rules, etc.  Clean-up existing documents  Correcting misspellings, errors, etc.  Archiving old versions  Search for missing documentation  NOTE: re-creating missing documents was not pursued  Entire data modeling team was involved © 2009 Alex Friedgan 26
  • 27. Long-Term Approach  On-going monitoring  Run on a regular (monthly) basis  Target new documentation only  Identify incremental changes and verify each  Fix issue or notify document author  “Campaigns”  Whenever project work permits  Target full documentation portfolio  Started with extraneous maps  Next, handled missing maps  Last, worked with duplicates © 2009 Alex Friedgan 27
  • 28. Data Lineage Analysis – The “After” Click to Open ABC ABC Find Find ABC Inventory ABC Sources Table ABC ABC Source-To- And ABC hyperlinks Target Map Rules ABC Return to Inventory with next table name © 2009 Alex Friedgan 28
  • 29. Lessons Learned  Can Do Attitude is essential  Distributed and “semantically fuzzy” metadata was tamed  Initial effort successfully targeted urgent needs  Low-cost metadata administration was established  Round-trip bridge increased productivity  Metrics effectively promoted quality  Inventory became a hub of metadata © 2009 Alex Friedgan 29
  • 30. Metadata Administration – A Case Study Alex Friedgan alex.friedgan@gmail.com © Alex Friedgan, 2009