Submit Search
Upload
Metadata Administration – A Case Study
•
1 like
•
319 views
A
afriedgan
Follow
Metadata Administration – A Case Study
Read less
Read more
Report
Share
Report
Share
1 of 30
Recommended
Industry-forum 2011 PARTsolutions CADENAS
Industry-forum 2011 PARTsolutions CADENAS
CADENAS
Aras Federation Web Services
Aras Federation Web Services
Prodeos
[Case Study] - Nuclear Power, DITA and FrameMaker: The How's and Why's
[Case Study] - Nuclear Power, DITA and FrameMaker: The How's and Why's
Scott Abel
JeffRichardsonResume2016
JeffRichardsonResume2016
Jeff Richardson
Informatica,Teradata,Oracle,SQL
Informatica,Teradata,Oracle,SQL
sivakumar s
Plm Data Migration
Plm Data Migration
Geometric Ltd.
Oracle DV V4 new features overview
Oracle DV V4 new features overview
Philippe Lions
Saranteja gutta wells
Saranteja gutta wells
ramesh5080
Recommended
Industry-forum 2011 PARTsolutions CADENAS
Industry-forum 2011 PARTsolutions CADENAS
CADENAS
Aras Federation Web Services
Aras Federation Web Services
Prodeos
[Case Study] - Nuclear Power, DITA and FrameMaker: The How's and Why's
[Case Study] - Nuclear Power, DITA and FrameMaker: The How's and Why's
Scott Abel
JeffRichardsonResume2016
JeffRichardsonResume2016
Jeff Richardson
Informatica,Teradata,Oracle,SQL
Informatica,Teradata,Oracle,SQL
sivakumar s
Plm Data Migration
Plm Data Migration
Geometric Ltd.
Oracle DV V4 new features overview
Oracle DV V4 new features overview
Philippe Lions
Saranteja gutta wells
Saranteja gutta wells
ramesh5080
Dimensional modelingowb11gr2 presentation
Dimensional modelingowb11gr2 presentation
Maren Eschermann
Data Transformation using Semantic Web Standards
Data Transformation using Semantic Web Standards
Irene Polikoff
Document Archiving & Sharing System
Document Archiving & Sharing System
Ashik Iqbal
Are You Ready for an Alternative in Application Development?
Are You Ready for an Alternative in Application Development?
LetsConnect
Document Merge on Salesforce.com
Document Merge on Salesforce.com
Drawloop Technologies, Inc.
Open Services for Lifecycle Collaboration (OSLC) - Extending REST APIs to Con...
Open Services for Lifecycle Collaboration (OSLC) - Extending REST APIs to Con...
Axel Reichwein
System i - DDL vs DDS Presentation
System i - DDL vs DDS Presentation
Chuck Walker
FME World Tour 2015: (EN) FME 2015 in action
FME World Tour 2015: (EN) FME 2015 in action
GIM_nv
Demystifying Modern PLM - Technology
Demystifying Modern PLM - Technology
Oleg Shilovitsky
Demystifying Modern PLM Sessions. Part 1: Technology
Demystifying Modern PLM Sessions. Part 1: Technology
Oleg Shilovitsky
Data Access Tech Ed India
Data Access Tech Ed India
rsnarayanan
InterConnect 2017 : Do You Have the Right Solution for z/OS Application Devel...
InterConnect 2017 : Do You Have the Right Solution for z/OS Application Devel...
DevOps for Enterprise Systems
Irina Kogan Resume
Irina Kogan Resume
irina_kogan
NoSQL for SQL Users
NoSQL for SQL Users
IBM Cloud Data Services
Finding your Way in the Midst of the NoSQL Haze - Abdelmonaim Remani
Finding your Way in the Midst of the NoSQL Haze - Abdelmonaim Remani
JAXLondon2014
Windows Azure: Lessons From The Field
Windows Azure: Lessons From The Field
Rob Gillen
Snowflake + Power BI: Cloud Analytics for Everyone
Snowflake + Power BI: Cloud Analytics for Everyone
Angel Abundez
Shapefile
Shapefile
Doan Quang Viet
Soprex framework on .net in action
Soprex framework on .net in action
Milan Vukoje
Databricks Platform.pptx
Databricks Platform.pptx
Alex Ivy
More Related Content
Similar to Metadata Administration – A Case Study
Dimensional modelingowb11gr2 presentation
Dimensional modelingowb11gr2 presentation
Maren Eschermann
Data Transformation using Semantic Web Standards
Data Transformation using Semantic Web Standards
Irene Polikoff
Document Archiving & Sharing System
Document Archiving & Sharing System
Ashik Iqbal
Are You Ready for an Alternative in Application Development?
Are You Ready for an Alternative in Application Development?
LetsConnect
Document Merge on Salesforce.com
Document Merge on Salesforce.com
Drawloop Technologies, Inc.
Open Services for Lifecycle Collaboration (OSLC) - Extending REST APIs to Con...
Open Services for Lifecycle Collaboration (OSLC) - Extending REST APIs to Con...
Axel Reichwein
System i - DDL vs DDS Presentation
System i - DDL vs DDS Presentation
Chuck Walker
FME World Tour 2015: (EN) FME 2015 in action
FME World Tour 2015: (EN) FME 2015 in action
GIM_nv
Demystifying Modern PLM - Technology
Demystifying Modern PLM - Technology
Oleg Shilovitsky
Demystifying Modern PLM Sessions. Part 1: Technology
Demystifying Modern PLM Sessions. Part 1: Technology
Oleg Shilovitsky
Data Access Tech Ed India
Data Access Tech Ed India
rsnarayanan
InterConnect 2017 : Do You Have the Right Solution for z/OS Application Devel...
InterConnect 2017 : Do You Have the Right Solution for z/OS Application Devel...
DevOps for Enterprise Systems
Irina Kogan Resume
Irina Kogan Resume
irina_kogan
NoSQL for SQL Users
NoSQL for SQL Users
IBM Cloud Data Services
Finding your Way in the Midst of the NoSQL Haze - Abdelmonaim Remani
Finding your Way in the Midst of the NoSQL Haze - Abdelmonaim Remani
JAXLondon2014
Windows Azure: Lessons From The Field
Windows Azure: Lessons From The Field
Rob Gillen
Snowflake + Power BI: Cloud Analytics for Everyone
Snowflake + Power BI: Cloud Analytics for Everyone
Angel Abundez
Shapefile
Shapefile
Doan Quang Viet
Soprex framework on .net in action
Soprex framework on .net in action
Milan Vukoje
Databricks Platform.pptx
Databricks Platform.pptx
Alex Ivy
Similar to Metadata Administration – A Case Study
(20)
Dimensional modelingowb11gr2 presentation
Dimensional modelingowb11gr2 presentation
Data Transformation using Semantic Web Standards
Data Transformation using Semantic Web Standards
Document Archiving & Sharing System
Document Archiving & Sharing System
Are You Ready for an Alternative in Application Development?
Are You Ready for an Alternative in Application Development?
Document Merge on Salesforce.com
Document Merge on Salesforce.com
Open Services for Lifecycle Collaboration (OSLC) - Extending REST APIs to Con...
Open Services for Lifecycle Collaboration (OSLC) - Extending REST APIs to Con...
System i - DDL vs DDS Presentation
System i - DDL vs DDS Presentation
FME World Tour 2015: (EN) FME 2015 in action
FME World Tour 2015: (EN) FME 2015 in action
Demystifying Modern PLM - Technology
Demystifying Modern PLM - Technology
Demystifying Modern PLM Sessions. Part 1: Technology
Demystifying Modern PLM Sessions. Part 1: Technology
Data Access Tech Ed India
Data Access Tech Ed India
InterConnect 2017 : Do You Have the Right Solution for z/OS Application Devel...
InterConnect 2017 : Do You Have the Right Solution for z/OS Application Devel...
Irina Kogan Resume
Irina Kogan Resume
NoSQL for SQL Users
NoSQL for SQL Users
Finding your Way in the Midst of the NoSQL Haze - Abdelmonaim Remani
Finding your Way in the Midst of the NoSQL Haze - Abdelmonaim Remani
Windows Azure: Lessons From The Field
Windows Azure: Lessons From The Field
Snowflake + Power BI: Cloud Analytics for Everyone
Snowflake + Power BI: Cloud Analytics for Everyone
Shapefile
Shapefile
Soprex framework on .net in action
Soprex framework on .net in action
Databricks Platform.pptx
Databricks Platform.pptx
Metadata Administration – A Case Study
1.
Metadata Administration –
A Case Study Alex Friedgan alex.friedgan@gmail.com © Alex Friedgan, 2009
2.
Overview Value Proposition
Challenges & Solutions Quality Scorecard & Results © 2009 Alex Friedgan 2
3.
Importance of Documentation
Development effort is subdivided into a series of projects Resources are geographically dispersed ETL and BI teams report to different managers Data delivery (ETL) and Reporting (BI) are handled as separate projects IMPORTANCE OF DOCUMENTATION !! © 2009 Alex Friedgan 3
4.
DW Development Workflow
Time Data Delivery (ETL) Project Reporting (BI) Project Gap Da ta An al ys is Org and/or Geography Boundary Source-to-Target Document ET L (Data Analysis Hand-off) De si gn © 2009 Alex Friedgan 4
5.
Data Analysis Hand-off
Document Designed to be comprehensive and sufficient Serves multiple purposes Hundreds of documents Versioning Info Project, change description General Info Subject area, entity, data flow Column Level Maps Target column, business name and definition, source column, transformation rules Supporting material © 2009 Alex Friedgan 5
6.
Documentation “Neighborhood”
Project Folders Databases File Record Data Models Layouts Source-to-Target Documents (Data Analysis Hand-off) © 2009 Alex Friedgan 6
7.
Metadata Framework Distributed
metadata Information is spread across numerous documents Each document serves as a source of metadata Semi-structured format (semantically fuzzy data) Documentation is produced by flexible authoring tools Overlapping documentation Information is copied, re-created and redundantly stored No authoritative system of record when documents contradict Integration without a centralized repository !! © 2009 Alex Friedgan 7
8.
Value Proposition PROPOSITION
Basic MME (managed metadata environment) Consistent and comprehensive documentation achieved within the distributed metadata framework Benefits are obvious but difficult to quantify Less confusion during development process Less re-work Shorter test cycles Improved quality of BI deliverables Low implementation cost; so only a confirmation of savings was needed, not a comprehensive estimate © 2009 Alex Friedgan 8
9.
Data Lineage Analysis
– The “Before” Data Architect was receiving frequent requests to find documents Developer performing unsuccessful search Developer contacting Project Manager Project Manager contacting Data Architect Data Architect searching and replying Developer waiting for information requested Finding correct document is worth XX man-hours per month. © 2009 Alex Friedgan 9
10.
Challenges & Solutions ©
2009 Alex Friedgan 10
11.
Distributed Metadata Challenges
Number of synchronization processes grows as square of sources File Record Project Folder Layout VUW DEF Multiple types of documents Multiple individual documents Spreadsheet Model ABC XYZ © 2009 Alex Friedgan 11
12.
Challenges Mitigated #1
Mitigating multiple document type integration Follow metadata model Concentrate on “pain points” only Project Folders Databases Project # used. OK F/E and R/E. OK File Record Source-to-target Data Models Layouts Spreadsheets No complaints. OK Broken Link © 2009 Alex Friedgan 12
13.
Challenges Mitigated #
2 Spreadsheet Model ABC XYZ Two partial methods Full synchronization, BUT individual documents Full inventory, BUT limited functionality Methods complement each other providing a workable metadata integration solution © 2009 Alex Friedgan 13
14.
Full Synchronization Round-trip
engineering (forward and reverse- engineering) bridge (+) Compares each column/attribute to make sure physical and logical properties match (+) Modifies spreadsheet (or data model) (-) Compares one table at a time Run as needed during development effort © 2009 Alex Friedgan 14
15.
Full Synchronization –
cont. Model XYZ Verify Spreadsheet ABC table_a documentation table_a Design new table Spreadsheet DEF F/E into data Model XYZ table_b model table_b © 2009 Alex Friedgan 15
16.
Full Inventory
Creates inventory of documents x-ref to data models (+) Handles full documentation portfolio (-) But in reporting mode and table level only Run on a regular basis ? Source-to-Target Quality Spreadsheets Scorecard Run Inventory Inventory Data Models © 2009 Alex Friedgan 16
17.
Semi-Structured Format Challenges
Flexible Authoring Tool !! Issues with standard layouts Designed for humans not machines Evolved through several generations (legacy documents) Vary by table type Deviations from standards By author, by project, by table Spreadsheet columns are added; columns are moved around; headers change, etc. © 2009 Alex Friedgan 17
18.
One Atomic Datum
Per Element Violations A single spreadsheet cell gets overloaded with multiple pieces of data Old value is crossed-out but kept together with the new value Comments get embedded in the cell text The value and the name of an artifact are combined. Multiple values get inserted separated by commas, spaces, or new lines The owner, table, and column names are combined with a dot notation © 2009 Alex Friedgan 18
19.
Artifact Name Variations
Derivation Load Rules Transformation Rules Rules / Algorithms … (over 40 variations) Source Field Datapoint Name Source Column Name Xmit Field Name …(over 50 variations) © 2009 Alex Friedgan 19
20.
Table Name Patterns
table_a table_b Table Name table_a table_b owner.table_a. Table Name Column Name table_a, table_b © 2009 Alex Friedgan 20
21.
Document Structure Recognition
Analyzing Reading Document Structure Article, Title, Column, Photo, Caption © 2009 Alex Friedgan 21
22.
Algorithm Document structure
recognition Semantic match Canonical form Synonyms Exception rules to eliminate false positives, etc. If nothing works, change the offending document Measure progress via metrics © 2009 Alex Friedgan 22
23.
Quality Scorecard &
Results © 2009 Alex Friedgan 23
24.
Quality Metrics: Consistency
& Completeness Maps Tables Missing ?? Maps Matched Matched Maps Tables Duplicate Maps Extraneous ?? Maps © 2009 Alex Friedgan 24
25.
Quality Scorecard Always
up-to-date Unflattering measurements provide motivation for improvements Broken down by schema/application Laggards have additional motivation to catch up with others © 2009 Alex Friedgan 25
26.
Initial Efforts Fine-tune
the tool Expanding synonyms, exception rules, etc. Clean-up existing documents Correcting misspellings, errors, etc. Archiving old versions Search for missing documentation NOTE: re-creating missing documents was not pursued Entire data modeling team was involved © 2009 Alex Friedgan 26
27.
Long-Term Approach On-going
monitoring Run on a regular (monthly) basis Target new documentation only Identify incremental changes and verify each Fix issue or notify document author “Campaigns” Whenever project work permits Target full documentation portfolio Started with extraneous maps Next, handled missing maps Last, worked with duplicates © 2009 Alex Friedgan 27
28.
Data Lineage Analysis
– The “After” Click to Open ABC ABC Find Find ABC Inventory ABC Sources Table ABC ABC Source-To- And ABC hyperlinks Target Map Rules ABC Return to Inventory with next table name © 2009 Alex Friedgan 28
29.
Lessons Learned Can
Do Attitude is essential Distributed and “semantically fuzzy” metadata was tamed Initial effort successfully targeted urgent needs Low-cost metadata administration was established Round-trip bridge increased productivity Metrics effectively promoted quality Inventory became a hub of metadata © 2009 Alex Friedgan 29
30.
Metadata Administration –
A Case Study Alex Friedgan alex.friedgan@gmail.com © Alex Friedgan, 2009