An automated data catalog is a known enabler for organizational data management. Data catalog creates for data citizens the ability to get business as well as technical value from the data empowering better business decisions.
Data Lineage must be an integrated capability as it provides data teams with the real time knowledge regarding where the data exists, and more importantly provides them with the capability to follow all data pipelines.
Our experts will conduct a vibrant discussion with relevant case studies to demonstrate best practices for catalog implementation and adoption.
This webinar will discuss how:
- The need for data transparency as a major driver of a data catalog initiatives
- The importance the physical, semantic and presentation layers for different types of data citizens
- And how data lineage becomes a key element providing deep understanding and greater visibility of the data
Achieving a Single View of Business – Critical Data with Master Data Management
The Benefits of a Data Catalog with Built-in Data Lineage
1. The Benefits of a Data Catalog
with Built-In Data Lineage
Webinar
2. Today’s speakers
John Fry
CEO/ Managing Partner
FIntegrity Consulting LLC
Nissim Yves Ohayon
Director, Global Business
Development
Octopai
3. Having a lot of diverse
well-maintained data
is not enough
The Gap
Source: https://bi-survey.com
50%
All companies
250-5000
employees
50%
Best-in-Class
50%
laggards
30%
>5000 employees
40%
<250 employees
50%
What percentage of all available
information in your organization is
actually used for decision-making?
(median, n=710)
4. Loss of
tribal
knowledge
Inefficient
use of & lack of
independence
in using data
Arriving at a
single
source of
truth
Increased
pressure on
the data
team for
analytics &
reports
Ever-
growing
amount of
data in the
organization
Data-driven
initiatives/
strategy
Main Challenges in the
Data Ecosystem
6. Most users don’t know how
to properly use the data
most of the time
7. Case Study #1:
What can happen when data
team has little visibility or
control of the data flows.
8. Client South African Clearing Bank
Business
Driver
• Implementation of 3rd Party System
• Went live, failed and rolled back after 4 weeks
• Huge ramifications for South African clearing
Business
Problem
Need to ‘re-start’ but had spent $28m
• Failure ‘not an option’
• Clearing system, regulated by S.A Reserve
• No idea where to start
Strategy -
‘Fix it’
• Re-Implement in a ‘no holds sprint’
• Ensure system data was solid
• Track system data into peripheral systems
• Invent tools
Challenges
Business
Staff
• Totally burned out
• Did not know the system properly
• Added random data everywhere
IT Systems • Not enough hardware
• Object/Relational DB – Data Server
IT Staff • Entire tech staff left after project fail
The Big Save
Client and vendor desperate, so let us have free reign
Complete backing by management
An operations environment so no complex math or lineage
CASE STUDY #1 One system, completely solution targeted – zero tolerance deadline
Huge Problem – No one knew we needed DD and Lineage
9. The Solution
Design • 3 proprietary project management tools
• Data flow model to represent implementation
completion
• Front to back lineage and governance
Constraints • Data Dictionary and lineage outside system
• Limited to data that traversed business
functional or system boundaries
• Canonical representation at bare minimum –
‘needs must’ approach
Manually • Fixed old DB using raw SQL
• Pulled then vetted all custom code and data
• Linked tables and data by hand
The Impact
Benefits • System installed 2 weeks before deadline
• Original attempt 2.5 years – reimplementation
6.5 months
• Original cost $28m, re-do $3.2m
• Visibility and governance over ‘business critical’
data flowing across and in/out of system
• Cross system regulatory and compliance
reporting
• Saved 25 staff jobs
Project
Challenges
• Realizing the need for data control
• Inventing the project management methods
• Making the call on the end date 3 days after
hitting the ground.
CASE STUDY #1 Completed on time, under budget, saved 25 jobs
11. Automation
Democratization
Collaboration
Traceability
Key Capabilities
• Automate
○ data catalog inventory & lineage mapping
○ out of date inventory is not trusted, and therefore
not used
• Democratize
○ enable & encourage data consumers and
purveyors to enrich and contribute, (not just use
the system).
○ expose data flow and visibility to all data citizens
• Collaborate
○ encourage discussion in context with data assets
• Traceability
○ empower data teams to make decisions based
on accepted, verifiable facts
○ ensure that data engineering can make changes
while knowing where to expect impact.
12. Case Study #2:
How broadly can these
capabilities be applied in large
transformation projects?
13. CASE STUDY #2
Client Top 5 U.S. Commercial Bank
Business
Driver
The ‘Libor’ interest rate indices that benchmark
over $300TN in financial contracts. [8 times Global
GDP] are being replaced
Business
Problem
Locate ‘Libor’ use and replace it in all:
• Banking systems
• Reporting
• Documentation
Strategy Use a combination of:
• Data Dictionary and Data Lineage
• An extended product and index taxonomy
Challenges
Business
Staff
• Used different terms for the same objects
• Did not know the meaning of the terms
• Have full time day jobs
IT Systems Mix of internal, vendor and ‘ad hoc’ infrastructure
• Old denormalized mainframe DB
• Opaque object databases
• Pure image stores
IT Staff • Staff turnover 15% - 40% near and offshore
• Did not know the ‘sharp end’ of finance
• Minimal system knowledge
100’s systems, surgical – variable deadline, variable requirements
Huge Problem – We had to build the DD and Lineage
14. The Solution
Design • Rudimentary Data Dictionary, with Lineage
• Model for the ‘Libor’ index and how to identify it
• Extension of product taxonomy
Develop • Data Dictionary
• Code to scan and assemble data
• Code to find and create lineage through GUIs,
DB, vendor code and client custom code
Manually • Research and enter canonical terms and links
• Find and correct ‘misunderstandings’
• Patch ‘first time’ errors and test
The Impact
Benefits • Approach solved the business problem
• Data Dictionary with over 1,000 correct terms
• Framework that will manage future changes
Project
Challenges
• Difficult design and concepts
• Significant build effort but excellent ROI
• Managing Agile
Client
Challenges
• Need for Dictionary curation
• Effort of accurate lineage in ‘leading edge’
systems
CASE STUDY #2 95% accurate on systems integrated into model
15. Why Lineage is Critical for the Catalog
Stay
Up-To-Date
● automate harvesting
of data assets
● aggregate/centralize
all metadata
● ensure that the
repository is regularly
refreshed
Preserve Tribal
Knowledge
● discuss assets in
context with catalog &
lineage
● document tribal
knowledge
● enable self service
through transparency
Full Visibility Into
the Data Journey
● provide provenance
of each asset
● empower data teams
with the added
visibility into the
structure and
evolution
● vendor agnostic
lineage is critical