SlideShare a Scribd company logo
1 of 17
Download to read offline
The Benefits of a Data Catalog
with Built-In Data Lineage
Webinar
Today’s speakers
John Fry
CEO/ Managing Partner
FIntegrity Consulting LLC
Nissim Yves Ohayon
Director, Global Business
Development
Octopai
Having a lot of diverse
well-maintained data
is not enough
The Gap
Source: https://bi-survey.com
50%
All companies
250-5000
employees
50%
Best-in-Class
50%
laggards
30%
>5000 employees
40%
<250 employees
50%
What percentage of all available
information in your organization is
actually used for decision-making?
(median, n=710)
Loss of
tribal
knowledge
Inefficient
use of & lack of
independence
in using data
Arriving at a
single
source of
truth
Increased
pressure on
the data
team for
analytics &
reports
Ever-
growing
amount of
data in the
organization
Data-driven
initiatives/
strategy
Main Challenges in the
Data Ecosystem
Traditional
reports and
analytics
Data
Science
projects
Customer
marketing
Data-driven
offerings
Self-Service
BI
Migration
projects
Increased pressure on data teams for analytics & reports
Most users don’t know how
to properly use the data
most of the time
Case Study #1:
What can happen when data
team has little visibility or
control of the data flows.
Client South African Clearing Bank
Business
Driver
• Implementation of 3rd Party System
• Went live, failed and rolled back after 4 weeks
• Huge ramifications for South African clearing
Business
Problem
Need to ‘re-start’ but had spent $28m
• Failure ‘not an option’
• Clearing system, regulated by S.A Reserve
• No idea where to start
Strategy -
‘Fix it’
• Re-Implement in a ‘no holds sprint’
• Ensure system data was solid
• Track system data into peripheral systems
• Invent tools
Challenges
Business
Staff
• Totally burned out
• Did not know the system properly
• Added random data everywhere
IT Systems • Not enough hardware
• Object/Relational DB – Data Server
IT Staff • Entire tech staff left after project fail
The Big Save
Client and vendor desperate, so let us have free reign
Complete backing by management
An operations environment so no complex math or lineage
CASE STUDY #1 One system, completely solution targeted – zero tolerance deadline
Huge Problem – No one knew we needed DD and Lineage
The Solution
Design • 3 proprietary project management tools
• Data flow model to represent implementation
completion
• Front to back lineage and governance
Constraints • Data Dictionary and lineage outside system
• Limited to data that traversed business
functional or system boundaries
• Canonical representation at bare minimum –
‘needs must’ approach
Manually • Fixed old DB using raw SQL
• Pulled then vetted all custom code and data
• Linked tables and data by hand
The Impact
Benefits • System installed 2 weeks before deadline
• Original attempt 2.5 years – reimplementation
6.5 months
• Original cost $28m, re-do $3.2m
• Visibility and governance over ‘business critical’
data flowing across and in/out of system
• Cross system regulatory and compliance
reporting
• Saved 25 staff jobs
Project
Challenges
• Realizing the need for data control
• Inventing the project management methods
• Making the call on the end date 3 days after
hitting the ground.
CASE STUDY #1 Completed on time, under budget, saved 25 jobs
The Solution:
Data Catalog with
Built-In Data Lineage
Automation
Democratization
Collaboration
Traceability
Key Capabilities
• Automate
○ data catalog inventory & lineage mapping
○ out of date inventory is not trusted, and therefore
not used
• Democratize
○ enable & encourage data consumers and
purveyors to enrich and contribute, (not just use
the system).
○ expose data flow and visibility to all data citizens
• Collaborate
○ encourage discussion in context with data assets
• Traceability
○ empower data teams to make decisions based
on accepted, verifiable facts
○ ensure that data engineering can make changes
while knowing where to expect impact.
Case Study #2:
How broadly can these
capabilities be applied in large
transformation projects?
CASE STUDY #2
Client Top 5 U.S. Commercial Bank
Business
Driver
The ‘Libor’ interest rate indices that benchmark
over $300TN in financial contracts. [8 times Global
GDP] are being replaced
Business
Problem
Locate ‘Libor’ use and replace it in all:
• Banking systems
• Reporting
• Documentation
Strategy Use a combination of:
• Data Dictionary and Data Lineage
• An extended product and index taxonomy
Challenges
Business
Staff
• Used different terms for the same objects
• Did not know the meaning of the terms
• Have full time day jobs
IT Systems Mix of internal, vendor and ‘ad hoc’ infrastructure
• Old denormalized mainframe DB
• Opaque object databases
• Pure image stores
IT Staff • Staff turnover 15% - 40% near and offshore
• Did not know the ‘sharp end’ of finance
• Minimal system knowledge
100’s systems, surgical – variable deadline, variable requirements
Huge Problem – We had to build the DD and Lineage
The Solution
Design • Rudimentary Data Dictionary, with Lineage
• Model for the ‘Libor’ index and how to identify it
• Extension of product taxonomy
Develop • Data Dictionary
• Code to scan and assemble data
• Code to find and create lineage through GUIs,
DB, vendor code and client custom code
Manually • Research and enter canonical terms and links
• Find and correct ‘misunderstandings’
• Patch ‘first time’ errors and test
The Impact
Benefits • Approach solved the business problem
• Data Dictionary with over 1,000 correct terms
• Framework that will manage future changes
Project
Challenges
• Difficult design and concepts
• Significant build effort but excellent ROI
• Managing Agile
Client
Challenges
• Need for Dictionary curation
• Effort of accurate lineage in ‘leading edge’
systems
CASE STUDY #2 95% accurate on systems integrated into model
Why Lineage is Critical for the Catalog
Stay
Up-To-Date
● automate harvesting
of data assets
● aggregate/centralize
all metadata
● ensure that the
repository is regularly
refreshed
Preserve Tribal
Knowledge
● discuss assets in
context with catalog &
lineage
● document tribal
knowledge
● enable self service
through transparency
Full Visibility Into
the Data Journey
● provide provenance
of each asset
● empower data teams
with the added
visibility into the
structure and
evolution
● vendor agnostic
lineage is critical
Q&A
16
For more information:
nissimo@octopai.com
octopai.com
Thank you.

More Related Content

More from DATAVERSITY

Showing ROI for Your Analytic Project
Showing ROI for Your Analytic ProjectShowing ROI for Your Analytic Project
Showing ROI for Your Analytic ProjectDATAVERSITY
 
How a Semantic Layer Makes Data Mesh Work at Scale
How a Semantic Layer Makes  Data Mesh Work at ScaleHow a Semantic Layer Makes  Data Mesh Work at Scale
How a Semantic Layer Makes Data Mesh Work at ScaleDATAVERSITY
 
Is Enterprise Data Literacy Possible?
Is Enterprise Data Literacy Possible?Is Enterprise Data Literacy Possible?
Is Enterprise Data Literacy Possible?DATAVERSITY
 
The Data Trifecta – Privacy, Security & Governance Race from Reactivity to Re...
The Data Trifecta – Privacy, Security & Governance Race from Reactivity to Re...The Data Trifecta – Privacy, Security & Governance Race from Reactivity to Re...
The Data Trifecta – Privacy, Security & Governance Race from Reactivity to Re...DATAVERSITY
 
Emerging Trends in Data Architecture – What’s the Next Big Thing?
Emerging Trends in Data Architecture – What’s the Next Big Thing?Emerging Trends in Data Architecture – What’s the Next Big Thing?
Emerging Trends in Data Architecture – What’s the Next Big Thing?DATAVERSITY
 
Data Governance Trends - A Look Backwards and Forwards
Data Governance Trends - A Look Backwards and ForwardsData Governance Trends - A Look Backwards and Forwards
Data Governance Trends - A Look Backwards and ForwardsDATAVERSITY
 
Data Governance Trends and Best Practices To Implement Today
Data Governance Trends and Best Practices To Implement TodayData Governance Trends and Best Practices To Implement Today
Data Governance Trends and Best Practices To Implement TodayDATAVERSITY
 
2023 Trends in Enterprise Analytics
2023 Trends in Enterprise Analytics2023 Trends in Enterprise Analytics
2023 Trends in Enterprise AnalyticsDATAVERSITY
 
Data Strategy Best Practices
Data Strategy Best PracticesData Strategy Best Practices
Data Strategy Best PracticesDATAVERSITY
 
Who Should Own Data Governance – IT or Business?
Who Should Own Data Governance – IT or Business?Who Should Own Data Governance – IT or Business?
Who Should Own Data Governance – IT or Business?DATAVERSITY
 
Data Management Best Practices
Data Management Best PracticesData Management Best Practices
Data Management Best PracticesDATAVERSITY
 
MLOps – Applying DevOps to Competitive Advantage
MLOps – Applying DevOps to Competitive AdvantageMLOps – Applying DevOps to Competitive Advantage
MLOps – Applying DevOps to Competitive AdvantageDATAVERSITY
 
Keeping the Pulse of Your Data – Why You Need Data Observability to Improve D...
Keeping the Pulse of Your Data – Why You Need Data Observability to Improve D...Keeping the Pulse of Your Data – Why You Need Data Observability to Improve D...
Keeping the Pulse of Your Data – Why You Need Data Observability to Improve D...DATAVERSITY
 
Empowering the Data Driven Business with Modern Business Intelligence
Empowering the Data Driven Business with Modern Business IntelligenceEmpowering the Data Driven Business with Modern Business Intelligence
Empowering the Data Driven Business with Modern Business IntelligenceDATAVERSITY
 
Enterprise Architecture vs. Data Architecture
Enterprise Architecture vs. Data ArchitectureEnterprise Architecture vs. Data Architecture
Enterprise Architecture vs. Data ArchitectureDATAVERSITY
 
Data Governance Best Practices, Assessments, and Roadmaps
Data Governance Best Practices, Assessments, and RoadmapsData Governance Best Practices, Assessments, and Roadmaps
Data Governance Best Practices, Assessments, and RoadmapsDATAVERSITY
 
Including All Your Mission-Critical Data in Modern Apps and Analytics
Including All Your Mission-Critical Data in Modern Apps and AnalyticsIncluding All Your Mission-Critical Data in Modern Apps and Analytics
Including All Your Mission-Critical Data in Modern Apps and AnalyticsDATAVERSITY
 
Assessing New Database Capabilities – Multi-Model
Assessing New Database Capabilities – Multi-ModelAssessing New Database Capabilities – Multi-Model
Assessing New Database Capabilities – Multi-ModelDATAVERSITY
 
What’s in Your Data Warehouse?
What’s in Your Data Warehouse?What’s in Your Data Warehouse?
What’s in Your Data Warehouse?DATAVERSITY
 
Achieving a Single View of Business – Critical Data with Master Data Management
Achieving a Single View of Business – Critical Data with Master Data ManagementAchieving a Single View of Business – Critical Data with Master Data Management
Achieving a Single View of Business – Critical Data with Master Data ManagementDATAVERSITY
 

More from DATAVERSITY (20)

Showing ROI for Your Analytic Project
Showing ROI for Your Analytic ProjectShowing ROI for Your Analytic Project
Showing ROI for Your Analytic Project
 
How a Semantic Layer Makes Data Mesh Work at Scale
How a Semantic Layer Makes  Data Mesh Work at ScaleHow a Semantic Layer Makes  Data Mesh Work at Scale
How a Semantic Layer Makes Data Mesh Work at Scale
 
Is Enterprise Data Literacy Possible?
Is Enterprise Data Literacy Possible?Is Enterprise Data Literacy Possible?
Is Enterprise Data Literacy Possible?
 
The Data Trifecta – Privacy, Security & Governance Race from Reactivity to Re...
The Data Trifecta – Privacy, Security & Governance Race from Reactivity to Re...The Data Trifecta – Privacy, Security & Governance Race from Reactivity to Re...
The Data Trifecta – Privacy, Security & Governance Race from Reactivity to Re...
 
Emerging Trends in Data Architecture – What’s the Next Big Thing?
Emerging Trends in Data Architecture – What’s the Next Big Thing?Emerging Trends in Data Architecture – What’s the Next Big Thing?
Emerging Trends in Data Architecture – What’s the Next Big Thing?
 
Data Governance Trends - A Look Backwards and Forwards
Data Governance Trends - A Look Backwards and ForwardsData Governance Trends - A Look Backwards and Forwards
Data Governance Trends - A Look Backwards and Forwards
 
Data Governance Trends and Best Practices To Implement Today
Data Governance Trends and Best Practices To Implement TodayData Governance Trends and Best Practices To Implement Today
Data Governance Trends and Best Practices To Implement Today
 
2023 Trends in Enterprise Analytics
2023 Trends in Enterprise Analytics2023 Trends in Enterprise Analytics
2023 Trends in Enterprise Analytics
 
Data Strategy Best Practices
Data Strategy Best PracticesData Strategy Best Practices
Data Strategy Best Practices
 
Who Should Own Data Governance – IT or Business?
Who Should Own Data Governance – IT or Business?Who Should Own Data Governance – IT or Business?
Who Should Own Data Governance – IT or Business?
 
Data Management Best Practices
Data Management Best PracticesData Management Best Practices
Data Management Best Practices
 
MLOps – Applying DevOps to Competitive Advantage
MLOps – Applying DevOps to Competitive AdvantageMLOps – Applying DevOps to Competitive Advantage
MLOps – Applying DevOps to Competitive Advantage
 
Keeping the Pulse of Your Data – Why You Need Data Observability to Improve D...
Keeping the Pulse of Your Data – Why You Need Data Observability to Improve D...Keeping the Pulse of Your Data – Why You Need Data Observability to Improve D...
Keeping the Pulse of Your Data – Why You Need Data Observability to Improve D...
 
Empowering the Data Driven Business with Modern Business Intelligence
Empowering the Data Driven Business with Modern Business IntelligenceEmpowering the Data Driven Business with Modern Business Intelligence
Empowering the Data Driven Business with Modern Business Intelligence
 
Enterprise Architecture vs. Data Architecture
Enterprise Architecture vs. Data ArchitectureEnterprise Architecture vs. Data Architecture
Enterprise Architecture vs. Data Architecture
 
Data Governance Best Practices, Assessments, and Roadmaps
Data Governance Best Practices, Assessments, and RoadmapsData Governance Best Practices, Assessments, and Roadmaps
Data Governance Best Practices, Assessments, and Roadmaps
 
Including All Your Mission-Critical Data in Modern Apps and Analytics
Including All Your Mission-Critical Data in Modern Apps and AnalyticsIncluding All Your Mission-Critical Data in Modern Apps and Analytics
Including All Your Mission-Critical Data in Modern Apps and Analytics
 
Assessing New Database Capabilities – Multi-Model
Assessing New Database Capabilities – Multi-ModelAssessing New Database Capabilities – Multi-Model
Assessing New Database Capabilities – Multi-Model
 
What’s in Your Data Warehouse?
What’s in Your Data Warehouse?What’s in Your Data Warehouse?
What’s in Your Data Warehouse?
 
Achieving a Single View of Business – Critical Data with Master Data Management
Achieving a Single View of Business – Critical Data with Master Data ManagementAchieving a Single View of Business – Critical Data with Master Data Management
Achieving a Single View of Business – Critical Data with Master Data Management
 

The Benefits of a Data Catalog with Built-in Data Lineage

  • 1. The Benefits of a Data Catalog with Built-In Data Lineage Webinar
  • 2. Today’s speakers John Fry CEO/ Managing Partner FIntegrity Consulting LLC Nissim Yves Ohayon Director, Global Business Development Octopai
  • 3. Having a lot of diverse well-maintained data is not enough The Gap Source: https://bi-survey.com 50% All companies 250-5000 employees 50% Best-in-Class 50% laggards 30% >5000 employees 40% <250 employees 50% What percentage of all available information in your organization is actually used for decision-making? (median, n=710)
  • 4. Loss of tribal knowledge Inefficient use of & lack of independence in using data Arriving at a single source of truth Increased pressure on the data team for analytics & reports Ever- growing amount of data in the organization Data-driven initiatives/ strategy Main Challenges in the Data Ecosystem
  • 6. Most users don’t know how to properly use the data most of the time
  • 7. Case Study #1: What can happen when data team has little visibility or control of the data flows.
  • 8. Client South African Clearing Bank Business Driver • Implementation of 3rd Party System • Went live, failed and rolled back after 4 weeks • Huge ramifications for South African clearing Business Problem Need to ‘re-start’ but had spent $28m • Failure ‘not an option’ • Clearing system, regulated by S.A Reserve • No idea where to start Strategy - ‘Fix it’ • Re-Implement in a ‘no holds sprint’ • Ensure system data was solid • Track system data into peripheral systems • Invent tools Challenges Business Staff • Totally burned out • Did not know the system properly • Added random data everywhere IT Systems • Not enough hardware • Object/Relational DB – Data Server IT Staff • Entire tech staff left after project fail The Big Save Client and vendor desperate, so let us have free reign Complete backing by management An operations environment so no complex math or lineage CASE STUDY #1 One system, completely solution targeted – zero tolerance deadline Huge Problem – No one knew we needed DD and Lineage
  • 9. The Solution Design • 3 proprietary project management tools • Data flow model to represent implementation completion • Front to back lineage and governance Constraints • Data Dictionary and lineage outside system • Limited to data that traversed business functional or system boundaries • Canonical representation at bare minimum – ‘needs must’ approach Manually • Fixed old DB using raw SQL • Pulled then vetted all custom code and data • Linked tables and data by hand The Impact Benefits • System installed 2 weeks before deadline • Original attempt 2.5 years – reimplementation 6.5 months • Original cost $28m, re-do $3.2m • Visibility and governance over ‘business critical’ data flowing across and in/out of system • Cross system regulatory and compliance reporting • Saved 25 staff jobs Project Challenges • Realizing the need for data control • Inventing the project management methods • Making the call on the end date 3 days after hitting the ground. CASE STUDY #1 Completed on time, under budget, saved 25 jobs
  • 10. The Solution: Data Catalog with Built-In Data Lineage
  • 11. Automation Democratization Collaboration Traceability Key Capabilities • Automate ○ data catalog inventory & lineage mapping ○ out of date inventory is not trusted, and therefore not used • Democratize ○ enable & encourage data consumers and purveyors to enrich and contribute, (not just use the system). ○ expose data flow and visibility to all data citizens • Collaborate ○ encourage discussion in context with data assets • Traceability ○ empower data teams to make decisions based on accepted, verifiable facts ○ ensure that data engineering can make changes while knowing where to expect impact.
  • 12. Case Study #2: How broadly can these capabilities be applied in large transformation projects?
  • 13. CASE STUDY #2 Client Top 5 U.S. Commercial Bank Business Driver The ‘Libor’ interest rate indices that benchmark over $300TN in financial contracts. [8 times Global GDP] are being replaced Business Problem Locate ‘Libor’ use and replace it in all: • Banking systems • Reporting • Documentation Strategy Use a combination of: • Data Dictionary and Data Lineage • An extended product and index taxonomy Challenges Business Staff • Used different terms for the same objects • Did not know the meaning of the terms • Have full time day jobs IT Systems Mix of internal, vendor and ‘ad hoc’ infrastructure • Old denormalized mainframe DB • Opaque object databases • Pure image stores IT Staff • Staff turnover 15% - 40% near and offshore • Did not know the ‘sharp end’ of finance • Minimal system knowledge 100’s systems, surgical – variable deadline, variable requirements Huge Problem – We had to build the DD and Lineage
  • 14. The Solution Design • Rudimentary Data Dictionary, with Lineage • Model for the ‘Libor’ index and how to identify it • Extension of product taxonomy Develop • Data Dictionary • Code to scan and assemble data • Code to find and create lineage through GUIs, DB, vendor code and client custom code Manually • Research and enter canonical terms and links • Find and correct ‘misunderstandings’ • Patch ‘first time’ errors and test The Impact Benefits • Approach solved the business problem • Data Dictionary with over 1,000 correct terms • Framework that will manage future changes Project Challenges • Difficult design and concepts • Significant build effort but excellent ROI • Managing Agile Client Challenges • Need for Dictionary curation • Effort of accurate lineage in ‘leading edge’ systems CASE STUDY #2 95% accurate on systems integrated into model
  • 15. Why Lineage is Critical for the Catalog Stay Up-To-Date ● automate harvesting of data assets ● aggregate/centralize all metadata ● ensure that the repository is regularly refreshed Preserve Tribal Knowledge ● discuss assets in context with catalog & lineage ● document tribal knowledge ● enable self service through transparency Full Visibility Into the Data Journey ● provide provenance of each asset ● empower data teams with the added visibility into the structure and evolution ● vendor agnostic lineage is critical