SlideShare a Scribd company logo
1 of 16
Download to read offline
IDMA 2021 Fall/Winter Conference
October 13th-14th, 2021
Data Catalog as a Business Enabler
Presented by Srinivasan Sankar
Disclaimer
Please note that the views expressed by our speakers are
their own and may not necessarily reflect those of their
respective employers.
This material is for general informational purposes only and
is not legal advice. It is not designed to be comprehensive,
and it may not apply to your particular facts and
circumstances.
TOPICS
• Improve insights by extracting value from unstructured data utilizing a machine
learning augmented data catalog
• Practical steps to deal with the onslaught of data and learn how to implement an
effective data catalog
• Overcoming data silos using intelligent tools
• Let the insights come to you with AI-augmentation
• Multi-source data to increase the potential of data value
• Data Catalog – key enabler of a Data Mesh
NEW DATA, NEW INSIGHTS:
MAXIMIZING THE VALUE OF
YOUR STRUCTURED AND
UNSTRUCTURED DATA
Definition
A data catalog creates and maintains an inventory of data assets through the
discovery, description and organization of distributed datasets. The data catalog
provides context to enable data stewards, data/business analysts, data engineers, data
scientists and other line of business (LOB) data consumers to find and understand
relevant datasets for the purpose of extracting business value.
In a nutshell,a data catalog is a place that shows what data assets you have and where they are
located.You might be asking,what is a data asset? That is any entity (i.e.,reports,databases,
websites) that contains data.
Data Catalogs Are the New Black in Data Management and Analytics
• Leverage an ML-augmented data catalog as a first step in metadata management
• Deploy data catalogs with the capability to scale beyond narrow (or tactical) use-case
requirements (such as cataloging data only within a Hadoop distribution),
AI POWERED PROCESS FOR CURATING,
VERIFYING, AND CLASSIFYING DATA THAT
ENHANCES SPEED AND USABILITY
How does it work?
What is it?
Use Algorithms (Advanced Statistics and Deep
Learning) to learn from the large scale data to:
Applicable to large, complex and
often streaming data sets
3rd party data, sensor data, customer
data, transactions
• Algorithmic sampling of data to
identify key patterns and business
rules
• Continuous monitoring to alert Data Stewards of
exceptions for timely resolution
• Correlation of data concepts across domains
and data sources to track usage and establish
lineage
• Ability to ingest and apply quality rules to
third party and unstructured data sources
• Establishes feedback loop that refines the
machine learning models to improve data quality
over time
Identify patterns Quality issues and anomalies
across massive, complex and
often streaming data sets
Business rules
THE CASE FOR DATA CATALOGS
Analyze Data not chase Data – Many data scientists spend over 2/3rd of their time understanding and
finding the data.The main reason for this problem in an organization is the poor mechanism of handling
and tracking all the data. A good Catalog helps the Data Scientist or Business Analyst understand the
data and answer the question they have.
Efficient Access Control – When an organization grows, role-based policies are needed, don’t want
everybody to modify the data. Access Control should be implemented while building the Data Lake.
Roles are assigned to the users, and according to those roles, Data Access should be controlled.
Eliminate Data Redundancies – A good Catalogue Tool helped us find the data redundancies and
eliminate them.This can help us to save storage costs and data management costs.
To follow Laws – There are different protection laws to follow as per the data, such as GDPR, BASEL,
GDSN, HIPAA, and many more.These laws must be followed while dealing with any data. But these laws
stand for different usecases and don’t imply every data set, to understand that we need to know about
the data set. A good Catalog helps us make sure that Data Compliance’s followed by giving a view on
Data Lineage and using Access Control.
Phase
1
Catalog and
Lineage
• Infrastructure
and
Installation of
Catalog tool
• Data
Architects to
initiate the
collection of
data assets,
catalog and
identify
lineage
Phase
2
Data
Stewardship,
Business
Glossary
•Appoint Part-
time
Governance
Lead role
(cross-
functional
business facing)
•Supporting
Analyst
•Manage
Governance
activities
Phase
3
Operationalize
Governance
activities
•Accountability,
Ownership of
Data
•Operationalize
Data
Governance
activities
•Report Metrics
•Iterate
activities for all
information /
data projects
Improve / Enhance
Data Governance
HOW TO ADOPT DATA CATALOGS
Manage Data Lifecycle
Establish
Data Governance
Sustain Data Governance
Communicate
Manage Return
On Investment
Maintain Organization &
Sponsorship
Review/Update Processes
Review//Update Scope
(Quarterly Workshop)
Business Change
Management
Review & Approve New Projects
Maintain Data Definitions
Maintain Metrics
Identify Data Stewards
Conflict Resolution, Escalation
Plan
Organize
Organize
Define
Deploy
Core Foundation
Augmented Data Catalog*
* Machine learning powered process for curating, verifying, and classifying data that enhances speed and usability
Phased approach
Data Cataloging is a journey……
DATA
CATALOG
BEST
PRACTICES
Assigning Ownership for the data set – Ownership of
each data set must be defined.There must be a person
to whom the user contacts in case of an issue. A good
Catalog also must talk about the owner of any data set.
Human Touch – After building a Catalog, the users must
verify the data sets to make them more accurate.
Searchability –The Catalog should support searchability.
Searchability enables Data Asset Discovery; data
consumers easily find assets that meet their needs.
Data Protection – Define Access policies to prevent
unauthorized data access.
HIGH ROI FOR MULTI-SOURCE DATA
WITH DATA CATALOG
Graphic
Source:
CEB
analysis
Weather,
Highway safety
Industry
Enterprise Data Integration and Data Lake
Single source data has value in relation to other data in the organization, and the ability
to search and analyze across multiple information sources provides tremendous insight
Traditional DW
•Driving Tracker
•Nest Protect
•GPS Fleet
Tracking
D
A
T
A
C
A
T
A
L
O
G
DATA CATALOG
THE NUCLEI OF A DATA MESH*
• A data product must be easily discoverable
especially with a data catalogue, with their meta
information such as their owners, source of origin,
lineage, sample datasets, etc.This centralized
discoverability service allows data consumers,
engineers and scientists in an organization, to find a
dataset of their interest easily. Each domain data
product must register itself with this centralized
data catalogue for easy discoverability.
• Note the perspective shift here is from a single
platform extracting and owning the data for its use,
to each domain providing its data as a product in a
discoverable fashion.
• Data catalog platforms provide central
discoverability, access control and governance of
distributed domain datasets.
*Data Mesh (concept founded by Zhamak Dehghani) is a sociotechnical approach to share, access and manage analytical data in complex and large-scale environments - within or across organizations
QUESTIONS?
http://www.linkedin.com/in/srinisankar
https://twitter.com/srinisankar

More Related Content

What's hot

MDM Strategy & Roadmap
MDM Strategy & RoadmapMDM Strategy & Roadmap
MDM Strategy & Roadmap
victorlbrown
 

What's hot (20)

CDMP Overview Professional Information Management Certification
CDMP Overview Professional Information Management CertificationCDMP Overview Professional Information Management Certification
CDMP Overview Professional Information Management Certification
 
Building a Data Strategy – Practical Steps for Aligning with Business Goals
Building a Data Strategy – Practical Steps for Aligning with Business GoalsBuilding a Data Strategy – Practical Steps for Aligning with Business Goals
Building a Data Strategy – Practical Steps for Aligning with Business Goals
 
Glossaries, Dictionaries, and Catalogs Result in Data Governance
Glossaries, Dictionaries, and Catalogs Result in Data GovernanceGlossaries, Dictionaries, and Catalogs Result in Data Governance
Glossaries, Dictionaries, and Catalogs Result in Data Governance
 
Building a Data Governance Strategy
Building a Data Governance StrategyBuilding a Data Governance Strategy
Building a Data Governance Strategy
 
Data Architecture Best Practices for Advanced Analytics
Data Architecture Best Practices for Advanced AnalyticsData Architecture Best Practices for Advanced Analytics
Data Architecture Best Practices for Advanced Analytics
 
Data-Ed Webinar: Data Quality Success Stories
Data-Ed Webinar: Data Quality Success StoriesData-Ed Webinar: Data Quality Success Stories
Data-Ed Webinar: Data Quality Success Stories
 
How to Build & Sustain a Data Governance Operating Model
How to Build & Sustain a Data Governance Operating Model How to Build & Sustain a Data Governance Operating Model
How to Build & Sustain a Data Governance Operating Model
 
Ebook - The Guide to Master Data Management
Ebook - The Guide to Master Data Management Ebook - The Guide to Master Data Management
Ebook - The Guide to Master Data Management
 
Becoming a Data-Driven Organization - Aligning Business & Data Strategy
Becoming a Data-Driven Organization - Aligning Business & Data StrategyBecoming a Data-Driven Organization - Aligning Business & Data Strategy
Becoming a Data-Driven Organization - Aligning Business & Data Strategy
 
8 Steps to Creating a Data Strategy
8 Steps to Creating a Data Strategy8 Steps to Creating a Data Strategy
8 Steps to Creating a Data Strategy
 
Emerging Trends in Data Architecture – What’s the Next Big Thing?
Emerging Trends in Data Architecture – What’s the Next Big Thing?Emerging Trends in Data Architecture – What’s the Next Big Thing?
Emerging Trends in Data Architecture – What’s the Next Big Thing?
 
DMBOK and Data Governance
DMBOK and Data GovernanceDMBOK and Data Governance
DMBOK and Data Governance
 
Data Governance Best Practices
Data Governance Best PracticesData Governance Best Practices
Data Governance Best Practices
 
MDM Strategy & Roadmap
MDM Strategy & RoadmapMDM Strategy & Roadmap
MDM Strategy & Roadmap
 
Data Governance
Data GovernanceData Governance
Data Governance
 
Improving Data Literacy Around Data Architecture
Improving Data Literacy Around Data ArchitectureImproving Data Literacy Around Data Architecture
Improving Data Literacy Around Data Architecture
 
Convincing Stakeholders Data Governance Is Essential
Convincing Stakeholders Data Governance Is EssentialConvincing Stakeholders Data Governance Is Essential
Convincing Stakeholders Data Governance Is Essential
 
The Business Value of Metadata for Data Governance
The Business Value of Metadata for Data GovernanceThe Business Value of Metadata for Data Governance
The Business Value of Metadata for Data Governance
 
Master Data Management methodology
Master Data Management methodologyMaster Data Management methodology
Master Data Management methodology
 
Introduction to DCAM, the Data Management Capability Assessment Model - Editi...
Introduction to DCAM, the Data Management Capability Assessment Model - Editi...Introduction to DCAM, the Data Management Capability Assessment Model - Editi...
Introduction to DCAM, the Data Management Capability Assessment Model - Editi...
 

Similar to Data Catalog as a Business Enabler

Similar to Data Catalog as a Business Enabler (20)

Chief Data & Analytics Officer Fall Boston - Presentation
Chief Data & Analytics Officer Fall Boston - PresentationChief Data & Analytics Officer Fall Boston - Presentation
Chief Data & Analytics Officer Fall Boston - Presentation
 
How to build a successful Data Lake
How to build a successful Data LakeHow to build a successful Data Lake
How to build a successful Data Lake
 
Active Governance Across the Delta Lake with Alation
Active Governance Across the Delta Lake with AlationActive Governance Across the Delta Lake with Alation
Active Governance Across the Delta Lake with Alation
 
Maximizing The Value of Your Structured and Unstructured Data with Data Catal...
Maximizing The Value of Your Structured and Unstructured Data with Data Catal...Maximizing The Value of Your Structured and Unstructured Data with Data Catal...
Maximizing The Value of Your Structured and Unstructured Data with Data Catal...
 
Best Practices To Build a Data Lake
Best Practices To Build a Data LakeBest Practices To Build a Data Lake
Best Practices To Build a Data Lake
 
9. Data Warehousing & Mining.pptx
9. Data Warehousing & Mining.pptx9. Data Warehousing & Mining.pptx
9. Data Warehousing & Mining.pptx
 
Big data
Big dataBig data
Big data
 
Data mining
Data miningData mining
Data mining
 
Data mining
Data miningData mining
Data mining
 
[DSC Europe 23] Milos Solujic - Data Lakehouse Revolutionizing Data Managemen...
[DSC Europe 23] Milos Solujic - Data Lakehouse Revolutionizing Data Managemen...[DSC Europe 23] Milos Solujic - Data Lakehouse Revolutionizing Data Managemen...
[DSC Europe 23] Milos Solujic - Data Lakehouse Revolutionizing Data Managemen...
 
Abstract
AbstractAbstract
Abstract
 
ERP technology Areas.pptx
ERP technology Areas.pptxERP technology Areas.pptx
ERP technology Areas.pptx
 
Five Things to Consider About Data Mesh and Data Governance
Five Things to Consider About Data Mesh and Data GovernanceFive Things to Consider About Data Mesh and Data Governance
Five Things to Consider About Data Mesh and Data Governance
 
Semantic 'Radar' Steers Users to Insights in the Data Lake
Semantic 'Radar' Steers Users to Insights in the Data LakeSemantic 'Radar' Steers Users to Insights in the Data Lake
Semantic 'Radar' Steers Users to Insights in the Data Lake
 
Data Mining
Data MiningData Mining
Data Mining
 
Data lake benefits
Data lake benefitsData lake benefits
Data lake benefits
 
Data warehouse
Data warehouseData warehouse
Data warehouse
 
ACCOUNTING-IT-APP-MIdterm Topic-Bigdata.pdf
ACCOUNTING-IT-APP-MIdterm Topic-Bigdata.pdfACCOUNTING-IT-APP-MIdterm Topic-Bigdata.pdf
ACCOUNTING-IT-APP-MIdterm Topic-Bigdata.pdf
 
ADV Slides: Data Pipelines in the Enterprise and Comparison
ADV Slides: Data Pipelines in the Enterprise and ComparisonADV Slides: Data Pipelines in the Enterprise and Comparison
ADV Slides: Data Pipelines in the Enterprise and Comparison
 
What Data Do You Have and Where is It?
What Data Do You Have and Where is It? What Data Do You Have and Where is It?
What Data Do You Have and Where is It?
 

Recently uploaded

Abortion pills in Jeddah | +966572737505 | Get Cytotec
Abortion pills in Jeddah | +966572737505 | Get CytotecAbortion pills in Jeddah | +966572737505 | Get Cytotec
Abortion pills in Jeddah | +966572737505 | Get Cytotec
Abortion pills in Riyadh +966572737505 get cytotec
 
Computer science Sql cheat sheet.pdf.pdf
Computer science Sql cheat sheet.pdf.pdfComputer science Sql cheat sheet.pdf.pdf
Computer science Sql cheat sheet.pdf.pdf
SayantanBiswas37
 
Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...
nirzagarg
 
Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...
Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...
Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...
gajnagarg
 
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...
nirzagarg
 
Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...
nirzagarg
 
Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...
Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...
Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...
nirzagarg
 
怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制
怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制
怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制
vexqp
 
Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...
nirzagarg
 
Gartner's Data Analytics Maturity Model.pptx
Gartner's Data Analytics Maturity Model.pptxGartner's Data Analytics Maturity Model.pptx
Gartner's Data Analytics Maturity Model.pptx
chadhar227
 
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
ZurliaSoop
 
Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...
nirzagarg
 
Top profile Call Girls In Latur [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Latur [ 7014168258 ] Call Me For Genuine Models We ...Top profile Call Girls In Latur [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Latur [ 7014168258 ] Call Me For Genuine Models We ...
gajnagarg
 

Recently uploaded (20)

Abortion pills in Jeddah | +966572737505 | Get Cytotec
Abortion pills in Jeddah | +966572737505 | Get CytotecAbortion pills in Jeddah | +966572737505 | Get Cytotec
Abortion pills in Jeddah | +966572737505 | Get Cytotec
 
Computer science Sql cheat sheet.pdf.pdf
Computer science Sql cheat sheet.pdf.pdfComputer science Sql cheat sheet.pdf.pdf
Computer science Sql cheat sheet.pdf.pdf
 
Predicting HDB Resale Prices - Conducting Linear Regression Analysis With Orange
Predicting HDB Resale Prices - Conducting Linear Regression Analysis With OrangePredicting HDB Resale Prices - Conducting Linear Regression Analysis With Orange
Predicting HDB Resale Prices - Conducting Linear Regression Analysis With Orange
 
Vadodara 💋 Call Girl 7737669865 Call Girls in Vadodara Escort service book now
Vadodara 💋 Call Girl 7737669865 Call Girls in Vadodara Escort service book nowVadodara 💋 Call Girl 7737669865 Call Girls in Vadodara Escort service book now
Vadodara 💋 Call Girl 7737669865 Call Girls in Vadodara Escort service book now
 
Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...
 
Discover Why Less is More in B2B Research
Discover Why Less is More in B2B ResearchDiscover Why Less is More in B2B Research
Discover Why Less is More in B2B Research
 
Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...
Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...
Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...
 
High Profile Call Girls Service in Jalore { 9332606886 } VVIP NISHA Call Girl...
High Profile Call Girls Service in Jalore { 9332606886 } VVIP NISHA Call Girl...High Profile Call Girls Service in Jalore { 9332606886 } VVIP NISHA Call Girl...
High Profile Call Girls Service in Jalore { 9332606886 } VVIP NISHA Call Girl...
 
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...
 
Ranking and Scoring Exercises for Research
Ranking and Scoring Exercises for ResearchRanking and Scoring Exercises for Research
Ranking and Scoring Exercises for Research
 
Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...
 
Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...
Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...
Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...
 
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
 
怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制
怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制
怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制
 
Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...
 
Gartner's Data Analytics Maturity Model.pptx
Gartner's Data Analytics Maturity Model.pptxGartner's Data Analytics Maturity Model.pptx
Gartner's Data Analytics Maturity Model.pptx
 
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
 
Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...
 
TrafficWave Generator Will Instantly drive targeted and engaging traffic back...
TrafficWave Generator Will Instantly drive targeted and engaging traffic back...TrafficWave Generator Will Instantly drive targeted and engaging traffic back...
TrafficWave Generator Will Instantly drive targeted and engaging traffic back...
 
Top profile Call Girls In Latur [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Latur [ 7014168258 ] Call Me For Genuine Models We ...Top profile Call Girls In Latur [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Latur [ 7014168258 ] Call Me For Genuine Models We ...
 

Data Catalog as a Business Enabler

  • 1. IDMA 2021 Fall/Winter Conference October 13th-14th, 2021 Data Catalog as a Business Enabler Presented by Srinivasan Sankar
  • 2. Disclaimer Please note that the views expressed by our speakers are their own and may not necessarily reflect those of their respective employers. This material is for general informational purposes only and is not legal advice. It is not designed to be comprehensive, and it may not apply to your particular facts and circumstances.
  • 3. TOPICS • Improve insights by extracting value from unstructured data utilizing a machine learning augmented data catalog • Practical steps to deal with the onslaught of data and learn how to implement an effective data catalog • Overcoming data silos using intelligent tools • Let the insights come to you with AI-augmentation • Multi-source data to increase the potential of data value • Data Catalog – key enabler of a Data Mesh
  • 4.
  • 5. NEW DATA, NEW INSIGHTS: MAXIMIZING THE VALUE OF YOUR STRUCTURED AND UNSTRUCTURED DATA
  • 6.
  • 7. Definition A data catalog creates and maintains an inventory of data assets through the discovery, description and organization of distributed datasets. The data catalog provides context to enable data stewards, data/business analysts, data engineers, data scientists and other line of business (LOB) data consumers to find and understand relevant datasets for the purpose of extracting business value. In a nutshell,a data catalog is a place that shows what data assets you have and where they are located.You might be asking,what is a data asset? That is any entity (i.e.,reports,databases, websites) that contains data. Data Catalogs Are the New Black in Data Management and Analytics
  • 8.
  • 9. • Leverage an ML-augmented data catalog as a first step in metadata management • Deploy data catalogs with the capability to scale beyond narrow (or tactical) use-case requirements (such as cataloging data only within a Hadoop distribution),
  • 10. AI POWERED PROCESS FOR CURATING, VERIFYING, AND CLASSIFYING DATA THAT ENHANCES SPEED AND USABILITY How does it work? What is it? Use Algorithms (Advanced Statistics and Deep Learning) to learn from the large scale data to: Applicable to large, complex and often streaming data sets 3rd party data, sensor data, customer data, transactions • Algorithmic sampling of data to identify key patterns and business rules • Continuous monitoring to alert Data Stewards of exceptions for timely resolution • Correlation of data concepts across domains and data sources to track usage and establish lineage • Ability to ingest and apply quality rules to third party and unstructured data sources • Establishes feedback loop that refines the machine learning models to improve data quality over time Identify patterns Quality issues and anomalies across massive, complex and often streaming data sets Business rules
  • 11. THE CASE FOR DATA CATALOGS Analyze Data not chase Data – Many data scientists spend over 2/3rd of their time understanding and finding the data.The main reason for this problem in an organization is the poor mechanism of handling and tracking all the data. A good Catalog helps the Data Scientist or Business Analyst understand the data and answer the question they have. Efficient Access Control – When an organization grows, role-based policies are needed, don’t want everybody to modify the data. Access Control should be implemented while building the Data Lake. Roles are assigned to the users, and according to those roles, Data Access should be controlled. Eliminate Data Redundancies – A good Catalogue Tool helped us find the data redundancies and eliminate them.This can help us to save storage costs and data management costs. To follow Laws – There are different protection laws to follow as per the data, such as GDPR, BASEL, GDSN, HIPAA, and many more.These laws must be followed while dealing with any data. But these laws stand for different usecases and don’t imply every data set, to understand that we need to know about the data set. A good Catalog helps us make sure that Data Compliance’s followed by giving a view on Data Lineage and using Access Control.
  • 12. Phase 1 Catalog and Lineage • Infrastructure and Installation of Catalog tool • Data Architects to initiate the collection of data assets, catalog and identify lineage Phase 2 Data Stewardship, Business Glossary •Appoint Part- time Governance Lead role (cross- functional business facing) •Supporting Analyst •Manage Governance activities Phase 3 Operationalize Governance activities •Accountability, Ownership of Data •Operationalize Data Governance activities •Report Metrics •Iterate activities for all information / data projects Improve / Enhance Data Governance HOW TO ADOPT DATA CATALOGS Manage Data Lifecycle Establish Data Governance Sustain Data Governance Communicate Manage Return On Investment Maintain Organization & Sponsorship Review/Update Processes Review//Update Scope (Quarterly Workshop) Business Change Management Review & Approve New Projects Maintain Data Definitions Maintain Metrics Identify Data Stewards Conflict Resolution, Escalation Plan Organize Organize Define Deploy Core Foundation Augmented Data Catalog* * Machine learning powered process for curating, verifying, and classifying data that enhances speed and usability Phased approach Data Cataloging is a journey……
  • 13. DATA CATALOG BEST PRACTICES Assigning Ownership for the data set – Ownership of each data set must be defined.There must be a person to whom the user contacts in case of an issue. A good Catalog also must talk about the owner of any data set. Human Touch – After building a Catalog, the users must verify the data sets to make them more accurate. Searchability –The Catalog should support searchability. Searchability enables Data Asset Discovery; data consumers easily find assets that meet their needs. Data Protection – Define Access policies to prevent unauthorized data access.
  • 14. HIGH ROI FOR MULTI-SOURCE DATA WITH DATA CATALOG Graphic Source: CEB analysis Weather, Highway safety Industry Enterprise Data Integration and Data Lake Single source data has value in relation to other data in the organization, and the ability to search and analyze across multiple information sources provides tremendous insight Traditional DW •Driving Tracker •Nest Protect •GPS Fleet Tracking D A T A C A T A L O G
  • 15. DATA CATALOG THE NUCLEI OF A DATA MESH* • A data product must be easily discoverable especially with a data catalogue, with their meta information such as their owners, source of origin, lineage, sample datasets, etc.This centralized discoverability service allows data consumers, engineers and scientists in an organization, to find a dataset of their interest easily. Each domain data product must register itself with this centralized data catalogue for easy discoverability. • Note the perspective shift here is from a single platform extracting and owning the data for its use, to each domain providing its data as a product in a discoverable fashion. • Data catalog platforms provide central discoverability, access control and governance of distributed domain datasets. *Data Mesh (concept founded by Zhamak Dehghani) is a sociotechnical approach to share, access and manage analytical data in complex and large-scale environments - within or across organizations