SlideShare a Scribd company logo
1 of 10
Download to read offline
Solving Data Discovery in the Enterprise:
Building an Enterprise Data Catalog
Contents
Overview.......................................................................................................................................................3
The Business Challenges of Data Discovery in the Enterprise......................................................................3
Why MDM is not the Answer........................................................................................................................4
Introducing the Enterprise Data Catalog ......................................................................................................4
Getting Technical: Building an Enterprise Data Catalog...............................................................................6
Catalog Portal...........................................................................................................................................7
Catalog Mobile .........................................................................................................................................8
Catalog Store ............................................................................................................................................8
Data Source Publishing API......................................................................................................................8
Data Source Discovery API.......................................................................................................................8
Data Source Notifications API..................................................................................................................8
Data Source Search API............................................................................................................................8
Data Governance API ...............................................................................................................................9
Metadata Connectors...............................................................................................................................9
Data Collaboration System and APIs.......................................................................................................9
Putting all Together ......................................................................................................................................9
Summary.......................................................................................................................................................9
Overview
Data discovery, understanding and governance is becoming one of the key
elements of data architectures in the enterprise. The explosion in the volumes of
data produced and consumed by organizations have exponentially increased the
complexities related to discovering and understanding data in an efficient manner.
Despite its relevance, data discovery and governance often tends to be an
overlooked aspect of enterprise big data solutions more focused in sexy areas such
as analytics, machine learning etc. However, more and more organizations are
realizing that data discovery is an essential component to effectively enable
analytics, visualizations and general data consumption capabilities in the enterprise.
However, the road of enabling data discovery in the enterprise is plagued with
challenges as we will explore in the next section.
The Business Challenges of Data Discovery in the Enterprise
As data grows in the enterprise so are the initiatives to gather intelligence about
that data. In that sense, the efforts around big data, analytics, visualizations, etc.
have increased exponentially during the last few years. In that sense, data
discovery has become a foundation block to any enterprise data initiative. However,
in order to enable efficient data discovery models, enterprises need to address
some of the following challenges:
 Increasing Data Volume: The increasing volume of data produced in the
enterprise has drastically degraded the ability of information workers for
quickly finding and consuming different data sources from enterprise
applications.
 Lack of Metadata Management: Even when data can be found,
information workers struggle to understand the specific semantics of
enterprise data sources. This is due to the lack of metadata management
solutions implemented in enterprise environments.
 Different Data Access Interfaces: One of the biggest challenges for
accessing data in the enterprise is the proliferation of heterogeneous data
access protocols and APIs introduced by new line of business solutions. In
that sense, organizations struggle with the lack of consistent protocols and
models to access data from different business applications.
 Lack of Established Data Stewardship: Complementing the previous
point, the lack of mainstream data stewardship models make it challenging
for applications trying to access enterprise data sources.
 Limited Collaboration Interfaces: Top-down data stewardship is just a
mechanism for establishing contextual information about enterprise data
sources. A lot of the knowledge about business data lives with business users
who actively interact with it. However, enterprises rarely implement the
collaboration interfaces that capture the knowledge of those domain experts
in order to add contextual information to corporate data sources.
Why MDM is not the Answer
Master data management (MDM) platforms has been traditionally seen as a
mechanism to keep a record of data sources in an enterprise environment.
However, over the years MDM solutions have become extremely heavy, complicated
and very limited to address some of the mainstream scenarios of data discovery in
the enterprise. Additionally, MDM solutions struggle to quickly integrate with
modern SaaS, cloud and mobile platforms which are becoming a significant source
of data in the enterprise.
As a result of the limitations of MDM platforms, organizations have started to adopt
lighter, simpler and more modern data discovery models that are optimized for the
modern technology ecosystem. From the different models used to enabled data
discovery in the enterprise there is one we’ve seen been incredibly successful in
organizations of all sizes: the enterprise data catalog.
Introducing the Enterprise Data Catalog
A data catalog is a simple but incredibly effective and robust model to enable data
discovery in the enterprise. From a functional standpoint, an enterprise data catalog
should provide a global repository that registers data sources from different line of
business systems as well as the corresponding metadata and contextual
information associated with it.
Conceptually, a data catalog borrows elements from popular repositories such as
mobile app stores or ecommerce marketplaces. In that sense, an enterprise data
catalog goes beyond the classification and organization of enterprise data sources
and enables capabilities such as search, collaboration, alerting and other features
that can be combine to provide a fresh, modern experience to discover data sources
in an enterprise environment.
From the functional standpoint, an enterprise data catalog should enable some of
the following capabilities:
 Data Source Discovery: An enterprise data catalog should allow
information workers to browse, and discover different data sources business
data sources linked to line of business systems. Additionally, the catalog
should allow a simple registration for new data sources.
 Data Source Publishing: Complementing the previous point, an enterprise
data catalog should allow information workers to register new data sources
using simple interfaces both visually and programmatically.
 Metadata Management: Enterprise data catalog solutions should allow data
stewards to provide adequate metadata related to business data sources.
Simple metadata such as field descriptions or other contextual information
can be incredibly relevant to correctly understand business data sources.
 Tagging and Classification: An enterprise data catalog should allow users
to classify the different data sources using tags or simple hierarchical
categories.
 Search: Finding data using simple keyword and facet search should be one
of the key capabilities of an enterprise data catalog solution.
 Testability: An enterprise data catalog should allow users to test and
validate the different data sources exposed in the catalog.
 Collaboration: An enterprise data catalog should facilitate the collaboration
between information workers working on specific data sources.
 Governance: Access control, SLAs, exception management are just some of
the key governance and data stewardship capabilities that should be enabled
by enterprise data catalogs.
 Alerts: Throughout the lifetime of a data source, information workers might
want to receive alerts about relevant events such as schema data changes of
performance degradations. An enterprise data catalog should provide a
simple interface for power users to configure alert conditions on specific data
sources.
Getting Technical: Building an Enterprise Data Catalog
As explained in the previous sections, enterprise data catalogs have become one of
the most popular solutions to enable data discovery in the enterprise. In the last
couple of years, we have implemented several enterprise data catalogs for dozens
of organizations. As a result, there are a few reference architectures that you can
implement with today’s technology. The following diagram illustrates a reference
architecture model for an enterprise data catalog solution.
The previous diagram includes highlights some of the following functional
components:
Catalog Portal
The catalog portal is the main user interface to register, browse and discover data
sources in an enterprise environment. From the architecture standpoint, the catalog
portal will interact with the different APIs of the solution to perform operations on
data sources. The catalog could be implemented using any web development
platform such as NodeJS express, ASP.NET or Python Django.
Catalog Mobile
Similar to the portal interface, users will be able to interact with data sources from
smartphones or tablets using the catalog mobile interface. This component of the
platform provides a mobile-first, simple functionality to enable data discovery from
mobile devices.
Catalog Store
The catalog store is the main data repository for maintaining the metadata
associated with different data sources. Considering the arbitrarily nature of
information related to business data sources we have typically preferred to leverage
NOSQL databases such as MongoDB or Couchbase when implementing this type of
solution.
Data Source Publishing API
The data source publishing API provides the interfaces for publishing and managing
business data sources from different applications including the catalog portal. This
API should handle all aspects related to data source management such as
categorization, tagging, metadata management etc.
Data Source Discovery API
The data source discovery API provides the interfaces required to dynamically query
and discover data sources registered on the platform. Typically, we have leveraged
industry standards such as OData or GraphQL as the main protocol for these
interfaces.
Data Source Notifications API
The data source notifications API provides the mechanisms for third party
applications to dynamically subscribe to changes on specific data sources. The API
should be able to deliver notifications via traditional channels such as email, SMS or
push notifications as well as via programmatic interfaces.
Data Source Search API
The data source search API is responsible for providing traditional search
capabilities to enterprise data sources registered in the catalog. The search
capabilities should focus on the data source metadata and not on the data itself.
Search techniques like facet searching and proximity algorithms are very relevant
for this API. Typically we rely on search platforms like Elastic to implement this
capability.
Data Governance API
The data governance API is responsible for enabling data governance and
stewardship capabilities such as access control, data privacy, data ownership, SLA
monitoring etc. These APIs can be integrated with existing security and access
control platforms in the enterprise.
Metadata Connectors
The connectors are responsible for abstracting the integration with the different line
of business systems hosting the data sources will be discovered via the catalog.
From the functional standpoint, the connectors should provide the authentication
and data querying capabilities required to register a data source in the enterprise
data catalog.
Data Collaboration System and APIs
The data collaboration system and APIs provides the interfaces for teams
collaborate around specific data sources stored in the data catalog. This interface
can be the main gateway to capture contextual information related to data sources
such as comments, documents, etc.
Putting all Together
As simple as the previous architecture model seems, it contain the fundamental
building blocks to enable robust data discovery scenarios in enterprise
environments. This architecture model is based on our experience implementing
dozens of similar solutions and can be easily extended with other relevant aspects
such as data quality rules, data access optimization, etc.
Summary
Data discovery is one of the most important elements of enterprise data solutions
and one that is frequently ignored. This paper has provided a reference architecture
to enable data discovery in the enterprise environments. The reference architecture
covers relevant aspects of data discovery solutions such as metadata management,
governance, alerting, discovery, etc. The reference architecture described in this
project has been implemented dozens of times using commodity technology stacks
available to any organization in the world.

More Related Content

What's hot

Microsoft Cloud Computing - Windows Azure Platform
Microsoft Cloud Computing - Windows Azure PlatformMicrosoft Cloud Computing - Windows Azure Platform
Microsoft Cloud Computing - Windows Azure PlatformDavid Chou
 
SOA Fundamentals
SOA  FundamentalsSOA  Fundamentals
SOA Fundamentalsabhi1112
 
Chapter 06: cloud computing trends
Chapter 06: cloud computing trendsChapter 06: cloud computing trends
Chapter 06: cloud computing trendsSsendiSamuel
 
Microservices_vs_SOA
Microservices_vs_SOAMicroservices_vs_SOA
Microservices_vs_SOAYakov Liskoff
 
Algorithm for Scheduling of Dependent Task in Cloud
Algorithm for Scheduling of Dependent Task in CloudAlgorithm for Scheduling of Dependent Task in Cloud
Algorithm for Scheduling of Dependent Task in CloudIRJET Journal
 
Service Oriented Architecture (SOA)
Service Oriented Architecture (SOA)Service Oriented Architecture (SOA)
Service Oriented Architecture (SOA)Biniam Asnake
 
Cloud Services Brokerage Demystified
Cloud Services Brokerage DemystifiedCloud Services Brokerage Demystified
Cloud Services Brokerage DemystifiedZach Gardner
 
Adaptive Information Technology for Service Lifecycle Management
Adaptive Information Technology for Service Lifecycle ManagementAdaptive Information Technology for Service Lifecycle Management
Adaptive Information Technology for Service Lifecycle Managementwhite paper
 
Sql Server 2008 Product Overview
Sql Server 2008 Product OverviewSql Server 2008 Product Overview
Sql Server 2008 Product OverviewIsmail Muhammad
 
Enterprise Integration Patterns Revisited (again) for the Era of Big Data, In...
Enterprise Integration Patterns Revisited (again) for the Era of Big Data, In...Enterprise Integration Patterns Revisited (again) for the Era of Big Data, In...
Enterprise Integration Patterns Revisited (again) for the Era of Big Data, In...Kai Wähner
 
IDC: Selecting the Optimal Path to Private Cloud
IDC: Selecting the Optimal Path to Private CloudIDC: Selecting the Optimal Path to Private Cloud
IDC: Selecting the Optimal Path to Private CloudEMC
 
Cloud Customer Architecture for Securing Workloads on Cloud Services
Cloud Customer Architecture for Securing Workloads on Cloud ServicesCloud Customer Architecture for Securing Workloads on Cloud Services
Cloud Customer Architecture for Securing Workloads on Cloud ServicesCloud Standards Customer Council
 
2013.07.05 [IBM] Cloud Ecosystem Forum - Atelier Directions Techniques
2013.07.05   [IBM] Cloud Ecosystem Forum - Atelier Directions Techniques2013.07.05   [IBM] Cloud Ecosystem Forum - Atelier Directions Techniques
2013.07.05 [IBM] Cloud Ecosystem Forum - Atelier Directions TechniquesClub Cloud des Partenaires
 
Service Oriented Architecture
Service Oriented ArchitectureService Oriented Architecture
Service Oriented ArchitectureSyed Mustafa
 

What's hot (20)

Microsoft Cloud Computing - Windows Azure Platform
Microsoft Cloud Computing - Windows Azure PlatformMicrosoft Cloud Computing - Windows Azure Platform
Microsoft Cloud Computing - Windows Azure Platform
 
Enterprise REST
Enterprise RESTEnterprise REST
Enterprise REST
 
Introduction to SOA
Introduction to SOAIntroduction to SOA
Introduction to SOA
 
SOA Fundamentals
SOA  FundamentalsSOA  Fundamentals
SOA Fundamentals
 
Chapter 06: cloud computing trends
Chapter 06: cloud computing trendsChapter 06: cloud computing trends
Chapter 06: cloud computing trends
 
Microservices_vs_SOA
Microservices_vs_SOAMicroservices_vs_SOA
Microservices_vs_SOA
 
Algorithm for Scheduling of Dependent Task in Cloud
Algorithm for Scheduling of Dependent Task in CloudAlgorithm for Scheduling of Dependent Task in Cloud
Algorithm for Scheduling of Dependent Task in Cloud
 
Service Oriented Architecture (SOA)
Service Oriented Architecture (SOA)Service Oriented Architecture (SOA)
Service Oriented Architecture (SOA)
 
Microservices Decomposition Patterns
Microservices Decomposition PatternsMicroservices Decomposition Patterns
Microservices Decomposition Patterns
 
Cloud Services Brokerage Demystified
Cloud Services Brokerage DemystifiedCloud Services Brokerage Demystified
Cloud Services Brokerage Demystified
 
Adaptive Information Technology for Service Lifecycle Management
Adaptive Information Technology for Service Lifecycle ManagementAdaptive Information Technology for Service Lifecycle Management
Adaptive Information Technology for Service Lifecycle Management
 
Sql Server 2008 Product Overview
Sql Server 2008 Product OverviewSql Server 2008 Product Overview
Sql Server 2008 Product Overview
 
What is cloud data management
What is cloud data management What is cloud data management
What is cloud data management
 
Enterprise Integration Patterns Revisited (again) for the Era of Big Data, In...
Enterprise Integration Patterns Revisited (again) for the Era of Big Data, In...Enterprise Integration Patterns Revisited (again) for the Era of Big Data, In...
Enterprise Integration Patterns Revisited (again) for the Era of Big Data, In...
 
IDC: Selecting the Optimal Path to Private Cloud
IDC: Selecting the Optimal Path to Private CloudIDC: Selecting the Optimal Path to Private Cloud
IDC: Selecting the Optimal Path to Private Cloud
 
Cloud Customer Architecture for Securing Workloads on Cloud Services
Cloud Customer Architecture for Securing Workloads on Cloud ServicesCloud Customer Architecture for Securing Workloads on Cloud Services
Cloud Customer Architecture for Securing Workloads on Cloud Services
 
2013.07.05 [IBM] Cloud Ecosystem Forum - Atelier Directions Techniques
2013.07.05   [IBM] Cloud Ecosystem Forum - Atelier Directions Techniques2013.07.05   [IBM] Cloud Ecosystem Forum - Atelier Directions Techniques
2013.07.05 [IBM] Cloud Ecosystem Forum - Atelier Directions Techniques
 
Soa 101
Soa 101Soa 101
Soa 101
 
Service Oriented Architecture
Service Oriented ArchitectureService Oriented Architecture
Service Oriented Architecture
 
12 Steps To Soa Final
12 Steps To Soa Final12 Steps To Soa Final
12 Steps To Soa Final
 

Similar to Solving data discovery in the enterprise

Running head Database and Data Warehousing design1Database and.docx
Running head Database and Data Warehousing design1Database and.docxRunning head Database and Data Warehousing design1Database and.docx
Running head Database and Data Warehousing design1Database and.docxhealdkathaleen
 
Running head Database and Data Warehousing design1Database and.docx
Running head Database and Data Warehousing design1Database and.docxRunning head Database and Data Warehousing design1Database and.docx
Running head Database and Data Warehousing design1Database and.docxtodd271
 
Data Integration: Creating a Trustworthy Data Foundation for Business Intelli...
Data Integration: Creating a Trustworthy Data Foundation for Business Intelli...Data Integration: Creating a Trustworthy Data Foundation for Business Intelli...
Data Integration: Creating a Trustworthy Data Foundation for Business Intelli...FindWhitePapers
 
Mastering data-modeling-for-master-data-domains
Mastering data-modeling-for-master-data-domainsMastering data-modeling-for-master-data-domains
Mastering data-modeling-for-master-data-domainsChanukya Mekala
 
Evolving Big Data Strategies: Bringing Data Lake and Data Mesh Vision to Life
Evolving Big Data Strategies: Bringing Data Lake and Data Mesh Vision to LifeEvolving Big Data Strategies: Bringing Data Lake and Data Mesh Vision to Life
Evolving Big Data Strategies: Bringing Data Lake and Data Mesh Vision to LifeSG Analytics
 
Building an Enterprise Metadata Repository
Building an Enterprise Metadata RepositoryBuilding an Enterprise Metadata Repository
Building an Enterprise Metadata RepositoryEmbarcadero Technologies
 
Dw & etl concepts
Dw & etl conceptsDw & etl concepts
Dw & etl conceptsjeshocarme
 
Next generation Data Governance
Next generation Data GovernanceNext generation Data Governance
Next generation Data GovernanceVladimiro Borsi
 
Guide to Business Intelligence
Guide to Business IntelligenceGuide to Business Intelligence
Guide to Business IntelligenceTechnologyAdvice
 
The Case for Business Modeling
The Case for Business ModelingThe Case for Business Modeling
The Case for Business ModelingNeil Raden
 
3._DWH_Architecture__Components.ppt
3._DWH_Architecture__Components.ppt3._DWH_Architecture__Components.ppt
3._DWH_Architecture__Components.pptBsMath3rdsem
 
Business Intelligence Module 2
Business Intelligence Module 2Business Intelligence Module 2
Business Intelligence Module 2Home
 
Using a Semantic and Graph-based Data Catalog in a Modern Data Fabric
Using a Semantic and Graph-based Data Catalog in a Modern Data FabricUsing a Semantic and Graph-based Data Catalog in a Modern Data Fabric
Using a Semantic and Graph-based Data Catalog in a Modern Data FabricCambridge Semantics
 
MDM AS A METHODOLOGY
MDM AS A METHODOLOGYMDM AS A METHODOLOGY
MDM AS A METHODOLOGYJanet Wetter
 
Semantic 'Radar' Steers Users to Insights in the Data Lake
Semantic 'Radar' Steers Users to Insights in the Data LakeSemantic 'Radar' Steers Users to Insights in the Data Lake
Semantic 'Radar' Steers Users to Insights in the Data LakeCognizant
 
Semantic 'Radar' Steers Users to Insights in the Data Lake
Semantic 'Radar' Steers Users to Insights in the Data LakeSemantic 'Radar' Steers Users to Insights in the Data Lake
Semantic 'Radar' Steers Users to Insights in the Data LakeThomas Kelly, PMP
 
WP_Impetus_2016_Guide_to_Modernize_Your_Enterprise_Data_Warehouse_JRoberts
WP_Impetus_2016_Guide_to_Modernize_Your_Enterprise_Data_Warehouse_JRobertsWP_Impetus_2016_Guide_to_Modernize_Your_Enterprise_Data_Warehouse_JRoberts
WP_Impetus_2016_Guide_to_Modernize_Your_Enterprise_Data_Warehouse_JRobertsJane Roberts
 

Similar to Solving data discovery in the enterprise (20)

Running head Database and Data Warehousing design1Database and.docx
Running head Database and Data Warehousing design1Database and.docxRunning head Database and Data Warehousing design1Database and.docx
Running head Database and Data Warehousing design1Database and.docx
 
Running head Database and Data Warehousing design1Database and.docx
Running head Database and Data Warehousing design1Database and.docxRunning head Database and Data Warehousing design1Database and.docx
Running head Database and Data Warehousing design1Database and.docx
 
Data Integration: Creating a Trustworthy Data Foundation for Business Intelli...
Data Integration: Creating a Trustworthy Data Foundation for Business Intelli...Data Integration: Creating a Trustworthy Data Foundation for Business Intelli...
Data Integration: Creating a Trustworthy Data Foundation for Business Intelli...
 
Mastering data-modeling-for-master-data-domains
Mastering data-modeling-for-master-data-domainsMastering data-modeling-for-master-data-domains
Mastering data-modeling-for-master-data-domains
 
Evolving Big Data Strategies: Bringing Data Lake and Data Mesh Vision to Life
Evolving Big Data Strategies: Bringing Data Lake and Data Mesh Vision to LifeEvolving Big Data Strategies: Bringing Data Lake and Data Mesh Vision to Life
Evolving Big Data Strategies: Bringing Data Lake and Data Mesh Vision to Life
 
Building an Enterprise Metadata Repository
Building an Enterprise Metadata RepositoryBuilding an Enterprise Metadata Repository
Building an Enterprise Metadata Repository
 
Dw & etl concepts
Dw & etl conceptsDw & etl concepts
Dw & etl concepts
 
Next generation Data Governance
Next generation Data GovernanceNext generation Data Governance
Next generation Data Governance
 
Guide to Business Intelligence
Guide to Business IntelligenceGuide to Business Intelligence
Guide to Business Intelligence
 
The Case for Business Modeling
The Case for Business ModelingThe Case for Business Modeling
The Case for Business Modeling
 
Are you mdm aware
Are you mdm awareAre you mdm aware
Are you mdm aware
 
3._DWH_Architecture__Components.ppt
3._DWH_Architecture__Components.ppt3._DWH_Architecture__Components.ppt
3._DWH_Architecture__Components.ppt
 
Business Intelligence Module 2
Business Intelligence Module 2Business Intelligence Module 2
Business Intelligence Module 2
 
Using a Semantic and Graph-based Data Catalog in a Modern Data Fabric
Using a Semantic and Graph-based Data Catalog in a Modern Data FabricUsing a Semantic and Graph-based Data Catalog in a Modern Data Fabric
Using a Semantic and Graph-based Data Catalog in a Modern Data Fabric
 
MDM AS A METHODOLOGY
MDM AS A METHODOLOGYMDM AS A METHODOLOGY
MDM AS A METHODOLOGY
 
Semantic 'Radar' Steers Users to Insights in the Data Lake
Semantic 'Radar' Steers Users to Insights in the Data LakeSemantic 'Radar' Steers Users to Insights in the Data Lake
Semantic 'Radar' Steers Users to Insights in the Data Lake
 
Semantic 'Radar' Steers Users to Insights in the Data Lake
Semantic 'Radar' Steers Users to Insights in the Data LakeSemantic 'Radar' Steers Users to Insights in the Data Lake
Semantic 'Radar' Steers Users to Insights in the Data Lake
 
WP_Impetus_2016_Guide_to_Modernize_Your_Enterprise_Data_Warehouse_JRoberts
WP_Impetus_2016_Guide_to_Modernize_Your_Enterprise_Data_Warehouse_JRobertsWP_Impetus_2016_Guide_to_Modernize_Your_Enterprise_Data_Warehouse_JRoberts
WP_Impetus_2016_Guide_to_Modernize_Your_Enterprise_Data_Warehouse_JRoberts
 
Database Systems Essay
Database Systems EssayDatabase Systems Essay
Database Systems Essay
 
pwc-data-mesh.pdf
pwc-data-mesh.pdfpwc-data-mesh.pdf
pwc-data-mesh.pdf
 

More from Jesus Rodriguez

The Emergence of DeFi Micro-Primitives
The Emergence of DeFi Micro-PrimitivesThe Emergence of DeFi Micro-Primitives
The Emergence of DeFi Micro-PrimitivesJesus Rodriguez
 
ChatGPT, Foundation Models and Web3.pptx
ChatGPT, Foundation Models and Web3.pptxChatGPT, Foundation Models and Web3.pptx
ChatGPT, Foundation Models and Web3.pptxJesus Rodriguez
 
DeFi Opportunities and Challenges in the Current Crypto Market
DeFi Opportunities and Challenges in the Current Crypto MarketDeFi Opportunities and Challenges in the Current Crypto Market
DeFi Opportunities and Challenges in the Current Crypto MarketJesus Rodriguez
 
The Polygon Blockchain by the Numbers
The Polygon Blockchain by the NumbersThe Polygon Blockchain by the Numbers
The Polygon Blockchain by the NumbersJesus Rodriguez
 
Social Analytics for Cryptocurrencies
Social Analytics for Cryptocurrencies Social Analytics for Cryptocurrencies
Social Analytics for Cryptocurrencies Jesus Rodriguez
 
DeFi Quant Yield-Generating Strategies
DeFi Quant Yield-Generating StrategiesDeFi Quant Yield-Generating Strategies
DeFi Quant Yield-Generating StrategiesJesus Rodriguez
 
High Frequency Trading and DeFi
High Frequency Trading and DeFiHigh Frequency Trading and DeFi
High Frequency Trading and DeFiJesus Rodriguez
 
Simple DeFi Analytics Any Crypto-Investor Should Know About
Simple DeFi Analytics Any Crypto-Investor Should Know About Simple DeFi Analytics Any Crypto-Investor Should Know About
Simple DeFi Analytics Any Crypto-Investor Should Know About Jesus Rodriguez
 
15 Minutes of DeFi Analytics
15 Minutes of DeFi Analytics15 Minutes of DeFi Analytics
15 Minutes of DeFi AnalyticsJesus Rodriguez
 
DeFi Trading Strategies: Opportunities and Challenges
DeFi Trading Strategies: Opportunities and ChallengesDeFi Trading Strategies: Opportunities and Challenges
DeFi Trading Strategies: Opportunities and ChallengesJesus Rodriguez
 
Practical Crypto Asset Predictions rev
Practical Crypto Asset Predictions revPractical Crypto Asset Predictions rev
Practical Crypto Asset Predictions revJesus Rodriguez
 
Better Technical Analysis with Blockchain Indicators
Better Technical Analysis with Blockchain IndicatorsBetter Technical Analysis with Blockchain Indicators
Better Technical Analysis with Blockchain IndicatorsJesus Rodriguez
 
Price Predictions for Cryptocurrencies
Price Predictions for CryptocurrenciesPrice Predictions for Cryptocurrencies
Price Predictions for CryptocurrenciesJesus Rodriguez
 
Fascinating Metrics and Analytics About Cryptocurrencies
Fascinating Metrics and Analytics About CryptocurrenciesFascinating Metrics and Analytics About Cryptocurrencies
Fascinating Metrics and Analytics About CryptocurrenciesJesus Rodriguez
 
Price PRedictions for Crypto-Assets Using Deep Learning
Price PRedictions for Crypto-Assets Using Deep LearningPrice PRedictions for Crypto-Assets Using Deep Learning
Price PRedictions for Crypto-Assets Using Deep LearningJesus Rodriguez
 
Demystifying Centralized Crypto Exchanges using Data Science
Demystifying Centralized Crypto Exchanges using Data ScienceDemystifying Centralized Crypto Exchanges using Data Science
Demystifying Centralized Crypto Exchanges using Data ScienceJesus Rodriguez
 
Crypto assets are a data science heaven rev
Crypto assets are a data science heaven revCrypto assets are a data science heaven rev
Crypto assets are a data science heaven revJesus Rodriguez
 
Implementing Machine Learning in the Real World
Implementing Machine Learning in the Real WorldImplementing Machine Learning in the Real World
Implementing Machine Learning in the Real WorldJesus Rodriguez
 

More from Jesus Rodriguez (20)

The Emergence of DeFi Micro-Primitives
The Emergence of DeFi Micro-PrimitivesThe Emergence of DeFi Micro-Primitives
The Emergence of DeFi Micro-Primitives
 
ChatGPT, Foundation Models and Web3.pptx
ChatGPT, Foundation Models and Web3.pptxChatGPT, Foundation Models and Web3.pptx
ChatGPT, Foundation Models and Web3.pptx
 
DeFi Opportunities and Challenges in the Current Crypto Market
DeFi Opportunities and Challenges in the Current Crypto MarketDeFi Opportunities and Challenges in the Current Crypto Market
DeFi Opportunities and Challenges in the Current Crypto Market
 
MEV Deep Dive .pptx
MEV Deep Dive .pptxMEV Deep Dive .pptx
MEV Deep Dive .pptx
 
Quant in Crypto Land
Quant in Crypto LandQuant in Crypto Land
Quant in Crypto Land
 
The Polygon Blockchain by the Numbers
The Polygon Blockchain by the NumbersThe Polygon Blockchain by the Numbers
The Polygon Blockchain by the Numbers
 
Social Analytics for Cryptocurrencies
Social Analytics for Cryptocurrencies Social Analytics for Cryptocurrencies
Social Analytics for Cryptocurrencies
 
DeFi Quant Yield-Generating Strategies
DeFi Quant Yield-Generating StrategiesDeFi Quant Yield-Generating Strategies
DeFi Quant Yield-Generating Strategies
 
High Frequency Trading and DeFi
High Frequency Trading and DeFiHigh Frequency Trading and DeFi
High Frequency Trading and DeFi
 
Simple DeFi Analytics Any Crypto-Investor Should Know About
Simple DeFi Analytics Any Crypto-Investor Should Know About Simple DeFi Analytics Any Crypto-Investor Should Know About
Simple DeFi Analytics Any Crypto-Investor Should Know About
 
15 Minutes of DeFi Analytics
15 Minutes of DeFi Analytics15 Minutes of DeFi Analytics
15 Minutes of DeFi Analytics
 
DeFi Trading Strategies: Opportunities and Challenges
DeFi Trading Strategies: Opportunities and ChallengesDeFi Trading Strategies: Opportunities and Challenges
DeFi Trading Strategies: Opportunities and Challenges
 
Practical Crypto Asset Predictions rev
Practical Crypto Asset Predictions revPractical Crypto Asset Predictions rev
Practical Crypto Asset Predictions rev
 
Better Technical Analysis with Blockchain Indicators
Better Technical Analysis with Blockchain IndicatorsBetter Technical Analysis with Blockchain Indicators
Better Technical Analysis with Blockchain Indicators
 
Price Predictions for Cryptocurrencies
Price Predictions for CryptocurrenciesPrice Predictions for Cryptocurrencies
Price Predictions for Cryptocurrencies
 
Fascinating Metrics and Analytics About Cryptocurrencies
Fascinating Metrics and Analytics About CryptocurrenciesFascinating Metrics and Analytics About Cryptocurrencies
Fascinating Metrics and Analytics About Cryptocurrencies
 
Price PRedictions for Crypto-Assets Using Deep Learning
Price PRedictions for Crypto-Assets Using Deep LearningPrice PRedictions for Crypto-Assets Using Deep Learning
Price PRedictions for Crypto-Assets Using Deep Learning
 
Demystifying Centralized Crypto Exchanges using Data Science
Demystifying Centralized Crypto Exchanges using Data ScienceDemystifying Centralized Crypto Exchanges using Data Science
Demystifying Centralized Crypto Exchanges using Data Science
 
Crypto assets are a data science heaven rev
Crypto assets are a data science heaven revCrypto assets are a data science heaven rev
Crypto assets are a data science heaven rev
 
Implementing Machine Learning in the Real World
Implementing Machine Learning in the Real WorldImplementing Machine Learning in the Real World
Implementing Machine Learning in the Real World
 

Recently uploaded

Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024
Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024
Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024StefanoLambiase
 
MYjobs Presentation Django-based project
MYjobs Presentation Django-based projectMYjobs Presentation Django-based project
MYjobs Presentation Django-based projectAnoyGreter
 
Buds n Tech IT Solutions: Top-Notch Web Services in Noida
Buds n Tech IT Solutions: Top-Notch Web Services in NoidaBuds n Tech IT Solutions: Top-Notch Web Services in Noida
Buds n Tech IT Solutions: Top-Notch Web Services in Noidabntitsolutionsrishis
 
Best Web Development Agency- Idiosys USA.pdf
Best Web Development Agency- Idiosys USA.pdfBest Web Development Agency- Idiosys USA.pdf
Best Web Development Agency- Idiosys USA.pdfIdiosysTechnologies1
 
Unveiling Design Patterns: A Visual Guide with UML Diagrams
Unveiling Design Patterns: A Visual Guide with UML DiagramsUnveiling Design Patterns: A Visual Guide with UML Diagrams
Unveiling Design Patterns: A Visual Guide with UML DiagramsAhmed Mohamed
 
BATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASE
BATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASEBATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASE
BATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASEOrtus Solutions, Corp
 
Ahmed Motair CV April 2024 (Senior SW Developer)
Ahmed Motair CV April 2024 (Senior SW Developer)Ahmed Motair CV April 2024 (Senior SW Developer)
Ahmed Motair CV April 2024 (Senior SW Developer)Ahmed Mater
 
SpotFlow: Tracking Method Calls and States at Runtime
SpotFlow: Tracking Method Calls and States at RuntimeSpotFlow: Tracking Method Calls and States at Runtime
SpotFlow: Tracking Method Calls and States at Runtimeandrehoraa
 
CRM Contender Series: HubSpot vs. Salesforce
CRM Contender Series: HubSpot vs. SalesforceCRM Contender Series: HubSpot vs. Salesforce
CRM Contender Series: HubSpot vs. SalesforceBrainSell Technologies
 
What is Advanced Excel and what are some best practices for designing and cre...
What is Advanced Excel and what are some best practices for designing and cre...What is Advanced Excel and what are some best practices for designing and cre...
What is Advanced Excel and what are some best practices for designing and cre...Technogeeks
 
Maximizing Efficiency and Profitability with OnePlan’s Professional Service A...
Maximizing Efficiency and Profitability with OnePlan’s Professional Service A...Maximizing Efficiency and Profitability with OnePlan’s Professional Service A...
Maximizing Efficiency and Profitability with OnePlan’s Professional Service A...OnePlan Solutions
 
Unveiling the Future: Sylius 2.0 New Features
Unveiling the Future: Sylius 2.0 New FeaturesUnveiling the Future: Sylius 2.0 New Features
Unveiling the Future: Sylius 2.0 New FeaturesŁukasz Chruściel
 
How to Track Employee Performance A Comprehensive Guide.pdf
How to Track Employee Performance A Comprehensive Guide.pdfHow to Track Employee Performance A Comprehensive Guide.pdf
How to Track Employee Performance A Comprehensive Guide.pdfLivetecs LLC
 
Implementing Zero Trust strategy with Azure
Implementing Zero Trust strategy with AzureImplementing Zero Trust strategy with Azure
Implementing Zero Trust strategy with AzureDinusha Kumarasiri
 
EY_Graph Database Powered Sustainability
EY_Graph Database Powered SustainabilityEY_Graph Database Powered Sustainability
EY_Graph Database Powered SustainabilityNeo4j
 
React Server Component in Next.js by Hanief Utama
React Server Component in Next.js by Hanief UtamaReact Server Component in Next.js by Hanief Utama
React Server Component in Next.js by Hanief UtamaHanief Utama
 
Open Source Summit NA 2024: Open Source Cloud Costs - OpenCost's Impact on En...
Open Source Summit NA 2024: Open Source Cloud Costs - OpenCost's Impact on En...Open Source Summit NA 2024: Open Source Cloud Costs - OpenCost's Impact on En...
Open Source Summit NA 2024: Open Source Cloud Costs - OpenCost's Impact on En...Matt Ray
 
Automate your Kamailio Test Calls - Kamailio World 2024
Automate your Kamailio Test Calls - Kamailio World 2024Automate your Kamailio Test Calls - Kamailio World 2024
Automate your Kamailio Test Calls - Kamailio World 2024Andreas Granig
 
英国UN学位证,北安普顿大学毕业证书1:1制作
英国UN学位证,北安普顿大学毕业证书1:1制作英国UN学位证,北安普顿大学毕业证书1:1制作
英国UN学位证,北安普顿大学毕业证书1:1制作qr0udbr0
 

Recently uploaded (20)

Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024
Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024
Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024
 
MYjobs Presentation Django-based project
MYjobs Presentation Django-based projectMYjobs Presentation Django-based project
MYjobs Presentation Django-based project
 
Buds n Tech IT Solutions: Top-Notch Web Services in Noida
Buds n Tech IT Solutions: Top-Notch Web Services in NoidaBuds n Tech IT Solutions: Top-Notch Web Services in Noida
Buds n Tech IT Solutions: Top-Notch Web Services in Noida
 
Best Web Development Agency- Idiosys USA.pdf
Best Web Development Agency- Idiosys USA.pdfBest Web Development Agency- Idiosys USA.pdf
Best Web Development Agency- Idiosys USA.pdf
 
Unveiling Design Patterns: A Visual Guide with UML Diagrams
Unveiling Design Patterns: A Visual Guide with UML DiagramsUnveiling Design Patterns: A Visual Guide with UML Diagrams
Unveiling Design Patterns: A Visual Guide with UML Diagrams
 
BATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASE
BATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASEBATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASE
BATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASE
 
Ahmed Motair CV April 2024 (Senior SW Developer)
Ahmed Motair CV April 2024 (Senior SW Developer)Ahmed Motair CV April 2024 (Senior SW Developer)
Ahmed Motair CV April 2024 (Senior SW Developer)
 
Hot Sexy call girls in Patel Nagar🔝 9953056974 🔝 escort Service
Hot Sexy call girls in Patel Nagar🔝 9953056974 🔝 escort ServiceHot Sexy call girls in Patel Nagar🔝 9953056974 🔝 escort Service
Hot Sexy call girls in Patel Nagar🔝 9953056974 🔝 escort Service
 
SpotFlow: Tracking Method Calls and States at Runtime
SpotFlow: Tracking Method Calls and States at RuntimeSpotFlow: Tracking Method Calls and States at Runtime
SpotFlow: Tracking Method Calls and States at Runtime
 
CRM Contender Series: HubSpot vs. Salesforce
CRM Contender Series: HubSpot vs. SalesforceCRM Contender Series: HubSpot vs. Salesforce
CRM Contender Series: HubSpot vs. Salesforce
 
What is Advanced Excel and what are some best practices for designing and cre...
What is Advanced Excel and what are some best practices for designing and cre...What is Advanced Excel and what are some best practices for designing and cre...
What is Advanced Excel and what are some best practices for designing and cre...
 
Maximizing Efficiency and Profitability with OnePlan’s Professional Service A...
Maximizing Efficiency and Profitability with OnePlan’s Professional Service A...Maximizing Efficiency and Profitability with OnePlan’s Professional Service A...
Maximizing Efficiency and Profitability with OnePlan’s Professional Service A...
 
Unveiling the Future: Sylius 2.0 New Features
Unveiling the Future: Sylius 2.0 New FeaturesUnveiling the Future: Sylius 2.0 New Features
Unveiling the Future: Sylius 2.0 New Features
 
How to Track Employee Performance A Comprehensive Guide.pdf
How to Track Employee Performance A Comprehensive Guide.pdfHow to Track Employee Performance A Comprehensive Guide.pdf
How to Track Employee Performance A Comprehensive Guide.pdf
 
Implementing Zero Trust strategy with Azure
Implementing Zero Trust strategy with AzureImplementing Zero Trust strategy with Azure
Implementing Zero Trust strategy with Azure
 
EY_Graph Database Powered Sustainability
EY_Graph Database Powered SustainabilityEY_Graph Database Powered Sustainability
EY_Graph Database Powered Sustainability
 
React Server Component in Next.js by Hanief Utama
React Server Component in Next.js by Hanief UtamaReact Server Component in Next.js by Hanief Utama
React Server Component in Next.js by Hanief Utama
 
Open Source Summit NA 2024: Open Source Cloud Costs - OpenCost's Impact on En...
Open Source Summit NA 2024: Open Source Cloud Costs - OpenCost's Impact on En...Open Source Summit NA 2024: Open Source Cloud Costs - OpenCost's Impact on En...
Open Source Summit NA 2024: Open Source Cloud Costs - OpenCost's Impact on En...
 
Automate your Kamailio Test Calls - Kamailio World 2024
Automate your Kamailio Test Calls - Kamailio World 2024Automate your Kamailio Test Calls - Kamailio World 2024
Automate your Kamailio Test Calls - Kamailio World 2024
 
英国UN学位证,北安普顿大学毕业证书1:1制作
英国UN学位证,北安普顿大学毕业证书1:1制作英国UN学位证,北安普顿大学毕业证书1:1制作
英国UN学位证,北安普顿大学毕业证书1:1制作
 

Solving data discovery in the enterprise

  • 1. Solving Data Discovery in the Enterprise: Building an Enterprise Data Catalog
  • 2. Contents Overview.......................................................................................................................................................3 The Business Challenges of Data Discovery in the Enterprise......................................................................3 Why MDM is not the Answer........................................................................................................................4 Introducing the Enterprise Data Catalog ......................................................................................................4 Getting Technical: Building an Enterprise Data Catalog...............................................................................6 Catalog Portal...........................................................................................................................................7 Catalog Mobile .........................................................................................................................................8 Catalog Store ............................................................................................................................................8 Data Source Publishing API......................................................................................................................8 Data Source Discovery API.......................................................................................................................8 Data Source Notifications API..................................................................................................................8 Data Source Search API............................................................................................................................8 Data Governance API ...............................................................................................................................9 Metadata Connectors...............................................................................................................................9 Data Collaboration System and APIs.......................................................................................................9 Putting all Together ......................................................................................................................................9 Summary.......................................................................................................................................................9
  • 3. Overview Data discovery, understanding and governance is becoming one of the key elements of data architectures in the enterprise. The explosion in the volumes of data produced and consumed by organizations have exponentially increased the complexities related to discovering and understanding data in an efficient manner. Despite its relevance, data discovery and governance often tends to be an overlooked aspect of enterprise big data solutions more focused in sexy areas such as analytics, machine learning etc. However, more and more organizations are realizing that data discovery is an essential component to effectively enable analytics, visualizations and general data consumption capabilities in the enterprise. However, the road of enabling data discovery in the enterprise is plagued with challenges as we will explore in the next section. The Business Challenges of Data Discovery in the Enterprise As data grows in the enterprise so are the initiatives to gather intelligence about that data. In that sense, the efforts around big data, analytics, visualizations, etc. have increased exponentially during the last few years. In that sense, data discovery has become a foundation block to any enterprise data initiative. However, in order to enable efficient data discovery models, enterprises need to address some of the following challenges:  Increasing Data Volume: The increasing volume of data produced in the enterprise has drastically degraded the ability of information workers for quickly finding and consuming different data sources from enterprise applications.  Lack of Metadata Management: Even when data can be found, information workers struggle to understand the specific semantics of enterprise data sources. This is due to the lack of metadata management solutions implemented in enterprise environments.  Different Data Access Interfaces: One of the biggest challenges for accessing data in the enterprise is the proliferation of heterogeneous data
  • 4. access protocols and APIs introduced by new line of business solutions. In that sense, organizations struggle with the lack of consistent protocols and models to access data from different business applications.  Lack of Established Data Stewardship: Complementing the previous point, the lack of mainstream data stewardship models make it challenging for applications trying to access enterprise data sources.  Limited Collaboration Interfaces: Top-down data stewardship is just a mechanism for establishing contextual information about enterprise data sources. A lot of the knowledge about business data lives with business users who actively interact with it. However, enterprises rarely implement the collaboration interfaces that capture the knowledge of those domain experts in order to add contextual information to corporate data sources. Why MDM is not the Answer Master data management (MDM) platforms has been traditionally seen as a mechanism to keep a record of data sources in an enterprise environment. However, over the years MDM solutions have become extremely heavy, complicated and very limited to address some of the mainstream scenarios of data discovery in the enterprise. Additionally, MDM solutions struggle to quickly integrate with modern SaaS, cloud and mobile platforms which are becoming a significant source of data in the enterprise. As a result of the limitations of MDM platforms, organizations have started to adopt lighter, simpler and more modern data discovery models that are optimized for the modern technology ecosystem. From the different models used to enabled data discovery in the enterprise there is one we’ve seen been incredibly successful in organizations of all sizes: the enterprise data catalog. Introducing the Enterprise Data Catalog A data catalog is a simple but incredibly effective and robust model to enable data discovery in the enterprise. From a functional standpoint, an enterprise data catalog should provide a global repository that registers data sources from different line of
  • 5. business systems as well as the corresponding metadata and contextual information associated with it. Conceptually, a data catalog borrows elements from popular repositories such as mobile app stores or ecommerce marketplaces. In that sense, an enterprise data catalog goes beyond the classification and organization of enterprise data sources and enables capabilities such as search, collaboration, alerting and other features that can be combine to provide a fresh, modern experience to discover data sources in an enterprise environment. From the functional standpoint, an enterprise data catalog should enable some of the following capabilities:  Data Source Discovery: An enterprise data catalog should allow information workers to browse, and discover different data sources business data sources linked to line of business systems. Additionally, the catalog should allow a simple registration for new data sources.  Data Source Publishing: Complementing the previous point, an enterprise data catalog should allow information workers to register new data sources using simple interfaces both visually and programmatically.  Metadata Management: Enterprise data catalog solutions should allow data stewards to provide adequate metadata related to business data sources. Simple metadata such as field descriptions or other contextual information can be incredibly relevant to correctly understand business data sources.  Tagging and Classification: An enterprise data catalog should allow users to classify the different data sources using tags or simple hierarchical categories.  Search: Finding data using simple keyword and facet search should be one of the key capabilities of an enterprise data catalog solution.  Testability: An enterprise data catalog should allow users to test and validate the different data sources exposed in the catalog.  Collaboration: An enterprise data catalog should facilitate the collaboration between information workers working on specific data sources.
  • 6.  Governance: Access control, SLAs, exception management are just some of the key governance and data stewardship capabilities that should be enabled by enterprise data catalogs.  Alerts: Throughout the lifetime of a data source, information workers might want to receive alerts about relevant events such as schema data changes of performance degradations. An enterprise data catalog should provide a simple interface for power users to configure alert conditions on specific data sources. Getting Technical: Building an Enterprise Data Catalog As explained in the previous sections, enterprise data catalogs have become one of the most popular solutions to enable data discovery in the enterprise. In the last couple of years, we have implemented several enterprise data catalogs for dozens of organizations. As a result, there are a few reference architectures that you can implement with today’s technology. The following diagram illustrates a reference architecture model for an enterprise data catalog solution.
  • 7. The previous diagram includes highlights some of the following functional components: Catalog Portal The catalog portal is the main user interface to register, browse and discover data sources in an enterprise environment. From the architecture standpoint, the catalog portal will interact with the different APIs of the solution to perform operations on data sources. The catalog could be implemented using any web development platform such as NodeJS express, ASP.NET or Python Django.
  • 8. Catalog Mobile Similar to the portal interface, users will be able to interact with data sources from smartphones or tablets using the catalog mobile interface. This component of the platform provides a mobile-first, simple functionality to enable data discovery from mobile devices. Catalog Store The catalog store is the main data repository for maintaining the metadata associated with different data sources. Considering the arbitrarily nature of information related to business data sources we have typically preferred to leverage NOSQL databases such as MongoDB or Couchbase when implementing this type of solution. Data Source Publishing API The data source publishing API provides the interfaces for publishing and managing business data sources from different applications including the catalog portal. This API should handle all aspects related to data source management such as categorization, tagging, metadata management etc. Data Source Discovery API The data source discovery API provides the interfaces required to dynamically query and discover data sources registered on the platform. Typically, we have leveraged industry standards such as OData or GraphQL as the main protocol for these interfaces. Data Source Notifications API The data source notifications API provides the mechanisms for third party applications to dynamically subscribe to changes on specific data sources. The API should be able to deliver notifications via traditional channels such as email, SMS or push notifications as well as via programmatic interfaces. Data Source Search API The data source search API is responsible for providing traditional search capabilities to enterprise data sources registered in the catalog. The search capabilities should focus on the data source metadata and not on the data itself. Search techniques like facet searching and proximity algorithms are very relevant
  • 9. for this API. Typically we rely on search platforms like Elastic to implement this capability. Data Governance API The data governance API is responsible for enabling data governance and stewardship capabilities such as access control, data privacy, data ownership, SLA monitoring etc. These APIs can be integrated with existing security and access control platforms in the enterprise. Metadata Connectors The connectors are responsible for abstracting the integration with the different line of business systems hosting the data sources will be discovered via the catalog. From the functional standpoint, the connectors should provide the authentication and data querying capabilities required to register a data source in the enterprise data catalog. Data Collaboration System and APIs The data collaboration system and APIs provides the interfaces for teams collaborate around specific data sources stored in the data catalog. This interface can be the main gateway to capture contextual information related to data sources such as comments, documents, etc. Putting all Together As simple as the previous architecture model seems, it contain the fundamental building blocks to enable robust data discovery scenarios in enterprise environments. This architecture model is based on our experience implementing dozens of similar solutions and can be easily extended with other relevant aspects such as data quality rules, data access optimization, etc. Summary Data discovery is one of the most important elements of enterprise data solutions and one that is frequently ignored. This paper has provided a reference architecture to enable data discovery in the enterprise environments. The reference architecture covers relevant aspects of data discovery solutions such as metadata management, governance, alerting, discovery, etc. The reference architecture described in this
  • 10. project has been implemented dozens of times using commodity technology stacks available to any organization in the world.