Data Governance
with Databricks
Unity Catalog
Presenter Name
Kundan Kumar
Lack of etiquette and manners is a huge turn off.
KnolX Etiquettes
 Punctuality
Join the session 5 minutes prior to the session start time. We start on
time and conclude on time!
 Feedback
Make sure to submit a constructive feedback for all sessions as it is very
helpful for the presenter.
 Silent Mode
Keep your mobile devices in silent mode, feel free to move out of session
in case you need to attend an urgent call.
 Avoid Disturbance
Avoid unwanted chit chat during the session.
1. Introduction to Data Governance
2. Challenges in Data Governance
3. Introduction to Databricks Unity Catalog
4. Key Components of Databricks Unity Catalog
5. Databricks Unity Catalog: Features and
capabilities
6. Benefits of Using Databricks Unity Catalog
Introduction to Data Governance
 Data governance is the process of managing the availability,
usability, integrity and security of the data in enterprise
systems.
 Data governance is a set of processes, policies, and
standards that organizations use to manage their data assets
effectively.
 The goal of data governance is to ensure that data is of high
quality, accessible, secure, and compliant with regulations
and standards.
 Data governance is a holistic approach to data management
that encompasses people, processes, policies, and
technology.
Challenges in Data Governance
Introduction to Databricks and Lakehouse
 Databricks is a unified, open analytics platform for building,
deploying, sharing, and maintaining enterprise-grade data, analytics,
and AI solutions at scale.
 Databricks provides tools that help you connect your sources of data
to one platform to process, store, share, analyze, model, and
monetize datasets with solutions from BI to generative AI.
 Databricks can be used as a powerful component within a Data
Lakehouse architecture to streamline data processing, analytics, and
machine learning tasks.
Databricks Unity Catalog
 Databricks Unity Catalog offers a unified governance layer for data and AI within the Databricks Data Intelligence
Platform.
 With Unity Catalog, organizations can seamlessly govern their structured and unstructured data, machine learning
models, notebooks, dashboards and files on any cloud or platform.
 Data scientists, analysts and engineers can use Unity Catalog to securely discover, access and collaborate on
trusted data and AI assets, leveraging AI to boost productivity and unlock the full potential of the Lakehouse
architecture.
 This unified approach to governance accelerates data and AI initiatives while simplifying regulatory compliance.
 Unity Catalog provides centralized access control, auditing, lineage, and data discovery capabilities across
Databricks workspaces.
Databricks Unity Catalog's Architecture
Unity Catalog's
Architecture
Key Components of Databricks Unity Catalog
The Databricks Lakehouse architecture combines data stored
with the Delta Lake protocol in cloud object storage with
metadata registered to a metastore.
There are five primary objects in the Databricks Lakehouse:
▪ Catalog: a grouping of databases.
▪ Database or schema: a grouping of objects in a catalog.
Databases contain tables, views, and functions.
▪ Table: a collection of rows and columns stored
as data files in object storage.
▪ View: a saved query typically against one or
more tables or data sources.
▪ Function: saved logic that returns a scalar value or set
of rows.
Databricks Unity Catalog: Features and capabilities
The Unity Catalog's meta store is a blend of data catalog features, each designed to ease the journey of data
management.
Databricks Unity Catalog: Features and capabilities
Data discovery and Exploration: Unified visibility into data and AI
 Discover and classify structured and unstructured data, ML models, notebooks, dashboards and arbitrary files
on any cloud.
 Users can easily discover and explore available datasets and data assets using Unity Catalog's intuitive
interface.
 It supports searching, filtering, and browsing metadata based on attributes such as dataset name, description,
tags, schema, and lineage.
Databricks Unity Catalog: Features and capabilities
Data Lineage: Know about your data journey
 Unity Catalog offers comprehensive data lineage capabilities that enable users to track the flow of data from its
source to consumption.
 It provides visibility into data transformations, ETL processes, and data dependencies, helping users understand
data provenance and impact analysis.
Databricks Unity Catalog: Features and capabilities
Access control: Single permission model for data and AI
 Simplify access management with a unified interface to define access policies on data and AI assets and
consistently apply and audit these policies on any cloud or data platform.
Databricks Unity Catalog: Features and capabilities
Data Sharing: Open data sharing
 With unity catalog we can easily share data and AI assets across clouds, regions and platforms with open source
Delta Sharing, natively integrated within Unity Catalog.
 Securely collaborate with anyone, anywhere to unlock new revenue streams and drive business value, without
relying on proprietary formats, complex ETL processes or costly data replication.
Databricks Unity Catalog: Features and capabilities
AI-powered monitoring and observability
 Harness the power of AI to automate monitoring, diagnose errors and uphold data and ML model quality.
 Benefit from proactive alerts that automatically detect personally identifiable information (PII) data, track model
drift, and effectively resolve issues within your data and AI pipelines to maintain accuracy and integrity.
Benefits of Using Databricks Unity Catalog
 Enhanced Data Visibility and Transparency: Centralized metadata repository provides a single
source of truth for all data assets.
 Improved Data Quality and Consistency: Data lineage tracking helps identify data quality issues
and ensure consistency.
 Accelerated Data Discovery and Analysis: Data catalog simplifies data discovery, leading to
faster insights and analysis.
 Simplified Compliance and Regulatory Reporting: Policy management ensures adherence to
regulatory requirements and simplifies compliance reporting.
Q/A
https://www.databricks.com/
https://www.databricks.com/product/unity-catalog
https://learn.microsoft.com/en-IN/azure/databricks/
https://delta.io/
References
Data governance with Unity Catalog Presentation

Data governance with Unity Catalog Presentation

  • 1.
    Data Governance with Databricks UnityCatalog Presenter Name Kundan Kumar
  • 2.
    Lack of etiquetteand manners is a huge turn off. KnolX Etiquettes  Punctuality Join the session 5 minutes prior to the session start time. We start on time and conclude on time!  Feedback Make sure to submit a constructive feedback for all sessions as it is very helpful for the presenter.  Silent Mode Keep your mobile devices in silent mode, feel free to move out of session in case you need to attend an urgent call.  Avoid Disturbance Avoid unwanted chit chat during the session.
  • 3.
    1. Introduction toData Governance 2. Challenges in Data Governance 3. Introduction to Databricks Unity Catalog 4. Key Components of Databricks Unity Catalog 5. Databricks Unity Catalog: Features and capabilities 6. Benefits of Using Databricks Unity Catalog
  • 4.
    Introduction to DataGovernance  Data governance is the process of managing the availability, usability, integrity and security of the data in enterprise systems.  Data governance is a set of processes, policies, and standards that organizations use to manage their data assets effectively.  The goal of data governance is to ensure that data is of high quality, accessible, secure, and compliant with regulations and standards.  Data governance is a holistic approach to data management that encompasses people, processes, policies, and technology.
  • 5.
  • 6.
    Introduction to Databricksand Lakehouse  Databricks is a unified, open analytics platform for building, deploying, sharing, and maintaining enterprise-grade data, analytics, and AI solutions at scale.  Databricks provides tools that help you connect your sources of data to one platform to process, store, share, analyze, model, and monetize datasets with solutions from BI to generative AI.  Databricks can be used as a powerful component within a Data Lakehouse architecture to streamline data processing, analytics, and machine learning tasks.
  • 7.
    Databricks Unity Catalog Databricks Unity Catalog offers a unified governance layer for data and AI within the Databricks Data Intelligence Platform.  With Unity Catalog, organizations can seamlessly govern their structured and unstructured data, machine learning models, notebooks, dashboards and files on any cloud or platform.  Data scientists, analysts and engineers can use Unity Catalog to securely discover, access and collaborate on trusted data and AI assets, leveraging AI to boost productivity and unlock the full potential of the Lakehouse architecture.  This unified approach to governance accelerates data and AI initiatives while simplifying regulatory compliance.  Unity Catalog provides centralized access control, auditing, lineage, and data discovery capabilities across Databricks workspaces.
  • 8.
    Databricks Unity Catalog'sArchitecture Unity Catalog's Architecture
  • 9.
    Key Components ofDatabricks Unity Catalog The Databricks Lakehouse architecture combines data stored with the Delta Lake protocol in cloud object storage with metadata registered to a metastore. There are five primary objects in the Databricks Lakehouse: ▪ Catalog: a grouping of databases. ▪ Database or schema: a grouping of objects in a catalog. Databases contain tables, views, and functions. ▪ Table: a collection of rows and columns stored as data files in object storage. ▪ View: a saved query typically against one or more tables or data sources. ▪ Function: saved logic that returns a scalar value or set of rows.
  • 10.
    Databricks Unity Catalog:Features and capabilities The Unity Catalog's meta store is a blend of data catalog features, each designed to ease the journey of data management.
  • 11.
    Databricks Unity Catalog:Features and capabilities Data discovery and Exploration: Unified visibility into data and AI  Discover and classify structured and unstructured data, ML models, notebooks, dashboards and arbitrary files on any cloud.  Users can easily discover and explore available datasets and data assets using Unity Catalog's intuitive interface.  It supports searching, filtering, and browsing metadata based on attributes such as dataset name, description, tags, schema, and lineage.
  • 12.
    Databricks Unity Catalog:Features and capabilities Data Lineage: Know about your data journey  Unity Catalog offers comprehensive data lineage capabilities that enable users to track the flow of data from its source to consumption.  It provides visibility into data transformations, ETL processes, and data dependencies, helping users understand data provenance and impact analysis.
  • 13.
    Databricks Unity Catalog:Features and capabilities Access control: Single permission model for data and AI  Simplify access management with a unified interface to define access policies on data and AI assets and consistently apply and audit these policies on any cloud or data platform.
  • 14.
    Databricks Unity Catalog:Features and capabilities Data Sharing: Open data sharing  With unity catalog we can easily share data and AI assets across clouds, regions and platforms with open source Delta Sharing, natively integrated within Unity Catalog.  Securely collaborate with anyone, anywhere to unlock new revenue streams and drive business value, without relying on proprietary formats, complex ETL processes or costly data replication.
  • 15.
    Databricks Unity Catalog:Features and capabilities AI-powered monitoring and observability  Harness the power of AI to automate monitoring, diagnose errors and uphold data and ML model quality.  Benefit from proactive alerts that automatically detect personally identifiable information (PII) data, track model drift, and effectively resolve issues within your data and AI pipelines to maintain accuracy and integrity.
  • 16.
    Benefits of UsingDatabricks Unity Catalog  Enhanced Data Visibility and Transparency: Centralized metadata repository provides a single source of truth for all data assets.  Improved Data Quality and Consistency: Data lineage tracking helps identify data quality issues and ensure consistency.  Accelerated Data Discovery and Analysis: Data catalog simplifies data discovery, leading to faster insights and analysis.  Simplified Compliance and Regulatory Reporting: Policy management ensures adherence to regulatory requirements and simplifies compliance reporting.
  • 17.
  • 18.