This document discusses governance software systems that help organizations manage and govern their data assets. It provides overviews of four open-source governance software systems: Apache Atlas, DataHub, Egeria, and Magda. Apache Atlas provides a unified view of data and supports various data sources. DataHub allows users to discover, understand and collaborate on data assets. Egeria provides a unified view of metadata across different systems. Magda allows organizations to discover and manage data assets through a central catalog. Open-source governance systems help organizations improve data management, security, and compliance.
2. INTRODUCTION
Definition of governance software
systems: Governance software
systems are tools that help
organizations manage and
regulate their data assets by
providing a centralized platform
for storing and managing
metadata.
3. Importance of managing and
governing data assets: Data is a
critical asset for many
organizations, but it can also be
complex and difficult to manage
effectively. Governance software
systems can help organizations
maintain a consistent view of
their data assets, ensure data
quality and accuracy, and comply
with regulatory requirements.
4. Overview of open-source
governance software systems:
There are several open-source
governance software systems
available, including Apache
Atlas, DataHub, Egeria, and
Magda. These tools provide a
flexible and cost-effective way
for organizations to manage
their data assets.
5. APACHE ATLAS
Apache Atlas is an open-source governance and metadata management system designed to
provide a unified view of data assets within an organization. It supports various data sources,
including Hadoop, Kafka, and relational databases, and provides a flexible data model that can be
customized to meet specific organizational needs.
Supported data sources: Apache Atlas supports a wide range of data sources, including Hadoop,
Kafka, Hive, and relational databases.
6. Atlas Architecture
Flexible data model: Apache Atlas provides a flexible
data model that can be customized to meet specific
organizational needs. It supports various metadata
types, including business glossaries, technical
metadata, and lineage information.
Metadata types supported: Apache Atlas supports
several metadata types, including entities,
classifications, relationships, and traits.
7. DataHub
DataHub is another open source governance software
system that provides a data cataloging platform for
organizations. It allows users to discover, understand,
and collaborate on their data assets.
The system provides a central catalog of data assets,
where users can search and filter for specific data
assets based on their metadata, such as schema,
tags, and owners. DataHub also provides a
collaborative platform where users can share and
document their knowledge about data assets.
9. EREGIA
Egeria is an open source metadata
management platform that provides a
unified view of an organization's data
assets. It provides a standardized way of
describing and managing metadata
across different systems and
applications. Egeria supports a range of
metadata repositories, such as Apache
Atlas, Apache Ranger, and Apache Hive,
and provides a common API for
accessing and managing metadata. The
system also includes data lineage and
data quality monitoring features.
10. MEGDA
Magda is an open source data
cataloging and discovery platform that
allows organizations to discover and
manage their data assets. It provides a
central catalog of data assets, where
users can search and filter for specific
data assets based on their metadata,
such as schema, tags, and owners.
Magda also includes data governance
features such as access control, data
sharing, and data quality monitoring.
12. CONCLUSION
Open source governance software systems provide a range of
functionalities that help organizations manage their data more
effectively. Apache Atlas, DataHub, Egeria, and Magda are just a few
examples of these systems, and each provides unique features to
meet different data governance needs. By adopting one of these
systems, organizations can improve their data management
practices, enhance data security, and ensure compliance with data
privacy regulations.
13. REFERENCE
Here are some references for further reading on the governance software
systems mentioned:
1. Apache Atlas: https://atlas.apache.org/
2. DataHub: https://datahubproject.io/
3. Egeria: https://egeria.odpi.org/
4. Magda: https://magda.io/