Data Governance
(with links to DAMA-DMBOK)
2
Week 2: Agenda
Overview of DAMA-DMBOK principles relevant to data governance.
DAMA-DMBOK's metadata management.
Data acquisition.
Data quality, challenges, standards and techniques to ensure it
Data Lineage and Traceability.
Importance of data lineage and traceability.
Relevant metadata.
Definition and importance of data management
Data Security and Privacy
3
Data Management
Data Management Professional
• Any person who works in any facet of data management (from technical management of data throughout its
lifecycle to ensuring that data is properly utilized and leveraged) to meet strategic organizational goals.
• Fill numerous roles, from the highly technical (e.g., database administrators, network administrators,
programmers) to strategic business (e.g., Data Stewards, Data Strategists, Chief Data Officers).
Data management activities
• include everything from the ability to make consistent decisions about how to get strategic value from data to
the technical deployment and performance of databases.
• Thus data management requires both technical and non-technical (i.e., ‘business’) skills. Responsibility for
managing data must be shared between business and information technology roles, and people in both areas
must be able to collaborate to ensure an organization has high quality data that meets its strategic needs.
The development, execution, and supervision of plans, policies, programs, and practices that deliver, control, protect,
and enhance the value of data and information assets throughout their lifecycles.
4
“Shifting the way data is used to derive value”
 It’s about deriving real value from data,
and not just storing data for data’s sake
“A set of principles to derive maximum value from an
organisation’s data, whilst protecting it as a key corporate
asset”
 Changing thinking to recognise that data is an asset
“The execution of a set of principles and processes to
derive maximum value from an organisation’s data, whilst
protecting it as a key corporate asset”
 The actual doing of it – it’s not theoretical!
 “The business management of data”
What is Data Management?
5
Data Management Principles
 Data is an asset with unique properties
 The value of data can and should be expressed in economic
terms:
 Managing data means managing the quality of data
 It takes Metadata to manage data
 It takes planning to manage data
 Data management is cross-functional; it requires a range of
skills and expertise
 Data management requires an enterprise perspective
 Data management must account for a range of perspectives
 Data management is lifecycle management
 Different types of data have different lifecycle characteristics
 Managing data includes managing the risks associated with
data
 Data management requirements must drive Information
Technology decisions
 Effective data management requires leadership commitment
6
​
Goal of data security practices is to protect information assets in alignment with privacy
and confidentiality regulations, contractual agreements, and business requirements.
These requirements come from:
​
• Stakeholders: Organizations must recognize the privacy and confidentiality needs of
their stakeholders, including clients, patients, students, citizens, suppliers, or business
partners. Everyone in an organization must be a responsible trustee of data about
stakeholders.
​
• Government regulations: Government regulations are in place to protect the
interests of some stakeholders. Regulations have different goals. Some restrict access
to information, while others ensure openness, transparency, and accountability.
​
• Proprietary business concerns: Each organization has proprietary data to protect.
An organization’s data provides insight into its customers and, when leveraged
effectively, can provide a competitive advantage. If confidential data is stolen or
breached, an organization can lose competitive advantage.
​
• Legitimate access needs: When securing data, organizations must also enable
legitimate access. Business processes require individuals in certain roles be able to
access, use, and maintain data.
​
• Contractual obligations: Contractual and non-disclosure agreements also influence
data security requirements. For example, the PCI Standard, an agreement among
credit card companies and individual business enterprises, demands that certain types
of data be protected in defined ways (e.g., mandatory encryption for customer
passwords).
Data Security and Privacy
Data Security ensures that data privacy and confidentiality are maintained, that data is not breached, and that
data is accessed appropriately
7
​
Understanding and complying with
the privacy and confidentiality
interests and needs of all
stakeholders is in the best interest of
every organization. Client, supplier,
and constituent relationships all trust
in, and depend on, the responsible
use of data.
Data Security and Privacy
Effective data security policies and procedures ensure that the right people can use and update data in
the right way, and that all inappropriate access and update is restricted
8
Data acquisition activities involve:
 Receiving and responding to new data source acquisition requests
 Performing rapid, ad-hoc, match and high-level data quality assessments using data cleansing
and data profiling tools
 Assessing and communicating complexity of data integration to the requesters to help them with
their cost-benefit analysis
 Piloting acquisition of data and its impact on match rules
 Finalizing data quality metrics for the new data source
 Determining who will be responsible for monitoring and maintaining the quality of a new source’s
data
 Completing integration into the overall data management environment
Data acquisition
9
 Metadata describes the data itself (e.g., databases, data
elements, data models), the concepts the data represents
(e.g.,
 business processes, application systems, software code,
technology infrastructure), and the connections
 (relationships) between the data and concepts.
 Metadata helps an organization understand its data, its
systems, and its workflows. It enables data quality
assessment and is integral to the management of databases
and other applications. It contributes to the ability to process,
maintain, integrate, secure, audit, and govern other data.
 Metadata is essential to data management as well as data
usage (multiple references to Metadata throughout the
DAMA-DMBOK).
DAMA-DMBOK's metadata management
Metadata includes information about technical and business processes, data rules and constraints, and
logical and physical data structures
10
 As technology has evolved, the speed at which data is
generated has also increased. Technical Metadata has
become integral to the way in which data is moved and
integrated. ISO’s Metadata Registry Standard,
ISO/IEC 11179, is intended to enable Metadata-driven
exchange of data in a heterogeneous environment,
based on exact definitions of data. Metadata present in
XML and other formats enables use of the data. Other
types of Metadata tagging allow data to be exchanged
while retaining signifiers of ownership, security
requirements, etc.
 Like other data, Metadata requires management. As
the capacity of organizations to collect and store data
increases, the role of Metadata in data management
grows in importance. To be data-driven, an
organization must be Metadata-driven.
DAMA-DMBOK's metadata management
Metadata includes information about technical and business processes, data rules and constraints, and
logical and physical data structures
11
 Effective data management involves a set of complex, interrelated
processes that enable an organization to use its data to achieve
strategic goals.
 Data management includes the ability to design data for applications,
store and access it securely, share it appropriately, learn from it, and
ensure it meets business needs.
 An assumption underlying assertions about the value of data is that
the data itself is reliable and trustworthy.
​In other words, that it is of high quality.
Data quality
12
 Definition: The term data quality refers both to the characteristics
associated with high quality data and to the processes used to
measure or improve the quality of data. These dual usages can be
confusing, so it helps to separate them and clarify what constitutes
high quality data
Data Quality programs focus on these general goals:
 Developing a governed approach to make data fit for purpose based
on data consumers’ requirements
 Defining standards and specifications for data quality controls as part
of the data lifecycle
 Defining and implementing processes to measure, monitor, and
report on data quality levels
 Identifying and advocating for opportunities to improve the quality of
data, through changes to processes and systems and engaging in
activities that measurably improve the quality of data based on data
consumer requirements
Data quality – Definition, and Goals
13
Data Quality programs should be guided by the following principles:
 Criticality: A Data Quality program should focus on the data most
critical to the enterprise and its customers.
 Lifecycle management: The quality of data should be managed
across the data lifecycle, from creation or procurement through
disposal.
 Prevention: The focus of a Data Quality program should be on
preventing data errors and conditions that reduce the usability of
data; it should not be focused on simply correcting records.
 Root cause remediation: Improving the quality of data goes
beyond correcting errors. Problems with the quality of data should
be understood and addressed at their root causes, rather than just
their
 symptoms. Because these causes are often related to process or
system design, improving data quality often requires changes to
processes and the systems that support them.
 Governance: Data Governance activities must support the
development of high quality data and Data Quality program
activities must support and sustain a governed data environment.
Data quality – Principles
14
Data Quality programs should be guided by the following principles:
 Standards-driven: All stakeholders in the data lifecycle have data
quality requirements. To the degree possible, these requirements
should be defined in the form of measurable standards and
expectations against which the quality of data can be measured.
 Objective measurement and transparency: Data quality levels
need to be measured objectively and consistently. Measurements
and measurement methodology should be shared with stakeholders
since they are the arbiters of quality.
 Embedded in business processes: Business process owners are
responsible for the quality of data produced through their processes.
They must enforce data quality standards in their processes.
 Systematically enforced: System owners must systematically
enforce data quality requirements.
 Connected to service levels: Data quality reporting and issues
management should be incorporated into Service Level Agreements
(SLA).
Data quality – Principles
15
Data Lineage and Traceability.
 And, refreshed very slowly – for
complex reports, even high level data
flow lineage is often outdated as soon
as it’s manually captured
In the past, capturing and maintaining
manual data flow lineage has been:
 Not only a regulatory and internal audit
requirement.
 But, very time & resource intensive
The process of data discovery will also uncover information about how data flows through an
organization. This information can be used to document high-level data lineage: how the data under
analysis is acquired or created by the organization, where it moves and is changed within the
organization, and how the data is used by the organization for analytics, decision-making, or event
triggering. Detailed lineage can include the rules according to which data is changed, and the
frequency of changes.
16
Data Lineage and Traceability.
17
Data Lineage and Traceability.
To maintain data integrity and traceability throughout the data lifecycle, DBAs communicate the changes to physical
database attributes to modelers, developers, and Metadata managers.
18
Data Lineage and Traceability.
19
Preparing Data
20
​
Metadata repository refers to
the physical tables in which the
Metadata is stored. Often these
are built into modelling tools, BI
tools, and other applications.
As an organization matures, it
will want to integrate Metadata
from repositories in these
applications to enable data
consumers to look across the
breadth of information
Metadata Repositories
​
Technical Metadata
 Physical database table and column names
 Column properties
 Database object properties
 Access permissions
 Data CRUD (create, replace, update and delete) rules
 Physical data models, including data table names, keys, and indexes
 Documented relationships between the data models and the physical assets
 ETL job details
 File format schema definitions
 Source-to-target mapping documentation
 Data lineage documentation, including upstream and downstream change
impact information
 Program and application names and descriptions
 Content update cycle job schedules and dependencies
 Recovery and backup rules
 Data access rights, groups, roles

Data Governance without AI Course Week 2.pptx

  • 1.
  • 2.
    2 Week 2: Agenda Overviewof DAMA-DMBOK principles relevant to data governance. DAMA-DMBOK's metadata management. Data acquisition. Data quality, challenges, standards and techniques to ensure it Data Lineage and Traceability. Importance of data lineage and traceability. Relevant metadata. Definition and importance of data management Data Security and Privacy
  • 3.
    3 Data Management Data ManagementProfessional • Any person who works in any facet of data management (from technical management of data throughout its lifecycle to ensuring that data is properly utilized and leveraged) to meet strategic organizational goals. • Fill numerous roles, from the highly technical (e.g., database administrators, network administrators, programmers) to strategic business (e.g., Data Stewards, Data Strategists, Chief Data Officers). Data management activities • include everything from the ability to make consistent decisions about how to get strategic value from data to the technical deployment and performance of databases. • Thus data management requires both technical and non-technical (i.e., ‘business’) skills. Responsibility for managing data must be shared between business and information technology roles, and people in both areas must be able to collaborate to ensure an organization has high quality data that meets its strategic needs. The development, execution, and supervision of plans, policies, programs, and practices that deliver, control, protect, and enhance the value of data and information assets throughout their lifecycles.
  • 4.
    4 “Shifting the waydata is used to derive value”  It’s about deriving real value from data, and not just storing data for data’s sake “A set of principles to derive maximum value from an organisation’s data, whilst protecting it as a key corporate asset”  Changing thinking to recognise that data is an asset “The execution of a set of principles and processes to derive maximum value from an organisation’s data, whilst protecting it as a key corporate asset”  The actual doing of it – it’s not theoretical!  “The business management of data” What is Data Management?
  • 5.
    5 Data Management Principles Data is an asset with unique properties  The value of data can and should be expressed in economic terms:  Managing data means managing the quality of data  It takes Metadata to manage data  It takes planning to manage data  Data management is cross-functional; it requires a range of skills and expertise  Data management requires an enterprise perspective  Data management must account for a range of perspectives  Data management is lifecycle management  Different types of data have different lifecycle characteristics  Managing data includes managing the risks associated with data  Data management requirements must drive Information Technology decisions  Effective data management requires leadership commitment
  • 6.
    6 ​ Goal of datasecurity practices is to protect information assets in alignment with privacy and confidentiality regulations, contractual agreements, and business requirements. These requirements come from: ​ • Stakeholders: Organizations must recognize the privacy and confidentiality needs of their stakeholders, including clients, patients, students, citizens, suppliers, or business partners. Everyone in an organization must be a responsible trustee of data about stakeholders. ​ • Government regulations: Government regulations are in place to protect the interests of some stakeholders. Regulations have different goals. Some restrict access to information, while others ensure openness, transparency, and accountability. ​ • Proprietary business concerns: Each organization has proprietary data to protect. An organization’s data provides insight into its customers and, when leveraged effectively, can provide a competitive advantage. If confidential data is stolen or breached, an organization can lose competitive advantage. ​ • Legitimate access needs: When securing data, organizations must also enable legitimate access. Business processes require individuals in certain roles be able to access, use, and maintain data. ​ • Contractual obligations: Contractual and non-disclosure agreements also influence data security requirements. For example, the PCI Standard, an agreement among credit card companies and individual business enterprises, demands that certain types of data be protected in defined ways (e.g., mandatory encryption for customer passwords). Data Security and Privacy Data Security ensures that data privacy and confidentiality are maintained, that data is not breached, and that data is accessed appropriately
  • 7.
    7 ​ Understanding and complyingwith the privacy and confidentiality interests and needs of all stakeholders is in the best interest of every organization. Client, supplier, and constituent relationships all trust in, and depend on, the responsible use of data. Data Security and Privacy Effective data security policies and procedures ensure that the right people can use and update data in the right way, and that all inappropriate access and update is restricted
  • 8.
    8 Data acquisition activitiesinvolve:  Receiving and responding to new data source acquisition requests  Performing rapid, ad-hoc, match and high-level data quality assessments using data cleansing and data profiling tools  Assessing and communicating complexity of data integration to the requesters to help them with their cost-benefit analysis  Piloting acquisition of data and its impact on match rules  Finalizing data quality metrics for the new data source  Determining who will be responsible for monitoring and maintaining the quality of a new source’s data  Completing integration into the overall data management environment Data acquisition
  • 9.
    9  Metadata describesthe data itself (e.g., databases, data elements, data models), the concepts the data represents (e.g.,  business processes, application systems, software code, technology infrastructure), and the connections  (relationships) between the data and concepts.  Metadata helps an organization understand its data, its systems, and its workflows. It enables data quality assessment and is integral to the management of databases and other applications. It contributes to the ability to process, maintain, integrate, secure, audit, and govern other data.  Metadata is essential to data management as well as data usage (multiple references to Metadata throughout the DAMA-DMBOK). DAMA-DMBOK's metadata management Metadata includes information about technical and business processes, data rules and constraints, and logical and physical data structures
  • 10.
    10  As technologyhas evolved, the speed at which data is generated has also increased. Technical Metadata has become integral to the way in which data is moved and integrated. ISO’s Metadata Registry Standard, ISO/IEC 11179, is intended to enable Metadata-driven exchange of data in a heterogeneous environment, based on exact definitions of data. Metadata present in XML and other formats enables use of the data. Other types of Metadata tagging allow data to be exchanged while retaining signifiers of ownership, security requirements, etc.  Like other data, Metadata requires management. As the capacity of organizations to collect and store data increases, the role of Metadata in data management grows in importance. To be data-driven, an organization must be Metadata-driven. DAMA-DMBOK's metadata management Metadata includes information about technical and business processes, data rules and constraints, and logical and physical data structures
  • 11.
    11  Effective datamanagement involves a set of complex, interrelated processes that enable an organization to use its data to achieve strategic goals.  Data management includes the ability to design data for applications, store and access it securely, share it appropriately, learn from it, and ensure it meets business needs.  An assumption underlying assertions about the value of data is that the data itself is reliable and trustworthy. ​In other words, that it is of high quality. Data quality
  • 12.
    12  Definition: Theterm data quality refers both to the characteristics associated with high quality data and to the processes used to measure or improve the quality of data. These dual usages can be confusing, so it helps to separate them and clarify what constitutes high quality data Data Quality programs focus on these general goals:  Developing a governed approach to make data fit for purpose based on data consumers’ requirements  Defining standards and specifications for data quality controls as part of the data lifecycle  Defining and implementing processes to measure, monitor, and report on data quality levels  Identifying and advocating for opportunities to improve the quality of data, through changes to processes and systems and engaging in activities that measurably improve the quality of data based on data consumer requirements Data quality – Definition, and Goals
  • 13.
    13 Data Quality programsshould be guided by the following principles:  Criticality: A Data Quality program should focus on the data most critical to the enterprise and its customers.  Lifecycle management: The quality of data should be managed across the data lifecycle, from creation or procurement through disposal.  Prevention: The focus of a Data Quality program should be on preventing data errors and conditions that reduce the usability of data; it should not be focused on simply correcting records.  Root cause remediation: Improving the quality of data goes beyond correcting errors. Problems with the quality of data should be understood and addressed at their root causes, rather than just their  symptoms. Because these causes are often related to process or system design, improving data quality often requires changes to processes and the systems that support them.  Governance: Data Governance activities must support the development of high quality data and Data Quality program activities must support and sustain a governed data environment. Data quality – Principles
  • 14.
    14 Data Quality programsshould be guided by the following principles:  Standards-driven: All stakeholders in the data lifecycle have data quality requirements. To the degree possible, these requirements should be defined in the form of measurable standards and expectations against which the quality of data can be measured.  Objective measurement and transparency: Data quality levels need to be measured objectively and consistently. Measurements and measurement methodology should be shared with stakeholders since they are the arbiters of quality.  Embedded in business processes: Business process owners are responsible for the quality of data produced through their processes. They must enforce data quality standards in their processes.  Systematically enforced: System owners must systematically enforce data quality requirements.  Connected to service levels: Data quality reporting and issues management should be incorporated into Service Level Agreements (SLA). Data quality – Principles
  • 15.
    15 Data Lineage andTraceability.  And, refreshed very slowly – for complex reports, even high level data flow lineage is often outdated as soon as it’s manually captured In the past, capturing and maintaining manual data flow lineage has been:  Not only a regulatory and internal audit requirement.  But, very time & resource intensive The process of data discovery will also uncover information about how data flows through an organization. This information can be used to document high-level data lineage: how the data under analysis is acquired or created by the organization, where it moves and is changed within the organization, and how the data is used by the organization for analytics, decision-making, or event triggering. Detailed lineage can include the rules according to which data is changed, and the frequency of changes.
  • 16.
    16 Data Lineage andTraceability.
  • 17.
    17 Data Lineage andTraceability. To maintain data integrity and traceability throughout the data lifecycle, DBAs communicate the changes to physical database attributes to modelers, developers, and Metadata managers.
  • 18.
    18 Data Lineage andTraceability.
  • 19.
  • 20.
    20 ​ Metadata repository refersto the physical tables in which the Metadata is stored. Often these are built into modelling tools, BI tools, and other applications. As an organization matures, it will want to integrate Metadata from repositories in these applications to enable data consumers to look across the breadth of information Metadata Repositories ​ Technical Metadata  Physical database table and column names  Column properties  Database object properties  Access permissions  Data CRUD (create, replace, update and delete) rules  Physical data models, including data table names, keys, and indexes  Documented relationships between the data models and the physical assets  ETL job details  File format schema definitions  Source-to-target mapping documentation  Data lineage documentation, including upstream and downstream change impact information  Program and application names and descriptions  Content update cycle job schedules and dependencies  Recovery and backup rules  Data access rights, groups, roles