Data Governance: Description, Design, Delivery


Published on

Presented at the OKHIMSS Medical Technology Forum on November 3rd at InnoTech Oklahoma. All rights reserved.

Published in: Technology, Business
  • Be the first to comment

  • Be the first to like this

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

Data Governance: Description, Design, Delivery

  1. 1. TEKsystems Data Services Data Governance 10/27/2011
  2. 2. Today’s Agenda› What is Governance?› Why Data Governance?› Governance Infrastructure› Program versus Project Management› Single Version of Truth versus Fit For Purpose› Metadata - The Heart of Governance› Master Data Management› Data Profiling – Enforcing and Developing Business Rules› Q&A2
  3. 3. What is Governance? • Governance is single authoritative group responsible for creating and enforcing standards and policies around processes and data.” • Governance isn’t a short term project. Governance is an ongoing program. Governance is enterprise focused. • The Governance Board is the heart of a governance program. The governance board is cross- functional organization with executive-level membership .3
  4. 4. What is Governance? • Governance provides standards and policies around the following in relation to processes and data. Software Products , Infrastructure, Quality, Security, Adjudicative Dispute Resolution, Lifecycle, Best Practices, Architecture and future roadmaps, Project Prioritization, Asset Management, Version Control, Evangelizing and Communication, Vendor Relationship Management, and Legal & Corporate Compliance.4
  5. 5. Why Data Governance?• A strong governance program is vital to the success of any enterprise architecture.• Compliance – Governance programs allows for the compliance to regulatory requirements. We have all heard “I am too pretty to go to jail”. Well, it’s true without governance. Without governance, there exists no formalized process for proving compliance such as regulatory compliance with HIPAA and privacy laws.• Data governance initiatives may be aimed at achieving a number of objectives including offering better visibility to internal and external customers such as supply chain management.5
  6. 6. Why Data Governance?• Harmonizing – Governance provides for standard definitions. This allows developers, Database administrations, end users and data stewards to be working with the same values.• Consistent Analysis – Allows the business to rollup (consolidated) values with consistent values apples to apples.• Faster Development – By providing standard definitions and models, we provide the infrastructure for extreme development. The hard part is adding data. If we do not need to add data, we can develop applications in days or hours.6
  7. 7. Why Data Governance?• The Confliction Resolution – resolve disputes between business groups.• Asset Management – harvesting and management of assets to maximum business returns. Ensure prioritization with an enterprise view. Reduction in costs thru elimination of duplicate efforts.• Security – Operational metadata. Who access what and when and how? This will not only allow for regulatory compliance; however, for better data warehouse design.• Better Data and Process Quality thru clearly defined business rules. Empowerment thru the speed of business exposing the business rules.7
  8. 8. Governance Infrastructure• The Governing Committee should be a group of key BI Senior Sponsors, project sponsors, and IT personnel from each of the Business Units (or at least those that currently and potentially utilize the BI COE services). IT should serve the governance as trusted advisors not the primary drivers of the governance board. The lead of the governance board should come from the business side.• This governing committee is not expected to be involved in day-to-day management of the BI COE.• The Governing Committee should meet regularly (i.e., alternating months or quarterly).8
  9. 9. Program versus Project Management• Program Management is what forces data governance to be enterprise focused. It is easy to implement the database components without regard to the enterprise if the outer functions are absent. With them firmly in place, the focus must shift to serving the entire enterprise. The ensure that silo application are not created that data is sharable and reusable that corporate standards are adhered to that best practices are used thought-out.9
  10. 10. Program versus Project Management Program Project On Going One Time Broad Perspective Specific Perspective Requires Architecture May Not Require Architecture Long Term Focus Short Term Focus Realizes Benefits of May Not Realize the Standards and Reuse Benefits of Standards and Reuse. Strategy Is Essential Strategy Is Not Essential10
  11. 11. Single Source of Truth versus Fit for Purpose• Single source of truth based upon data governance rules to determine the single source of truth.• Fit to purpose means storing any number of data values and relying upon data governance rules to determine the approximate one.11
  12. 12. Modeling - Fit for Purpose• For example, clinic discharge date and billing discharge date.• Fit for Purpose we could keep both in the EDW. Enc1 s1 12/1/2010 Clinical Enc2 s2 12/2/2010 Billing Enc3 SSOT 12/1/2010 Clinical Additional modeling for Fit for Purpose Peer Tables Flex Columns12
  13. 13. Metadata is the Heart of Governance• Metadata is very important for the management and operation of a data.• Metadata provides context for data by describing data about data. For example, on every food product, there is a list of ingredients, calorie and vitamin content.• Metadata can be classified into four areas: • Business metadata • Operational metadata • Technical metadata • Process metadata13
  14. 14. Business Metadata• Business metadata describes the business meaning of data. It includes business definitions of the objects and metrics, hierarchies, business rules, and aggregation rules.• Business Metadata must be • Searchable and • Easy to access.14
  15. 15. Business Metadata - Searchable• Business metadata must be searchable to determine how many occurrences of the term exist. This search ability component is important to support fit for purpose architecture versus a single version of the truth.• For example, we must be able to determine How Many Different Definitions of Patient Stay. Without a Business Metadata repository, this is a manual effort.15
  16. 16. Business Metadata – Ease to Access• Business metadata must be integrated in the application so that it is easily accessible by the end user. For example, an end user must be able to move his cursor on top of a field on his computer screen and click for a help screen to appear. This help screen should describe the business data of the element that the end user has selected. The Business Metadata should appear to give a clear definition of the data element.16
  17. 17. Business Metadata – Ease to Access17
  18. 18. Business Metadata – Ease to Access18
  19. 19. Operational Metadata• Operational metadata stores information about who accessed what and when. This metadata is not only important for legal requirements but for the design of the data warehouse itself.• For example, we can identify that a particular data mart is not being utilized. This will enable us to develop a plan. Should we eliminate the data mart? Should we be providing better education for the end users? Should we redesign the application or the data mart?19
  20. 20. Technical Metadata• Technical Metadata describes the data structures and formats such as table types, data types, indexes and partitioning method. Also, it describes the location of the data elements.• Technical metadata is very critical in a federated architecture where a data model is implementing across many different servers in different locations.20
  21. 21. Technical Metadata• For example, on Server A, patient’s first name might be a 5 character field. On Server B, it maybe, 20 character field. The patient’s first name would not be consistent across these two servers because a patient’s name could be truncated on Server A. For example, Michael would appears as Micha on Server A and Michael on Server B. The Governance board needs to understand these technical data abnormities. The lack of enforcing consistent technical metadata across servers is the primary reason federated architectures fail.21
  22. 22. Technical Metadata22
  23. 23. Technical Metadata• Process Metadata describes the data input process. It includes data cleansing rules, source target maps, transformation rules, validation rules and integration rules. Also, process metadata covers versioning of processes (services).• Process metadata tools need to support two important governance features: • Data Linage and • Impact Analysis.23
  24. 24. Technical Metadata – Data Lineage• The Data Lineage feature supports graphical visualization of complex data lineage relationships. We have the ability to track lineage of data from end user report/dashboard back to original data source elements. Data Lineage will answer questions such as Where the data came from? What business rules, transformations happened? And What data source is the authoritative source?.24
  25. 25. Process Metadata – Data Lineage25
  26. 26. Process Metadata – Data Lineage Column Name Description Data Type DATASOURCE_NUM_ID Primary key to NUMBER(10) NOT uniquely identify a NULL data source. DATASOURCE_CODE Stores the code VARCHAR2(80) assigned DATASOURCE_NAME Stores the name of the VARCHAR2(80) data source. DATASOURCE_DESC Stores the description VARCHAR2(2000) of the data source.26
  27. 27. Technical Metadata – Impact Analysis• The Impact Analysis feature shows us what impact there would be if we changed a process.• For example, the governance board has a question – What effort would it be to reduce down the number of different definitions of patient stay from 4 to 2. The business metadata would confirm that there are 5 different versions of patient stay. The process metadata tool with impact analysis would tell us that 50 ETL programs would have to be modified. It would tells us that 15 easy ETL programs, 20 medium ETL programs and 15 hard ETL programs would have to be modified. With this information, we can make estimates on the amount of resources, the timeframe and the total cost.27
  28. 28. Master Data Management• Master Data Management (MDM) provides for special data quality processes on a limited number of objects. MDM utilizes metadata to assist in the cleansing of MDM data. However, it is a separate concept from metadata. All objects cannot be given MDM treatment. Usually, MDM scope is limited to between 5 and 10 objects. These objects are very critical and need special treatment to ensure a higher data quality standard. MDM repository contains one true version of each master entity created from multiple source systems and this “golden copy” is used by downstream systems.28
  29. 29. Master Data Management• MDM tools are designed to identify duplicates, handle the variations of key entity across source systems and standardize the data.• The following represent three objects to consider for MDM in Healthcare: • Master Patient Index, • Units of Measures and • Enterprise Terminology Services29
  30. 30. Data Profiling• Data Profiling is a data investigation and quality monitoring process. It allows the business to access the quality of their data through metrics, to discover or infer rules based on data and to monitor historical metrics about data quality such as range of values, frequency, patterns/formats and sparseness.• Data Profiling is a key enforcement mechanism of data governance. Data profiling examines the data to validate the data. Often this process leads to the discovery of new business rules.30
  31. 31. Data Profiling Patterns• Conversion and Translation Rules Validation - Conversion and translation mean conforming data to a single value for a data element. For example, the state code for Colorado might be 23, C or CO in three different source systems. We need to agree on one value say CO for Colorado. We then would covert or translate the 23 and C to CO. This allows us to run queries in which CO will represent all of the Colorado data.• Format Validation - Sometimes data elements need a standard format such as phone number or Social Security Number (SSN). For example, SSN has the format 999-99-9999. The SSN utilizes this standard format for intelligence. The first three numbers identify a specific region of the county where the SSN was first obtained.• Range Checking - Range checking is utilized to see if data values fit within a boundary of values. For example, a birth date may be checked to verify if the person is no more than 200 years old. Often people leave off the first two digits of a birth date. For example, a person born in 1959 would say that they were born in 59.31
  32. 32. Data Profiling Patterns• Sparseness - A sparseness check evaluates the percentage of the data elements that are actually populated with meaningful data values. Also, we check that the system is correctly utilizing null values. A null value is a missing value. Some systems mistakenly insert a zero or blank in place of a null value.• Uniqueness Checking - In a relational database, every table that is normalized to at least 1st normal form needs a primary key. A primary key enforces uniqueness.• Frequency Distribution - A count of the distinct vales in a column. Often, if there is a value with only 1 occurrence in a table with millions of rows, it may be an indication that this value should be merged with other values.• Overloading Columns - A check to see if a column has multiple values in the same columns. Sometimes, different developers store different values in the same column. This may be an indication that we need to redesign the column into two or more columns.• Best Sources - Checking to determine that a column is being populated by the best source. Often a column could be populated from multiple sources. However, one source is usually better than the others. This best source is often referred to as a Gold Source.• Derivation - If a data value is being derived, check to make sure the calculation is correct.32
  33. 33. Data Profiling Patterns• Data Enrichment - Adding additional information to a data element. Check to see if the process is correct.• Aggregation Hierarchies Validation - Validate the hierarchies utilized to rollup or aggregate data values.• Matching - Validating that the matching process is correct. Matching data values is very complex and is covered more in MDM (Master Data Management).• Dependency between columns.33
  34. 34. 34
  35. 35. Thank you
  36. 36. Appendix A – Hybrid Architecture› PSA – Persistent Stage Area› EDW – Enterprise Data Warehouse› Data Marts› Cubes – OLAP36
  37. 37. Hybrid Architecture DiagramClinical Clinical Messaging Star Dimensional Data Mart Dashboards & Reports Atomic Level Data Persistent 3NF EDW cube Mart StagingFINANCIAL Area OLAP Cube Star Dimensional Data Mart HRSUPPLY CHAIN 37
  38. 38. Persistent Staging Area› Late Arriving Data.› Easier and Conformed ETL.› Support Incremental loads.› SORP (System of Original Record Proxy).› Supports Data Lineage.› Flatter 3rd normal form. Better integration than star dimension model.› Data Profiling to develop data governance rules.38
  39. 39. Enterprise Data Warehouse› 3rd normal form. Better Integration than Star Dimensional model.› Supports both a Single Source of Truth and Fit for Purpose.› Supports Versioning.› Data Profiling to support Data Governance Rules.39
  40. 40. Data Marts and OLAP Cubes› Data Profiling to support Data Governance Rules.› Star Dimensional Modeling is better (easier) presentation format than 3rd normal form.› Fast response with partitioning and bitmapped indexes.› Supports Versioning.40