Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Straight Talk to Demystify Data Lineage

380 views

Published on

Are you sure you trust the data you just used for that $10 million decision? To trust data authenticity we must first understand its lineage. However, the term "Data Lineage" itself is ambiguous since it is used in different contexts. "Business Lineage" links metadata constructs to specific terms in a business glossary. This approach is used by numerous Data Governance solutions. This approach alone comes up short, since it doesn't trace the real flow of information through an organization. "Technical Lineage" traces data's journey through different systems and data stores, providing an audit trail of the changes along the way. True "Data Lineage" combines both aspects, providing context to fully understand the data life cycle. Every step in data's journey is a potential source for introduction of error that could compromise Data Quality, and hence, business decisions. In this session, Ron Huizenga offers a comprehensive discussion of data lineage and associated Data Quality remediation approaches that are essential to build a foundation for Data Governance.

Published in: Data & Analytics
  • Be the first to comment

  • Be the first to like this

Straight Talk to Demystify Data Lineage

  1. 1. © 2019 IDERA, Inc. All rights reserved. STRAIGHT TALK TO DEMYSTIFY DATA LINEAGE
  2. 2. © 2019 IDERA, Inc. All rights reserved. 2 DRIVING ENTERPRISE DATA GOVERNANCE ▪ Key drivers for instituting data governance: • Improved information utilization • Better data quality • Improved interoperability • Improved technical operationalization • Reduced operational costs • Streamlined design and development • Improved business accountability • Compliance with data use agreements • Compliance with regulatory demands • Improved business results • Trustworthy analytics • Trustworthy reporting
  3. 3. © 2019 IDERA, Inc. All rights reserved. OBJECTIVES OF A DATA GOVERNANCE PROGRAM Understand and interpret business data dependencies Define and approve data policies Develop procedures for operationalization Continuously monitor compliance
  4. 4. © 2019 IDERA, Inc. All rights reserved. DATA LINEAGE POWERS DATA GOVERNANCE ▪ Data lineage methods help to develop a map of the enterprise data landscape ▪ Data lineage provides a holistic description of each data object’s • Sources • Information pipelines • Transformations • Methods of access • Controls • All other fundamental aspects of information utility
  5. 5. © 2019 IDERA, Inc. All rights reserved. ASPECTS OF DATA LINEAGE Business lineage Technical lineage Procedural lineage The semantic aspects of tracing data meaning and usage semantics The structural aspects of data element concepts and their use across the enterprise A trace of data's journey through different systems and data stores, providing an audit trail of the changes along the way Data lineage combines three different aspects of corporate metadata:
  6. 6. © 2019 IDERA, Inc. All rights reserved. TECHNIQUES SUPPORTING LINEAGE Policy management Glossary Business Process Model
  7. 7. © 2019 IDERA, Inc. All rights reserved. BUSINESS LINEAGE ▪ Inventory and description of business characteristics of data assets captured within a data catalog, accumulating information such as: • Data asset description • Business glossary • Data asset location • Data sensitivity • Access rights
  8. 8. © 2019 IDERA, Inc. All rights reserved. TECHNICAL/STRUCTURAL LINEAGE ▪ Catalogs which data element concepts are used ▪ Notes how data element concepts are manifested as data elements within specific data assets ▪ Not limited to static data sets • Data in motion • Manifestation of data element concepts in dynamic contexts such as reports and feature sets for analysis
  9. 9. © 2019 IDERA, Inc. All rights reserved. PROCEDURAL LINEAGE ▪ Identify the original introduction of data elements ▪ Establish the process flow for data elements that are central to data policy compliance ▪ Draft a mapping of data element use to the business application touch points ▪ Determine where data instances are created, updated, or just read ▪ Document transformations applied
  10. 10. © 2019 IDERA, Inc. All rights reserved. BENEFITS OF DATA LINEAGE ▪ Analyzing data dependencies ▪ Validating semantic consistency ▪ Impact analysis ▪ Data quality root cause analysis ▪ Integrating data controls ▪ Enforcing regulatory compliance ▪ Protecting sensitive data Resulting in: ▪ Better data quality ▪ Better business decisions
  11. 11. © 2019 IDERA, Inc. All rights reserved. ANALYZING DATA DEPENDENCIES ▪ Unexposed data dependencies introduce risks in ensuring high- quality usable data • Reports, dashboards, and analyses may appear to be derived from data sets from isolated systems, but in many cases there is a chain of processing that ultimately originates with data taken from a shared data source • Multiple data sets may be populated using data from distinct yet structurally and semantically equivalent sources ? =
  12. 12. © 2019 IDERA, Inc. All rights reserved. VALIDATING SEMANTIC CONSISTENCY Social Security Number Identifier Unique number assigned by Social Security Administration Authentication Last four digits of number assigned by the Social Security Administration Identifier Unique number assigned by the company Customer ID
  13. 13. © 2019 IDERA, Inc. All rights reserved. IMPACT ANALYSIS ▪ External drivers and directives may demand changes to organizational information systems ▪ Data lineage allows forward-dependency tracing to identify downstream systems impacted by changes to • Business term definitions • Data element specifications • Augmentation of data element semantics • Changes in business process flow
  14. 14. © 2019 IDERA, Inc. All rights reserved. ISSUE ROOT CAUSE ANALYSIS ▪ Data lineage maps the information production flow ▪ A data steward can use the lineage maps to reverse-trace back through the data production flow ▪ Enables identification of the point of introduction of a data error
  15. 15. © 2019 IDERA, Inc. All rights reserved. INTEGRATED DATA CONTROLS ▪ Identification of “problem spots” and key phases in business information flows highlight opportunities for integrated data controls ▪ Data controls validate data flowing through selected processing phases ▪ Alerts are generated when invalid data values are identified
  16. 16. © 2019 IDERA, Inc. All rights reserved. ENFORCING DATA POLICIES ▪ Data policies can be formulated to reflect externally-imposed data compliance requirements ▪ Business lineage is used to • Capture external policy definitions • Standardize semantics across different application usage of shared data element concepts ▪ Technical lineage allows for • Standardized specifications for data element validation • Institution of audit controls for demonstrating compliance ▪ Examples: • GDPR • CCPA • 12 CFR Part 11 • HIPAA Privacy Rule
  17. 17. © 2019 IDERA, Inc. All rights reserved. PROTECTING SENSITIVE DATA ▪ Business lineage traces origin and levels of data sensitivity ▪ Coupled with procedural lineage allows for insertion of data protection techniques • Encryption at rest • Encryption in motion • Data masking • Access controls
  18. 18. © 2019 IDERA, Inc. All rights reserved. SOME QUESTIONS DATA LINEAGE CAN ANSWER ▪ To understand organizational data • What’s important? • Where is it? (can be may places) • Where did it come from? • How is it used (business processes)? • What is the chain of custody? • What are the business rules? ▪ To support governance • How do I identify private information? • How long should I keep the information? • Master Data Management classification • Data quality • Is it fit for purpose? • What changed and why?
  19. 19. © 2019 IDERA, Inc. All rights reserved. CONSIDERATIONS ▪ Data lineage augments the corporate toolkit for deploying data governance ▪ Look for products that simplify the data steward’s consumption of data lineage mappings, and have: • The ability to enable users to see the flow of data through the data production lifecycle • A mechanism for enumerating the data sources for the different data pipelines • The ability to identify data elements and link them to data models and to metadata for data element concepts and business glossaries • A method of documenting data transformations and allowing data professionals to review those transformations across a variety of data pipelines • The capability of interoperating with existing ETL/data integration tools to import data pipelines along with their collected transformations. • A means for collaboration around data pipelines and associated metadata • The ability to display a visual presentation allowing data stewards to review the data lineage
  20. 20. © 2019 IDERA, Inc. All rights reserved. THANKS! Any questions? Learn more at: www.idera.com 20

×