You Need a Data Catalog. Do You Know Why?

Aug. 25, 2022
Technology

The data catalog has become a popular discussion topic within data management and data governance circles. A data catalog is a central repository that contains metadata for describing data sets, how they are defined, and where to find them. TDWI research indicates that implementing a data catalog is a top priority among organizations we survey.


The data catalog can also play an important part in the governance process. It provides features that help ensure data quality, and compliance, and that trusted data is used for analysis. Without an in-depth knowledge of data and associated metadata, organizations cannot truly safeguard and govern their data.


Join this webinar to learn more about the data catalog and its role in data governance efforts.


During this webinar, industry experts will cover:
• Data management challenges and priorities
• The modern data catalog: what it is and why it is important
• The role of the modern data catalog in your data quality and governance programs
• The kinds of information that should be in your data catalog and why


The webinar will be led by industry experts including:
• Chris Reed, Manager, Sales Engineering, Precisely
• Matthew Vandevere, VP, Strategic Services, Precisely
• Colin Gibson, Moderator, Senior Advisor of DCAM & CDMC, EDM Council

  1. 1. You need a data catalog. Do you know why? A conversation with Christopher Reed Manager, Sales Engineering Precisely Matthew Vandevere VP, Strategic Services Precisely
  2. 2. 2 © 2022 EDM Council Inc. Today’s speakers Moderator Mike Meriton Co-Founder & COO EDM Council Matthew Vandevere VP, Strategic Services Precisely Christopher Reed Manager, Sales Engineering Precisely
  3. 3. You need a data catalog. Do you know why? Christopher Reed| Manager, Sales Engineering Matthew Vandevere| VP, Strategic Services
  4. 4. Poll #1
  5. 5. Agenda • What is a data catalog? • The relationship between governance and the data catalog • Data catalog governance use cases
  6. 6. Data catalog drivers 6 Today’s challenges • We stood up a data lake, it’s too much, now what? • Profusion of valued analytics. Driving need for shareable data • Regulatory compliance • You’ve been told you need a data catalog • My spreadsheet and email repositories are not keeping up
  7. 7. Data catalogs and data governance 7 Data Internet Webpages Marketplace Appliances Cosmetics Groceries Business Technical Positional Fixed length record Delimited XML JSON Unstructured AVRO, PARQUET, ORC MS Excel Files Databases & data lakes Applications Books eBooks Magazines Library Reports Analytics ETLs and technologies Cloud
  8. 8. Foundations of a catalog: consumer ready 8 Where can I find it? How can I find something? What are the rules? How is it maintained? Who is responsible? What is the outcome? What is it? Why are you there? Asset path (system -> path -> element) by data domain Find a piece of data to solve a problem Research what my peers are using Monitor data and identify data impacts Repository of data available to me with attribution, associations, relationships and lineage structured in a consistent metamodel Request access and start working with it Data steward The organization (everybody has a part to play) Data stewardship (certification, scoring, quality, standards) Catalog itself (AI/ML learning, auto curation, workflow) Stakeholders / users / crowdsourced Find a piece of data to solve a problem Research what my peers are using Monitor data and identify data impacts Data is available to users in the organization according to data governance policy Data adheres to corporate policy (e.g. PII, standards, redundancy) Research a topic or read book Organized collection of books Research a topic or read book Librarian Inventory control, shelving and shelf work Books are available to the publish set by rules of behavior Search by subject or peruse Classification system. (subject-> shelf -> row) Library catalog Data catalog
  9. 9. A data catalog enables business-ready data 9 Data that is trusted Data that is ready to deliver outcomes Data that is easy to find and understand • Business objectives framework • Enterprise governance & ownership • Metrics & scoring • Balancing, reconciliation & controls • Stewardship & workflow • Case management • Data catalog & smart glossary • Data lineage & impact analysis • Data acquisition & analysis
  10. 10. What are the components of a “modern” data catalog? 10 Contains extensive information about your data Has business context Intuitive user interface Turnkey automation, administration, and integration Promotes governance and stewardship
  11. 11. What is governance? Governance is often influenced by perspective
  12. 12. What is governance? 12 Governance = “The data catalog stores what good looks like and who is responsible”
  13. 13. Governance activities 13 Data governance like other kinds of governance is focused on ensuring that activities are performed in accordance with strategy and accepted best practices in the organization • In a well managed data environment data governance is well integrated with the business strategy • Modern data governance does this proactively and passively • Best in class solutions incorporate AI and ML to facilitate recommendation and automation engines Data strategy Data governance Data management / operations Strategy drives actions The data “W”s drive awareness (metadata) How well the data strategy is working (metric metadata) Identifies data that is important to strategy Business strategy Impact of data strategy on KPIs Explicit alignment to business goals & objectives Business alignment & impact metrics
  14. 14. Data governance is the keystone for your data catalog 14 Outcome Business objectives Measures & metrics Processes & stages People Data Governance Catalog Reporting & compliance Analytics & insights Operational excellence
  15. 15. Governance follows a methodology 15 • Factors include: capabilities, culture, maturity • All approaches assume automation and integration • All should be supported by your data catalog • Governance is integral to each one Bottom up Discover the critical data assets that have operational, compliance and analytic business impact Top down Find the critical information needed to driving business goals, objectives, KPIs, KRIs, and other metrics Target the key critical data to drive business process improvement and stage gate efficiencies Middle out
  16. 16. Governance focuses on critical data 16 All available data 100% of data Data we use 40% of data Data we should govern 10% of data Data of high value 100-200 data elements CRITICAL DATA Data and metadata Selection of data at the system and source level (tables and fields) Information As required to develop a common language for important data Business process excellence To monitor the effectiveness of our processes design and execution Business goals & objectives Focus on critical data elements required to support value drivers and key initiatives The evolution has been towards greater reliance on rich metadata as it captures the data’s findability, usability and appropriateness within a particular context – implies well cataloged data
  17. 17. Death by excel! 17
  18. 18. Getting off excel
  19. 19. Data has too many relationships to be handled in spreadsheets 19 Answers need to be a few clicks away – not a few pivot tables away!
  20. 20. Catalog is a window to your data with many views! 20 A traditional enterprise data model can (maybe) handle the “single version of truth” – but not the many views of truth! It is very hard to catalog both the data and the metadata around multiple use cases in Excel How do I write my query to the EDW using the lens of: • Risk • ISO 27001 • HIPAA • Privacy regulations (GDPR; CCPA) Goods issue Process order Production order Sample Test Inspection CAPA Change request Specification Parameter list Sample Study Notification Manufacturing order Manufacturing plan order Material movement Recipe Goods exception Artwork request Operation Project Event organism role Event configuration item role Event material role Event comment Event type hierarchy Event Event party role Event activity Batch Individual Manufacturer Organization Party identifier Party address Party role Party hierarchy Party Party to party role Location Plant Customer service center Distribution center Manufacturing line Sample point Work center Demand forecast region hierarchy Calendar Configuration item type hierarchy Configuration item Organism type hierarchy Organism Material supplier Material location Material BOM Material BOM Inventory Complaint Low inventory Stock out Criticality Escalation Field Action Supply chain deviation Outage Quality control plan Protocol Goods receipt Quality event Event to event role Event role Supply chain event
  21. 21. Poll #2
  22. 22. Compliance 22
  23. 23. Catalogs for compliance… If it is not cataloged – it is not governed! Contained in catalog: 1. How and where used 2. Data is labelled to show where it is in the lifecycle 3. Data is linked to a data package 4. Data has security classification 5. Personal information “type” label flags this data as falling under privacy regulations Do we know enough about the data to know it is being managed correctly?
  24. 24. Audit control model has transparency and accountability 24 Catalog contains the audit “view” of data: A. Task owner B. Task detail C. The data required D. The standard that guides the execution E. The control rule that enforces the standard F. The metric that measures compliance … which provides data level accountability Accountability is defined for each control point… Task Owner A Tasks B Data C Standard D Control Rule E Metric F
  25. 25. Optimizing operations 25
  26. 26. Data exchange with external manufacturing MDM Finance Source Pian Make Quality Deliver SAP P01 JDE 7.3 PC4 PC4 PC4 MBox PIP Quality Prisym 360 Wm Mbox Mbox MDS SAP P02 Laser etch SQLDB at SAP forms DocuSphere ATP Bank Sterling NRP Excel LIDO/ PAGO Neptune SAP GRC BODS SOLMAN SMI purchase order Basic data and classification Data load script Data load script Basic data and classification A/P check details (end state) Inventory Open purchase orders Sales orders Open purchase orders Advanced shipping Notification SMI Delivery note/ Sale order, ASN Doc. Print Requests Label data Material, serial code, batch data Production order – Lot master data Manual load email Production order – Lot master data Picking, manufacturing logistics transactions Manual interface with Sap P02 Lot master data Agile client (JnJ network) Company For data Conversion Solman LPPF Print Print Supply chain lifecycle Company External manufacturer Exchange
  27. 27. Data exchange with external manufacturing MDM Finance Source Plan Make Quality Deliver SAP P01 JDE 7.3 PC4 PC4 PC4 MBox PIP Quality Prisym 360 Wm Mbox Mbox MDS SAP P02 Laser etch SQLDB at SAP forms DocuSphere ATP Bank Sterling NRP Excel LIDO/ PAGO Neptune SAP GRC BODS SOLMAN SMI purchase order Basic data and classification Data load script Data load script Basic data and classification A/P check details (end state) Inventory Open purchase orders Sales orders Open purchase orders Advanced shipping Notification SMI Delivery note/ Sale order, ASN Doc. Print Requests Label data Material, serial code, batch data Production order – Lot master data Manual load email Production order – Lot master data Picking, manufacturing logistics transactions Manual interface with Sap P02 Lot master data Agile client (JnJ network) Company For data Conversion Solman LPPF Print Print Set up Order request Shipping notice Master data reference data Transaction data Billing notice shipping detail
  28. 28. Cataloging captures issue PRIOR to production 28 Data quality rule: the shipping notice cannot contain any master data or reference data that was not contained in the set-up file Objects contain data Interface has 3 objects
  29. 29. In closing… it is not just about the data! 29 How do you manage everything that surrounds the data? (metadata) Technical metadata Platforms; applications; tables; relationships Control metadata Rules; standards; ownership; RACI The data Semantic metadata / meaning & context Hierarchies; classification; allowed values…
  30. 30. Questions? 30
  31. 31. Let’s continue the conversation… Contact us Set up a 30-minute personalized demo Precisely.com/contact www.precisely.com Demos White Papers Case Studies

