SlideShare a Scribd company logo
1 of 45
Download to read offline
Paraskevi Zerva
Cognition & Knowledge Representation Lead
Supporting GDPR Compliance through effectively governing
Data Lineage & Data Provenance
❖ Introductions
❖ Definitions
❖ EDG Metadata Governance Platform & Use Cases
❖ EDG Showcase Data Lineage
❖ GDPR in a Nutshell
❖ How governing effectively Data Lineage supports GDPR Compliance
❖ GDPR Use Case for Time Limits of Personal Data Erasure – Data Retention
❖ GDPR Policies and Compliance
❖ GDPR Compliance Use Case
❖ WhoAmI?
▪ Paraskevi Zerva
▪ Cognition & Knowledge Representation Lead (Entellect, Elsevier)
▪ Previously worked as an Information Architect for the Enterprise Data Governance at
JP Morgan & Chase.
▪ PhD in ``Provenance of Data for Compositions of Services’’.
❖ What’s my focus?
▪ Work on the the data governance strategy for Elsevier Entellect to support effective data
governance across Entellect’s software development life-cycle.
▪ Build a common representation for analysis & validation of Elsevier Entellect’s data.
▪ Consolidate data lineage & provenance information with other data assets to provide
a unified data governance ecosystem.
❖ What I am going to talk about ?
▪ How governing effectively data lineage/provenance supports compliance for GDPR within
the Enterprise Data Governance Platform.
❖ Data governance:
▪ is a set of processes that ensures that data assets are efficiently managed and
enables gaining control and have a better understanding of your data,
▪ ensures that data can be trusted and organizations can show accountability about
their data assets with regards to data quality, retention, data lineage etc.,
▪ describes an evolutionary process for a company setting up the processes to handle
information so that it may be utilized by the entire organization,
▪ encompasses data/metadata collection, analysis and validation of rules involving
data (e.g., business (domain) rules, standards, data quality, entitlements, SOR, etc.)
❖ Data lineage refers to capturing the sequence of data flows involving a data element - it
can be represented visually to discover the movement of data artefact from its source to
its destination to understand where this originates from.
❖ Data provenance refers to the recording activity for the processing activities data (e.g.,
through provenance loggers).
❖ GDPR is the General Data Protection Regulation.
Enterprise Data Governance
❖ Unified platform for Corporate Technology to support the efficient data governance and
metadata management.
❖ Team’s mission:
✓ Integrate CT metadata from various sources in one place in a common way (RDF), regardless
of the input format
✓ Consolidates lineage/provenance information together with other metadata.
❖ EDG ingests different formats like XML, JSON, CSV (Collect)
❖ EDG translates the data/metadata into a common language format (RDF) (Standardize)
✓ Schemas are expressed as OWL ontologies.
✓ SHACL (shapes constraint language) is used for interface building.
and different user’s representation with the same underlying core schema.
✓ SPIN is used for transformation.
❖ We form a connected graph data structure queryable
across all internal and external reference datasets (Connect)
Enterprise Data Governance Ecosystem
✓ Enterprise
✓ Req Reports
✓ Glossaries
✓ Taxonomies
✓ Codelists
✓ External
✓ Data Models
✓ Provenance
✓ Movement
✓ Feedback
ata Hub
People, Processes, Tools, Services,
Conformed Data
✓ APIs
✓ Discovery
✓ Reporting
Data Governance Business Cases
❖ Capture/manage governance requirements for the complete portfolio of CT applications.
❖ Support the software development lifecycle, compliance/regulatory requirements (GDPR).
❖ Demonstrate Data Lineage* where the different data sources originated from, to showcase
accountability on control & understanding of the data for regulatory purposes.
❖ Exhibit Data provenance** of how the data is processed/transforms across the platform.
❖ Track data movement/data transfers between applications (Traceability***).
❖ Provide contextual alignment with firm-wide standards, taxonomies and glossaries.
❖ Provide validation capabilities for data quality and data accuracy.
❖ Exhibit accountability with regards to entitlements (by effectively governing data provenance).
❖ Ontologies are extended to provide crosswalks between models and ecosystems so we can
answer questions such as :
✓ Which applications contain (S)PI data affected by GDPR? (Regulatory Reporting)
✓ S. Arabia has changed its retention policy – what applications are impacted? (Reporting)
✓ What are the owners of particular data requirements documentation? (RACI)
* Data lineage refers to capturing the sequence of data flows involving a data element - it can be represented visually to
discover the data flow/movement from its source to its destination.
** Traceability indicates the ability to track a data construct back to the construct it was derived from e.g., the original
system where this was created
*** Data provenance refers to the recording activity (through provenance loggers)
EDG – Metadata Governance Model
❖ The diagram depicts how conceptually metadata from various sources is connected in EDG.
❖ Data Lineage flow connects the following artefacts:
➢ Business Terms (Data Dictionary Metadata)
➢ Data Requirements
➢ Logical Data Model Artefacts
➢ Physical Data Model Artefacts
➢ Application/Deployment (Technical) Metadata
❖ Data Traceability indicates the ability to
be able to track the links to another artefact.
❖ Data Provenance allows to track:
➢ Ownership/Entitlements/Access Control Metadata
➢ Business Capability/Process Metadata
Data Lineage
Data Traceability
Data Provenance
EDG – Showcase Data Lineage
Data Assets
Logical Data Model
Physical Data Model
Physical Database Realization
Link to Technical Metadata
Logical Data Model
Logical Entity
Logical Attribute
Link to Data Requirement
Link to Standard Glossary
Physical Data Model
Physical Table
Physical Column
Data Lineage Diagram Logical/Physical
Data Elements
Mapping to
Technical Asset
LDM Artefacts
PDM Artefacts
Mapping to
Data Requirement
Mapping to
Standard Glossary
Data Lineage Logical Data Elements
LDM Artefacts
Mapping to
Data Requirement
Mapping to
Standard Glossary
Data Lineage Physical Data Elements
Mapping to
Technical Asset
PDM Artefacts
GDPR in a Nutshell
❖ GDPR is a new set of EU guidelines governing how organizations handle personal data
replacing the current Data Protection Act (DPA) and was enforced from 25 May 2018.
❖ According to GDPR personal data should be processed:
➢ Fairly/lawfully
➢ Must retain accurate/kept up to date
➢ Kept no longer than is necessary (retention period)
➢ Processed in a secure way
❖ Controller and processor terms are used in GDPR to describe the parties involved in
processing personal data (PI).
❖ Controller: the party that decides what data is extracted, the purpose used, who is
involved in the processing.
✓ should be able to demonstrate compliance (accountability metrics).
✓ should be able to report on the purposes of processing/the categories PI it controls.
❖ Processor: the party responsible for processing the data on behalf of the controller.
✓ should maintain records of the categories of processing activities of PI & the means
in which it’s processed.
✓ should be able to report on the data transfers of personal data to a third country or an
international organization and can be held responsible for a data breach (requirement
for breach notification).
How governing Data Lineage Supports GDPR
❖ GDPR Challenge:
➢ Record of personal data processing are required for evidencing/demonstrating compliance.
➢ Organizations are required to record every point where processing activities of personal data
takes place and showcase accountability.
❖ Solution:
➢ GDPR makes data governance even more critical on the lineage aspect.
➢ Governance of data lineage enables the understanding of your data-flow activities & to
identify and document legal justification for each type of activity.
➢ When data lineage is represented visually it allows discovery of the data flow/movement
from its source to destination via various changes and how the data is transformed.
➢ On top of that the GDPR requires to evidence records of personal data processing that
implies the need for Data Provenance.
➢ Data Provenance refers to the recording activity of how the data were derived/generated
and processed. It allows to verify that the process and steps used to obtain a result complies
with a set of given requirements.
➢ In our business case the given requirements are GDPR regulatory requirements therefore
data lineage and provenance become the tools to showcase accountability with regards to
GDPR compliance.
GDPR Governance Use Case
❖ GDPR Article 30 Data Requirement
➢ Provide time limits for erasure of the different categories of data required per record
retention policy.
❖ Regulatory requirement translated to the creation of a report and accountability
metrics that:
➢ Returns applications in scope of GDPR for Corporate Technology.
➢ Returns the record class code in scope applications based on the record retention
policies per country.
➢ Notifies application owners in case there are changes on record retention updates and
verifies compliance of new changes with regards to GDPR regulatory requirements.
* Record class codes are used to determine how long to keep each record for each jurisdiction.
**A record class code (RCC) is a category used to group similar types of records in JPMC’s master record retention schedule.
*** Record retention requirements are categorized by record class code by county and in some case by the business function
of the record.
Retention Conceptual Model
ia.jpmc: SEAL_103249 a
edg:BusinessApplication a
a ia.jpmc:dataRetentionPolicy
SOR: Retention Manager Record Retention Policy Ontology
Technical Standard Ontology
GDPR In Scope Business Application
GDPR compliance information
Provenance information
GDPR in Scope – contains pi
OBK1060 | Payroll Services (GRM) PAY100 | Employee Compensation Contribution (GRM)record retention class
Record Retention Class
Record Retention Code
Data Retention Record Class
data retention
Record Retention Policy
period unit:
EDG – RCC Code Diagram
EDG – RCC Policy Diagram
period unit:
Query RCC Data for GDPR in scope Apps
prefix rdfs: <>
prefix edg: <>
prefix ia.jpmc: <>
SELECT DISTINCT ?appId ?appName ?gdprScopr ?lob ?rccClassCode ?rccLabel ?rccPolicy
FROM <urn:x-evn-master:seal>
FROM <urn:x-evn-master:grm>
?app a edg:BusinessApplication .
?app edg:name ?appName .
?app edg:identifier ?appId .
?app ia.jpmc:inScopeForGDPR ?gdprScope .
?app ia.jpmc:lineOfBusiness ?lob .
FILTER regex(?gdprScope, “YES”)
FILTER regex(?lineOfBusiness, “CT”)
appId appName gdprScope lob rccClassCode rccPolicy
35632 Payroll Application YES CT GRM_AUD_PAY_1060 Payroll Services
38537 KPMG Link – Global
Business Travel
YES CT GRM_AUD_PAY_1080 Payroll Accounting
SPIN Rules (1) - Inferencing
#STEP 401: Create Record Classes
?recordClassCodeU a ia.jpmc:DataRetentionRecordClass .
?recordClassCodeU rdfs:label ?rccLabel.
?recordClassCodeU edg:identifier ?rccClassCode.
?recordClassCodeU edg:name ?rccName.
?recordClassCodeU edg:description ?rccDescription.
?recordClassCodeU ia.jpmc.go.dataRetentionPolicy ?rccPolicyU.
?this a RetentionExport:RetentionExport.
BIND (spl:object (?this, RetentionExport:recordClassCode) AS ?rccClassCode .
BIND (spl:object (?this, RetentionExport:country) AS ?country .
BIND (spl:object (?this, RetentionExport:countryCode) AS ?countryCode .
BIND (spl:object (?this, RetentionExport:recordClassName) AS ?rccName .
BIND (spl:object (?this, RetentionExport:recordClassDescription) AS ?rccDescription.
BIND (ia.jpmc:BuildDataRetentionPolicyClassURI (?recordClassCode) AS ?recordClassCodeU.
?countryU country:countryId ?RDICountryCode .
BIND (str(?RDICountryCode) AS cntryCodeLabel).
FILTER (?countryCode = ?cntryCodeLabel) .
BIND (fn:concat(?recordClassCode, "|", ?recordClassName, "(GRM)") AS ?rccLabel).
BIND (ia.jpmc:BuildDataRetentionPolicyRecordURI(?recordClassCode, ?RDICountryCode) AS ?rccPolicyU).
SPIN Rules (2) - Inferencing
#STEP 402: Create Record Retention Policy
?rccPolicyU a edg:DataRetentionPolicy .
?rccPolicyU rdfs:label ?policyLabel.
?rccPolicyU edg:identifier ?policyIdentifier.
?rccPolicyU edg:name ?policyName.
?rccPolicyU ia.jpmc.go:retentionDisposition ?disposition.
?rccPolicyU ia.jpmc.go:retentionEvent ?retentionEvent.
?rccPolicyU ia.jpmc.go:retentionPeriod ?retentionPeriod.
?rccPolicyU ia.jpmc.go:retentionPeriodUnit ?retentionPeriodUnit.
?rccPolicyU edg:country ?country.
?this a RetentionExport:RetentionExport.
BIND (spl:object (?this, RetentionExport:policyIdentifier) AS ?policyIdentifier.
BIND (spl:object (?this, RetentionExport:retentionDisposition) AS ?disposition .
BIND (spl:object (?this, RetentionExport:retentionEvent) AS ?retentionEvent .
BIND (spl:object (?this, RetentionExport:retentionPeriod) AS ?retentionPeriod .
BIND (spl:object (?this, RetentionExport:retentionPeriodUnit) AS ?retentionPeriodUnit.
BIND (spl:object (?this, RetentionExport:country) AS ?country .
BIND (ia.jpmc:BuildDataRetentionPolicyClassURI (?recordClassCode) AS
BIND (fn:concat(?policyIdentifier, "|", ?policyName, "(GRM)") AS ?policyLabel).
BIND (ia.jpmc:BuildDataRetentionPolicyURI(?policyIdentifier, ?country, ?policyName) AS ?rccPolicyU).
SHACL Property Constraint
❖ Policies define guidelines for handling and implementing specific security or
regulatory issues.
❖ With focus on the policy requirements for data protection we have built a
policy/compliance model:
➢ Aiming on validating GDPR compliance for the compliance objects under policy target.
➢ Showcasing accountability with regards to GDPR policy requirements.
❖ Merge the gap between GDPR legislation obligations and operational level
technology controls using semantic modelling to model the critical policy and
compliance aspects.
❖ Use inferencing to preserve accountability of processing activities that
handle PI data subject to regulatory compliance.
ia.jpmc:Country ia.jpmc:Business
Thanks for attending ☺
Q & A
Name: Paraskevi Zerva
Extending Data Lineage to Data Provenance
Non-Linear Activity Linear Activity
rdfs:subclass rdfs:subclass
Software Program
Software Program
Software Program
subproperty of
subproperty of

More Related Content

What's hot

GraphQL and its schema as a universal layer for database access
GraphQL and its schema as a universal layer for database accessGraphQL and its schema as a universal layer for database access
GraphQL and its schema as a universal layer for database accessConnected Data World
Using a Semantic and Graph-based Data Catalog in a Modern Data Fabric
Using a Semantic and Graph-based Data Catalog in a Modern Data FabricUsing a Semantic and Graph-based Data Catalog in a Modern Data Fabric
Using a Semantic and Graph-based Data Catalog in a Modern Data FabricCambridge Semantics
Knowledge Graph Discussion: Foundational Capability for Data Fabric, Data Int...
Knowledge Graph Discussion: Foundational Capability for Data Fabric, Data Int...Knowledge Graph Discussion: Foundational Capability for Data Fabric, Data Int...
Knowledge Graph Discussion: Foundational Capability for Data Fabric, Data Int...Cambridge Semantics
GraphDB Cloud: Enterprise Ready RDF Database on Demand
GraphDB Cloud: Enterprise Ready RDF Database on DemandGraphDB Cloud: Enterprise Ready RDF Database on Demand
GraphDB Cloud: Enterprise Ready RDF Database on DemandOntotext
Solving the Disconnected Data Problem in Healthcare Using MongoDB
Solving the Disconnected Data Problem in Healthcare Using MongoDBSolving the Disconnected Data Problem in Healthcare Using MongoDB
Solving the Disconnected Data Problem in Healthcare Using MongoDBMongoDB
Modern Data Discovery and Integration in Retail Banking
Modern Data Discovery and Integration in Retail BankingModern Data Discovery and Integration in Retail Banking
Modern Data Discovery and Integration in Retail BankingCambridge Semantics
How Graphs Continue to Revolutionize The Prevention of Financial Crime & Frau...
How Graphs Continue to Revolutionize The Prevention of Financial Crime & Frau...How Graphs Continue to Revolutionize The Prevention of Financial Crime & Frau...
How Graphs Continue to Revolutionize The Prevention of Financial Crime & Frau...Connected Data World
Building Knowledge Graphs in 10 steps
Building Knowledge Graphs in 10 stepsBuilding Knowledge Graphs in 10 steps
Building Knowledge Graphs in 10 stepsOntotext
Knowledge Graph for Machine Learning and Data Science
Knowledge Graph for Machine Learning and Data ScienceKnowledge Graph for Machine Learning and Data Science
Knowledge Graph for Machine Learning and Data ScienceCambridge Semantics
From Knowledge Graphs to AI-powered SEO: Using taxonomies, schemas and knowle...
From Knowledge Graphs to AI-powered SEO: Using taxonomies, schemas and knowle...From Knowledge Graphs to AI-powered SEO: Using taxonomies, schemas and knowle...
From Knowledge Graphs to AI-powered SEO: Using taxonomies, schemas and knowle...Connected Data World
Applying Data Engineering and Semantic Standards to Tame the "Perfect Storm" ...
Applying Data Engineering and Semantic Standards to Tame the "Perfect Storm" ...Applying Data Engineering and Semantic Standards to Tame the "Perfect Storm" ...
Applying Data Engineering and Semantic Standards to Tame the "Perfect Storm" ...Cambridge Semantics
Vital AI: Big Data Modeling
Vital AI: Big Data ModelingVital AI: Big Data Modeling
Vital AI: Big Data ModelingVital.AI
Using Semantic Technology to Drive Agile Analytics - SLIDES
Using Semantic Technology to Drive Agile Analytics - SLIDESUsing Semantic Technology to Drive Agile Analytics - SLIDES
Using Semantic Technology to Drive Agile Analytics - SLIDESDATAVERSITY
Graph-based Discovery and Analytics at Enterprise Scale
Graph-based Discovery and Analytics at Enterprise ScaleGraph-based Discovery and Analytics at Enterprise Scale
Graph-based Discovery and Analytics at Enterprise ScaleCambridge Semantics
Power of the Run Graph
Power of the Run GraphPower of the Run Graph
Power of the Run GraphVaticle
Using the Semantic Web Stack to Make Big Data Smarter
Using the Semantic Web Stack to Make  Big Data SmarterUsing the Semantic Web Stack to Make  Big Data Smarter
Using the Semantic Web Stack to Make Big Data SmarterMatheus Mota
Optimizing the
 Data Supply Chain
 for Data Science
Optimizing the
 Data Supply Chain
 for Data ScienceOptimizing the
 Data Supply Chain
 for Data Science
Optimizing the
 Data Supply Chain
 for Data ScienceVital.AI
Scalability and Graph Analytics with Neo4j - Stefan Kolmar, Neo4j
Scalability and Graph Analytics with Neo4j - Stefan Kolmar, Neo4jScalability and Graph Analytics with Neo4j - Stefan Kolmar, Neo4j
Scalability and Graph Analytics with Neo4j - Stefan Kolmar, Neo4jNeo4j

What's hot (20)

GraphQL and its schema as a universal layer for database access
GraphQL and its schema as a universal layer for database accessGraphQL and its schema as a universal layer for database access
GraphQL and its schema as a universal layer for database access
Using a Semantic and Graph-based Data Catalog in a Modern Data Fabric
Using a Semantic and Graph-based Data Catalog in a Modern Data FabricUsing a Semantic and Graph-based Data Catalog in a Modern Data Fabric
Using a Semantic and Graph-based Data Catalog in a Modern Data Fabric
Knowledge Graph Discussion: Foundational Capability for Data Fabric, Data Int...
Knowledge Graph Discussion: Foundational Capability for Data Fabric, Data Int...Knowledge Graph Discussion: Foundational Capability for Data Fabric, Data Int...
Knowledge Graph Discussion: Foundational Capability for Data Fabric, Data Int...
GraphDB Cloud: Enterprise Ready RDF Database on Demand
GraphDB Cloud: Enterprise Ready RDF Database on DemandGraphDB Cloud: Enterprise Ready RDF Database on Demand
GraphDB Cloud: Enterprise Ready RDF Database on Demand
Solving the Disconnected Data Problem in Healthcare Using MongoDB
Solving the Disconnected Data Problem in Healthcare Using MongoDBSolving the Disconnected Data Problem in Healthcare Using MongoDB
Solving the Disconnected Data Problem in Healthcare Using MongoDB
Tara Raafat
Tara RaafatTara Raafat
Tara Raafat
Modern Data Discovery and Integration in Retail Banking
Modern Data Discovery and Integration in Retail BankingModern Data Discovery and Integration in Retail Banking
Modern Data Discovery and Integration in Retail Banking
How Graphs Continue to Revolutionize The Prevention of Financial Crime & Frau...
How Graphs Continue to Revolutionize The Prevention of Financial Crime & Frau...How Graphs Continue to Revolutionize The Prevention of Financial Crime & Frau...
How Graphs Continue to Revolutionize The Prevention of Financial Crime & Frau...
Building Knowledge Graphs in 10 steps
Building Knowledge Graphs in 10 stepsBuilding Knowledge Graphs in 10 steps
Building Knowledge Graphs in 10 steps
Knowledge Graph for Machine Learning and Data Science
Knowledge Graph for Machine Learning and Data ScienceKnowledge Graph for Machine Learning and Data Science
Knowledge Graph for Machine Learning and Data Science
From Knowledge Graphs to AI-powered SEO: Using taxonomies, schemas and knowle...
From Knowledge Graphs to AI-powered SEO: Using taxonomies, schemas and knowle...From Knowledge Graphs to AI-powered SEO: Using taxonomies, schemas and knowle...
From Knowledge Graphs to AI-powered SEO: Using taxonomies, schemas and knowle...
Applying Data Engineering and Semantic Standards to Tame the "Perfect Storm" ...
Applying Data Engineering and Semantic Standards to Tame the "Perfect Storm" ...Applying Data Engineering and Semantic Standards to Tame the "Perfect Storm" ...
Applying Data Engineering and Semantic Standards to Tame the "Perfect Storm" ...
Vital AI: Big Data Modeling
Vital AI: Big Data ModelingVital AI: Big Data Modeling
Vital AI: Big Data Modeling
Sebastian Hellmann
Sebastian HellmannSebastian Hellmann
Sebastian Hellmann
Using Semantic Technology to Drive Agile Analytics - SLIDES
Using Semantic Technology to Drive Agile Analytics - SLIDESUsing Semantic Technology to Drive Agile Analytics - SLIDES
Using Semantic Technology to Drive Agile Analytics - SLIDES
Graph-based Discovery and Analytics at Enterprise Scale
Graph-based Discovery and Analytics at Enterprise ScaleGraph-based Discovery and Analytics at Enterprise Scale
Graph-based Discovery and Analytics at Enterprise Scale
Power of the Run Graph
Power of the Run GraphPower of the Run Graph
Power of the Run Graph
Using the Semantic Web Stack to Make Big Data Smarter
Using the Semantic Web Stack to Make  Big Data SmarterUsing the Semantic Web Stack to Make  Big Data Smarter
Using the Semantic Web Stack to Make Big Data Smarter
Optimizing the
 Data Supply Chain
 for Data Science
Optimizing the
 Data Supply Chain
 for Data ScienceOptimizing the
 Data Supply Chain
 for Data Science
Optimizing the
 Data Supply Chain
 for Data Science
Scalability and Graph Analytics with Neo4j - Stefan Kolmar, Neo4j
Scalability and Graph Analytics with Neo4j - Stefan Kolmar, Neo4jScalability and Graph Analytics with Neo4j - Stefan Kolmar, Neo4j
Scalability and Graph Analytics with Neo4j - Stefan Kolmar, Neo4j

Similar to Supporting GDPR Compliance through effectively governing Data Lineage and Data Provenance

Henninger_MakingReferenceDataMoreMeaningful-FinalScott Henninger
RFT for Business Intelligence and Data Strategy
RFT for Business Intelligence and Data StrategyRFT for Business Intelligence and Data Strategy
RFT for Business Intelligence and Data StrategySustainableEnergyAut
Intro to big data and applications -day 3
Intro to big data and applications -day 3Intro to big data and applications -day 3
Intro to big data and applications -day 3Parviz Vakili
Credit Suisse, Reference Data Management on a Global Scale
Credit Suisse, Reference Data Management on a Global ScaleCredit Suisse, Reference Data Management on a Global Scale
Credit Suisse, Reference Data Management on a Global ScaleOrchestra Networks
Who changed my data? Need for data governance and provenance in a streaming w...
Who changed my data? Need for data governance and provenance in a streaming w...Who changed my data? Need for data governance and provenance in a streaming w...
Who changed my data? Need for data governance and provenance in a streaming w...DataWorks Summit
Implementar una estrategia eficiente de gobierno y seguridad del dato con la ...
Implementar una estrategia eficiente de gobierno y seguridad del dato con la ...Implementar una estrategia eficiente de gobierno y seguridad del dato con la ...
Implementar una estrategia eficiente de gobierno y seguridad del dato con la ...Denodo
GDPR Noncompliance: Avoid the Risk with Data Virtualization
GDPR Noncompliance: Avoid the Risk with Data VirtualizationGDPR Noncompliance: Avoid the Risk with Data Virtualization
GDPR Noncompliance: Avoid the Risk with Data VirtualizationDenodo
Information architecture overview
Information architecture overviewInformation architecture overview
Information architecture overviewJames M. Dey
¿En qué se parece el Gobierno del Dato a un parque de atracciones?
¿En qué se parece el Gobierno del Dato a un parque de atracciones?¿En qué se parece el Gobierno del Dato a un parque de atracciones?
¿En qué se parece el Gobierno del Dato a un parque de atracciones?Denodo
Introduction to data interoperability across the data value chain.pdf
Introduction to data interoperability across the data value chain.pdfIntroduction to data interoperability across the data value chain.pdf
Introduction to data interoperability across the data value chain.pdfAhmedHany Sayed
Michael Josephs
Michael JosephsMichael Josephs
Michael JosephsdaveGBE
Big Data Analytics Architecture PowerPoint Presentation Slides
Big Data Analytics Architecture PowerPoint Presentation SlidesBig Data Analytics Architecture PowerPoint Presentation Slides
Big Data Analytics Architecture PowerPoint Presentation SlidesSlideTeam
data collection, data integration, data management, data modeling.pptx
data collection, data integration, data management, data modeling.pptxdata collection, data integration, data management, data modeling.pptx
data collection, data integration, data management, data modeling.pptxSourabhkumar729579
General Data Protection Regulation (GDPR) Implications for Canadian Firms
General Data Protection Regulation (GDPR) Implications for Canadian FirmsGeneral Data Protection Regulation (GDPR) Implications for Canadian Firms
General Data Protection Regulation (GDPR) Implications for Canadian Firmsaccenture
CXAIR for Data Migration
CXAIR for Data MigrationCXAIR for Data Migration
CXAIR for Data MigrationConnexica
Building the enterprise data architecture
Building the enterprise data architectureBuilding the enterprise data architecture
Building the enterprise data architectureCosta Pissaris
Data privacy and security in uae
Data privacy and security in uaeData privacy and security in uae
Data privacy and security in uaeRishalHalid1
Data Science Operationalization: The Journey of Enterprise AI
Data Science Operationalization: The Journey of Enterprise AIData Science Operationalization: The Journey of Enterprise AI
Data Science Operationalization: The Journey of Enterprise AIDenodo

Similar to Supporting GDPR Compliance through effectively governing Data Lineage and Data Provenance (20)

RFT for Business Intelligence and Data Strategy
RFT for Business Intelligence and Data StrategyRFT for Business Intelligence and Data Strategy
RFT for Business Intelligence and Data Strategy
Intro to big data and applications -day 3
Intro to big data and applications -day 3Intro to big data and applications -day 3
Intro to big data and applications -day 3
Credit Suisse, Reference Data Management on a Global Scale
Credit Suisse, Reference Data Management on a Global ScaleCredit Suisse, Reference Data Management on a Global Scale
Credit Suisse, Reference Data Management on a Global Scale
Who changed my data? Need for data governance and provenance in a streaming w...
Who changed my data? Need for data governance and provenance in a streaming w...Who changed my data? Need for data governance and provenance in a streaming w...
Who changed my data? Need for data governance and provenance in a streaming w...
Implementar una estrategia eficiente de gobierno y seguridad del dato con la ...
Implementar una estrategia eficiente de gobierno y seguridad del dato con la ...Implementar una estrategia eficiente de gobierno y seguridad del dato con la ...
Implementar una estrategia eficiente de gobierno y seguridad del dato con la ...
GDPR Noncompliance: Avoid the Risk with Data Virtualization
GDPR Noncompliance: Avoid the Risk with Data VirtualizationGDPR Noncompliance: Avoid the Risk with Data Virtualization
GDPR Noncompliance: Avoid the Risk with Data Virtualization
Information architecture overview
Information architecture overviewInformation architecture overview
Information architecture overview
¿En qué se parece el Gobierno del Dato a un parque de atracciones?
¿En qué se parece el Gobierno del Dato a un parque de atracciones?¿En qué se parece el Gobierno del Dato a un parque de atracciones?
¿En qué se parece el Gobierno del Dato a un parque de atracciones?
Introduction to data interoperability across the data value chain.pdf
Introduction to data interoperability across the data value chain.pdfIntroduction to data interoperability across the data value chain.pdf
Introduction to data interoperability across the data value chain.pdf
Michael Josephs
Michael JosephsMichael Josephs
Michael Josephs
Big Data Analytics Architecture PowerPoint Presentation Slides
Big Data Analytics Architecture PowerPoint Presentation SlidesBig Data Analytics Architecture PowerPoint Presentation Slides
Big Data Analytics Architecture PowerPoint Presentation Slides
data collection, data integration, data management, data modeling.pptx
data collection, data integration, data management, data modeling.pptxdata collection, data integration, data management, data modeling.pptx
data collection, data integration, data management, data modeling.pptx
General Data Protection Regulation (GDPR) Implications for Canadian Firms
General Data Protection Regulation (GDPR) Implications for Canadian FirmsGeneral Data Protection Regulation (GDPR) Implications for Canadian Firms
General Data Protection Regulation (GDPR) Implications for Canadian Firms
CXAIR for Data Migration
CXAIR for Data MigrationCXAIR for Data Migration
CXAIR for Data Migration
Big data governance
Big data governanceBig data governance
Big data governance
Building the enterprise data architecture
Building the enterprise data architectureBuilding the enterprise data architecture
Building the enterprise data architecture
Data privacy and security in uae
Data privacy and security in uaeData privacy and security in uae
Data privacy and security in uae
Data Science Operationalization: The Journey of Enterprise AI
Data Science Operationalization: The Journey of Enterprise AIData Science Operationalization: The Journey of Enterprise AI
Data Science Operationalization: The Journey of Enterprise AI

More from Connected Data World

Systems that learn and reason | Frank Van Harmelen
Systems that learn and reason | Frank Van HarmelenSystems that learn and reason | Frank Van Harmelen
Systems that learn and reason | Frank Van HarmelenConnected Data World
Graph Abstractions Matter by Ora Lassila
Graph Abstractions Matter by Ora LassilaGraph Abstractions Matter by Ora Lassila
Graph Abstractions Matter by Ora LassilaConnected Data World
Κnowledge Architecture: Combining Strategy, Data Science and Information Arch...
Κnowledge Architecture: Combining Strategy, Data Science and Information Arch...Κnowledge Architecture: Combining Strategy, Data Science and Information Arch...
Κnowledge Architecture: Combining Strategy, Data Science and Information Arch...Connected Data World
How to get started with Graph Machine Learning
How to get started with Graph Machine LearningHow to get started with Graph Machine Learning
How to get started with Graph Machine LearningConnected Data World
The years of the graph: The future of the future is here
The years of the graph: The future of the future is hereThe years of the graph: The future of the future is here
The years of the graph: The future of the future is hereConnected Data World
From Taxonomies and Schemas to Knowledge Graphs: Parts 1 & 2
From Taxonomies and Schemas to Knowledge Graphs: Parts 1 & 2From Taxonomies and Schemas to Knowledge Graphs: Parts 1 & 2
From Taxonomies and Schemas to Knowledge Graphs: Parts 1 & 2Connected Data World
From Taxonomies and Schemas to Knowledge Graphs: Part 3
From Taxonomies and Schemas to Knowledge Graphs: Part 3From Taxonomies and Schemas to Knowledge Graphs: Part 3
From Taxonomies and Schemas to Knowledge Graphs: Part 3Connected Data World
In Search of the Universal Data Model
In Search of the Universal Data ModelIn Search of the Universal Data Model
In Search of the Universal Data ModelConnected Data World
Graph in Apache Cassandra. The World’s Most Scalable Graph Database
Graph in Apache Cassandra. The World’s Most Scalable Graph DatabaseGraph in Apache Cassandra. The World’s Most Scalable Graph Database
Graph in Apache Cassandra. The World’s Most Scalable Graph DatabaseConnected Data World
Enterprise Data Governance: Leveraging Knowledge Graph & AI in support of a d...
Enterprise Data Governance: Leveraging Knowledge Graph & AI in support of a d...Enterprise Data Governance: Leveraging Knowledge Graph & AI in support of a d...
Enterprise Data Governance: Leveraging Knowledge Graph & AI in support of a d...Connected Data World
Powering Question-Driven Problem Solving to Improve the Chances of Finding Ne...
Powering Question-Driven Problem Solving to Improve the Chances of Finding Ne...Powering Question-Driven Problem Solving to Improve the Chances of Finding Ne...
Powering Question-Driven Problem Solving to Improve the Chances of Finding Ne...Connected Data World
Semantic similarity for faster Knowledge Graph delivery at scale
Semantic similarity for faster Knowledge Graph delivery at scaleSemantic similarity for faster Knowledge Graph delivery at scale
Semantic similarity for faster Knowledge Graph delivery at scaleConnected Data World
Knowledge Graphs and AI to Hyper-Personalise the Fashion Retail Experience at...
Knowledge Graphs and AI to Hyper-Personalise the Fashion Retail Experience at...Knowledge Graphs and AI to Hyper-Personalise the Fashion Retail Experience at...
Knowledge Graphs and AI to Hyper-Personalise the Fashion Retail Experience at...Connected Data World
Schema, Google & The Future of the Web
Schema, Google & The Future of the WebSchema, Google & The Future of the Web
Schema, Google & The Future of the WebConnected Data World
RAPIDS cuGraph – Accelerating all your Graph needs
RAPIDS cuGraph – Accelerating all your Graph needsRAPIDS cuGraph – Accelerating all your Graph needs
RAPIDS cuGraph – Accelerating all your Graph needsConnected Data World
Elegant and Scalable Code Querying with Code Property Graphs
Elegant and Scalable Code Querying with Code Property GraphsElegant and Scalable Code Querying with Code Property Graphs
Elegant and Scalable Code Querying with Code Property GraphsConnected Data World
Graph for Good: Empowering your NGO
Graph for Good: Empowering your NGOGraph for Good: Empowering your NGO
Graph for Good: Empowering your NGOConnected Data World
What are we Talking About, When we Talk About Ontology?
What are we Talking About, When we Talk About Ontology?What are we Talking About, When we Talk About Ontology?
What are we Talking About, When we Talk About Ontology?Connected Data World

More from Connected Data World (20)

Systems that learn and reason | Frank Van Harmelen
Systems that learn and reason | Frank Van HarmelenSystems that learn and reason | Frank Van Harmelen
Systems that learn and reason | Frank Van Harmelen
Graph Abstractions Matter by Ora Lassila
Graph Abstractions Matter by Ora LassilaGraph Abstractions Matter by Ora Lassila
Graph Abstractions Matter by Ora Lassila
Κnowledge Architecture: Combining Strategy, Data Science and Information Arch...
Κnowledge Architecture: Combining Strategy, Data Science and Information Arch...Κnowledge Architecture: Combining Strategy, Data Science and Information Arch...
Κnowledge Architecture: Combining Strategy, Data Science and Information Arch...
How to get started with Graph Machine Learning
How to get started with Graph Machine LearningHow to get started with Graph Machine Learning
How to get started with Graph Machine Learning
Graphs in sustainable finance
Graphs in sustainable financeGraphs in sustainable finance
Graphs in sustainable finance
The years of the graph: The future of the future is here
The years of the graph: The future of the future is hereThe years of the graph: The future of the future is here
The years of the graph: The future of the future is here
From Taxonomies and Schemas to Knowledge Graphs: Parts 1 & 2
From Taxonomies and Schemas to Knowledge Graphs: Parts 1 & 2From Taxonomies and Schemas to Knowledge Graphs: Parts 1 & 2
From Taxonomies and Schemas to Knowledge Graphs: Parts 1 & 2
From Taxonomies and Schemas to Knowledge Graphs: Part 3
From Taxonomies and Schemas to Knowledge Graphs: Part 3From Taxonomies and Schemas to Knowledge Graphs: Part 3
From Taxonomies and Schemas to Knowledge Graphs: Part 3
In Search of the Universal Data Model
In Search of the Universal Data ModelIn Search of the Universal Data Model
In Search of the Universal Data Model
Graph in Apache Cassandra. The World’s Most Scalable Graph Database
Graph in Apache Cassandra. The World’s Most Scalable Graph DatabaseGraph in Apache Cassandra. The World’s Most Scalable Graph Database
Graph in Apache Cassandra. The World’s Most Scalable Graph Database
Enterprise Data Governance: Leveraging Knowledge Graph & AI in support of a d...
Enterprise Data Governance: Leveraging Knowledge Graph & AI in support of a d...Enterprise Data Governance: Leveraging Knowledge Graph & AI in support of a d...
Enterprise Data Governance: Leveraging Knowledge Graph & AI in support of a d...
Graph Realities
Graph RealitiesGraph Realities
Graph Realities
Powering Question-Driven Problem Solving to Improve the Chances of Finding Ne...
Powering Question-Driven Problem Solving to Improve the Chances of Finding Ne...Powering Question-Driven Problem Solving to Improve the Chances of Finding Ne...
Powering Question-Driven Problem Solving to Improve the Chances of Finding Ne...
Semantic similarity for faster Knowledge Graph delivery at scale
Semantic similarity for faster Knowledge Graph delivery at scaleSemantic similarity for faster Knowledge Graph delivery at scale
Semantic similarity for faster Knowledge Graph delivery at scale
Knowledge Graphs and AI to Hyper-Personalise the Fashion Retail Experience at...
Knowledge Graphs and AI to Hyper-Personalise the Fashion Retail Experience at...Knowledge Graphs and AI to Hyper-Personalise the Fashion Retail Experience at...
Knowledge Graphs and AI to Hyper-Personalise the Fashion Retail Experience at...
Schema, Google & The Future of the Web
Schema, Google & The Future of the WebSchema, Google & The Future of the Web
Schema, Google & The Future of the Web
RAPIDS cuGraph – Accelerating all your Graph needs
RAPIDS cuGraph – Accelerating all your Graph needsRAPIDS cuGraph – Accelerating all your Graph needs
RAPIDS cuGraph – Accelerating all your Graph needs
Elegant and Scalable Code Querying with Code Property Graphs
Elegant and Scalable Code Querying with Code Property GraphsElegant and Scalable Code Querying with Code Property Graphs
Elegant and Scalable Code Querying with Code Property Graphs
Graph for Good: Empowering your NGO
Graph for Good: Empowering your NGOGraph for Good: Empowering your NGO
Graph for Good: Empowering your NGO
What are we Talking About, When we Talk About Ontology?
What are we Talking About, When we Talk About Ontology?What are we Talking About, When we Talk About Ontology?
What are we Talking About, When we Talk About Ontology?

Recently uploaded

The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsPixlogix Infotech
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .Alan Dix
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfAlex Barbosa Coqueiro
Search Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfSearch Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfRankYa
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteDianaGray10
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsMiki Katsuragi
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.Curtis Poe
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsMark Billinghurst
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity PlanDatabarracks
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii SoldatenkoFwdays
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024Stephanie Beckett
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupFlorian Wilhelm
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebUiPathCommunity
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfAddepto
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningLars Bell
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenHervé Boutemy
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsSergiu Bodiu
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brandgvaughan

Recently uploaded (20)

The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and Cons
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdf
Search Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfSearch Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdf
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test Suite
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering Tips
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR Systems
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity Plan
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project Setup
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio Web
DMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special EditionDMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special Edition
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdf
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine Tuning
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brand

Supporting GDPR Compliance through effectively governing Data Lineage and Data Provenance

  • 1. Paraskevi Zerva Cognition & Knowledge Representation Lead Supporting GDPR Compliance through effectively governing Data Lineage & Data Provenance
  • 2. Context ❖ Introductions ❖ Definitions ❖ EDG Metadata Governance Platform & Use Cases ❖ EDG Showcase Data Lineage ❖ GDPR in a Nutshell ❖ How governing effectively Data Lineage supports GDPR Compliance ❖ GDPR Use Case for Time Limits of Personal Data Erasure – Data Retention ❖ GDPR Policies and Compliance ❖ GDPR Compliance Use Case 2
  • 3. Introductions ❖ WhoAmI? ▪ Paraskevi Zerva ▪ Cognition & Knowledge Representation Lead (Entellect, Elsevier) ▪ Previously worked as an Information Architect for the Enterprise Data Governance at JP Morgan & Chase. ▪ PhD in ``Provenance of Data for Compositions of Services’’. ❖ What’s my focus? ▪ Work on the the data governance strategy for Elsevier Entellect to support effective data governance across Entellect’s software development life-cycle. ▪ Build a common representation for analysis & validation of Elsevier Entellect’s data. ▪ Consolidate data lineage & provenance information with other data assets to provide a unified data governance ecosystem. ❖ What I am going to talk about ? ▪ How governing effectively data lineage/provenance supports compliance for GDPR within the Enterprise Data Governance Platform. 3
  • 4. Definitions 4 ❖ Data governance: ▪ is a set of processes that ensures that data assets are efficiently managed and enables gaining control and have a better understanding of your data, ▪ ensures that data can be trusted and organizations can show accountability about their data assets with regards to data quality, retention, data lineage etc., ▪ describes an evolutionary process for a company setting up the processes to handle information so that it may be utilized by the entire organization, ▪ encompasses data/metadata collection, analysis and validation of rules involving data (e.g., business (domain) rules, standards, data quality, entitlements, SOR, etc.) ❖ Data lineage refers to capturing the sequence of data flows involving a data element - it can be represented visually to discover the movement of data artefact from its source to its destination to understand where this originates from. ❖ Data provenance refers to the recording activity for the processing activities data (e.g., through provenance loggers). ❖ GDPR is the General Data Protection Regulation.
  • 5. Enterprise Data Governance ❖ Unified platform for Corporate Technology to support the efficient data governance and metadata management. ❖ Team’s mission: ✓ Integrate CT metadata from various sources in one place in a common way (RDF), regardless of the input format ✓ Consolidates lineage/provenance information together with other metadata. ❖ EDG ingests different formats like XML, JSON, CSV (Collect) ❖ EDG translates the data/metadata into a common language format (RDF) (Standardize) ✓ Schemas are expressed as OWL ontologies. ✓ SHACL (shapes constraint language) is used for interface building. and different user’s representation with the same underlying core schema. ✓ SPIN is used for transformation. ❖ We form a connected graph data structure queryable across all internal and external reference datasets (Connect) 5 Collect Standardize Connect Refine
  • 6. Enterprise Data Governance Ecosystem 6 ✓ Enterprise Metadata ✓ LDMs/PDMs ✓ Req Reports ✓ Glossaries ✓ Taxonomies ✓ Codelists ✓ External Standards Ingestion ✓ Data Models ✓ Provenance Logging ✓ Movement ✓ Feedback Unified Data/Metad ata Hub (EDG) Sources People, Processes, Tools, Services, Conformed Data ✓ RACI (roles) ✓ APIs ✓ Discovery ✓ Reporting Uses
  • 7. Data Governance Business Cases ❖ Capture/manage governance requirements for the complete portfolio of CT applications. ❖ Support the software development lifecycle, compliance/regulatory requirements (GDPR). ❖ Demonstrate Data Lineage* where the different data sources originated from, to showcase accountability on control & understanding of the data for regulatory purposes. ❖ Exhibit Data provenance** of how the data is processed/transforms across the platform. ❖ Track data movement/data transfers between applications (Traceability***). ❖ Provide contextual alignment with firm-wide standards, taxonomies and glossaries. ❖ Provide validation capabilities for data quality and data accuracy. ❖ Exhibit accountability with regards to entitlements (by effectively governing data provenance). ❖ Ontologies are extended to provide crosswalks between models and ecosystems so we can answer questions such as : ✓ Which applications contain (S)PI data affected by GDPR? (Regulatory Reporting) ✓ S. Arabia has changed its retention policy – what applications are impacted? (Reporting) ✓ What are the owners of particular data requirements documentation? (RACI) 7 * Data lineage refers to capturing the sequence of data flows involving a data element - it can be represented visually to discover the data flow/movement from its source to its destination. ** Traceability indicates the ability to track a data construct back to the construct it was derived from e.g., the original system where this was created *** Data provenance refers to the recording activity (through provenance loggers)
  • 8. EDG – Metadata Governance Model ❖ The diagram depicts how conceptually metadata from various sources is connected in EDG. ❖ Data Lineage flow connects the following artefacts: ➢ Business Terms (Data Dictionary Metadata) ➢ Data Requirements ➢ Logical Data Model Artefacts ➢ Physical Data Model Artefacts ➢ Application/Deployment (Technical) Metadata ❖ Data Traceability indicates the ability to be able to track the links to another artefact. ❖ Data Provenance allows to track: ➢ Ownership/Entitlements/Access Control Metadata ➢ Business Capability/Process Metadata 8 Data Lineage Data Traceability Data Provenance
  • 9. EDG – Showcase Data Lineage 9
  • 14. Link to Technical Metadata 14
  • 18. Link to Data Requirement 18
  • 19. Link to Standard Glossary 19
  • 23. Data Lineage Diagram Logical/Physical Data Elements 23 Mapping to Technical Asset LDM Artefacts PDM Artefacts Mapping to Data Requirement Mapping to Standard Glossary PDM to LDM mapping
  • 24. Data Lineage Logical Data Elements 24 LDM Artefacts Mapping to Data Requirement Mapping to Standard Glossary PDM to LDM mapping
  • 25. Data Lineage Physical Data Elements 25 Mapping to Technical Asset PDM Artefacts
  • 26. GDPR in a Nutshell ❖ GDPR is a new set of EU guidelines governing how organizations handle personal data replacing the current Data Protection Act (DPA) and was enforced from 25 May 2018. ❖ According to GDPR personal data should be processed: ➢ Fairly/lawfully ➢ Must retain accurate/kept up to date ➢ Kept no longer than is necessary (retention period) ➢ Processed in a secure way ❖ Controller and processor terms are used in GDPR to describe the parties involved in processing personal data (PI). ❖ Controller: the party that decides what data is extracted, the purpose used, who is involved in the processing. ✓ should be able to demonstrate compliance (accountability metrics). ✓ should be able to report on the purposes of processing/the categories PI it controls. ❖ Processor: the party responsible for processing the data on behalf of the controller. ✓ should maintain records of the categories of processing activities of PI & the means in which it’s processed. ✓ should be able to report on the data transfers of personal data to a third country or an international organization and can be held responsible for a data breach (requirement for breach notification). 26
  • 27. How governing Data Lineage Supports GDPR Compliance ❖ GDPR Challenge: ➢ Record of personal data processing are required for evidencing/demonstrating compliance. ➢ Organizations are required to record every point where processing activities of personal data takes place and showcase accountability. ❖ Solution: ➢ GDPR makes data governance even more critical on the lineage aspect. ➢ Governance of data lineage enables the understanding of your data-flow activities & to identify and document legal justification for each type of activity. ➢ When data lineage is represented visually it allows discovery of the data flow/movement from its source to destination via various changes and how the data is transformed. ➢ On top of that the GDPR requires to evidence records of personal data processing that implies the need for Data Provenance. ➢ Data Provenance refers to the recording activity of how the data were derived/generated and processed. It allows to verify that the process and steps used to obtain a result complies with a set of given requirements. ➢ In our business case the given requirements are GDPR regulatory requirements therefore data lineage and provenance become the tools to showcase accountability with regards to GDPR compliance. 27
  • 28. GDPR Governance Use Case ❖ GDPR Article 30 Data Requirement ➢ Provide time limits for erasure of the different categories of data required per record retention policy. ❖ Regulatory requirement translated to the creation of a report and accountability metrics that: ➢ Returns applications in scope of GDPR for Corporate Technology. ➢ Returns the record class code in scope applications based on the record retention policies per country. ➢ Notifies application owners in case there are changes on record retention updates and verifies compliance of new changes with regards to GDPR regulatory requirements. 28 * Record class codes are used to determine how long to keep each record for each jurisdiction. **A record class code (RCC) is a category used to group similar types of records in JPMC’s master record retention schedule. *** Record retention requirements are categorized by record class code by county and in some case by the business function of the record.
  • 29. Retention Conceptual Model 29 ia.jpmc: SEAL_103249 a edg:BusinessApplication a ia.jpmc:RecordRetentionClass ia.jpmc:application RecordRetentionClass a ia.jpmc:dataRetentionPolicy ia.jpmc:dataRetentionPolicy SOR: SEAL SOR: Retention Manager Record Retention Policy Ontology Technical Standard Ontology
  • 30. GDPR In Scope Business Application 30 GDPR compliance information Provenance information GDPR in Scope – contains pi OBK1060 | Payroll Services (GRM) PAY100 | Employee Compensation Contribution (GRM)record retention class Record Retention Class
  • 31. Record Retention Code 31 Data Retention Record Class data retention policy
  • 33. EDG – RCC Code Diagram 33 data retention policy
  • 34. EDG – RCC Policy Diagram 34 country: disposition: retention period unit: retention period: retention event:
  • 35. Query RCC Data for GDPR in scope Apps prefix rdfs: <> prefix edg: <> prefix ia.jpmc: <> SELECT DISTINCT ?appId ?appName ?gdprScopr ?lob ?rccClassCode ?rccLabel ?rccPolicy FROM <urn:x-evn-master:seal> FROM <urn:x-evn-master:grm> WHERE { { ?app a edg:BusinessApplication . ?app edg:name ?appName . ?app edg:identifier ?appId . ?app ia.jpmc:inScopeForGDPR ?gdprScope . ?app ia.jpmc:lineOfBusiness ?lob . FILTER regex(?gdprScope, “YES”) FILTER regex(?lineOfBusiness, “CT”) } } 35 appId appName gdprScope lob rccClassCode rccPolicy 35632 Payroll Application YES CT GRM_AUD_PAY_1060 Payroll Services 38537 KPMG Link – Global Business Travel YES CT GRM_AUD_PAY_1080 Payroll Accounting
  • 36. SPIN Rules (1) - Inferencing #STEP 401: Create Record Classes CONSTRUCT { ?recordClassCodeU a ia.jpmc:DataRetentionRecordClass . ?recordClassCodeU rdfs:label ?rccLabel. ?recordClassCodeU edg:identifier ?rccClassCode. ?recordClassCodeU edg:name ?rccName. ?recordClassCodeU edg:description ?rccDescription. ?recordClassCodeU ia.jpmc.go.dataRetentionPolicy ?rccPolicyU. } WHERE { ?this a RetentionExport:RetentionExport. BIND (spl:object (?this, RetentionExport:recordClassCode) AS ?rccClassCode . BIND (spl:object (?this, RetentionExport:country) AS ?country . BIND (spl:object (?this, RetentionExport:countryCode) AS ?countryCode . BIND (spl:object (?this, RetentionExport:recordClassName) AS ?rccName . BIND (spl:object (?this, RetentionExport:recordClassDescription) AS ?rccDescription. BIND (ia.jpmc:BuildDataRetentionPolicyClassURI (?recordClassCode) AS ?recordClassCodeU. ?countryU country:countryId ?RDICountryCode . BIND (str(?RDICountryCode) AS cntryCodeLabel). FILTER (?countryCode = ?cntryCodeLabel) . BIND (fn:concat(?recordClassCode, "|", ?recordClassName, "(GRM)") AS ?rccLabel). BIND (ia.jpmc:BuildDataRetentionPolicyRecordURI(?recordClassCode, ?RDICountryCode) AS ?rccPolicyU). } 36
  • 37. SPIN Rules (2) - Inferencing #STEP 402: Create Record Retention Policy CONSTRUCT { ?rccPolicyU a edg:DataRetentionPolicy . ?rccPolicyU rdfs:label ?policyLabel. ?rccPolicyU edg:identifier ?policyIdentifier. ?rccPolicyU edg:name ?policyName. ?rccPolicyU ia.jpmc.go:retentionDisposition ?disposition. ?rccPolicyU ia.jpmc.go:retentionEvent ?retentionEvent. ?rccPolicyU ia.jpmc.go:retentionPeriod ?retentionPeriod. ?rccPolicyU ia.jpmc.go:retentionPeriodUnit ?retentionPeriodUnit. ?rccPolicyU edg:country ?country. } WHERE { ?this a RetentionExport:RetentionExport. BIND (spl:object (?this, RetentionExport:policyIdentifier) AS ?policyIdentifier. BIND (spl:object (?this, RetentionExport:retentionDisposition) AS ?disposition . BIND (spl:object (?this, RetentionExport:retentionEvent) AS ?retentionEvent . BIND (spl:object (?this, RetentionExport:retentionPeriod) AS ?retentionPeriod . BIND (spl:object (?this, RetentionExport:retentionPeriodUnit) AS ?retentionPeriodUnit. BIND (spl:object (?this, RetentionExport:country) AS ?country . BIND (ia.jpmc:BuildDataRetentionPolicyClassURI (?recordClassCode) AS BIND (fn:concat(?policyIdentifier, "|", ?policyName, "(GRM)") AS ?policyLabel). BIND (ia.jpmc:BuildDataRetentionPolicyURI(?policyIdentifier, ?country, ?policyName) AS ?rccPolicyU). } 37
  • 40. GDPR POLICIES & COMPLIANCE ❖ Policies define guidelines for handling and implementing specific security or regulatory issues. ❖ With focus on the policy requirements for data protection we have built a policy/compliance model: ➢ Aiming on validating GDPR compliance for the compliance objects under policy target. ➢ Showcasing accountability with regards to GDPR policy requirements. Objectives ❖ Merge the gap between GDPR legislation obligations and operational level technology controls using semantic modelling to model the critical policy and compliance aspects. ❖ Use inferencing to preserve accountability of processing activities that handle PI data subject to regulatory compliance. 40
  • 41. GDPR RETENTION COMPLIANCE 41 edg:Policy edg:DataPolicy rdfs:subclass rdfs:subclass edg:ComplianceAspect edg:Policy Requirement ia.jpmc:DataRetention RecordClass ia.jpmc:categorized ByCountry ia.jpmc:categorized ByBusinessFunction ia.jpmc:Country ia.jpmc:Business Function edg:compliesWith edg:DataRetention Policy rdf:type ia.jpmc:dataRetentionPolicy edg:RequirementAsset rdf:typeedg:hasRequirement edg:GDPRRegulatory Requirement rdfs:subclass
  • 43. Thanks for attending ☺ Q & A Name: Paraskevi Zerva Email: Linkedin: 43
  • 45. Extending Data Lineage to Data Provenance 45 prov:Activity Non-Linear Activity Linear Activity Workflow Pipeline rdfs:subclass rdfs:subclass rdfs:subclass composedOf/contains Software Program Software Program Execution computationOf executionOf Software Program Computation runsOn prov:Agent prov:Server rdfs:subclass prov:Entity rdfs:subclass Processable Processable input subproperty of prov:wasUsed output subproperty of prov:wasGeneratedBy prov:wasDerivedFrom