The main aim of Data-Centric Architecture is to reduce complexity of information systems by using shared data with clear meaning. But how can you trust your data? How do you know if it is accurate and reliable?
Just Call Vip call girls Bellary Escorts ☎️9352988975 Two shot with one girl ...
Data Provenance and PROV Ontology
1. Data Provenance and
PROV Ontology
Trusting Shared Data in the
Data-Centric Architecture
Semantic Web London
Eugene Morozov
@eugenemorozov
https://www.linkedin.com/in/emorozov/
2. About me
● Software engineer
● Lead European engineering practice at Lab49 - a technology, design and
business strategy consulting firm specialising in financial services
● Run Semantic Web London Meetup
Semantic Web London
3. Data-Centric Architecture
Semantic Web London
Ingest
Processes
Analysis
External
Behaviour
Shared Data
Credits: Dave McComb. The Data-Centric Revolution. 2019.
4. Data-Centric Architecture
Semantic Web London
Ingest
Processes
Analysis
External
Behaviour
Shared Data
Is your data
accurate and
reliable? Can you
trust it?
Credits: Dave McComb. The Data-Centric Revolution. 2019.
5. Data-Centric Architecture
Semantic Web London
Ingest
Processes
Analysis
External
Behaviour
Shared Data
Is your data
accurate and
reliable? Can you
trust it?
Provenance is
one way to
evaluate trust.
Credits: Dave McComb. The Data-Centric Revolution. 2019.
6. Data Provenance
Semantic Web London
“Originally from the French provenir,
meaning to come from. It represents
the origin or source of something,
the history of ownership, the
location of an object.”
The DAMA Dictionary of Data Management, 2nd Edition. 2011
7. Multiple Facets to Provenance
● How it is captured
● How granular it is
● What layer in the stack it is captured at
● How it is integrated
● etc.
Semantic Web LondonLucian Carata et al. A Primer on Provenance. ACM Queue. March 2014.
8. Multiple Facets to Provenance
● How it is captured
● How granular it is
● What layer in the stack it is captured at
● How it is integrated
● etc.
Semantic Web London
This is where
PROV comes in
Lucian Carata et al. A Primer on Provenance. ACM Queue. March 2014.
9. PROV-O
● Family of specifications for Data Provenance
● PROV-O is OWL serialization of PROV
● W3C Recommendation since 2013
● Used by financial services firms
Semantic Web LondonPaul Groth, Luc Moreau. Provenance. An Introduction to PROV. 2013.
10. PROV-O - Entire Class Hierarchy
PROV-O is
very compact
as just 30
classes
Semantic Web London
11. PROV-O - Starting Point Terms
Semantic Web London
Entity is a physical, digital,
conceptual, or other kind of
thing with some fixed
aspects; entities may be
real or imaginary.
Agent is something that
bears some form of
responsibility for an activity
taking place, for the
existence of an entity, or for
another agent's activity.
Activity is something that
occurs over a period of
time and acts upon or with
entities; it may include
consuming, processing,
transforming, modifying,
relocating, using, or
generating entities.
Credits: https://www.w3.org/TR/prov-o/
12. PROV-O - Expanded Terms
Semantic Web LondonCredits: https://www.w3.org/TR/prov-o/
13. PROV-O - Qualified Terms
Semantic Web London
Derivation
example to
show the
general pattern
Credits: https://www.w3.org/TR/prov-o/
14. Data-Centric Architecture
Semantic Web London
Provenance
Ingest
Processes
Analysis
External
Behaviour
Shared Data
Capture
provenance for
Shared Data in
common terms of
PROV-O
15. Data-Centric Architecture
Semantic Web London
Provenance
Ingest
Processes
Analysis
External
Behaviour
Shared Data
But how? What
tools do I have as
a developer to
make that
happen?
Capture
provenance for
Shared Data in
common terms of
PROV-O
17. What are repo transactions?
Semantic Web London
Cash
Bonds
Borrower BankStart
Same amount of cash + interest
Same nominal amount of bonds
Borrower BankEnd
Some other bonds
Bonds
Borrower Bank
Substitute
collateral
time
18. What would the system look like?
Semantic Web London
Counterparty
service
Risk
calculator
SPARQL
sink
Provenance
view
Trade
service
trades
counterparties
provenance
risk
kafka kafka connect
21. Integrating values
Semantic Web London
:calc-risk-1
:risk-1
prov:wasGeneratedBy
:risk-s-1 :risk-s-1 a
prov:SoftwareAgent:probability-005
0.05
prov:value :calc-risk-1-prob
:calc-risk-1-prob a
prov:Usage
prov:entity
:riskParameter
prov:hadRole
:riskParameter a
prov:Role
prov:used
prov:qualifiedUsage
prov:wasEndedBy
prov:wasStartedBy
22. Integrating values
Semantic Web London
:calc-risk-1
:risk-1
prov:wasGeneratedBy
:risk-s-1 :risk-s-1 a
prov:SoftwareAgent:probability-005
0.05
prov:value :calc-risk-1-prob
:withProbability a
prov:Usage
prov:entity
:riskParameter
prov:hadRole
:riskParameter a
prov:Role
prov:used
prov:qualifiedUsage
prov:wasEndedBy
prov:wasStartedBy
Similarly for specific
calculations,
versions of libraries,
etc.
23. Tools to create PROV content
Semantic Web London
Language-specific
mapping such as
ProvToolbox for Java
instead of Jena, RDF4J,
rdflib, SPARQL
CONSTRUCT, etc.
24. Tools to create PROV content
Semantic Web London
Language-specific
mapping such as
ProvToolbox for Java
instead of Jena, RDF4J,
rdflib, SPARQL
CONSTRUCT, etc.
ProvToolbox has just
made it to 0.9.1 but
looks too verbose to
a Java person
26. Tools to visualize data
Semantic Web London
A specific visualization
application is not
important and can be
replaced because we
captured provenance in
common terms
27. Q&A
Semantic Web London
Eugene Morozov
@eugenemorozov
https://www.linkedin.com/in/emorozov/
https://www.meetup.com/semantic-web-london/