Enrico Daga† Mathieu d’Aquin† Aldo Gangemi‡ Enrico Motta†
† Knowledge Media Institute, The Open University (UK)
‡ Université Paris13 (France) and ISTC-CNR (Italy)
The 8th International Conference on Knowledge Capture (K-CAP 2015)
October 10th, 2015 - Palisades, NY (USA)
http://www.k-cap2015.org/
1. Propagation of Policies in Rich Data Flows
1
Enrico Daga†
Mathieu d’Aquin†
Aldo Gangemi‡
Enrico Motta†
†
Knowledge
Media
Ins2tute,
The
Open
University
(UK)
‡
Université
Paris13
(France)
and
ISTC-‐CNR
(Italy)
The
8th
Interna2onal
Conference
on
Knowledge
Capture
(K-‐CAP
2015)
October
10th,
2015
-‐
Palisades,
NY
(USA)
hRp://www.k-‐cap2015.org/
Feedback
welcome:
@enridaga
#kmiou
2. Motivation
• Governing the life cycle of data on the web is a
challenging issue for organisations and users.
• Assessing what policies on input data propagate to
the output of a process is a crucial problem.
2
3. Policies
3
Foundations
Constraints and permissions set by the data owner regarding the reuse of
the data, ie. mostly licences.
We can describe licences/policies (RDF+ODRL).
RDF License Database
http://datahub.io/dataset/rdflicense
Describes ~140 licenses using RDF
and the Open Digital Rights Language
(ODRL).
Reports 113 policies.
lic:cc-by-nc4.0 a odrl:Policy ;
rdfs:label "CC-BY-NC" ;
odrl:permission [
odrl:action cc:Distribution,
ldr:extraction , ldr:reutilization ,
cc:DerivativeWorks , cc:Reproduction ;
odrl:duty [ odrl:action
cc:Attribution , cc:Notice ] ] ;
odrl:prohibition [ odrl:action
cc:CommercialUse ]
5. Data flows
5
Foundations
What are the semantic relations
between input and output?
Data flows can be thought as graphs of data objects.
Datanode is an ontology for data centric description of applications.
Built as hierarchy of relations.
More then 100 relations so far.
http://purl.org/datanode/ns/
http://purl.org/datanode/docs/
6. Policy Propagation Rule (PPR)
6
Relying on Datanode for an enumeration of the possible relations and
on the RDF License Database for the possible policies, we can setup
rules like the following:
A Horn clauses of the form:
propagates(odrl:duty cc:Attribution, dn:isCopyOf)
For example:
It can be simplified as:
Foundations
7. Problem
A description of policies and data flows implies a huge number of
Policy Propagation Rules to be specified and computed (number
of possible policies times number of possible relations between data
objects).
How to abstract this KB to make the management and
reasoning on policy propagation rules easier?
7
8. Contributions
(1) A methodology to obtain an abstraction that allows to reduce the
number of rules significantly (using an Ontology).
(2) Evaluate how effective this methodology is when using the
Datanode ontology.
(3) Demonstrate how this ontology can evolve in order to better
represent the behaviour of Policy Propagation Rules.
8
9. (A)AAAA Methodology
9
1.Aquire rules
2.Analyse the rules: find clusters using Formal Concept Analysis
3.Abstract the rules: match clusters & ontology hierarchy
4.Assess the compression, and diagnose errors or refinements in the
rules or the ontology
5.Adjust the rules or the ontology
repeat from 2.
10. Preparing the Rules Base
• This phase required a manual supervision of all associations
between policies and relations in order to establish the initial set
of propagation rules.
• We used Contento, it was possible to prepare manually the rule
base with a reasonable effort.
• The initial knowledge base was then composed of 3363 Policy
Propagation Rules
10
Acquisition
11. Formal Concept Analysis (FCA)
• Input is a Formal Context (a binary matrix of objects/attributes)
• Basic unit is a Close Concept:
– (O,A) => (Extension,Intension)
– Closure operator ’ … (O,A) is a concept when O’=A and A’=O
• Classifies concepts hierarchically in a concept lattice
– Top: all objects, no attr, bottom: all attributes, no obj
11
Analysis
12. Applying FCA to the Rule Base
80 concepts
Clusters of rules
Relations that have a common behaviour: they
propagate the same policies.
But why do they do it?
12
Analysis
13. Detect matches with the ontology
13
Abstraction
Search for matches between concepts and branches taken
from the ontology hierarchy.
When found, subtract the rules from the KB accordingly.
16. Compression Factor
16
Assessment
By applying the original Datanode ontology, 1925 rules could be
removed out of 3363, for a compression factor of 0.572.
We calculate the CF as the number of abstracted rules divided by the
total number of rules:
17. Considerations
17
Assessment
2. The Datanode ontology has not been designed for the purpose of
representing a common behaviour of relations in terms of propagation
of policies.
It is possible to refine the ontology in order to make it cover this use
case better (and possibly reduce the number of rules even more).
1. The size of the matrix that was manually supervised is large, and it
is possible that errors have been made at that stage of the process.
18. Observing the measures
18
Assessment
concepts * relations=~9040 (partial) matches. Hard to explore!
Inspecting a partial match with high precision and low recall highlights a
problem that might be easy to fix, as the number of relations and policies to
compare will be low.
19. Operations
19
Adjustment
We try to make adjustments in order to improve the compression
factor. We defined a set of operations, targeted to (a) fix errors in the
initial rule base and (b) refine the ontology.
• Fill: makes a branch be fully in a cluster of concept c, attempting to
push Pre up to 1.
• Group: some relations that share a concept, but belong to different
branches, are abstracted by a new relation.
• Merge: two distinct branches are abstracted by a new relation.
• Wedge: change the top relation of a branch to make it fully match
the concept.
(and we run our process again from the Analysis phase to the Assessment)
20. Example: Fill
20
Adjustment
… dn:isSelectionOf should indeed propagate all the policies listed
in this concept. Fill adds the following rules:
propagates(dn:isSelectionOf, duty cc:Attribution)
propagates(dn:isSelectionOf, duty cc:Copyleft)
propagates(dn:isSelectionOf, duty cc:Notice)
propagates(dn:isSelectionOf, duty cc:SourceCode)
propagates(dn:isSelectionOf, duty odrl:attachPolicy)
propagates(dn:isSelectionOf, duty odrl:attachSource)
propagates(dn:isSelectionOf, duty odrl:attribute)
22. Evaluation
22
As a result we obtained: 3865 rules in total, 78
concepts, 2817 rules abstracted and 1048
rules remaining - for a compression factor of
0.729.
Thanks to this methodology we have been
able to fix many errors in the initial data, and to
refine Datanode by clarifying the semantics of
many properties and adding new ones.
23. Conclusions
23
• We presented a method to abstract Policy Propagation Rules
applying an ontology, the Datanode ontology, a hierarchical
organisation of the possible relations between data objects.
• Datanode allowed us to reduce the number of rules to a factor of 0.5.
• Applying the ontology and the method we were able to find and correct
errors in the rules.
• Moreover we have been able to analyse the ontology in relation to this
task and enhance it having as result not only a further reduction of rules
- 0.7, but also a better ontology.
24. Future work
• Apply the rule base to specific use cases (perform a practical
evaluation), and publish it for reuse.
• Extend the Assessment phase of the methodology to also include
consistency check between the hierarchy of the FCA lattice and the
ontology.
• Define additional operations to support the Adjustment phase.
• Study the rule base evolution via continuous integration of new
policies/actions/rules.
• Apply the methodology to other use cases. Should we integrate the
measures and operations of the methodology in the Contento tool?
24
26. 76Bottom-Up Ontology Construction with CONTENTO
http://bit.ly/contento-tool
Contento supports the user in the generation and curation of concept lattices to
produce Semantic Web Ontologies from data.
In the demo, we show how to use CONTENTO with Linked Data.
Formal
Context
Concept
Lattice
Modeling
(Naming &
Pruning)
SPARQL
Export as
OWL
Ontology
(adver2sement)
@ISWC
2015
Poster
and
Demo
session,
next
week…
27. Operations: Fill
27
Annex
Observation: the branch is close to be fully in the concept (high Pre)
Diagnosis: the branch must be fully in the concept…
Effect: The Fill operation makes a branch be fully in a cluster of concept
c, attempting to push Pre up to 1.
28. Operations: Merge
28
Annex
Observation: two distinct branches match the same concept
Diagnosis: the respective top relations can be abstracted by a new
common relation
Effect: a new relation is added to the ontology
29. Operations: Group
29
Annex
Observation: A set of relations are all together in the extent of a concept,
but belong to different branches.
Diagnosis: they are actually related, and a common parent is missing.
Effect: a new relation is added to the ontology.
30. Operations: Wedge
30
Annex
Observation: the top relation is missing, and it does not propagate (all)
the policies of the concept.
Diagnosis: the actual top relation is not a right abstraction.
Effect: a new relation is added to the ontology.