Propagating Data Policies
A User Study
1
Enrico Daga (The Open University, UK)
Mathieu d’Aquin (Insight Centre, NUI Galway, Ireland)
Enrico Motta (The Open University, UK)
K-­‐CAP2017	
  
The	
  9th	
  Interna4onal	
  Conference	
  on	
  Knowledge	
  Capture	
  
December	
  4th-­‐6th,	
  2017Aus4n,	
  Texas,	
  United	
  States	
  
City Data Hubs
2
Smart	
  Bins	
  to	
  make	
  
garbage	
  collec2on	
  more	
  ecient
Monitor	
  parking	
  spaces	
  to	
  
support	
  ci2zens’	
  mobility
Observe	
  busyness	
  of	
  places	
  to	
  
be=er	
  tune	
  services
Forecast	
  car	
  accidents	
  to	
  improve	
  
drivers’	
  awareness
MK:Smart	
  is	
  an	
  integrated	
  innova4on	
  
and	
  support	
  programme	
  leveraging	
  
large-­‐scale	
  city	
  data	
  to	
  drive	
  growth	
  in	
  
Milton	
  Keynes	
  (UK).
hPps://datahub.mksmart.org
Delivery
Onboarding
Processing
Acquisi4on
Data	
  Hub
It is a loop!
Feedback:	
  @enridaga	
  	
  
Policy Propagation
In City Data Hubs publisher, processor and consumer are different.
3
How to support data managers on making aware developers
about the policies an output might derive from the data sources?
Feedback:	
  @enridaga	
  	
  
4
Policy Propagation
propagates(duty:attribution,processed into)
…
has(dataset1,permission:DerivativeWorks)
has(dataset1,duty:attribution)
…
ODRL	
  -­‐	
  License	
  and	
  Terms	
  of	
  Use
Datanode	
  -­‐	
  Process
Policy	
  Propaga=on	
  Rules
Focus on the knowledge components:

policies and processes
has(output, duty:attribution)
has(output, permission:commercialise)
…
has(X,P) ⋀ propagates(P,R) ⋀
relation(R,X,Y) → has(Y,P)
These are the
policies derived from
the sources!
Feedback:	
  @enridaga	
  	
  
Previous work
5
Data flow descriptions

“Describing semantic web applications through relations between data
nodes”, Technical Report, 2015
Rules acquisition and management 

“Propagation of policies in rich data flows”, K-Cap, 2015.
Policy knowledge acquisition 

“A Bottom-Up Approach for Licences Classification and Selection”, LeDA-
SWAn, 2015.
Process knowledge acquisition 

“An incremental learning method to support the annotation of workflows with
data-to-data relations”, EKAW, 2016.
Applicability in an end-to-end scenario 

“Addressing exploitability of smart city data”, IEEE ISC2, 2016.
Efficiency of the reasoner 

“Reasoning with Data Flows and Policy Propagation Rules”, Semantic Web
Journal, Special Issue to appear 2017?
Objective
To evaluate the system accuracy and utility:
• Are the assumptions made about the needed components the correct
ones?
• To what extent a system supporting such a task can be accurate?
When it is not, are there any fundamental reasons?
• How difficult is the task? Is this support needed?
6Feedback:	
  @enridaga	
  
User Study Design
A Data Hub manager to take decisions about policy propagation
• having the same knowledge used by the system.
• 10 participants selected among researchers and PhD students
having data analysis skills
• No legal expertise: developers and data managers don’t have that.
• Working in pairs: required to develop an agreement.
• 5 teams: MAPI, ILAN, CAAN, ALPA, NIFR
• Real data sources (MK Data Hub), licences (RDF Licence Database),
and processes (Rapid Miner les found on GitHub.com)
• 5 data journeys designed by mapping data sources and processes
in a narrative: SCRAPE, FOOD, CLOUD, AVG, CLEAN
• Distributed in a latin square: a team for 2 journeys and a journey for
2 teams; at most 1 shared journey between teams.
7Feedback:	
  @enridaga	
  
User Study Design
We developed a tool to assist participants in a data journey:
1. Understand the process (on Rapid Miner)
2. Understand the input datasets (from the MK Data Hub)
3. Understand the input policies (from the RDF Licence Database)
4. Decide what policies shall propagate to the output (1-5 Likert scale)
5. Compare with the automatic reasoner and explain differences
8Feedback:	
  @enridaga	
  
Understand a data journey
9
Table 6.1: Data Journeys (a,b).
(a) SCRAPE
SCRAPE Milton Keynes Websites Scraper.
The content of Websites about Milton Keynes is downloaded. Each web page is processed in order
to select only the relevant part of the content. After each iteration the resulting text is appended to a
dataset. The resulting dataset is modied before being saved in the local data storage.
Datasets Milton Keynes Council Website (UK OGL 2.0), MK50 Website (All rights reserved),
Wikipedia pages about Milton Keynes (CC-By-SA 3.0)
Policies Permissions: reutilization, Reproduction, Distribution, DerivativeWorks. Prohi-
bitions: Distribution, IPRRight, Reproduction, DerivativeWorks, databaseRight,
reutilization, extraction, CommercialUse. Duties: Notice, ShareAlike, Attribution.
Process https://github.com/mtrebi/SentimentAnalyzer/tree/
master/process/scraper.rmp
Teams ILAN, MAPI
(b) FOOD
FOOD Models for Food Rating Prediction.
A lift chart graphically represents the improvement that a mining model provides when compared
against a random guess, and measures the change in terms of a lift score. In this task, two techniques
Understand a data journey
10
Process https://github.com/mtrebi/SentimentAnalyzer/tree/
master/process/scraper.rmp
Teams ILAN, MAPI
(b) FOOD
FOOD Models for Food Rating Prediction.
A lift chart graphically represents the improvement that a mining model provides when compared
against a random guess, and measures the change in terms of a lift score. In this task, two techniques
are compared, namely Decision Tree and Naive Bays. The task uses data about Food Ratings in
information about quality of life in Milton Keynes Wards. The result are two pareto charts to be
compared.
Datasets OpenDataCommunities Worthwhile 2011-2012 Average Rating (UK OGL 2.0), Food
Establishments Info and Ratings (Terms of use)
Policies Permissions: DerivativeWorks, Distribution, Reproduction, display, extraction, reuti-
lization. Prohibitions: modify, use. Duties: Attribution, Notice, display.
Process https://github.com/samwar/tree/master/rapid_miner_
training/16_lift_chart.rmp
Teams NIFR, ALPA
Comparing with system
11
Explanation: blocked
12
Explanation: propagated
13
A discussion on ODRL action dependencies and how they
a↵ect the policy semantics is included in [13]. Nonetheless, to
the best of our knowledge, the rst attempt to analyse how
policies can propagate in manipulation processes is the one
presented in one of our earlier papers [6]. In [6] we introduced
Figure 1: Explanation: propagation trace.
Explaining differences
14
Analysis
• Accuracy analysis. To evaluate the system (agreement with users)
• Teams agree the system is right
• Teams agree the system is not right
• Eg: the knowledge base needs to be improved (e.g. rules)
• Teams disagree: we don’t know whether the system is right!
• Thematic analysis. Focusing on the disagreements between users
• From explanations given in the last phase
• From discussions during the study (reaching agreement)
• Grounded Theory Approach
• Questionnaire. To assess the value to users after the study
experience.
15Feedback:	
  @enridaga	
  
Questionnaire
16
Questionnaire
17
produc-
bitions:
y.
miner
d in order
nce score.
license,
med/
ords.
rive, in-
tribute,
nymize.
n sensors
ready for
Weather
her Sta-
erms of
Distri-
ization,
Q.2
Do you think you had enough information to decide?
Yes 2 6 2 0 0 No
Q.3
How di cult was it to reach an agreement?
Easy 1 5 2 2 0 Di cult
Q.4
Somebody with strong technical skills is absolutely required to take this decision.
Do you agree?
Yes 1 8 1 0 0 No
Q.5
Somebody with strong technical skills is absolutely required to take this decision
even with the support of automated reasoning. Do you agree?
Yes 1 5 1 3 No
Q.6
Understanding the details of the process is fundamental to take a decision. Do
you agree?
Yes 6 3 1 0 0 No
Q.7
How enjoyable was it to discuss and decide on policies and how they propagate
in a process?
Very 4 5 0 1 0 Not
Q.8
How feasible/sustainable do you think it is to discuss and decide on policies
and how they propagate in a process?
Feasible 3 1 3 1 2 Unfeasible
Q.9
How sustainable do you think it is to discuss and decide on policies and how
they propagate in a process with the support of our system?
Feasible 5 4 0 1 0 Unfeasible
2
10
6 1
The owner of the input data
The process executor
The consumer of the processed data
They must do it together
Nobody (it cannot be done)
Q.10 Who should decide on what policies propagate to the output of a process?
Accuracy analysis
18
D Number of policies of data sources / Number of decisions
T_avg Agreement with system, average of the T1 and T2
T12 Agreement between T1 and T2
T_12+ Agreement on certain answers
T_1+,T_2+ Amount of certain answers
Permissions
8 20 12
1 5 2
8 15 15
3 13 6
31 56.5
0.6 0.8 0.7 0.6 0.6 0.9 0.5
0.7 1 0.9 0.7 0.2 0.7 0.3
1 0.5 0.8 0.5 1 1 1
0.7 0.5 0.6 0.8 0.2 0.8 0.4
0.8 0.7 0.6 0.7
otals)
T12+ T1+ T2+
0 0 3
6 12 6
0 0 1
0 7 7
2 4 5
8 22.5
(d) (ratios)
T1 T2 Tavg T12 T12+ T1+ T2+
0.8 0.4 0.6 0.6 0 0 0.6
1 1 1 1 0.5 1 0.5
0 1 0.5 0 N.A. 0 0.5
1 0 0.5 0 N.A. 1 1
0.4 0 0.2 0.6 0.4 0.5 0.6
0.6 0.6 0.4 0.7
otals)
T12+ T1+ T2+
8 8 8
2 4 2
0 4 0
7 7 7
0 5 0
17 22.5
(f) (ratios)
T1 T2 Tavg T12 T12+ T1+ T2+
1 1 1 1 1 1 1
0 0.5 0.3 0.5 1 1 0.5
1 1 1 1 0 1 0
1 1 1 1 1 1 1
1 1 1 1 0 1 0
0.9 0.9 0.7 0.8
als)
T12+ T1+ T2+
3 3 3
0 4 4
1 1 1
1 1 1
1 4 1
(h) (ratios)
T1 T2 Tavg T12 T12+ T1+ T2+
1 1 1 1 1 1 1
0.3 0.7 0.5 0 N.A. 0.7 0.7
1 1 1 1 1 1 1
1 1 1 1 1 1 1
1 1 1 1 0.3 1 0.3
FOOD 22 14 18 16 14 8 20 12
CLOUD 7 5 7 6 5 1 5 2
AVG 15 15 8 11.5 8 8 15 15
CLEAN 17 12 9 10.5 14 3 13 6
All 77 58 55 31 56.5
0.6 0.8 0.7 0.6 0.6 0.9 0.5
0.7 1 0.9 0.7 0.2 0.7 0.3
1 0.5 0.8 0.5 1 1 1
0.7 0.5 0.6 0.8 0.2 0.8 0.4
0.8 0.7 0.6 0.7
(c) Permissions (totals)
Journey D T1 T2 Tavg T12 T12+ T1+ T2+
SCRAPE 5 4 2 3 3 0 0 3
FOOD 12 12 12 12 12 6 12 6
CLOUD 2 0 2 1 0 0 0 1
AVG 7 7 0 3.5 0 0 7 7
CLEAN 8 3 0 1.5 5 2 4 5
All 34 21 20 8 22.5
(d) (ratios)
T1 T2 Tavg T12 T12+ T1+ T2+
0.8 0.4 0.6 0.6 0 0 0.6
1 1 1 1 0.5 1 0.5
0 1 0.5 0 N.A. 0 0.5
1 0 0.5 0 N.A. 1 1
0.4 0 0.2 0.6 0.4 0.5 0.6
0.6 0.6 0.4 0.7
(e) Prohibitions (totals)
Journey D T1 T2 Tavg T12 T12+ T1+ T2+
SCRAPE 8 8 8 8 8 8 8 8
FOOD 4 0 2 1 2 2 4 2
CLOUD 4 4 4 4 4 0 4 0
AVG 7 7 7 7 7 7 7 7
CLEAN 5 5 5 5 5 0 5 0
All 28 25 26 17 22.5
(f) (ratios)
T1 T2 Tavg T12 T12+ T1+ T2+
1 1 1 1 1 1 1
0 0.5 0.3 0.5 1 1 0.5
1 1 1 1 0 1 0
1 1 1 1 1 1 1
1 1 1 1 0 1 0
0.9 0.9 0.7 0.8
(g) Duties (totals)
Journey D T1 T2 Tavg T12 T12+ T1+ T2+
SCRAPE 3 3 3 3 3 3 3 3
FOOD 6 2 4 3 0 0 4 4
CLOUD 1 1 1 1 1 1 1 1
AVG 1 1 1 1 1 1 1 1
CLEAN 4 4 4 4 4 1 4 1
(h) (ratios)
T1 T2 Tavg T12 T12+ T1+ T2+
1 1 1 1 1 1 1
0.3 0.7 0.5 0 N.A. 0.7 0.7
1 1 1 1 1 1 1
1 1 1 1 1 1 1
1 1 1 1 0.3 1 0.3
during the study, that are discussed
ANALYSIS
ow the decisions made by the users
The decisions taken by the system
3. For example, the SCRAPE data
16 policies and the system decided
4 of the 5 permissions and all the
Tables 4a-4h summarize the results
tative way. The values are shown
e full numbers and the computed
decisions (Tables 4a and 4b), and
s (Tables 4c and 4d), prohibitions
ties (Tables 4g and 4h). The values
ne of the user study (data journey
ed for each data journey (average
as totals considering the decisions
at the bottom). The data journeys
wenty-two policies to be analysed
ven decisions. Table 4a shows the
CLOUD 2 0 2 1 0 0 0 1
AVG 7 7 0 3.5 0 0 7 7
CLEAN 8 3 0 1.5 5 2 4 5
All 34 21 20 8 22.5
0
1
0
(e) Prohibitions (totals)
Journey D T1 T2 Tavg T12 T12+ T1+ T2+
SCRAPE 8 8 8 8 8 8 8 8
FOOD 4 0 2 1 2 2 4 2
CLOUD 4 4 4 4 4 0 4 0
AVG 7 7 7 7 7 7 7 7
CLEAN 5 5 5 5 5 0 5 0
All 28 25 26 17 22.5
T
1
0
1
1
1
(g) Duties (totals)
Journey D T1 T2 Tavg T12 T12+ T1+ T2+
SCRAPE 3 3 3 3 3 3 3 3
FOOD 6 2 4 3 0 0 4 4
CLOUD 1 1 1 1 1 1 1 1
AVG 1 1 1 1 1 1 1 1
CLEAN 4 4 4 4 4 1 4 1
All 15 12 9 6 11.5
T
1
0
1
1
1
altough this aspect will be discussed w
classes of policies.
Tables 4c and 4d only show result
type permission. The average agreeme
discussed
the users
e system
PE data
decided
d all the
e results
e shown
omputed
4b), and
hibitions
he values
journey
(average
decisions
journeys
analysed
hows the
CLOUD 2 0 2 1 0 0 0 1
AVG 7 7 0 3.5 0 0 7 7
CLEAN 8 3 0 1.5 5 2 4 5
All 34 21 20 8 22.5
0 1 0.5 0 N.A. 0 0.5
1 0 0.5 0 N.A. 1 1
0.4 0 0.2 0.6 0.4 0.5 0.6
0.6 0.6 0.4 0.7
(e) Prohibitions (totals)
Journey D T1 T2 Tavg T12 T12+ T1+ T2+
SCRAPE 8 8 8 8 8 8 8 8
FOOD 4 0 2 1 2 2 4 2
CLOUD 4 4 4 4 4 0 4 0
AVG 7 7 7 7 7 7 7 7
CLEAN 5 5 5 5 5 0 5 0
All 28 25 26 17 22.5
(f) (ratios)
T1 T2 Tavg T12 T12+ T1+ T2+
1 1 1 1 1 1 1
0 0.5 0.3 0.5 1 1 0.5
1 1 1 1 0 1 0
1 1 1 1 1 1 1
1 1 1 1 0 1 0
0.9 0.9 0.7 0.8
(g) Duties (totals)
Journey D T1 T2 Tavg T12 T12+ T1+ T2+
SCRAPE 3 3 3 3 3 3 3 3
FOOD 6 2 4 3 0 0 4 4
CLOUD 1 1 1 1 1 1 1 1
AVG 1 1 1 1 1 1 1 1
CLEAN 4 4 4 4 4 1 4 1
All 15 12 9 6 11.5
(h) (ratios)
T1 T2 Tavg T12 T12+ T1+ T2+
1 1 1 1 1 1 1
0.3 0.7 0.5 0 N.A. 0.7 0.7
1 1 1 1 1 1 1
1 1 1 1 1 1 1
1 1 1 1 0.3 1 0.3
0.8 0.6 0.7 0.8
altough this aspect will be discussed when looking at specic
classes of policies.
Tables 4c and 4d only show results involving policies of
type permission. The average agreement between the system
Prohibitions
0 7 7
2 4 5
8 22.5
1 0 0.5 0 N.A. 1 1
0.4 0 0.2 0.6 0.4 0.5 0.6
0.6 0.6 0.4 0.7
otals)
T12+ T1+ T2+
8 8 8
2 4 2
0 4 0
7 7 7
0 5 0
17 22.5
(f) (ratios)
T1 T2 Tavg T12 T12+ T1+ T2+
1 1 1 1 1 1 1
0 0.5 0.3 0.5 1 1 0.5
1 1 1 1 0 1 0
1 1 1 1 1 1 1
1 1 1 1 0 1 0
0.9 0.9 0.7 0.8
als)
T12+ T1+ T2+
3 3 3
0 4 4
1 1 1
1 1 1
1 4 1
6 11.5
(h) (ratios)
T1 T2 Tavg T12 T12+ T1+ T2+
1 1 1 1 1 1 1
0.3 0.7 0.5 0 N.A. 0.7 0.7
1 1 1 1 1 1 1
1 1 1 1 1 1 1
1 1 1 1 0.3 1 0.3
0.8 0.6 0.7 0.8
be discussed when looking at specic
nly show results involving policies of
verage agreement between the system
ng all the decisions is 0.6. Particularly,
AVG 7 7 0 3.5 0 0 7 7
CLEAN 8 3 0 1.5 5 2 4 5
All 34 21 20 8 22.5
1 0 0.5 0 N.A. 1 1
0.4 0 0.2 0.6 0.4 0.5 0.6
0.6 0.6 0.4 0.7
(e) Prohibitions (totals)
Journey D T1 T2 Tavg T12 T12+ T1+ T2+
SCRAPE 8 8 8 8 8 8 8 8
FOOD 4 0 2 1 2 2 4 2
CLOUD 4 4 4 4 4 0 4 0
AVG 7 7 7 7 7 7 7 7
CLEAN 5 5 5 5 5 0 5 0
All 28 25 26 17 22.5
(f) (ratios)
T1 T2 Tavg T12 T12+ T1+ T2+
1 1 1 1 1 1 1
0 0.5 0.3 0.5 1 1 0.5
1 1 1 1 0 1 0
1 1 1 1 1 1 1
1 1 1 1 0 1 0
0.9 0.9 0.7 0.8
(g) Duties (totals)
Journey D T1 T2 Tavg T12 T12+ T1+ T2+
SCRAPE 3 3 3 3 3 3 3 3
FOOD 6 2 4 3 0 0 4 4
CLOUD 1 1 1 1 1 1 1 1
AVG 1 1 1 1 1 1 1 1
CLEAN 4 4 4 4 4 1 4 1
All 15 12 9 6 11.5
(h) (ratios)
T1 T2 Tavg T12 T12+ T1+ T2+
1 1 1 1 1 1 1
0.3 0.7 0.5 0 N.A. 0.7 0.7
1 1 1 1 1 1 1
1 1 1 1 1 1 1
1 1 1 1 0.3 1 0.3
0.8 0.6 0.7 0.8
altough this aspect will be discussed when looking at specic
classes of policies.
Tables 4c and 4d only show results involving policies of
type permission. The average agreement between the system
and the users considering all the decisions is 0.6. Particularly,
Duties
ourneys: System decisions
P ermissions P rohibitions Duties
4/5 8/8 3/3
0/12 4/4 4/6
0/2 4/4 1/1
0/7 7/7 1/1
0/8 5/5 4/4
4/34 28/28 13/15
sion (Q.1, Q.9). This feedback shows
n is a di cult problem, although it
ight knowledge models. Therefore, a
k has good value for users. The last
ant to understand whether the Data
ally decide on policy propagation. It
the users think he/she cannot solve
/she should involve the data owner
in this task. This conclusion reflects
during the study, that are discussed
Table 4: Agreement analysis.
D: total number of decisions; T1, T2: ag
tem and each team; Tavg: average agre
and system; T12: agreement between t
between teams (only Certainly Yes/A
T1+, T2+: amount of Certainly Yes/Abso
Tables on the left indicate totals, while
show the related ratios.
(a) All decisions (totals)
Journey D T1 T2 Tavg T12 T12+ T1+ T2+
SCRAPE 16 15 13 14 14 11 11 14
FOOD 22 14 18 16 14 8 20 12
CLOUD 7 5 7 6 5 1 5 2
AVG 15 15 8 11.5 8 8 15 15
CLEAN 17 12 9 10.5 14 3 13 6
All 77 58 55 31 56.5
T1
0.9
0.6
0.7
1
0.7
(c) Permissions (totals)
Journey D T1 T2 Tavg T12 T12+ T1+ T2+
SCRAPE 5 4 2 3 3 0 0 3
FOOD 12 12 12 12 12 6 12 6
CLOUD 2 0 2 1 0 0 0 1
AVG 7 7 0 3.5 0 0 7 7
CLEAN 8 3 0 1.5 5 2 4 5
All 34 21 20 8 22.5
T1
0.8
1
0
1
0.4
ons
Duties
3/3
4/6
1/1
1/1
4/4
13/15
ack shows
though it
erefore, a
The last
the Data
gation. It
not solve
ata owner
on reflects
discussed
Table 4: Agreement analysis.
D: total number of decisions; T1, T2: agreement between sys-
tem and each team; Tavg: average agreement between teams
and system; T12: agreement between teams; T12+ agreement
between teams (only Certainly Yes/Absolutely No answers);
T1+, T2+: amount of Certainly Yes/Absolutely No answers.
Tables on the left indicate totals, while the ones on the right
show the related ratios.
(a) All decisions (totals)
Journey D T1 T2 Tavg T12 T12+ T1+ T2+
SCRAPE 16 15 13 14 14 11 11 14
FOOD 22 14 18 16 14 8 20 12
CLOUD 7 5 7 6 5 1 5 2
AVG 15 15 8 11.5 8 8 15 15
CLEAN 17 12 9 10.5 14 3 13 6
All 77 58 55 31 56.5
(b) (ratios)
T1 T2 Tavg T12 T12+ T1+ T2+
0.9 0.8 0.9 0.9 0.8 0.7 0.9
0.6 0.8 0.7 0.6 0.6 0.9 0.5
0.7 1 0.9 0.7 0.2 0.7 0.3
1 0.5 0.8 0.5 1 1 1
0.7 0.5 0.6 0.8 0.2 0.8 0.4
0.8 0.7 0.6 0.7
(c) Permissions (totals)
Journey D T1 T2 Tavg T12 T12+ T1+ T2+
SCRAPE 5 4 2 3 3 0 0 3
FOOD 12 12 12 12 12 6 12 6
CLOUD 2 0 2 1 0 0 0 1
AVG 7 7 0 3.5 0 0 7 7
CLEAN 8 3 0 1.5 5 2 4 5
All 34 21 20 8 22.5
(d) (ratios)
T1 T2 Tavg T12 T12+ T1+ T2+
0.8 0.4 0.6 0.6 0 0 0.6
1 1 1 1 0.5 1 0.5
0 1 0.5 0 N.A. 0 0.5
1 0 0.5 0 N.A. 1 1
0.4 0 0.2 0.6 0.4 0.5 0.6
0.6 0.6 0.4 0.7
All policies
Feedback:	
  @enridaga	
  
Thematic analysis
• Expected disagreements:
A. system should propagate a policy, it didn’t
B. system should block a policy, it didn’t
C. system cannot decide: information is not enough
• (C) never occurred: the system has enough information to make a
decision!
Example FOOD journey: high disagreement between teams, due to
the different interpretation of the output:
(a) an enhanced version of the input (input included in the output)
(b) an independent dataset (input not included in the output)
(More discussion in the paper)
19
Thematic analysis
• Incomplete knowledge: rules missing or wrong, data flows
inaccurate, …
• Data reverse engineering: recurring theme (e.g. FOOD)
• Content-dependent decisions: current experiments assumed
reasoning at design time, although it is also possible to reason on
execution traces
• Dependant policies: permission:modify implies permission:use - but
we haven’t considered that
• The Legal Knowledge: users commented on the lack of legal
expertise.
– a support tool is useful
– a legal framework on top of the components is theoretically
possible, although unlikely in the short term …
20
Conclusions
• The system is accurate as developers/ data managers can be
• variance between the teams similar to the ones between each
team and system (Cohen’s kappa coefficient)
• The task is perceived as difficult, although not impossible, and the
system is therefore of good value for users
• The assumption behind the system is correct: there is a fundamental
correspondence between the possible data-to-data relations and the
way policies are propagated
• (and reuse does not necessarily imply derivation)
21Feedback:	
  @enridaga	
  
Future work
• How to consider the rights of the stakeholders other then data
publishers (e.g. process designer?)
• How to assess consistency of processes wrt policies
• Policy and process knowledge must be accurate and reflect the
relevant data-to-data relations. How can we assure that?
Challenge: “They must do it together”
• Need for shared knowledge about “data actions”, from jargon to
consensus (remodelling, refactoring, extraction, …)
• Negotiation of policies between data providers, processors and
consumers - also a non-technical challenge
22
Thank you
23
Enrico Daga
enrico.daga@open.ac.uk
tw: @enridaga

Propagating Data Policies - A User Study

  • 1.
    Propagating Data Policies AUser Study 1 Enrico Daga (The Open University, UK) Mathieu d’Aquin (Insight Centre, NUI Galway, Ireland) Enrico Motta (The Open University, UK) K-­‐CAP2017   The  9th  Interna4onal  Conference  on  Knowledge  Capture   December  4th-­‐6th,  2017Aus4n,  Texas,  United  States  
  • 2.
    City Data Hubs 2 Smart  Bins  to  make   garbage  collec2on  more  efficient Monitor  parking  spaces  to   support  ci2zens’  mobility Observe  busyness  of  places  to   be=er  tune  services Forecast  car  accidents  to  improve   drivers’  awareness MK:Smart  is  an  integrated  innova4on   and  support  programme  leveraging   large-­‐scale  city  data  to  drive  growth  in   Milton  Keynes  (UK). hPps://datahub.mksmart.org Delivery Onboarding Processing Acquisi4on Data  Hub It is a loop! Feedback:  @enridaga    
  • 3.
    Policy Propagation In CityData Hubs publisher, processor and consumer are different. 3 How to support data managers on making aware developers about the policies an output might derive from the data sources? Feedback:  @enridaga    
  • 4.
    4 Policy Propagation propagates(duty:attribution,processed into) … has(dataset1,permission:DerivativeWorks) has(dataset1,duty:attribution) … ODRL  -­‐  License  and  Terms  of  Use Datanode  -­‐  Process Policy  Propaga=on  Rules Focus on the knowledge components:
 policies and processes has(output, duty:attribution) has(output, permission:commercialise) … has(X,P) ⋀ propagates(P,R) ⋀ relation(R,X,Y) → has(Y,P) These are the policies derived from the sources! Feedback:  @enridaga    
  • 5.
    Previous work 5 Data flowdescriptions
 “Describing semantic web applications through relations between data nodes”, Technical Report, 2015 Rules acquisition and management 
 “Propagation of policies in rich data flows”, K-Cap, 2015. Policy knowledge acquisition 
 “A Bottom-Up Approach for Licences Classification and Selection”, LeDA- SWAn, 2015. Process knowledge acquisition 
 “An incremental learning method to support the annotation of workflows with data-to-data relations”, EKAW, 2016. Applicability in an end-to-end scenario 
 “Addressing exploitability of smart city data”, IEEE ISC2, 2016. Efficiency of the reasoner 
 “Reasoning with Data Flows and Policy Propagation Rules”, Semantic Web Journal, Special Issue to appear 2017?
  • 6.
    Objective To evaluate thesystem accuracy and utility: • Are the assumptions made about the needed components the correct ones? • To what extent a system supporting such a task can be accurate? When it is not, are there any fundamental reasons? • How difficult is the task? Is this support needed? 6Feedback:  @enridaga  
  • 7.
    User Study Design AData Hub manager to take decisions about policy propagation • having the same knowledge used by the system. • 10 participants selected among researchers and PhD students having data analysis skills • No legal expertise: developers and data managers don’t have that. • Working in pairs: required to develop an agreement. • 5 teams: MAPI, ILAN, CAAN, ALPA, NIFR • Real data sources (MK Data Hub), licences (RDF Licence Database), and processes (Rapid Miner files found on GitHub.com) • 5 data journeys designed by mapping data sources and processes in a narrative: SCRAPE, FOOD, CLOUD, AVG, CLEAN • Distributed in a latin square: a team for 2 journeys and a journey for 2 teams; at most 1 shared journey between teams. 7Feedback:  @enridaga  
  • 8.
    User Study Design Wedeveloped a tool to assist participants in a data journey: 1. Understand the process (on Rapid Miner) 2. Understand the input datasets (from the MK Data Hub) 3. Understand the input policies (from the RDF Licence Database) 4. Decide what policies shall propagate to the output (1-5 Likert scale) 5. Compare with the automatic reasoner and explain differences 8Feedback:  @enridaga  
  • 9.
    Understand a datajourney 9 Table 6.1: Data Journeys (a,b). (a) SCRAPE SCRAPE Milton Keynes Websites Scraper. The content of Websites about Milton Keynes is downloaded. Each web page is processed in order to select only the relevant part of the content. After each iteration the resulting text is appended to a dataset. The resulting dataset is modied before being saved in the local data storage. Datasets Milton Keynes Council Website (UK OGL 2.0), MK50 Website (All rights reserved), Wikipedia pages about Milton Keynes (CC-By-SA 3.0) Policies Permissions: reutilization, Reproduction, Distribution, DerivativeWorks. Prohi- bitions: Distribution, IPRRight, Reproduction, DerivativeWorks, databaseRight, reutilization, extraction, CommercialUse. Duties: Notice, ShareAlike, Attribution. Process https://github.com/mtrebi/SentimentAnalyzer/tree/ master/process/scraper.rmp Teams ILAN, MAPI (b) FOOD FOOD Models for Food Rating Prediction. A lift chart graphically represents the improvement that a mining model provides when compared against a random guess, and measures the change in terms of a lift score. In this task, two techniques
  • 10.
    Understand a datajourney 10 Process https://github.com/mtrebi/SentimentAnalyzer/tree/ master/process/scraper.rmp Teams ILAN, MAPI (b) FOOD FOOD Models for Food Rating Prediction. A lift chart graphically represents the improvement that a mining model provides when compared against a random guess, and measures the change in terms of a lift score. In this task, two techniques are compared, namely Decision Tree and Naive Bays. The task uses data about Food Ratings in information about quality of life in Milton Keynes Wards. The result are two pareto charts to be compared. Datasets OpenDataCommunities Worthwhile 2011-2012 Average Rating (UK OGL 2.0), Food Establishments Info and Ratings (Terms of use) Policies Permissions: DerivativeWorks, Distribution, Reproduction, display, extraction, reuti- lization. Prohibitions: modify, use. Duties: Attribution, Notice, display. Process https://github.com/samwar/tree/master/rapid_miner_ training/16_lift_chart.rmp Teams NIFR, ALPA
  • 11.
  • 12.
  • 13.
    Explanation: propagated 13 A discussionon ODRL action dependencies and how they a↵ect the policy semantics is included in [13]. Nonetheless, to the best of our knowledge, the first attempt to analyse how policies can propagate in manipulation processes is the one presented in one of our earlier papers [6]. In [6] we introduced Figure 1: Explanation: propagation trace.
  • 14.
  • 15.
    Analysis • Accuracy analysis.To evaluate the system (agreement with users) • Teams agree the system is right • Teams agree the system is not right • Eg: the knowledge base needs to be improved (e.g. rules) • Teams disagree: we don’t know whether the system is right! • Thematic analysis. Focusing on the disagreements between users • From explanations given in the last phase • From discussions during the study (reaching agreement) • Grounded Theory Approach • Questionnaire. To assess the value to users after the study experience. 15Feedback:  @enridaga  
  • 16.
  • 17.
    Questionnaire 17 produc- bitions: y. miner d in order ncescore. license, med/ ords. rive, in- tribute, nymize. n sensors ready for Weather her Sta- erms of Distri- ization, Q.2 Do you think you had enough information to decide? Yes 2 6 2 0 0 No Q.3 How di cult was it to reach an agreement? Easy 1 5 2 2 0 Di cult Q.4 Somebody with strong technical skills is absolutely required to take this decision. Do you agree? Yes 1 8 1 0 0 No Q.5 Somebody with strong technical skills is absolutely required to take this decision even with the support of automated reasoning. Do you agree? Yes 1 5 1 3 No Q.6 Understanding the details of the process is fundamental to take a decision. Do you agree? Yes 6 3 1 0 0 No Q.7 How enjoyable was it to discuss and decide on policies and how they propagate in a process? Very 4 5 0 1 0 Not Q.8 How feasible/sustainable do you think it is to discuss and decide on policies and how they propagate in a process? Feasible 3 1 3 1 2 Unfeasible Q.9 How sustainable do you think it is to discuss and decide on policies and how they propagate in a process with the support of our system? Feasible 5 4 0 1 0 Unfeasible 2 10 6 1 The owner of the input data The process executor The consumer of the processed data They must do it together Nobody (it cannot be done) Q.10 Who should decide on what policies propagate to the output of a process?
  • 18.
    Accuracy analysis 18 D Numberof policies of data sources / Number of decisions T_avg Agreement with system, average of the T1 and T2 T12 Agreement between T1 and T2 T_12+ Agreement on certain answers T_1+,T_2+ Amount of certain answers Permissions 8 20 12 1 5 2 8 15 15 3 13 6 31 56.5 0.6 0.8 0.7 0.6 0.6 0.9 0.5 0.7 1 0.9 0.7 0.2 0.7 0.3 1 0.5 0.8 0.5 1 1 1 0.7 0.5 0.6 0.8 0.2 0.8 0.4 0.8 0.7 0.6 0.7 otals) T12+ T1+ T2+ 0 0 3 6 12 6 0 0 1 0 7 7 2 4 5 8 22.5 (d) (ratios) T1 T2 Tavg T12 T12+ T1+ T2+ 0.8 0.4 0.6 0.6 0 0 0.6 1 1 1 1 0.5 1 0.5 0 1 0.5 0 N.A. 0 0.5 1 0 0.5 0 N.A. 1 1 0.4 0 0.2 0.6 0.4 0.5 0.6 0.6 0.6 0.4 0.7 otals) T12+ T1+ T2+ 8 8 8 2 4 2 0 4 0 7 7 7 0 5 0 17 22.5 (f) (ratios) T1 T2 Tavg T12 T12+ T1+ T2+ 1 1 1 1 1 1 1 0 0.5 0.3 0.5 1 1 0.5 1 1 1 1 0 1 0 1 1 1 1 1 1 1 1 1 1 1 0 1 0 0.9 0.9 0.7 0.8 als) T12+ T1+ T2+ 3 3 3 0 4 4 1 1 1 1 1 1 1 4 1 (h) (ratios) T1 T2 Tavg T12 T12+ T1+ T2+ 1 1 1 1 1 1 1 0.3 0.7 0.5 0 N.A. 0.7 0.7 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0.3 1 0.3 FOOD 22 14 18 16 14 8 20 12 CLOUD 7 5 7 6 5 1 5 2 AVG 15 15 8 11.5 8 8 15 15 CLEAN 17 12 9 10.5 14 3 13 6 All 77 58 55 31 56.5 0.6 0.8 0.7 0.6 0.6 0.9 0.5 0.7 1 0.9 0.7 0.2 0.7 0.3 1 0.5 0.8 0.5 1 1 1 0.7 0.5 0.6 0.8 0.2 0.8 0.4 0.8 0.7 0.6 0.7 (c) Permissions (totals) Journey D T1 T2 Tavg T12 T12+ T1+ T2+ SCRAPE 5 4 2 3 3 0 0 3 FOOD 12 12 12 12 12 6 12 6 CLOUD 2 0 2 1 0 0 0 1 AVG 7 7 0 3.5 0 0 7 7 CLEAN 8 3 0 1.5 5 2 4 5 All 34 21 20 8 22.5 (d) (ratios) T1 T2 Tavg T12 T12+ T1+ T2+ 0.8 0.4 0.6 0.6 0 0 0.6 1 1 1 1 0.5 1 0.5 0 1 0.5 0 N.A. 0 0.5 1 0 0.5 0 N.A. 1 1 0.4 0 0.2 0.6 0.4 0.5 0.6 0.6 0.6 0.4 0.7 (e) Prohibitions (totals) Journey D T1 T2 Tavg T12 T12+ T1+ T2+ SCRAPE 8 8 8 8 8 8 8 8 FOOD 4 0 2 1 2 2 4 2 CLOUD 4 4 4 4 4 0 4 0 AVG 7 7 7 7 7 7 7 7 CLEAN 5 5 5 5 5 0 5 0 All 28 25 26 17 22.5 (f) (ratios) T1 T2 Tavg T12 T12+ T1+ T2+ 1 1 1 1 1 1 1 0 0.5 0.3 0.5 1 1 0.5 1 1 1 1 0 1 0 1 1 1 1 1 1 1 1 1 1 1 0 1 0 0.9 0.9 0.7 0.8 (g) Duties (totals) Journey D T1 T2 Tavg T12 T12+ T1+ T2+ SCRAPE 3 3 3 3 3 3 3 3 FOOD 6 2 4 3 0 0 4 4 CLOUD 1 1 1 1 1 1 1 1 AVG 1 1 1 1 1 1 1 1 CLEAN 4 4 4 4 4 1 4 1 (h) (ratios) T1 T2 Tavg T12 T12+ T1+ T2+ 1 1 1 1 1 1 1 0.3 0.7 0.5 0 N.A. 0.7 0.7 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0.3 1 0.3 during the study, that are discussed ANALYSIS ow the decisions made by the users The decisions taken by the system 3. For example, the SCRAPE data 16 policies and the system decided 4 of the 5 permissions and all the Tables 4a-4h summarize the results tative way. The values are shown e full numbers and the computed decisions (Tables 4a and 4b), and s (Tables 4c and 4d), prohibitions ties (Tables 4g and 4h). The values ne of the user study (data journey ed for each data journey (average as totals considering the decisions at the bottom). The data journeys wenty-two policies to be analysed ven decisions. Table 4a shows the CLOUD 2 0 2 1 0 0 0 1 AVG 7 7 0 3.5 0 0 7 7 CLEAN 8 3 0 1.5 5 2 4 5 All 34 21 20 8 22.5 0 1 0 (e) Prohibitions (totals) Journey D T1 T2 Tavg T12 T12+ T1+ T2+ SCRAPE 8 8 8 8 8 8 8 8 FOOD 4 0 2 1 2 2 4 2 CLOUD 4 4 4 4 4 0 4 0 AVG 7 7 7 7 7 7 7 7 CLEAN 5 5 5 5 5 0 5 0 All 28 25 26 17 22.5 T 1 0 1 1 1 (g) Duties (totals) Journey D T1 T2 Tavg T12 T12+ T1+ T2+ SCRAPE 3 3 3 3 3 3 3 3 FOOD 6 2 4 3 0 0 4 4 CLOUD 1 1 1 1 1 1 1 1 AVG 1 1 1 1 1 1 1 1 CLEAN 4 4 4 4 4 1 4 1 All 15 12 9 6 11.5 T 1 0 1 1 1 altough this aspect will be discussed w classes of policies. Tables 4c and 4d only show result type permission. The average agreeme discussed the users e system PE data decided d all the e results e shown omputed 4b), and hibitions he values journey (average decisions journeys analysed hows the CLOUD 2 0 2 1 0 0 0 1 AVG 7 7 0 3.5 0 0 7 7 CLEAN 8 3 0 1.5 5 2 4 5 All 34 21 20 8 22.5 0 1 0.5 0 N.A. 0 0.5 1 0 0.5 0 N.A. 1 1 0.4 0 0.2 0.6 0.4 0.5 0.6 0.6 0.6 0.4 0.7 (e) Prohibitions (totals) Journey D T1 T2 Tavg T12 T12+ T1+ T2+ SCRAPE 8 8 8 8 8 8 8 8 FOOD 4 0 2 1 2 2 4 2 CLOUD 4 4 4 4 4 0 4 0 AVG 7 7 7 7 7 7 7 7 CLEAN 5 5 5 5 5 0 5 0 All 28 25 26 17 22.5 (f) (ratios) T1 T2 Tavg T12 T12+ T1+ T2+ 1 1 1 1 1 1 1 0 0.5 0.3 0.5 1 1 0.5 1 1 1 1 0 1 0 1 1 1 1 1 1 1 1 1 1 1 0 1 0 0.9 0.9 0.7 0.8 (g) Duties (totals) Journey D T1 T2 Tavg T12 T12+ T1+ T2+ SCRAPE 3 3 3 3 3 3 3 3 FOOD 6 2 4 3 0 0 4 4 CLOUD 1 1 1 1 1 1 1 1 AVG 1 1 1 1 1 1 1 1 CLEAN 4 4 4 4 4 1 4 1 All 15 12 9 6 11.5 (h) (ratios) T1 T2 Tavg T12 T12+ T1+ T2+ 1 1 1 1 1 1 1 0.3 0.7 0.5 0 N.A. 0.7 0.7 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0.3 1 0.3 0.8 0.6 0.7 0.8 altough this aspect will be discussed when looking at specific classes of policies. Tables 4c and 4d only show results involving policies of type permission. The average agreement between the system Prohibitions 0 7 7 2 4 5 8 22.5 1 0 0.5 0 N.A. 1 1 0.4 0 0.2 0.6 0.4 0.5 0.6 0.6 0.6 0.4 0.7 otals) T12+ T1+ T2+ 8 8 8 2 4 2 0 4 0 7 7 7 0 5 0 17 22.5 (f) (ratios) T1 T2 Tavg T12 T12+ T1+ T2+ 1 1 1 1 1 1 1 0 0.5 0.3 0.5 1 1 0.5 1 1 1 1 0 1 0 1 1 1 1 1 1 1 1 1 1 1 0 1 0 0.9 0.9 0.7 0.8 als) T12+ T1+ T2+ 3 3 3 0 4 4 1 1 1 1 1 1 1 4 1 6 11.5 (h) (ratios) T1 T2 Tavg T12 T12+ T1+ T2+ 1 1 1 1 1 1 1 0.3 0.7 0.5 0 N.A. 0.7 0.7 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0.3 1 0.3 0.8 0.6 0.7 0.8 be discussed when looking at specific nly show results involving policies of verage agreement between the system ng all the decisions is 0.6. Particularly, AVG 7 7 0 3.5 0 0 7 7 CLEAN 8 3 0 1.5 5 2 4 5 All 34 21 20 8 22.5 1 0 0.5 0 N.A. 1 1 0.4 0 0.2 0.6 0.4 0.5 0.6 0.6 0.6 0.4 0.7 (e) Prohibitions (totals) Journey D T1 T2 Tavg T12 T12+ T1+ T2+ SCRAPE 8 8 8 8 8 8 8 8 FOOD 4 0 2 1 2 2 4 2 CLOUD 4 4 4 4 4 0 4 0 AVG 7 7 7 7 7 7 7 7 CLEAN 5 5 5 5 5 0 5 0 All 28 25 26 17 22.5 (f) (ratios) T1 T2 Tavg T12 T12+ T1+ T2+ 1 1 1 1 1 1 1 0 0.5 0.3 0.5 1 1 0.5 1 1 1 1 0 1 0 1 1 1 1 1 1 1 1 1 1 1 0 1 0 0.9 0.9 0.7 0.8 (g) Duties (totals) Journey D T1 T2 Tavg T12 T12+ T1+ T2+ SCRAPE 3 3 3 3 3 3 3 3 FOOD 6 2 4 3 0 0 4 4 CLOUD 1 1 1 1 1 1 1 1 AVG 1 1 1 1 1 1 1 1 CLEAN 4 4 4 4 4 1 4 1 All 15 12 9 6 11.5 (h) (ratios) T1 T2 Tavg T12 T12+ T1+ T2+ 1 1 1 1 1 1 1 0.3 0.7 0.5 0 N.A. 0.7 0.7 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0.3 1 0.3 0.8 0.6 0.7 0.8 altough this aspect will be discussed when looking at specific classes of policies. Tables 4c and 4d only show results involving policies of type permission. The average agreement between the system and the users considering all the decisions is 0.6. Particularly, Duties ourneys: System decisions P ermissions P rohibitions Duties 4/5 8/8 3/3 0/12 4/4 4/6 0/2 4/4 1/1 0/7 7/7 1/1 0/8 5/5 4/4 4/34 28/28 13/15 sion (Q.1, Q.9). This feedback shows n is a di cult problem, although it ight knowledge models. Therefore, a k has good value for users. The last ant to understand whether the Data ally decide on policy propagation. It the users think he/she cannot solve /she should involve the data owner in this task. This conclusion reflects during the study, that are discussed Table 4: Agreement analysis. D: total number of decisions; T1, T2: ag tem and each team; Tavg: average agre and system; T12: agreement between t between teams (only Certainly Yes/A T1+, T2+: amount of Certainly Yes/Abso Tables on the left indicate totals, while show the related ratios. (a) All decisions (totals) Journey D T1 T2 Tavg T12 T12+ T1+ T2+ SCRAPE 16 15 13 14 14 11 11 14 FOOD 22 14 18 16 14 8 20 12 CLOUD 7 5 7 6 5 1 5 2 AVG 15 15 8 11.5 8 8 15 15 CLEAN 17 12 9 10.5 14 3 13 6 All 77 58 55 31 56.5 T1 0.9 0.6 0.7 1 0.7 (c) Permissions (totals) Journey D T1 T2 Tavg T12 T12+ T1+ T2+ SCRAPE 5 4 2 3 3 0 0 3 FOOD 12 12 12 12 12 6 12 6 CLOUD 2 0 2 1 0 0 0 1 AVG 7 7 0 3.5 0 0 7 7 CLEAN 8 3 0 1.5 5 2 4 5 All 34 21 20 8 22.5 T1 0.8 1 0 1 0.4 ons Duties 3/3 4/6 1/1 1/1 4/4 13/15 ack shows though it erefore, a The last the Data gation. It not solve ata owner on reflects discussed Table 4: Agreement analysis. D: total number of decisions; T1, T2: agreement between sys- tem and each team; Tavg: average agreement between teams and system; T12: agreement between teams; T12+ agreement between teams (only Certainly Yes/Absolutely No answers); T1+, T2+: amount of Certainly Yes/Absolutely No answers. Tables on the left indicate totals, while the ones on the right show the related ratios. (a) All decisions (totals) Journey D T1 T2 Tavg T12 T12+ T1+ T2+ SCRAPE 16 15 13 14 14 11 11 14 FOOD 22 14 18 16 14 8 20 12 CLOUD 7 5 7 6 5 1 5 2 AVG 15 15 8 11.5 8 8 15 15 CLEAN 17 12 9 10.5 14 3 13 6 All 77 58 55 31 56.5 (b) (ratios) T1 T2 Tavg T12 T12+ T1+ T2+ 0.9 0.8 0.9 0.9 0.8 0.7 0.9 0.6 0.8 0.7 0.6 0.6 0.9 0.5 0.7 1 0.9 0.7 0.2 0.7 0.3 1 0.5 0.8 0.5 1 1 1 0.7 0.5 0.6 0.8 0.2 0.8 0.4 0.8 0.7 0.6 0.7 (c) Permissions (totals) Journey D T1 T2 Tavg T12 T12+ T1+ T2+ SCRAPE 5 4 2 3 3 0 0 3 FOOD 12 12 12 12 12 6 12 6 CLOUD 2 0 2 1 0 0 0 1 AVG 7 7 0 3.5 0 0 7 7 CLEAN 8 3 0 1.5 5 2 4 5 All 34 21 20 8 22.5 (d) (ratios) T1 T2 Tavg T12 T12+ T1+ T2+ 0.8 0.4 0.6 0.6 0 0 0.6 1 1 1 1 0.5 1 0.5 0 1 0.5 0 N.A. 0 0.5 1 0 0.5 0 N.A. 1 1 0.4 0 0.2 0.6 0.4 0.5 0.6 0.6 0.6 0.4 0.7 All policies Feedback:  @enridaga  
  • 19.
    Thematic analysis • Expecteddisagreements: A. system should propagate a policy, it didn’t B. system should block a policy, it didn’t C. system cannot decide: information is not enough • (C) never occurred: the system has enough information to make a decision! Example FOOD journey: high disagreement between teams, due to the different interpretation of the output: (a) an enhanced version of the input (input included in the output) (b) an independent dataset (input not included in the output) (More discussion in the paper) 19
  • 20.
    Thematic analysis • Incompleteknowledge: rules missing or wrong, data flows inaccurate, … • Data reverse engineering: recurring theme (e.g. FOOD) • Content-dependent decisions: current experiments assumed reasoning at design time, although it is also possible to reason on execution traces • Dependant policies: permission:modify implies permission:use - but we haven’t considered that • The Legal Knowledge: users commented on the lack of legal expertise. – a support tool is useful – a legal framework on top of the components is theoretically possible, although unlikely in the short term … 20
  • 21.
    Conclusions • The systemis accurate as developers/ data managers can be • variance between the teams similar to the ones between each team and system (Cohen’s kappa coefficient) • The task is perceived as difficult, although not impossible, and the system is therefore of good value for users • The assumption behind the system is correct: there is a fundamental correspondence between the possible data-to-data relations and the way policies are propagated • (and reuse does not necessarily imply derivation) 21Feedback:  @enridaga  
  • 22.
    Future work • Howto consider the rights of the stakeholders other then data publishers (e.g. process designer?) • How to assess consistency of processes wrt policies • Policy and process knowledge must be accurate and reflect the relevant data-to-data relations. How can we assure that? Challenge: “They must do it together” • Need for shared knowledge about “data actions”, from jargon to consensus (remodelling, refactoring, extraction, …) • Negotiation of policies between data providers, processors and consumers - also a non-technical challenge 22
  • 23.