Central to a number of emerging Smart Cities are online platforms for data sharing and reuse: Data Hubs and Data Catalogues. These systems support the use of data by developers through enabling data discoverability and access. As such, the effectiveness of a Data Catalogue can be seen as the way in which it supports ‘data exploitability’: the ability to assess whether the provided data is appropriate to the given task. Beyond technical compatibility, this also regards validating the policies attached to data. Here, we present a methodology to enable Smart City Data Hubs to better address exploitability by considering the way policies propagate across the data flows applied in the system.
1. Addressing exploitability of Smart City data
1
Enrico Daga,
Mathieu d’Aquin,
Alessandro Adamou,
Enrico Motta
Data
Science
Group
Knowledge
Media
Ins8tute,
The
Open
University
Milton
Keynes
(UK)
Feedback:
@enridaga
@datasciencegr
#kmiou
September
13th,
2016
-‐
Trento
(Italy)
IEEE
Interna)onal
Smart
Ci)es
Conference
(ISC2)
hNp://events.unitn.it/en/isc2-‐2016
2. 2
Smart
Bins
to
make
garbage
collec2on
more
efficient
Monitor
parking
spaces
to
support
ci2zens’
mobility
Observe
busyness
of
places
to
be=er
tune
services
Forecast
car
accidents
to
improve
drivers’
awareness
MK:Smart
is
an
integrated
innova8on
and
support
programme
leveraging
large-‐scale
city
data
to
drive
growth
in
Milton
Keynes
(UK)
[1].
Smart City data
hNps://datahub.mksmart.org
Delivery
Onboarding
Processing
Acquisi8on
Data
Hub
It is a loop!
Feedback:
@enridaga
@datasciencegr
#kmiou
3. Top MK!
3
Top
MK
is
a
virtual
card
playing
game
where
each
card
represents
a
ward
in
Milton
Keynes,
with
characteris8cs
such
as
area,
popula)on,
level
of
qualifica)ons,
etc.
Two
players,
one
human
and
the
other
automa8c,
try
to
win
the
other’s
cards
by
choosing
the
characteris8c
that
has
the
best
chance
to
win
against
the
other
card.
hNps://data.beta.mksmart.org/apps/topmk/
Feedback:
@enridaga
@datasciencegr
#kmiou
4. The problem of exploitability
• Data come from different owners and have different licenses.
• Data are processed into new data before being reused.
• What are the policies that apply to the output data?
• Can we make use of it in a commercial setting?
4
Could Top Trumps
sell this game?
Feedback:
@enridaga
@datasciencegr
#kmiou
"Data exploitability" is the assessment of the policies associated with the data
resulting from the computation of diverse datasets in complex data flows.
5. Under the hood - 1/5
The
En)ty-‐Centric
API
(ECApi)
offers
an
en8ty
based
access
point
to
the
informa8on
offered
by
the
Data
Hub
[2].
5
hNps://data.mksmart.org/en8ty/ward/newport_pagnell_north
{
"global:religion": [{
"global:sikh": ["16"],
"global:no_religion": [“2323”], ...
}],
"global:maritalStatus": [{
"global:in_a_registered_same-sex_civil_partnership": ["11"],
"global:married": ["3290"], ...
}],
"global:economicActivity": [{
"global:unemployed:_never_worked": ["15"],
"global:unemployed:_age_50_to_74": ["33"],
"global:in_employment": ["3785"],
"global:unemployed:_age_16_to_24": ["48"],
"global:long-term_unemployed": ["49"], ...
}],
"global:percentInBasicSkills": [{
"global:literacy_level_1": ["47.41344196"],
"global:literacy_level_2": ["46.23217923"],
"global:numeracy_level_1_2.5percentci": ["18.13034623"],
"global:numeracy_level_1": ["32.38289206"], ...
}],
"global:peopleInAgeGroups": [{
"global:age_85_to_89": ["152"],
"global:age_20_to_24": ["393"], ...
}],
"global:qualifications": [{
"global:full-time_students:_age_18_to_74:_economically_inactive": ["61"],
"global:highest_level_of_qualification:_level_4_qualifications_and_above": ["1413"],
"global:highest_level_of_qualification:_level_1_qualifications": ["1042"],
"global:highest_level_of_qualification:_level_3_qualifications": ["794"],
"global:highest_level_of_qualification:_level_2_qualifications": ["1050"],
"global:full-time_students:_age_18_to_74:_economically_active:_unemployed": ["17"],
"global:highest_level_of_qualification:_apprenticeship": ["327"],
"global:highest_level_of_qualification:_other_qualifications": ["271"],
"global:full-time_students:_age_18_to_74:_economically_active:_in_employment": ["84"],
"global:no_qualifications": ["1167"],
"global:schoolchildren_and_full-time_students:_age_18_and_over": ["163"],
"global:schoolchildren_and_full-time_students:_age_16_to_17": ["165"],
"global:all_usual_residents_aged_16_and_over": ["6064"]
}]
(Some logic here)
Entity-Centric API (ECApi)
6. 6
The
data
hub
offers
a
provenance
access
point
including
the
metadata
of
the
datasets,
including
ownership
and
licenses.
{
"dataset": "urn:census/ks501-qualification",
"description": {
"global:owner": ["Milton Keynes Council"],
"global:title": ["Census 2011 - Qualifications in Milton Keynes' wards"],
"global:uuid": ["3f6c6107-835c-45ee-b8b4-83c2099b4084"],
"global:issued": ["2015-10-12 19:18:36"],
"global:distribution": ["http://data.mksmart.org/entity/thing/www:uri/
datahub.mksmart.org/ns/distribution/3527333636"],
"global:modified": ["2016-09-06 12:03:14"],
"global:type": ["http://data.mksmart.org/entity/thing/www:uri/www.w3.org/ns/
dcat#Dataset"],
"global:format": ["CSV"],
"global:landingPage": ["http://data.mksmart.org/entity/thing/www:uri/https://
datahub.mksmart.org/dataset/census-2011-qualifications-in-milton-keynes-wards/"],
"global:homepage": ["https://datahub.mksmart.org/dataset/census-2011-qualifications-
in-milton-keynes-wards/"],
"global:name": ["census-2011-qualifications-in-milton-keynes-wards"],
"global:attribution": [""],
"global:policy": ["http://data.mksmart.org/entity/
policy/open-government-license"],
"@id": "urn:census/ks501-qualification",
"global:api": ["https://datahub.mksmart.org/data-catalogue-api/?
action=dataset&name=census-2011-qualifications-in-milton-keynes-wards"]
},
"attributes": [
"global:qualifications/global:all_usual_residents_aged_16_and_over",
"global:qualifications/global:full-
time_students:_age_18_to_74:_economically_active:_in_employment",
"global:qualifications/global:full-
time_students:_age_18_to_74:_economically_active:_unemployed",
"global:qualifications/global:full-
time_students:_age_18_to_74:_economically_inactive", …
]
},
hNps://data.mksmart.org/en8ty/ward/newport_pagnell_north.prov
“global:qualifications” attributes
come from the "Census 2011 -
Qualifications in Milton Keynes' wards”
dataset, distributed under the Open
Government License.
Under the hood - 2/5
Provenance
7. 7
{
"global:type": ["http://data.mksmart.org/entity/thing/www:uri/datahub.mksmart.org/ns/
schema/RedistributionPolicy"],
"global:landingPage": [
"http://data.mksmart.org/entity/thing/www:uri/https://datahub.mksmart.org/policy/
open-government-license/",
"http://data.mksmart.org/entity/thing/www:uri/https://datahub.beta.mksmart.org/
policy/open-government-license/"
],
"global:description": [""],
"global:title": ["Open Government License"],
"global:homepage": [
"https://datahub.beta.mksmart.org/policy/open-government-license/",
"https://datahub.mksmart.org/policy/open-government-license/"
],
"global:name": ["open-government-license"],
"global:api": [
"https://datahub.mksmart.org/data-catalogue-api/?action=policy&id=open-government-
license",
"https://datahub.beta.mksmart.org/data-catalogue-api/?action=policy&id=open-
government-license"
],
"global:permission": [
"http://data.mksmart.org/entity/thing/www:uri/permission:publish-1441",
"http://data.mksmart.org/entity/thing/www:uri/permission:redistribute-1441",
"http://data.mksmart.org/entity/thing/www:uri/permission:use-1441",
"http://data.mksmart.org/entity/thing/www:uri/permission:copy-1441",
"http://data.mksmart.org/entity/thing/www:uri/permission:reproduce-1441",
"http://data.mksmart.org/entity/thing/www:uri/permission:combine-1441",
"http://data.mksmart.org/entity/thing/www:uri/
permission:commercialize-1441",
"http://data.mksmart.org/entity/thing/www:uri/permission:adapt-1441",
"http://data.mksmart.org/entity/thing/www:uri/permission:transmit-1441",
"http://data.mksmart.org/entity/thing/www:uri/permission:extract-1441",
"http://data.mksmart.org/entity/thing/www:uri/permission:derive-1441"
]
}
hNp://data.mksmart.org/en8ty/policy/open-‐government-‐license
Licenses
are
described
as
machine
readable
policies:
permissions,
prohibi8ons
or
du8es
[3].
Good news, this is OGL, it can be used
in commercial applications.
Under the hood - 3/5
License
8. 8
Under the hood - 4/5
Data flow
Data
flows
can
be
represented
with
the
Datanode
ontology
[4]
as
graphs
of
data
“nodes”.
(The logic here) http://purl.org/datanode/ns/
http://purl.org/datanode/docs/
This is the semantics behind the code!
9. 9
Under the hood - 5/5
Reasoning on Policy Propagation
Machine
readable
policies
and
data
flows
allow
us
to
reason
on
policy
propaga8on
exploi8ng
Policy
Propaga)on
Rules
(PPR)
[5].
hNps://github.com/enridaga/pprreasoner/
These are the policies of the
output data!
has(output, duty:attribution)
has(output, permission:commercialise)
has(X,P) ⋀ propagates(P,R) ⋀
relation(R,X,Y) → has(Y,P)
propagates(permission:commercialise,processed into)
has(dataset1,permission:commercialise)
has(dataset1,duty:attribution)
relation(node23,node16,processed into)
Provenance and License
Data flow
Policy Propagation Rule
Propagated policies
Rule engine
10. Yes.
(but they must include attribution statements)
10
The problem of exploitability (reprise)
Could Top Trumps
sell this game?
How can we make it work at scale?
• Represent diversity of datasets, licenses and data flows
• Support developers in the assessment of policies associated with the
data and how they affect their data flows
11. 11
Data cataloguing as the backbone of data governance.
Follow the journey of the data and trace the semantics, respecting the
diversity datasets, licenses and data flows.
Metadata Supply Chain - 1/2
Approach
Delivery
Processing
Record
Content
Data
flow
Provenance
(Meta)data
Catalogue
Acquisi)on
Onboarding
Onboarding
Setup
a
catalogue
record
of
the
data
source
Acquisi)on
Extract
content
metadata
(8meliness,
validity,
…)
Processing
Describe
the
Data
flow
Reason
on
policy
propaga8on
Delivery
Provide
provenance
informa8on
Feedback:
@enridaga
@datasciencegr
#kmiou
12. 12
•Data
provider
specifies
a
single
License
•Same
License
for
any
user
•License
is
described
in
the
catalogue
•License
policies
are
referenced
by
Policy
Propaga8on
Rules
•Data
source
is
accessible
•Acquisi8on
processes
respect
the
data
source
License
•Data
flows
can
be
described
with
Datanode
•ETL
pipelines
do
not
violate
the
policies
•Process
execu)ons
do
not
influence
policies
propaga)on
•Data
flow
descrip8ons
and
License
policies
enable
reasoning
on
policy
propaga8on
•End-‐user
access
methods
provides
provenance
informa8on
Evaluation (can we really do that?)
An end-to-end solution for exploitability assessment can be implemented.
Metadata Supply Chain - 2/2
Considering
a
given
set
of
assump8ons
(details
in
the
paper…):
13. Lessons learnt
13
• Assessing exploitability of smart city data is possible following a holistic
approach to data cataloguing:
• understanding the semantics of data flows;
• understanding the role of policies (licences).
• New open challenges:
• Handle the diversity of policies and consequently the size of Policy
Propagation Rules [3].
• Support Data providers in the selection of the right license [6].
• Support developers in the definition of data flows [7].
• Integrate validation of propagated policies [8].
• Integrate validation of data flows with respect to policies.
• Reasoning with process execution traces (not only at design time).
• We need an end-user evaluation “in the wild”.
15. References
[1] M. d’Aquin, J. Davies, and E. Motta. Smart cities’ data: Challenges and opportunities for semantic technologies.
Internet Computing, IEEE, 19(6):66–70, 2015.
[2] A. Adamou and M. d’Aquin. On requirements for federated data integration as a compilation process. In
Proceedings of 2nd International Workshop on Dataset PROFIling and fEderated Search for Linked Data (PRO-
FILES)., pages 75–80, 2015.
[3] Open Digital Rights Language (ODRL) Version 2.1 https://www.w3.org/ns/odrl/2/ODRL21 (accessed 09/09/2016)
[4] E. Daga, M. d’Aquin, A. Gangemi, and E. Motta. Describing semantic web applications through relations between
data nodes. Technical Report kmi-14-05, Knowl- edge Media Institute, The Open University, Walton Hall, Milton
Keynes, 2014.
[5] E. Daga, M. d’Aquin, A. Gangemi, and E. Motta. Propagation of policies in rich data flows. In Proceedings of the
8th International Conference on Knowledge Capture, page 5. ACM, 2015.
[6] Daga, Enrico ; d'Aquin, Mathieu ; Motta, Enrico and Gangemi, Aldo (2015). A Bottom-Up Approach for Licences
Classification and Selection. In: 2015 Workshop on Legal Domain And Semantic Web Applications (LeDA-SWAn
2015), 1 June 2015, Portoroz, Slovenia.
[7] E. Daga, M. d.Aquin, A. Gangemi and E. Motta: An incremental learning method to support the annotation of
workflows with data-to-data relations. 20th International Conference on Knowledge Engineering and Knowledge
Management. Bologna, Italy, 19-23 November 2016 - ACCEPTED
[8] H.-P. Lam and G. Governatori. The Making of SPINdle. In A. Paschke, G. Governatori, and J. Hall, editors, Proc.
RuleML’09, pp. 315–322. Springer-Verlag, 2009
15