This document discusses Linked Open Data, including its principles, usage examples, and research challenges. It begins by defining open data and Linked Open Data, describing the four Linked Data principles of using URIs, HTTP URIs, providing useful information via standards like RDF and SPARQL, and including links between data. Examples are given of querying and combining Linked Data sets. Two research challenges are identified: dataset discovery to find relevant data based on natural language queries, and dataset visualization to identify appropriate visualizations for discovered data combinations. The document concludes by discussing OpenData.cz's role in advancing open data in the Czech Republic through assisting institutions, helping establish open data standards and legislation, and educating on open data practices.
Linked Open Data - Masaryk University in Brno 8.11.2016
1. Linked Open Data
Current State and Future
Trends
Martin Nečaský
Faculty of Mathematics and Physics
Charles University
1
2. Agenda
◦ What is Open Data
◦ Linked Open Data
◦ principles
◦ usage examples
◦ research challenges
◦ Open Data activities of OpenData.cz
◦ Our contribution to Czech legislation
2
3. Open Data Definition
Open data is data that can be freely used, re-used
and redistributed by anyone - subject only, at most,
to the requirement to attribute and sharealike.
3
4. 5 levels of Open Data
Zdroj : http://5stardata.info
4
5. Public Sector Open Data?
5
Environment Inspections
http://www.cizp.czhttp://www.potravinynapranyri.cz/
Food Inspections
Trade Inspections
http://www.coi.cz
National Statistics
http://www.czso.cz Geopolitical Regions
http://www.cuzk.cz
Business Registers
http://www.mfcr.cz
Code of Law
http://portal.gov.cz
★★★
http://data.nku.cz
Public Sector Inspections
★★★
★★★
★★★
6. What is Linked Data?
ID SUBJECT START
2012/33 Peněžní prostředky určené … 2012/11
2012/34 Účetní závěrka a finanční ... 2012/11
Check Actions
ENTITY ID DISTRICT ACTION
Ministry of Defence 60162694 Prague 2012/33
Social Security
Administration
6963 Prague 2012/34
Inspected entities
★★★ ★★★★★
?
6
Linked Open Data is a set of
(technological) principles of
publishing data on the Web.
7. ID SUBJECT START
2012/33 Peněžní prostředky určené … 2012/11
2012/34 Účetní závěrka a finanční ... 2012/11
Check Actions
ENTITY ID DISTRICT ACTION
Ministry of Defence 60162694 Prague 2012/33
Social Security
Administration
6963 Prague 2012/34
Inspected entities
1st Linked Data Principle
Use URIs as names for things.
http://data.nku.cz/action/2012/33
http://data.nku.cz/action/2012/34
http://data.nku.cz/entity/60162694
http://data.nku.cz/district/prague
http://data.nku.cz/entity/6963
7
8. 2nd Linked Data Principle
Use HTTP URIs so that people can look up those names.
http://data.nku.cz/action/2012/33
WWW
HTTP GET "http://data.nku.cz/action/2012/33
8
9. 3rd Linked Data Principle
When someone looks up a URI, provide useful information, using the
W3C standards (RDF, SPARQL).
ID SUBJECT START
2012/33 Peněžní prostředky určené … 2012/11
Check Actions
<http://data.nku.cz/action/2012/33> id "2012/33" .
<http://data.nku.cz/action/2012/33> subject "Peněžní prostředky určené …" .
<http://data.nku.cz/action/2012/33> start "2012/11" .
RDF expression (Turtle)
http://data.nku.cz/action/2012/33
"Peněžní prostředky určené …"
start
"2012/33"
id
"2012/11"
subject
9
subject predicate object
10. 3rd Linked Data Principle
When someone looks up a URI, provide useful information, using the
standards (RDF, SPARQL).
NKÚ
RDF store
HTTP GET
"http://data.nku.cz/...
HTTP SERVER
SPARQL API
(SPARQL endpoint)
HTTP GET
SPARQL query
10
11. SPARQL crash course
◦ Similar to SQL.
◦ Query expressed as a graph pattern.
SELECT <result specification>
WHERE <graph pattern>
11
15. SPARQL crash course
◦ Query may return a graph as well
CONSTRUCT {
?x ?z ?y .
} WHERE {
?x start "2012/11" ;
?z ?y .
}
15
?x
start
"2012/11"
?y
?z
16. 4th Linked Data Principle
Include links to other URIs so that others can discover more things.
<http://data.nku.cz/action/2012/33>
id "2012/33" ;
subject "Peněžní prostředky určené …" ;
start "2012/11" ;
entity <http://data.nku.cz/entity/60162694> .
<http://data.nku.cz/entity/60162694>
title “Ministry of Defense" ;
district <http://data.nku.cz/district/prague> .
<http://data.nku.cz/district/prague>
title "Prague".
http://data.nku.cz/action/2012/33
"Peněžní prostředky určené …"
start
"2012/33"
id
"2012/11"
subject
http://data.nku.cz/entity/60162694
entity
http://data.nku.cz/district/prague
district
“Ministry of Defense"
"Prague"
16
17. district
4th Linked Data Principle
Include links to other URIs so that others can discover more things
(including URIs of other publishers).
http://data.nku.cz/action/2012/33
http://data.nku.cz/entity/60162694
http://data.nku.cz/district/prague
http://data.mfcr.cz/ares/entity/60162694
entity same as
http://data.cuzk.cz/ruian/district/3100
district
17
18. 4th Linked Data Principle
Include links to other URIs so that others can discover more things
(including URIs of other publishers).
Public
Sector
Inspection
Business
Entities
Geopolitical
Regions
Trade
Inspection
Gov Off
Science and
Research IS
Soc Sec
Statistics
Nat Stats
Demography
18
19. Vocabularies
<http://data.nku.cz/action/2012/33>
id "2012/33" ;
subject "Peněžní prostředky určené …" ;
start "2012/11" ;
entity <http://data.nku.cz/entity/60162694>
.
<http://data.nku.cz/action/2012/33>
a schema:CheckAction, nku:CheckAction ;
adms:identifier "2012/33" ;
schema:object "Peněžní prostředky určené …" ;
schema:startDate "2012/11" ;
nku:entity <http://data.nku.cz/entity/60162694> .
• things classified to classes
• LD principles applied also to properties and
classes
• classes and properties defined in shared
vocabularies (sometimes, incorrectly, called
ontologies)
• Dublin Core Vocabulary
• Schema.org
• Data Cube Vocabulary
• http://lov.okfn.org
schema:object
=
<http://schema.org/object>
nku:CheckAction
=
<http://data.nku.cz/vocabulary/CheckAction>
19
same as
=
owl:sameAs
20. Vocabularies
◦ Vocabulary reuse principle
◦ But - own vocabulary is sometimes necessary
◦ When there is no appropriate vocabulary.
◦ When specific metadata about a class or property need to be provided.
◦ A need to properly define semantics of new classes and properties:
◦ Semantic web approach : Machine-readable semantics definition
◦ Linked Data approach : Simple semantic links to shared vocabularies
◦ RDF Schema : subClassOf, subPropertyOf
◦ OWL (Web Ontology Language) : equivalentClass, equivalentProperty
nku:CheckAction a owl:Class ;
rdfs:label "Kontrolní akce NKÚ"@cs ,
"Check action of Supreme Audit Office of Czech Republic"@en ;
rdfs:subClassOf schema:CheckAction .
20
22. Searching Datasets
Where can I get some data about entities inspected by
Supreme Audit Office (SAO)?
22
Entity Organizace
SAO linked.opendata.cz
…
?
owl:sameAs
owl:sameAs
owl:sameAs
SPARQL : https://drive.google.com/open?id=0BwP-TfUUfcFTR0VYd3ZJaTJub3c (try on http://linked.opendata.cz/sparql endpoint)
23. Searching Datasets
23
Public Agreements Registry - Agreements 61961
Registr Agreements Registry - Orders 27726
Database of Science, Research and Innovations 14286
Offices of Public Authorities 763
Public Sector Inspections 6254
Agendas of Public Institutions 12112
Identification Numbers of Business Entities 60520
Trade Register 167376
Monitor of Public Budgets 104522
Registr Agreements Registry - Payments 5516
Trade Inspections 2576
Integrated Registry of Environmental Pollution 6658
Public Authorities 60007
Business Register 94881
24. Combining datasets
Which public research institutions were inspected by SAO
and what is their public research budget?
24
Entity
Entity
SAO
linked.opendata.cz
ResearchOrg
Science and Research DB
Participant
owl:sameAs
owl:sameAs
Project
Budget
Premise
CheckAction
SPARQL : https://drive.google.com/open?id=0BwP-TfUUfcFTS2VjV1puakdIeG8 (try on http://linked.opendata.cz/sparql endpoint)
RESULT IN CSV : https://drive.google.com/open?id=0BwP-TfUUfcFTN2NzVlk4Zk1ncGM
25. Combining datasets
Sanctions for unfair trade practices in Czech regions and
numbers of pensioners.
25
Trade Inspections
Inspection
RAMON EU
NUTS
Sanction
Geopolitical regions
Region
Social Security
# pensioners
Region
owl:sameAs
owl:sameAs
owl:sameAs
SPARQL : https://drive.google.com/open?id=0BwP-TfUUfcFTQzBGZzdwYzFuTUE (try on http://linked.opendata.cz/sparql
endpoint, note : this federated query also asks http://ruian.linked.opendata.cz/sparql and http://data.cssz.cz/sparql)
RESULT IN CSV : https://drive.google.com/open?id=0BwP-TfUUfcFTaEVxNF84NlUwTTg
26. Building Applications
◦ http://lekovaencyklopedie.cz
26
Each oval is a data source which exists (MeSH, NDF-RT, NCI,
DrugBank) as LOD or we have converted it to LOD.
Links represent types of RDF links between datasets.
LOD made us much faster in the development.
RDF data updated periodically thanks to http://etl.linkedpipes.com
28. Knowledge Graphs as LOD
◦ DBPedia
◦ Wikipedia as LOD
◦ http://dbpedia.org/sparql
◦ 402,086,316 triples about 17,315,785 entities
◦ Wikidata
◦ Emerging project of Wikimedia Foundation
◦ Structured data source for Wikipedia
◦ https://query.wikidata.org
◦ 1,373,105,652 triples about 24,437,040 entities
28
29. Two research challenges for near future
“A data journalist writes an article about unfair trade
practices on elderly people in Czech Republic.
He needs to find datasets with an evidence for his article
(unfair trade inspections, elderly people numbers, regions in
Czech Republic).
He also needs to preview the discovered datasets, create
map visualizations and embed them to his article.”
◦ Challenge 1: Dataset discovery
◦ Challenge 2: Dataset visualization
29
30. Dataset discovery
◦ Input : User’s intent
◦ How the intent should be expressed?
◦ How we can assist the user when expressing the intent?
◦ How the expression of the intent should be translated to a formal
query language?
◦ Output : Combinations of datasets which fulfill the intent
◦ How datasets should be indexed?
◦ How the indexes should be kept up-to-date?
◦ How the user’s intent should be evaluated against the index?
30
31. Dataset discovery
31
„Datasets with demographical observations located in cities of Czech
Republic.“
?x
a
w3c.org/cube/
Observation
?y
w3c.org/cube/
DataSet
a
dbpedia.org/resource/
Demography
a
cuzk.cz/Okres
refArea
partOf
topic
?
Intent in a natural language
Intent in a formal language
32. Dataset discovery
32
◦ We cannot simply ask a database with a query expression.
• We don’t know where to send the
query.
• We cannot keep the copy of all data
locally.
• We cannot expect 100% match of the
intent with the structure of real data.
?x
a
w3c.org/cube/
Observation
?y
w3c.org/cube/
DataSet
a
dbpedia.org/resource/
Demography
a
cuzk.cz/Okres
refArea
partOf
topic
34. Dataset visualization
◦ Input : Discovered combination of datasets
◦ Output : Possible visualizations of the datasets
◦ How appropriate visualization should be identified for a given
combination of datasets?
34
?x
refArea
a
w3c.org/cube/
Observation
?y
partOf
w3c.org/cube/
DataSet
a
topic
dbpedia.org/resource/
Demography
a
cuzk.cz/Okres
36. Back to Open Data
◦ OpenData.cz – a group of academicians supporting and
boosting (Linked) Open Data in Czech public sector
◦ We have assisted several public institutions with opening
their data
◦ http://data.ctu.cz
◦ http://data.nku.cz
◦ http://data.cssz.cz
◦ http://data.gov.cz
◦ Cooperation with ČSÚ, ČOI, MF ČR, MV ČR
36
37. Back to Open Data
◦ Under Ministry of Interior of Czech Republic, we have
helped with making Open Data as one of the major
eGovernment topics
◦ position of National Coordinator for Open Data
◦ National Open Data Catalogue (http://data.gov.cz)
◦ Standards for open data publication and cataloging
(http://opendata.gov.cz)
◦ Open Data in Czech legislation
◦ Educating public institutions
◦ Plan for National Linked Open Data Infrastructure
37
38. Our Journey to Czech Open Data Legislation
◦ October 2014 : Open Data must be part of Czech legislation
◦ Public bodies did not want to or could not open their data without
legislation.
◦ October 2016 : The Czech president signed our amendment of
Public Sector Information Act (106/1999) introducing Open Data
◦ Only data published according to given conditions can be called Open
Data.
◦ Ministry of Interior must provide National Open Data Catalogue
◦ Czech Government will instruct ministries and national authorities to
mandatorily publish given datasets as Open Data.
◦ Defending our position
◦ Ministry of Interior (Oct 2014 – Aug 2015)
◦ Office of the Government (Sep 2015 – Mar 2016)
◦ Parliament (Apr 2016 – Aug 2016)
38
39. How you can help as students / teachers?
◦ Develop applications which use open data.
◦ bachelor or diploma theses, student software projects
◦ If you need some data, ask for them.
◦ You can ask OpenData.cz and we will try to help.
39