1. Linked (Open) Data
But what does it buy me?
Rinke Hoekstra
VU University Amsterdam/University of Amsterdam
rinke.hoekstra@vu.nl
Linked (Open) Data - But what does it buy me? by Rinke Hoekstra
Licensed under a Creative Commons Attribution-ShareAlike 3.0 Unported License.
maandag 11 maart 13
7. Linked Open Data
Texts taken from http://5stardata.info
maandag 11 maart 13
8. Why people go “Meh”
• Data needs to be converted to RDF
• Data needs to be published on the Web
• An open license is required even for a single ★
Pacific Barreleye, http://imgur.com/gallery/Mzyb5
(can rotate its eyes forwards or upwards to look through the transparent head to prey above)
maandag 11 maart 13
9. Why people go “Meh”
What if people draw incorrect
conclusions from my data?
• Data needs to be converted to RDF
• Data needs to be published on the Web
• An open license is required even for a single ★
Pacific Barreleye, http://imgur.com/gallery/Mzyb5
(can rotate its eyes forwards or upwards to look through the transparent head to prey above)
maandag 11 maart 13
10. Why people go “Meh”
What if if people draw incorrect
What journalists draw incorrect
conclusions from my data?
• Data needs to be converted to RDF
• Data needs to be published on the Web
• An open license is required even for a single ★
Pacific Barreleye, http://imgur.com/gallery/Mzyb5
(can rotate its eyes forwards or upwards to look through the transparent head to prey above)
maandag 11 maart 13
11. Why people go “Meh”
What if if people draw incorrect
What journalists draw incorrect
conclusions from my data?
• Data needs to be converted to RDF
• Data needs to be published on the Web
• An open license is required even for a single ★
What if combining data results in
privacy infringement?
Pacific Barreleye, http://imgur.com/gallery/Mzyb5
(can rotate its eyes forwards or upwards to look through the transparent head to prey above)
maandag 11 maart 13
12. ... but LOD is just asking for more!
maandag 11 maart 13
13. ... how can I sell this internally?
maandag 11 maart 13
16. Repeatable Transformation
The missing ★ Choose your Grain Size
Linked Data
Six Ingredients
Contextualize!
Mix ‘n Mash
Lower the Threshold
maandag 11 maart 13
19. 1 The missing ★
Version information Guessable
http://give.everything/a/URI
Version agnostic
HTTPs URIs only please!
(or resolver + URN)
maandag 11 maart 13
20. Messy Data
http://wetten.overheid.nl/BWBIdService/BWBIdList.xml.zip
NB: The problem with the XML processing instruction was reported and fixed, but returned some weeks later
maandag 11 maart 13
21. Example: Juriconnect
1.0:c:BWBR0005416&artikel=6
vs
http://wetten.overheid.nl/cgi-bin/deeplink/law1/bwbid=BWBR0005416/article=6/date=2005-01-14
vs
http://wetten.overheid.nl/BWBR0005416/TitelII698946/HoofdstukII/Artikel16/geldigheidsdatum_14-01-2005
• Existing identification standard: Juriconnect
• URN-like... but no naming server
cf. Document Object Identifiers
• Named elements do not carry identifier
• No explicit version information, only contextual
maandag 11 maart 13
22. Levels of Identification
Bibliographic
Work
Entity
•
realizes
IFLA FRBR levels Expression
•
embodies
Work Manifestation
• Expression Item
exemplifies
• Manifestation
XML version of
regulation on
XML version of Version of
Regulation
regulation regulation
my harddisk
maandag 11 maart 13
23. Transparent = Guessable
• Hierarchical information (work)
http://doc.metalex.eu/id/BWBR0011823/hoofdstuk/1/artikel/1
http://doc.metalex.eu/id/BWBR0011823/artikel/1
• Version and language (expression)
http://doc.metalex.eu/id/BWBR0011823/hoofdstuk/1/artikel/1/nl/2010-09-01
• Format information (manifestation)
http://doc.metalex.eu/doc/BWBR0011823/hoofdstuk/1/artikel/1/nl/2010-09-01/data.xml
maandag 11 maart 13
24. Versioning Issues
• URIs don’t carry semantics...
• Detect changes:
• which element versions are the same
• ... and which versions are different?
Art. 44, lid 4
(2011-03-26)
Art. 44, lid 4
(2011-04-05)
From: Besluit prudentiële regels Wft, BWBR0020420
maandag 11 maart 13
25. Opaque Identifiers
http://doc.metalex.eu/BWBR0011823/hoofdstuk/1/artikel/34b0cee26ee5138c74aa2c62caf2c117d3c616e9
vermogen van de erflater
dcterms:subject
SW SW
Hoofdstuk I, Artikel 10 Hoofdstuk I, Artikel 10
2011-01-01 2011-10-12
owl:sameAs
SHA1
8738ef273ea4dbc73
• Content information
• Unique SHA1 Hash of text
maandag 11 maart 13
26. Opaque Identifiers
http://doc.metalex.eu/BWBR0011823/hoofdstuk/1/artikel/34b0cee26ee5138c74aa2c62caf2c117d3c616e9
vermogen van de erflater
dcterms:subject
SW SW
Hoofdstuk I, Artikel 10 Hoofdstuk I, Artikel 10
2011-01-01 2011-10-12
owl:sameAs owl:sameAs
SHA1
8738ef273ea4dbc73
• Content information
• Unique SHA1 Hash of text
maandag 11 maart 13
27. Opaque Identifiers
http://doc.metalex.eu/BWBR0011823/hoofdstuk/1/artikel/34b0cee26ee5138c74aa2c62caf2c117d3c616e9
vermogen van de erflater
dcterms:subject dcterms:subject
SW SW
Hoofdstuk I, Artikel 10 owl:sameAs Hoofdstuk I, Artikel 10
2011-01-01 2011-10-12
owl:sameAs owl:sameAs
SHA1
8738ef273ea4dbc73
• Content information
• Unique SHA1 Hash of text
maandag 11 maart 13
28. Opaque Identifiers
http://doc.metalex.eu/BWBR0011823/hoofdstuk/1/artikel/34b0cee26ee5138c74aa2c62caf2c117d3c616e9
vermogen van de erflater
dcterms:subject
SW SW
Hoofdstuk I, Artikel 10 Hoofdstuk I, Artikel 10
2011-01-01 2011-10-12
owl:sameAs owl:sameAs
SHA1 SHA1
8738ef273ea4dbc73 a433f53273c78a56f2
• Content information
• Unique SHA1 Hash of text
maandag 11 maart 13
30. 2
Repeatable Transformation
Transformation should be part of routine ...
... manageable and scalable ...
... repeatable ...
http://www.w3.org/TR/prov-overview/
maandag 11 maart 13
31. 2
Repeatable Transformation
Linked Data will not be the official source anytime soon
Provenance is key
Transformation should be part of routine ...
... manageable and scalable ...
... repeatable ...
http://www.w3.org/TR/prov-overview/
maandag 11 maart 13
35. 40.745.554.078 Triples!
(1.6 Billion)
(I tried to check the latest figures, but http://stats.lod2.eu was down)
maandag 11 maart 13
36. 3
Choose your Grain Size
• The document is the
traditional grain size
(dublin core)
• Linked data allows for
deep links into data
• Cost versus usefulness
• Are you the right party to provide detailed descriptions?
http://creatingandeducating.blogspot.nl/2011/11/blog-post.html
maandag 11 maart 13
37. Report Card Categories
Report Card Cate
RDF Report Card
Low Detail High Detail
Structure
Metadata
Scope Internals
RDF Report Card by Leigh Dodds, talk at Semtech Biz London, 2011, http://slideshare.net/ldodds
maandag 11 maart 13
42. Example: Provenance
The date at which the expression was created
"2009-10-23"^^xsd:date time:Instant ml:Date sem:Time
rdf:value
sem:hasTimeStamp rdf:type
rdf:type sem:timeType
time:inXSDDateTime rdf:type
opmv:Process http://doc.metalex.eu/id/date/2009-10-23 sem:Event ml:LegislativeModification
sem:hasTime rdf:type
rdf:type time:hasEnd rdf:type
ml:date sem:eventType The creation event of the regulation
http://doc.metalex.eu/id/process/BWBR0017869/2009-10-23 http://doc.metalex.eu/id/event/BWBR0017869/2009-10-23 opmv:Artifact
opmv:wasGeneratedAt
The process that generated the expression ml:resultOf
rdf:type ml:BibliographicExpression
opmv:wasGeneratedBy
rdf:type
http://doc.metalex.eu/id/BWBR0017869/2009-10-23
The expression (version) URI of a regulation
maandag 11 maart 13
43. 5
Contextualize!
• Information is not always compatible
• Make explicit in which context the information holds ...
• ... and who stated the information, why and how.
Flat Earth and Square Earth idea courtesy of Szymon Klarman
maandag 11 maart 13
44. <http://example.com/workbook1/sheet1> <http://example.com/workbook1/sheet1/corrected> provo:Activity
rdf:type
:curation20120126
"1"^^xsd:int "11"^^xsd:int
provo:wasGeneratedBy provo:hadAgent
provo:startedAt
d2s:populationSize d2s:populationSize provo:endedAt
"1889"^^xsd:int :RinkeHoekstra
d2s:censusYear
_:x
d2s:birthYears
:1875--1874 _:b _:a
d2s:gemeente
d2s:dimension d2s:ageGroup
time:inXSDDateTime time:inXSDDateTime
:Assendelft
:14--15_1875--1874 :14-15
"20120126T09:00:00" "20120126T08:30:00"
• Namespaces don’t mean anything
• Use named graphs to compartmentalize metadata
• Add provenance information about groups of statements
maandag 11 maart 13
45. Compliance
Regulation A Art 12 Art 14, lid 3, 2e volzin
maandag 11 maart 13
46. Compliance
start
State Name
entry/action
do/activity action
State
exit/action
event/action(arguments)
end
Regulation A Art 12 Art 14, lid 3, 2e volzin
maandag 11 maart 13
47. Compliance
start
State Name
entry/action
do/activity action
State
exit/action
event/action(arguments)
end
Regulation A Art 12 Art 14, lid 3, 2e volzin
maandag 11 maart 13
48. Compliance
start
State Name
entry/action
do/activity action
State
exit/action
event/action(arguments)
end
Regulation A Art 12 Art 14, lid 3, 2e volzin
maandag 11 maart 13
49. Compliance
start
State Name
entry/action
do/activity action
State
exit/action
event/action(arguments)
end
Regulation A Art 12 Art 14, lid 3, 2e volzin
maandag 11 maart 13
50. Compliance
start
State Name
entry/action
do/activity action
State
exit/action
event/action(arguments)
end
Regulation A Art 12 Art 14, lid 3, 2e volzin Art 14, lid 3, 2e volzin
maandag 11 maart 13
51. Compliance
start
State Name
entry/action
do/activity action
State
exit/action
event/action(arguments)
end
Regulation A Art 12 Art 14, lid 3, 2e volzin Art 14, lid 3, 2e volzin
(01-01-2011) (04-02-2011) (11-06-2008) (01-07-2011)
maandag 11 maart 13
52. Contextual Annotation
vermogen van de erflater
Successiewet
dcterms:subject Successiewet
vermogen van de erflater
SW Hoofdstuk I SW
dcterms:subject
vermogen van de erflater Hoofdstuk I
SW Artikel 10 SW
dcterms:subject
vermogen van de erflater Hoofdstuk I, Artikel 10
SW
SW Art. 10, zin 1 Hoofdstuk I, Artikel 10
dcterms:subject
vermogen van de erflater Zin 1
No nice background because Google Image search only returned boring images
maandag 11 maart 13
53. 6
Lower the Threshold
• Integrate Linked Data production into everyday tools
• Allow tools to do the work for you
• Use a built-in reward model
Image courtesy of http://themaisonette.net
maandag 11 maart 13
54. 6
Lower the Threshold
Linked Data allows you to trace usage!
• Integrate Linked Data production into everyday tools
• Allow tools to do the work for you
• Use a built-in reward model
Image courtesy of http://themaisonette.net
maandag 11 maart 13
58. • Lightweight Web Application
• Interface to API of existing data repositories
• Enrich metadata by linking to Linked Data resources
• Provide annotation services for data files
• Plugin based architecture http://linkitup.data2semantics.org
• Publish RDF metadata as new data publication
maandag 11 maart 13
59. recoprov
Reconstruct provenance using
Dropbox file edit history
19
7 5
8 14
11 9 13
4 16 1 22
17 12
2 0
23
3 18 6
10 15 20
21 24
Sara Magliacane and Paul Groth
maandag 11 maart 13
60. plsheet
How are results calculated (1)?
Analyse dependencies between Automatic analyis of workflow in spreadsheets
cells in complex spreadsheets
Martine de Vos, Jan Wielemaker and Willem van Hage
maandag 11 maart 13
61. plsheet
Reconstruct and explain the
workflow of computations
Martine de Vos, Jan Wielemaker and Willem van Hage
maandag 11 maart 13
62. TabLinker
Semi-automatic RDF converter for
eccentric spreadsheets
Albert Merono-Penuela, Rinke Hoekstra, http://www.cedar-project.nl
Laurens Rietveld, Christophe Gueret
maandag 11 maart 13
63. TabLinker
Semi-automatic RDF converter for
eccentric spreadsheets
Albert Merono-Penuela, Rinke Hoekstra, http://www.cedar-project.nl
Laurens Rietveld, Christophe Gueret
maandag 11 maart 13
64. Repeatable Transformation
The missing ★ Choose your Grain Size
Linked Data
Six Ingredients
Contextualize!
Mix ‘n Mash
Lower the Threshold
maandag 11 maart 13
65. Repeatable Transformation
The missing ★ Choose your Grain Size
Linked Open Data
... be sure to use it internally too!
Contextualize!
Mix ‘n Mash
Lower the Threshold
maandag 11 maart 13