Linked Data Publishing
Ruben Taelman - @rubensworks
imec - Ghent University
1
Publishing as part of any Linked Data life cycle
2
Publish Query... ...
Linked Data Publishing
Linked Data Interfaces
Storage
Non-technical tasks
3
Linked Data Publishing
Linked Data Interfaces
Storage
Non-technical tasks
4
Linked Data provides machine-accessible data
Machines can retrieve and discover data through HTTP interfaces
Machines can understand the data
5
Ways for publishing Linked Data on the Web
Data dump
1 RDF document
Linked Data document
RDF document per topic
SPARQL endpoint
Expressive query interface
6
Data dump
Simple for data publisher
Data dumps can be large (~gigabytes)
Querying only possible after downloading entire dataset
7
Linked Data document
Data is available in smaller fragments, according to subject
Linked Data principle of dereferencing
“3. When someone looks up a URI, provide useful information,
using the open Web standards such as RDF, SPARQL” (Hyland 2013)
Querying only possible by traversing links
8
SPARQL endpoint
Requires higher computational effort from server
Single point to get and expose data
Easily queryable by clients
9
How do the different interfaces relate to each other?
10
Linked Data Fragments (LDF)
A uniform view on Linked Data interfaces
high client effort high server effort
D
ata
dum
p
LD
docum
ent
SPAR
Q
L
result
11
(Verborgh 2016)
A big unexplored area on the LDF axis
high client effort high server effort
D
ata
dum
p
LD
docum
ent
SPAR
Q
L
result
?
12
(Verborgh 2016)
Triple Pattern Fragments (TPF),
a trade-off between server and client effort
high client effort high server effort
D
ata
dum
p
LD
docum
ent
SPAR
Q
L
result
TPF
13
(Verborgh 2016)
Triple Pattern Fragments
Low-cost server interface
Fragmentation of a dataset by triple patterns
Client-side SPARQL query evaluation using a TPF interface
14
Choosing an LD interface as trade-off between
server and client effort
15
URI policies for interfaces
Linked Data uses URI’s as a global identification system
URI design principles also apply to interface URI’s:
Persistent URI’s and redirection
Domain authority (e.g. government domain)
Machine and human-readable representations through content
negotiation
...
16
Linked Data Publishing
Linked Data Interfaces
Storage
Non-technical tasks
17
Interface and storage solution influence each other
18
Start with most restrictive element
Storage is fixed → storage, interface
Machine limitations → interface, storage
19
Storage solutions for Linked Data interfaces
Data dump
Linked Data document
Triple Pattern Fragments
SPARQL endpoint
RDF file, HDT, ...
Static or dynamic RDF files
RDF file, HDT, SPARQL engine, ...
SPARQL engine
20
Linked Data Publishing
Linked Data Interfaces
Storage
Non-technical tasks
Licensing
Publication announcement
Maintenance
21
Linked Open Data requires an open license
All published data should have a connected license
Features of openness: (Open Knowledge Foundation)
Availability and access
Reuse and redistribution
Universal participation
Popular open license: CC0
Mention license in dataset listings and in metadata
Confidential data might require restrictive license and security
22
Announcing to the public
Communication channels: mailing lists, blogs, newsletters, …
Feedback channel: form or contact address for any issues
Centralized repositories (e.g. https://datahub.io)
Automated discovery with metadata (e.g. DCAT, VOID)
23
Linked Data Publication is a continuous process
Social contract with data consumers
Avoid dataset / interface removal
Data can change
Movement of dataset to new location → URI persistence!
Responsible entity behind feedback channel
24
Linked Data Publishing
Linked Data Interfaces
Storage
Non-technical tasks
25
Conclusions
Different Linked Data interfaces exist for publishing Linked Data
Trade-off between server and client effort
Interface and storage solution influence each other
Properly license, announce and maintain your data
26
Sources
R Verborgh “Linked Data Publishing”
http://rubenverborgh.github.io/WebFundamentals/linked-data-publishing/
Hyland B, Atemezing G, Villazón-Terrazas B. “Best Practises for Publishing Linked
Data”
https://www.w3.org/TR/ld-bp/
Berners-Lee, Tim. "Linked data, 2006." (2006).
https://www.w3.org/DesignIssues/LinkedData.html
Villazón-Terrazas, Boris, et al. "Methodological guidelines for publishing government
Linked Data." Linking government data. Springer New York, 2011. 27-49.
27

EKAW - Linked Data Publishing

  • 1.
    Linked Data Publishing RubenTaelman - @rubensworks imec - Ghent University 1
  • 2.
    Publishing as partof any Linked Data life cycle 2 Publish Query... ...
  • 3.
    Linked Data Publishing LinkedData Interfaces Storage Non-technical tasks 3
  • 4.
    Linked Data Publishing LinkedData Interfaces Storage Non-technical tasks 4
  • 5.
    Linked Data providesmachine-accessible data Machines can retrieve and discover data through HTTP interfaces Machines can understand the data 5
  • 6.
    Ways for publishingLinked Data on the Web Data dump 1 RDF document Linked Data document RDF document per topic SPARQL endpoint Expressive query interface 6
  • 7.
    Data dump Simple fordata publisher Data dumps can be large (~gigabytes) Querying only possible after downloading entire dataset 7
  • 8.
    Linked Data document Datais available in smaller fragments, according to subject Linked Data principle of dereferencing “3. When someone looks up a URI, provide useful information, using the open Web standards such as RDF, SPARQL” (Hyland 2013) Querying only possible by traversing links 8
  • 9.
    SPARQL endpoint Requires highercomputational effort from server Single point to get and expose data Easily queryable by clients 9
  • 10.
    How do thedifferent interfaces relate to each other? 10
  • 11.
    Linked Data Fragments(LDF) A uniform view on Linked Data interfaces high client effort high server effort D ata dum p LD docum ent SPAR Q L result 11 (Verborgh 2016)
  • 12.
    A big unexploredarea on the LDF axis high client effort high server effort D ata dum p LD docum ent SPAR Q L result ? 12 (Verborgh 2016)
  • 13.
    Triple Pattern Fragments(TPF), a trade-off between server and client effort high client effort high server effort D ata dum p LD docum ent SPAR Q L result TPF 13 (Verborgh 2016)
  • 14.
    Triple Pattern Fragments Low-costserver interface Fragmentation of a dataset by triple patterns Client-side SPARQL query evaluation using a TPF interface 14
  • 15.
    Choosing an LDinterface as trade-off between server and client effort 15
  • 16.
    URI policies forinterfaces Linked Data uses URI’s as a global identification system URI design principles also apply to interface URI’s: Persistent URI’s and redirection Domain authority (e.g. government domain) Machine and human-readable representations through content negotiation ... 16
  • 17.
    Linked Data Publishing LinkedData Interfaces Storage Non-technical tasks 17
  • 18.
    Interface and storagesolution influence each other 18
  • 19.
    Start with mostrestrictive element Storage is fixed → storage, interface Machine limitations → interface, storage 19
  • 20.
    Storage solutions forLinked Data interfaces Data dump Linked Data document Triple Pattern Fragments SPARQL endpoint RDF file, HDT, ... Static or dynamic RDF files RDF file, HDT, SPARQL engine, ... SPARQL engine 20
  • 21.
    Linked Data Publishing LinkedData Interfaces Storage Non-technical tasks Licensing Publication announcement Maintenance 21
  • 22.
    Linked Open Datarequires an open license All published data should have a connected license Features of openness: (Open Knowledge Foundation) Availability and access Reuse and redistribution Universal participation Popular open license: CC0 Mention license in dataset listings and in metadata Confidential data might require restrictive license and security 22
  • 23.
    Announcing to thepublic Communication channels: mailing lists, blogs, newsletters, … Feedback channel: form or contact address for any issues Centralized repositories (e.g. https://datahub.io) Automated discovery with metadata (e.g. DCAT, VOID) 23
  • 24.
    Linked Data Publicationis a continuous process Social contract with data consumers Avoid dataset / interface removal Data can change Movement of dataset to new location → URI persistence! Responsible entity behind feedback channel 24
  • 25.
    Linked Data Publishing LinkedData Interfaces Storage Non-technical tasks 25
  • 26.
    Conclusions Different Linked Datainterfaces exist for publishing Linked Data Trade-off between server and client effort Interface and storage solution influence each other Properly license, announce and maintain your data 26
  • 27.
    Sources R Verborgh “LinkedData Publishing” http://rubenverborgh.github.io/WebFundamentals/linked-data-publishing/ Hyland B, Atemezing G, Villazón-Terrazas B. “Best Practises for Publishing Linked Data” https://www.w3.org/TR/ld-bp/ Berners-Lee, Tim. "Linked data, 2006." (2006). https://www.w3.org/DesignIssues/LinkedData.html Villazón-Terrazas, Boris, et al. "Methodological guidelines for publishing government Linked Data." Linking government data. Springer New York, 2011. 27-49. 27