Providing	
 Ā Linked	
 Ā Data	
 Ā 
Presented	
 Ā by:	
 Ā 
Barry	
 Ā Norton	
 Ā 
Maribel	
 Ā Acosta	
 Ā 
Motivation:	
 Ā Music!	
 Ā 
2	
 Ā 
Visualiza3on	
 Ā 
Module	
 Ā 
Metadata	
 Ā 
Streaming	
 Ā providers	
 Ā 
Physical	
 Ā Wrapper	
 Ā 
Downloads	
 Ā 
Data	
 Ā acquisi3on	
 Ā 
R2R	
 Ā Transf.	
 Ā LD	
 Ā Wrapper	
 Ā 
Musical	
 Ā Content	
 Ā 
Applica3on	
 Ā 
Analysis	
 Ā &	
 Ā 
Mining	
 Ā Module	
 Ā 
LD	
 Ā Data	
 Ā set	
 Ā Access	
 Ā 
LD	
 Ā Wrapper	
 Ā 
RDF/	
 Ā 
XML	
 Ā 
Integrated	
 Ā 
Dataset	
 Ā 
Interlinking	
 Ā  Cleansing	
 Ā 
Vocabulary	
 Ā 
Mapping	
 Ā 
SPARQL	
 Ā 
Endpoint	
 Ā 
Publishing	
 Ā 
RDFa	
 Ā 
Other	
 Ā content	
 Ā 
LINKED	
 Ā DATA	
 Ā LIFECYCLE	
 Ā 
EUCLID	
 Ā -­‐	
 Ā Querying	
 Ā Linked	
 Ā Data	
 Ā  3	
 Ā 
Linked	
 Ā Data	
 Ā Principles	
 Ā 
1.  Use	
 Ā URIs	
 Ā as	
 Ā names	
 Ā for	
 Ā things.	
 Ā 
2.  Use	
 Ā HTTP	
 Ā URIs	
 Ā so	
 Ā that	
 Ā users	
 Ā can	
 Ā look	
 Ā up	
 Ā 
those	
 Ā names.	
 Ā 
3.  When	
 Ā someone	
 Ā looks	
 Ā up	
 Ā a	
 Ā URI,	
 Ā provide	
 Ā 
useful	
 Ā informa9on,	
 Ā using	
 Ā the	
 Ā standards	
 Ā 
(RDF*,	
 Ā SPARQL).	
 Ā 
4.  Include	
 Ā links	
 Ā to	
 Ā other	
 Ā URIs,	
 Ā so	
 Ā that	
 Ā users	
 Ā 
can	
 Ā discover	
 Ā more	
 Ā things.	
 Ā 
EUCLID	
 Ā -­‐	
 Ā Providing	
 Ā Linked	
 Ā Data	
 Ā  4	
 Ā 
CH	
 Ā 1	
 Ā 
Linked	
 Ā Data	
 Ā 
Lifecycle	
 Ā 
Linked	
 Ā Data	
 Ā Lifecycle	
 Ā 
EUCLID	
 Ā -­‐	
 Ā Providing	
 Ā Linked	
 Ā Data	
 Ā  5	
 Ā 
Source:	
  Sören	
 Ā Auer.	
 Ā ā€œThe	
 Ā Seman3c	
 Ā Data	
 Ā Webā€	
 Ā (slides)	
 Ā 
Source:	
  José	
 Ā M.	
 Ā Alvarez.	
 Ā ā€œMy	
 Ā Linked	
 Ā Data	
 Ā Lifecycleā€	
 Ā 
Source:	
 Ā Michael	
 Ā Hausenblas.	
 Ā ā€œLinked	
 Ā Data	
 Ā lifeyclcleā€	
 Ā 
Core	
 Ā Tasks	
 Ā for	
 Ā Providing	
 Ā 	
 Ā 	
 Ā 	
 Ā 	
 Ā 	
 Ā 
Linked	
 Ā Data	
 Ā 	
 Ā 
EUCLID	
 Ā -­‐	
 Ā Providing	
 Ā Linked	
 Ā Data	
 Ā  6	
 Ā 
Based	
 Ā on	
 Ā the	
 Ā proposed	
 Ā LD	
 Ā lifecycles	
 Ā and	
 Ā the	
 Ā LD	
 Ā 
principles,	
 Ā we	
 Ā can	
 Ā iden3fy	
 Ā 3	
 Ā main	
 Ā tasks	
 Ā for	
 Ā providing	
 Ā LD:	
 Ā 
① Crea9ng:	
 Ā includes	
 Ā data	
 Ā extrac3on,	
 Ā crea3on	
 Ā of	
 Ā HTTP	
 Ā 
URIs,	
 Ā and	
 Ā vocabulary	
 Ā selec3on.	
 Ā (LD	
 Ā principles	
 Ā 1	
 Ā &	
 Ā 2)	
 Ā 
⑔ Interlinking:	
 Ā involves	
 Ā the	
 Ā crea3on	
 Ā of	
 Ā (RDF)	
 Ā links	
 Ā to	
 Ā 
external	
 Ā data	
 Ā sets.	
 Ā (LD	
 Ā principle	
 Ā 4)	
 Ā 
③ Publishing:	
 Ā consists	
 Ā of	
 Ā crea3ng	
 Ā the	
 Ā metadata	
 Ā and	
 Ā 
making	
 Ā the	
 Ā data	
 Ā set	
 Ā accessible.	
 Ā (LD	
 Ā principle	
 Ā 3)	
 Ā 
	
 Ā 
Agenda	
 Ā 
1.  Crea9ng	
 Ā Linked	
 Ā Data	
 Ā 
2.  Interlinking	
 Ā Linked	
 Ā Data	
 Ā 
3.  Publishing	
 Ā Linked	
 Ā Data	
 Ā 
4.  Linked	
 Ā Data	
 Ā publishing	
 Ā checklist	
 Ā 
7	
 Ā EUCLID	
 Ā -­‐	
 Ā Providing	
 Ā Linked	
 Ā Data	
 Ā 
CREATING	
 Ā LINKED	
 Ā DATA	
 Ā 
EUCLID	
 Ā -­‐	
 Ā Querying	
 Ā Linked	
 Ā Data	
 Ā  8	
 Ā 
•  The	
 Ā data	
 Ā of	
 Ā interest	
 Ā may	
 Ā be	
 Ā stored	
 Ā in	
 Ā a	
 Ā wide	
 Ā range	
 Ā or	
 Ā 
formats:	
 Ā 
	
 Ā 
•  Several	
 Ā tools	
 Ā support	
 Ā the	
 Ā process	
 Ā of	
 Ā mining	
 Ā data	
 Ā 
from	
 Ā different	
 Ā repositories,	
 Ā for	
 Ā example:	
 Ā 
Extracting	
 Ā the	
 Ā Data	
 Ā 
9	
 Ā EUCLID	
 Ā -­‐	
 Ā Providing	
 Ā Linked	
 Ā Data	
 Ā 
Spreadsheets	
 Ā 
or	
 Ā tabular	
 Ā data	
 Ā 	
 Ā 
Databases	
 Ā  Text	
 Ā 
R2RML	
 Ā 
Using	
 Ā the	
 Ā RDF	
 Ā Data	
 Ā Model	
 Ā 
EUCLID	
 Ā -­‐	
 Ā Providing	
 Ā Linked	
 Ā Data	
 Ā  10	
 Ā 
•  The	
 Ā RDF	
 Ā data	
 Ā model	
 Ā is	
 Ā used	
 Ā to	
 Ā represent	
 Ā the	
 Ā 
extracted	
 Ā informa3on	
 Ā 
•  The	
 Ā nodes	
 Ā represent	
 Ā the	
 Ā concepts/en33es	
 Ā within	
 Ā 
the	
 Ā data.	
 Ā A	
 Ā node	
 Ā corresponds	
 Ā to	
 Ā a	
 Ā URI,	
 Ā a	
 Ā blank	
 Ā node	
 Ā 
or	
 Ā a	
 Ā literal	
 Ā (only	
 Ā in	
 Ā predicates)	
 Ā 
•  The	
 Ā rela3onships	
 Ā between	
 Ā the	
 Ā concepts/en33es	
 Ā are	
 Ā 
modeled	
 Ā as	
 Ā arcs	
 Ā 
Subject	
 Ā  Object	
 Ā 
Predicate	
 Ā 
Naming	
 Ā Things:	
 Ā URIs	
 Ā 
•  All	
 Ā the	
 Ā things	
 Ā or	
 Ā dis3nct	
 Ā en33es	
 Ā within	
 Ā the	
 Ā data	
 Ā must	
 Ā 
be	
 Ā named	
 Ā 
•  According	
 Ā to	
 Ā the	
 Ā Linked	
 Ā Data	
 Ā principles,	
 Ā the	
 Ā standard	
 Ā 
mechanism	
 Ā to	
 Ā name	
 Ā en33es	
 Ā is	
 Ā the	
 Ā URI	
 Ā 
•  Designing	
 Ā Cool	
 Ā URIs:	
 Ā 
–  Leave	
 Ā out	
 Ā informa3on	
 Ā about	
 Ā the	
 Ā data	
 Ā regarding	
 Ā to:	
 Ā author,	
 Ā 
technologies,	
 Ā status,	
 Ā access	
 Ā mechanisms,	
  …	
 Ā 
–  Simplicity:	
 Ā short,	
 Ā mnemonic	
 Ā URIs	
 Ā 
–  Stability:	
 Ā maintain	
 Ā the	
 Ā URIs	
 Ā as	
 Ā long	
 Ā as	
 Ā possible	
 Ā 
–  Manageability:	
 Ā issue	
 Ā the	
 Ā URIs	
 Ā in	
 Ā a	
 Ā way	
 Ā that	
 Ā you	
 Ā can	
 Ā manage	
 Ā 
11	
 Ā EUCLID	
 Ā -­‐	
 Ā Providing	
 Ā Linked	
 Ā Data	
 Ā 
Source:hjp://www.w3.org/TR/cooluris/	
 Ā 
Selecting	
 Ā Vocabularies	
 Ā 
•  Vocabularies	
 Ā 	
 Ā model	
 Ā the	
 Ā concepts	
 Ā and	
 Ā the	
 Ā 
rela9onship	
 Ā between	
 Ā them	
 Ā in	
 Ā a	
 Ā knowledge	
 Ā domain	
 Ā 
•  Terms	
 Ā from	
 Ā well-­‐known	
 Ā vocabularies	
 Ā should	
 Ā be	
 Ā 
reused	
 Ā wherever	
 Ā possible	
 Ā 
•  New	
 Ā terms	
 Ā should	
 Ā be	
  define	
 Ā only	
 Ā if	
 Ā you	
 Ā can	
 Ā not	
  find	
 Ā 
required	
 Ā terms	
 Ā in	
 Ā exis3ng	
 Ā vocabularies	
 Ā 
•  A	
 Ā large	
 Ā number	
 Ā of	
 Ā vocabularies	
 Ā in	
 Ā RDF	
 Ā are	
 Ā openly	
 Ā 
available,	
 Ā e.g.,	
 Ā Linked	
 Ā Open	
 Ā Vocabularies	
 Ā (LOV)	
 Ā 
12	
 Ā EUCLID	
 Ā -­‐	
 Ā Providing	
 Ā Linked	
 Ā Data	
 Ā 
Selecting	
 Ā Vocabularies	
 Ā (2)	
 Ā 
EUCLID	
 Ā -­‐	
 Ā Providing	
 Ā Linked	
 Ā Data	
 Ā  13	
 Ā 
Linked	
 Ā Open	
 Ā Vocabularies	
 Ā 
322	
 Ā vocabularies	
 Ā 
classified	
 Ā by	
 Ā domain	
 Ā 
Source:hjp://lov.okfn.org/dataset/lov/	
 Ā 
Selecting	
 Ā Vocabularies	
 Ā (3)	
 Ā 
EUCLID	
 Ā -­‐	
 Ā Providing	
 Ā Linked	
 Ā Data	
 Ā  14	
 Ā 
Linked	
 Ā Open	
 Ā Vocabularies:	
 Ā Analyzing	
 Ā MusicOntology	
 Ā 
Source:hjp://lov.okfn.org/dataset/lov/details/vocabulary_mo.html	
 Ā 
Selecting	
 Ā Vocabularies	
 Ā (4)	
 Ā 
EUCLID	
 Ā -­‐	
 Ā Providing	
 Ā Linked	
 Ā Data	
 Ā  15	
 Ā 
Other	
 Ā lists	
 Ā of	
 Ā well-­‐known	
 Ā vocabularies	
 Ā are	
 Ā maintained	
 Ā 
by:	
 Ā 
•  W3C	
 Ā SWEO	
 Ā Linking	
 Ā Open	
 Ā Data	
 Ā community	
 Ā project	
 Ā 
hjp://www.w3.org/wiki/TaskForces/CommunityProjects/LinkingOpenData/
CommonVocabularies	
 Ā 
•  Library	
 Ā Linked	
 Ā Data	
 Ā Incubator	
 Ā Group:	
 Ā Vocabularies	
 Ā in	
 Ā 
the	
 Ā library	
 Ā domain	
 Ā 
hjp://www.w3.org/2005/Incubator/lld/XGR-­‐lld-­‐vocabdataset-­‐20111025	
 Ā 
INTERLINKING	
 Ā LINKED	
 Ā DATA	
 Ā 
EUCLID	
 Ā -­‐	
 Ā Providing	
 Ā Linked	
 Ā Data	
 Ā  16	
 Ā 
Interlinking	
 Ā Data	
 Ā Sets	
 Ā 
•  It’s	
 Ā one	
 Ā of	
 Ā the	
 Ā Linked	
 Ā Data	
 Ā principles!	
 Ā 
•  Involves	
 Ā the	
 Ā crea3on	
 Ā of	
 Ā RDF	
 Ā links	
 Ā between	
 Ā two	
 Ā 
different	
 Ā RDF	
 Ā data	
 Ā sets:	
 Ā 
–  Links	
 Ā at	
 Ā instance	
 Ā level	
 Ā (rdfs:seeAlso,	
 Ā owl:sameAs)	
 Ā 
–  Links	
 Ā at	
 Ā schema	
 Ā level	
 Ā (RDFS	
 Ā subclass/subproperty,	
 Ā OWL	
 Ā 
equivalent	
 Ā class/property,	
 Ā SKOS	
 Ā mapping	
 Ā proper9es)	
 Ā 
•  Appropriate	
 Ā links	
 Ā are	
 Ā detected	
 Ā via	
 Ā link	
 Ā discovery	
 Ā 
EUCLID	
 Ā -­‐	
 Ā Providing	
 Ā Linked	
 Ā Data	
 Ā  17	
 Ā 
4.	
 Ā Include	
 Ā links	
 Ā to	
 Ā other	
 Ā URIs,	
 Ā so	
 Ā that	
 Ā users	
 Ā can	
 Ā discover	
 Ā 
more	
 Ā things.	
 Ā 
Interlinking	
 Ā Data	
 Ā Sets	
 Ā (2)	
 Ā 
Challenges	
 Ā for	
 Ā link	
 Ā discovery	
 Ā 
•  Linked	
 Ā Data	
 Ā sets	
 Ā are	
 Ā heterogeneous	
 Ā in	
 Ā terms	
 Ā of	
 Ā 
vocabularies,	
 Ā formats	
 Ā and	
 Ā data	
 Ā representa3on	
 Ā 
•  Large	
 Ā range	
 Ā of	
 Ā knowledge	
 Ā domains	
 Ā 	
 Ā 
•  Scalability:	
 Ā LD	
 Ā is	
 Ā composed	
 Ā of	
 Ā a	
 Ā large	
 Ā number	
 Ā of	
 Ā data	
 Ā 
sets	
 Ā and	
 Ā RDF	
 Ā triples,	
 Ā hence	
 Ā it	
 Ā is	
 Ā not	
 Ā possible	
 Ā to	
 Ā 
compare	
 Ā every	
 Ā possible	
 Ā en3ty	
 Ā pair	
 Ā 
EUCLID	
 Ā -­‐	
 Ā Providing	
 Ā Linked	
 Ā Data	
 Ā  18	
 Ā 
Source:	
 Ā Robert	
 Ā Isele.	
 Ā ā€œLOD2	
 Ā Webinar	
 Ā Series:Silkā€	
 Ā 	
 Ā 
Interlinking	
 Ā Data	
 Ā Sets	
 Ā (3)	
 Ā 
Challenges	
 Ā for	
 Ā link	
 Ā discovery	
 Ā 
•  It	
 Ā corresponds	
 Ā to	
 Ā the	
 Ā en9ty	
 Ā resolu9on	
 Ā problem:	
 Ā 
deciding	
 Ā whether	
 Ā two	
 Ā en..es	
 Ā correspond	
 Ā to	
 Ā same	
 Ā object	
 Ā in	
 Ā 
the	
 Ā real	
 Ā world	
 Ā 
•  Name	
 Ā ambigui9es:	
 Ā typos,	
 Ā misspellings,	
 Ā different	
 Ā 
languages,	
 Ā homonyms	
 Ā 	
 Ā 
•  Structural	
 Ā ambigui9es:	
 Ā same	
 Ā concepts/en33es	
 Ā with	
 Ā 
different	
 Ā structures.	
 Ā Requires	
 Ā the	
 Ā applica3on	
 Ā of	
 Ā ontology	
 Ā 
and	
 Ā schema	
 Ā matching	
 Ā techniques	
 Ā 
EUCLID	
 Ā -­‐	
 Ā Providing	
 Ā Linked	
 Ā Data	
 Ā  19	
 Ā 
Interlinking	
 Ā Data	
 Ā Sets	
 Ā (4)	
 Ā 
EUCLID	
 Ā -­‐	
 Ā Providing	
 Ā Linked	
 Ā Data	
 Ā  20	
 Ā 
RDF	
 Ā data	
 Ā sets	
 Ā 	
 Ā 
can	
 Ā be	
 Ā interlinked:	
 Ā 
Manually	
 Ā 
•  Involves	
 Ā the	
 Ā manual	
 Ā explora3on	
 Ā of	
 Ā 
LD	
 Ā data	
 Ā sets	
 Ā and	
 Ā their	
 Ā RDF	
 Ā 
resources	
 Ā to	
 Ā iden3fy	
 Ā linking	
 Ā targets	
 Ā 
•  May	
 Ā not	
 Ā be	
 Ā feasible	
 Ā when	
 Ā the	
 Ā 
number	
 Ā of	
 Ā en33es	
 Ā within	
 Ā the	
 Ā data	
 Ā 
set	
 Ā is	
 Ā very	
 Ā large	
 Ā 
	
 Ā 
Automatically	
 Ā 
•  Using	
 Ā tools	
 Ā that	
 Ā perform	
 Ā link	
 Ā 
discovery	
 Ā based	
 Ā on	
 Ā linkage	
 Ā rules,	
 Ā for	
 Ā 
example:	
 Ā Silk,	
 Ā Limes	
 Ā and	
 Ā xCurator	
 Ā 
owl:sameAs	
 Ā &	
 Ā rdfs:seeAlso	
 Ā 
•  owl:sameAs	
 Ā 
•  Creates	
 Ā links	
 Ā between	
 Ā individuals	
 Ā 	
 Ā 
•  States	
 Ā that	
 Ā two	
 Ā URIs	
 Ā refer	
 Ā to	
 Ā the	
 Ā same	
 Ā individuals	
 Ā 
	
 Ā 
•  rdfs:seeAlso	
 Ā 
•  States	
 Ā that	
 Ā a	
 Ā resource	
 Ā may	
 Ā provide	
 Ā addi3onal	
 Ā informa3on	
 Ā 
about	
 Ā the	
 Ā subject	
 Ā resource	
 Ā 
•  Links	
 Ā in	
 Ā MusicBrainz:	
 Ā 
–  owl:seeAlso	
 Ā is	
 Ā used	
 Ā for	
 Ā music	
 Ā ar3sts	
 Ā 
–  rdfs:seeAlso	
 Ā is	
 Ā used	
 Ā for	
 Ā albums	
 Ā 
EUCLID	
 Ā -­‐	
 Ā Providing	
 Ā Linked	
 Ā Data	
 Ā  21	
 Ā 
SKOS	
 Ā 
•  Simple	
 Ā Knowledge	
 Ā Organiza3on	
 Ā System	
 Ā 
–  hjp://www.w3.org/TR/skos-­‐reference/	
 Ā 	
 Ā 
•  Data	
 Ā model	
 Ā for	
 Ā knowledge	
 Ā organiza3on	
 Ā systems	
 Ā 
(thesauri,	
  classifica3on	
 Ā scheme,	
 Ā taxonomies)	
 Ā 	
 Ā 
•  SKOS	
 Ā data	
 Ā is	
 Ā expressed	
 Ā as	
 Ā RDF	
 Ā triples	
 Ā 
•  Allows	
 Ā the	
 Ā crea3on	
 Ā of	
 Ā RDF	
 Ā links	
 Ā between	
 Ā different	
 Ā 
data	
 Ā sets	
 Ā with	
 Ā the	
 Ā usage	
 Ā of	
 Ā mapping	
 Ā proper9es	
 Ā 
EUCLID	
 Ā -­‐	
 Ā Providing	
 Ā Linked	
 Ā Data	
 Ā  22	
 Ā 
SKOS:	
 Ā Mapping	
 Ā Properties	
 Ā 
These	
 Ā proper3es	
 Ā are	
 Ā used	
 Ā to	
 Ā link	
 Ā SKOS	
 Ā concepts	
 Ā 
(par3cularly	
 Ā instances)	
 Ā in	
 Ā different	
 Ā schemes:	
 Ā 
•  skos:closeMatch:	
 Ā links	
 Ā two	
 Ā concepts	
 Ā that	
 Ā are	
 Ā 
sufficiently	
 Ā similar	
 Ā (some3mes	
 Ā can	
 Ā be	
 Ā used	
 Ā interchangeably)	
 Ā 
•  skos:exactMatch:	
 Ā indicates	
 Ā that	
 Ā the	
 Ā two	
 Ā concepts	
 Ā 
can	
 Ā be	
 Ā used	
 Ā interchangeably.	
 Ā 	
 Ā 
•  Axiom:	
 Ā It	
 Ā is	
 Ā a	
 Ā transi9ve	
 Ā property	
 Ā 
•  skos:relatedMatch:	
 Ā states	
 Ā an	
 Ā associa3ve	
 Ā mapping	
 Ā 
link	
 Ā between	
 Ā two	
 Ā concepts	
 Ā 
EUCLID	
 Ā -­‐	
 Ā Providing	
 Ā Linked	
 Ā Data	
 Ā  23	
 Ā 
Example	
 Ā of	
 Ā SKOS	
 Ā exact	
 Ā match	
 Ā 
	
 Ā 
	
 Ā 
SKOS:	
 Ā Mapping	
 Ā Properties	
 Ā (2)	
 Ā 
EUCLID	
 Ā -­‐	
 Ā Providing	
 Ā Linked	
 Ā Data	
 Ā  24	
 Ā 
mo:MusicArtist	
 Ā skos:exactMatch	
 Ā dbpedia-­‐ont:MusicalArtist.	
 Ā 
@prefix	
 Ā skos:	
 Ā <http://www.w3.org/2004/02/skos/core#>	
 Ā 
@prefix	
 Ā mo:	
 Ā <http://purl.org/ontology/mo/>	
 Ā 
@prefix	
 Ā dbpedia-­‐ont:	
 Ā <http://dbpedia.org/ontology/>	
 Ā 
@prefix	
 Ā schema:	
 Ā <http://schema.org/>	
 Ā 
	
 Ā 
	
 Ā 
	
 Ā 
mo:MusicGroup	
 Ā skos:exactMatch	
 Ā schema:MusicGroup.	
 Ā 
mo:MusicGroup	
 Ā skos:exactMatch	
 Ā dbpedia-­‐ont:Band.	
 Ā 
Example	
 Ā of	
 Ā SKOS	
 Ā close	
 Ā match	
 Ā 
	
 Ā 
	
 Ā 
SKOS:	
 Ā Mapping	
 Ā Properties	
 Ā (3)	
 Ā 
EUCLID	
 Ā -­‐	
 Ā Providing	
 Ā Linked	
 Ā Data	
 Ā  25	
 Ā 
mo:SignalGroup	
 Ā skos:closeMatch	
 Ā schema:MusicAlbum.	
 Ā 
@prefix	
 Ā skos:	
 Ā <http://www.w3.org/2004/02/skos/core#>	
 Ā 
@prefix	
 Ā mo:	
 Ā <http://purl.org/ontology/mo/>	
 Ā 
@prefix	
 Ā dbpedia-­‐ont:	
 Ā <http://dbpedia.org/ontology/>	
 Ā 
@prefix	
 Ā schema:	
 Ā <http://schema.org/>	
 Ā 
	
 Ā 
	
 Ā 
mo:SignalGroup	
 Ā skos:closeMatch	
 Ā dbpedia-­‐ont:Album.	
 Ā 
Integrity	
 Ā conditions	
 Ā 
•  Guarantee	
 Ā consistency	
 Ā and	
 Ā avoid	
 Ā contradic3ons	
 Ā in	
 Ā 
the	
 Ā rela3onships	
 Ā between	
 Ā SKOS	
 Ā concepts	
 Ā 
SKOS:	
 Ā Mapping	
 Ā Properties	
 Ā (4)	
 Ā 
EUCLID	
 Ā -­‐	
 Ā Providing	
 Ā Linked	
 Ā Data	
 Ā  26	
 Ā 
skos:Mapping	
 Ā 
Relation	
 Ā 
skos:close	
 Ā 
Match	
 Ā 
skos:exact	
 Ā 
Match	
 Ā 
skos:related	
 Ā 
Match	
 Ā 
Symmetric	
 Ā 
&	
 Ā Transi9ve	
 Ā 
Disjoint	
 Ā 
with	
 Ā 
Par3al	
 Ā Mapping	
 Ā Rela3on	
 Ā diagram	
 Ā with	
 Ā integrity	
 Ā condi3ons	
 Ā 
Symmetric	
 Ā 
PUBLISHING	
 Ā LINKED	
 Ā DATA	
 Ā 
EUCLID	
 Ā -­‐	
 Ā Providing	
 Ā Linked	
 Ā Data	
 Ā  27	
 Ā 
Publishing	
 Ā Linked	
 Ā Data	
 Ā 
Once	
 Ā the	
 Ā RDF	
 Ā data	
 Ā set	
 Ā has	
 Ā been	
 Ā created	
 Ā and	
 Ā 
interlinked,	
 Ā the	
 Ā publishing	
 Ā process	
 Ā involves	
 Ā the	
 Ā 
following	
 Ā tasks:	
 Ā 
1.  Metadata	
 Ā crea3on	
 Ā for	
 Ā describing	
 Ā the	
 Ā data	
 Ā set	
 Ā 	
 Ā 
2.  Making	
 Ā the	
 Ā data	
 Ā set	
 Ā accessible	
 Ā 
3.  Exposing	
 Ā the	
 Ā data	
 Ā set	
 Ā in	
 Ā Linked	
 Ā Data	
 Ā repositories	
 Ā 
4.  Valida9ng	
 Ā the	
 Ā data	
 Ā set	
 Ā 
EUCLID	
 Ā -­‐	
 Ā Providing	
 Ā Linked	
 Ā Data	
 Ā  28	
 Ā 
•  Consists	
 Ā of	
 Ā providing	
 Ā (machine-­‐readable)	
 Ā metadata	
 Ā 
of	
 Ā RDF	
 Ā data	
 Ā sets	
 Ā which	
 Ā can	
 Ā be	
 Ā processed	
 Ā by	
 Ā engines	
 Ā 
•  This	
 Ā informa3on	
 Ā allows	
 Ā for:	
 Ā 
–  Efficient	
 Ā and	
 Ā effec3ve	
 Ā search	
 Ā of	
 Ā data	
 Ā sets	
 Ā 
–  Selec3on	
 Ā of	
 Ā appropriate	
 Ā data	
 Ā sets	
 Ā (for	
 Ā consump3on	
 Ā or	
 Ā 
interlinking)	
 Ā 
–  Get	
 Ā general	
 Ā sta3s3cs	
 Ā of	
 Ā the	
 Ā data	
 Ā sets	
 Ā 	
 Ā 
EUCLID	
 Ā -­‐	
 Ā Providing	
 Ā Linked	
 Ā Data	
 Ā  29	
 Ā 
Describing	
 Ā RDF	
 Ā Data	
 Ā Sets	
 Ā 
Describing	
 Ā RDF	
 Ā Data	
 Ā Sets	
 Ā (2)	
 Ā 
•  The	
 Ā common	
 Ā language	
 Ā for	
 Ā describing	
 Ā RDF	
 Ā data	
 Ā sets	
 Ā is	
 Ā 
VoID	
 Ā (Vocabulary	
 Ā of	
 Ā Interlinked	
 Ā Data	
 Ā sets)	
 Ā 	
 Ā 
•  Defines	
 Ā an	
 Ā RDF	
 Ā data	
 Ā set	
 Ā with	
 Ā the	
 Ā predicate	
 Ā 
void:Dataset	
 Ā 
	
 Ā 
•  Covers	
 Ā 4	
 Ā types	
 Ā of	
 Ā metadata:	
 Ā 
EUCLID	
 Ā -­‐	
 Ā Providing	
 Ā Linked	
 Ā Data	
 Ā  30	
 Ā 
•  General	
 Ā metadata	
 Ā 
•  Structural	
 Ā metadata	
 Ā 
•  Descrip3ons	
 Ā of	
 Ā linksets	
 Ā 
•  Access	
 Ā metadata	
 Ā 
VoID:	
 Ā General	
 Ā Metadata	
 Ā 
•  General	
 Ā metadata	
 Ā is	
 Ā used	
 Ā by	
 Ā users	
 Ā to	
 Ā iden3fy	
 Ā 
appropriate	
 Ā data	
 Ā sets.	
 Ā 
•  Specifies	
 Ā informa3on	
 Ā about	
 Ā descrip3on	
 Ā of	
 Ā the	
 Ā data	
 Ā 
set,	
 Ā contact	
 Ā person/organiza3on,	
 Ā the	
 Ā license	
 Ā of	
 Ā the	
 Ā 
data	
 Ā set,	
 Ā data	
 Ā subject	
 Ā and	
 Ā some	
 Ā technical	
 Ā features.	
 Ā 
•  VoID	
 Ā (re)uses	
 Ā predicates	
 Ā from	
 Ā the	
 Ā Dublin	
 Ā Core	
 Ā 
Metadata1	
 Ā and	
 Ā FOAF2	
 Ā vocabularies.	
 Ā 
EUCLID	
 Ā -­‐	
 Ā Providing	
 Ā Linked	
 Ā Data	
 Ā  31	
 Ā 
1	
 Ā hjp://dublincore.org/documents/2010/10/11/dcmi-­‐terms/	
 Ā 
2	
 Ā hjp://xmlns.com/foaf/spec/	
 Ā 
VoID:	
 Ā General	
 Ā Metadata	
 Ā (2)	
 Ā 
Predicate	
 Ā  Range	
 Ā  Descrip9on	
 Ā 
dcterms:title	
 Ā  Literal	
 Ā  Name	
 Ā of	
 Ā the	
 Ā data	
 Ā set.	
 Ā 
dcterms:description	
 Ā  Literal	
 Ā  Descrip3on	
 Ā of	
 Ā the	
 Ā data	
 Ā set.	
 Ā 
dcterms:source	
 Ā  RDF	
 Ā resource	
 Ā  Source	
 Ā from	
 Ā which	
 Ā the	
 Ā data	
 Ā set	
 Ā was	
 Ā derived.	
 Ā 
dcterms:creator	
 Ā  RDF	
 Ā resource	
 Ā  Primarily	
 Ā responsible	
 Ā of	
 Ā crea3ng	
 Ā the	
 Ā data	
 Ā set.	
 Ā 
dcterms:date	
 Ā  xsd:date	
 Ā  Time	
 Ā associated	
 Ā with	
 Ā an	
 Ā event	
 Ā in	
 Ā the	
 Ā life-­‐cycle	
 Ā of	
 Ā the	
 Ā resource.	
 Ā 
dcterms:created	
 Ā  xsd:date	
 Ā  Date	
 Ā of	
 Ā crea3on	
 Ā of	
 Ā the	
 Ā data	
 Ā set.	
 Ā 
dcterms:issued	
 Ā  xsd:date	
 Ā  Date	
 Ā of	
 Ā publica3on	
 Ā of	
 Ā the	
 Ā data	
 Ā set.	
 Ā 
dcterms:modified	
 Ā  xsd:date	
 Ā  Date	
 Ā on	
 Ā which	
 Ā the	
 Ā data	
 Ā set	
 Ā was	
 Ā changed.	
 Ā 
foaf:homepage	
 Ā  Literal	
 Ā  Name	
 Ā of	
 Ā the	
 Ā data	
 Ā set.	
 Ā 
dcterms:publisher	
 Ā  RDF	
 Ā resource	
 Ā  En3ty	
 Ā responsible	
 Ā for	
 Ā making	
 Ā the	
 Ā data	
 Ā set	
 Ā available.	
 Ā 
dcterms:contributor	
 Ā  RDF	
 Ā resource	
 Ā  En3ty	
 Ā responsible	
 Ā for	
 Ā making	
 Ā contribu3ons	
 Ā to	
 Ā the	
 Ā data	
 Ā set.	
 Ā 
EUCLID	
 Ā -­‐	
 Ā Providing	
 Ā Linked	
 Ā Data	
 Ā  32	
 Ā 
Source:	
 Ā 	
 Ā hjp://www.w3.org/TR/void/#metadata	
 Ā 	
 Ā General	
 Ā Information	
 Ā 
Contains	
 Ā informa3on	
 Ā about	
 Ā the	
 Ā crea3on	
 Ā of	
 Ā the	
 Ā data	
 Ā set	
 Ā 	
 Ā 
VoID:	
 Ā General	
 Ā Metadata	
 Ā (3)	
 Ā 
Other	
 Ā Information	
 Ā 
•  License	
 Ā of	
 Ā the	
 Ā data	
 Ā set:	
  specifies	
 Ā the	
 Ā usage	
 Ā condi3ons	
 Ā of	
 Ā 
the	
 Ā data.	
 Ā The	
 Ā license	
 Ā can	
 Ā be	
 Ā pointed	
 Ā with	
 Ā the	
 Ā property	
 Ā 
dcterms:license	
 Ā 
•  Category	
 Ā of	
 Ā the	
 Ā data	
 Ā set:	
 Ā to	
 Ā specify	
 Ā the	
 Ā topics	
 Ā or	
 Ā domains	
 Ā 
covered	
 Ā by	
 Ā the	
 Ā data	
 Ā set,	
 Ā the	
 Ā property	
 Ā dcterms:subject	
 Ā 
can	
 Ā be	
 Ā used	
 Ā 
•  Technical	
 Ā features:	
 Ā the	
 Ā property	
 Ā void:feature	
 Ā can	
 Ā be	
 Ā 
used	
 Ā to	
 Ā express	
 Ā technical	
 Ā proper3es	
 Ā of	
 Ā the	
 Ā data	
 Ā (e.g.	
 Ā RDF	
 Ā 
serializa3on	
 Ā formats)	
 Ā 	
 Ā 
EUCLID	
 Ā -­‐	
 Ā Providing	
 Ā Linked	
 Ā Data	
 Ā  33	
 Ā 
VoID:	
 Ā Structural	
 Ā Metadata	
 Ā 	
 Ā 
EUCLID	
 Ā -­‐	
 Ā Providing	
 Ā Linked	
 Ā Data	
 Ā  34	
 Ā 
•  Provides	
 Ā high-­‐level	
 Ā informa3on	
 Ā about	
 Ā the	
 Ā internal	
 Ā 
structure	
 Ā of	
 Ā the	
 Ā data	
 Ā set	
 Ā 
•  This	
 Ā metadata	
 Ā is	
 Ā useful	
 Ā when	
 Ā exploring	
 Ā or	
 Ā querying	
 Ā 
the	
 Ā data	
 Ā set	
 Ā 
•  Includes	
 Ā informa3on	
 Ā about	
 Ā resources,	
 Ā vocabularies	
 Ā 
used	
 Ā in	
 Ā the	
 Ā data	
 Ā set,	
 Ā sta3s3cs	
 Ā and	
 Ā examples	
 Ā of	
 Ā 
resources	
 Ā in	
 Ā the	
 Ā data	
 Ā set	
 Ā 
VoID:	
 Ā Structural	
 Ā Metadata	
 Ā (2)	
 Ā 	
 Ā 
EUCLID	
 Ā -­‐	
 Ā Providing	
 Ā Linked	
 Ā Data	
 Ā  35	
 Ā 
Information	
 Ā about	
 Ā resources	
 Ā 
•  Example	
 Ā resources:	
 Ā allow	
 Ā users	
 Ā to	
 Ā get	
 Ā an	
 Ā impression	
 Ā of	
 Ā the	
 Ā 
kind	
 Ā of	
 Ā resources	
 Ā included	
 Ā in	
 Ā the	
 Ā data	
 Ā set.	
 Ā Examples	
 Ā can	
 Ā be	
 Ā 
shown	
 Ā with	
 Ā the	
 Ā property	
 Ā void:exampleResource	
 Ā 
•  Pajern	
 Ā for	
 Ā resource	
 Ā URIs:	
 Ā the	
 Ā void:uriSpace	
 Ā property	
 Ā 
can	
 Ā be	
 Ā used	
 Ā to	
 Ā state	
 Ā that	
 Ā all	
 Ā the	
 Ā en3ty	
 Ā URIs	
 Ā in	
 Ā a	
 Ā data	
 Ā set	
 Ā start	
 Ā 
with	
 Ā a	
 Ā given	
 Ā string	
 Ā 	
 Ā 
:MusicBrainz	
 Ā a	
 Ā void:Dataset;	
 Ā 
	
 Ā 	
 Ā 	
 Ā void:exampleResource	
 Ā 
	
 Ā 	
 Ā 	
 Ā 	
 Ā 	
 Ā <http://musicbrainz.org/artist/b10bbbfc-­‐cf9e-­‐42e0-­‐be17-­‐e2c3e1d2600d>	
 Ā .	
 Ā 
:MusicBrainz	
 Ā a	
 Ā void:Dataset;	
 Ā 
	
 Ā 	
 Ā 	
 Ā void:uriSpace	
 Ā "http://musicbrainz.org/"	
 Ā .	
 Ā 
VoID:	
 Ā Structural	
 Ā Metadata	
 Ā (3)	
 Ā 	
 Ā 
EUCLID	
 Ā -­‐	
 Ā Providing	
 Ā Linked	
 Ā Data	
 Ā  36	
 Ā 
Vocabularies	
 Ā used	
 Ā in	
 Ā the	
 Ā data	
 Ā set	
 Ā 
•  The	
 Ā void:vocabulary	
 Ā property	
  iden3fies	
 Ā the	
 Ā vocabulary	
 Ā or	
 Ā 
ontology	
 Ā that	
 Ā is	
 Ā used	
 Ā in	
 Ā a	
 Ā data	
 Ā set	
 Ā 
•  Typically,	
 Ā only	
 Ā the	
 Ā most	
 Ā relevant	
 Ā vocabularies	
 Ā are	
 Ā listed	
 Ā 
•  This	
 Ā property	
 Ā can	
 Ā only	
 Ā be	
 Ā used	
 Ā for	
 Ā en3re	
 Ā vocabularies.	
 Ā It	
 Ā 
cannot	
 Ā be	
 Ā used	
 Ā to	
 Ā express	
 Ā that	
 Ā a	
 Ā subset	
 Ā of	
 Ā the	
 Ā vocabulary	
 Ā 
occurs	
 Ā in	
 Ā the	
 Ā data	
 Ā set.	
 Ā 	
 Ā 
:MusicBrainz	
 Ā a	
 Ā void:Dataset;	
 Ā 
	
 Ā 	
 Ā 	
 Ā void:vocabulary	
 Ā <http://purl.org/ontology/mo/>	
 Ā .	
 Ā 
VoID:	
 Ā Structural	
 Ā Metadata	
 Ā (4)	
 Ā 
EUCLID	
 Ā -­‐	
 Ā Providing	
 Ā Linked	
 Ā Data	
 Ā  37	
 Ā 
Source:	
 Ā 	
 Ā hjp://www.w3.org/TR/void/#metadata	
 Ā 	
 Ā 
Statistics	
 Ā about	
 Ā a	
 Ā data	
 Ā set	
 Ā 
Express	
 Ā numeric	
 Ā sta3s3cs	
 Ā about	
 Ā a	
 Ā data	
 Ā set:	
 Ā 
	
 Ā 
	
 Ā 
	
 Ā 
Predicate	
 Ā  Range	
 Ā  Descrip9on	
 Ā 
void:triples	
 Ā  Number	
 Ā  Total	
 Ā number	
 Ā of	
 Ā triples	
 Ā contained	
 Ā in	
 Ā the	
 Ā data	
 Ā set.	
 Ā 	
 Ā 
void:entities	
 Ā  Number	
 Ā 
Total	
 Ā number	
 Ā of	
 Ā en33es	
 Ā that	
 Ā are	
 Ā described	
 Ā in	
 Ā the	
 Ā data	
 Ā set.	
 Ā 
An	
 Ā en3ty	
 Ā must	
 Ā have	
 Ā a	
 Ā URI,	
 Ā and	
 Ā match	
 Ā the	
 Ā void:uriRegexPajern	
 Ā 	
 Ā 
void:classes	
 Ā  Number	
 Ā  Total	
 Ā number	
 Ā of	
 Ā dis3nct	
 Ā classes	
 Ā in	
 Ā the	
 Ā data	
 Ā set.	
 Ā 
void:properties	
 Ā  Number	
 Ā  Total	
 Ā number	
 Ā of	
 Ā dis3nct	
 Ā proper3es	
 Ā in	
 Ā the	
 Ā data	
 Ā set.	
 Ā 
void:distinctSubjects	
 Ā  Number	
 Ā  Total	
 Ā number	
 Ā of	
 Ā dis3nct	
 Ā subjects	
 Ā in	
 Ā the	
 Ā data	
 Ā set.	
 Ā 
void:distinctObjects	
 Ā  Number	
 Ā  Total	
 Ā number	
 Ā of	
 Ā dis3nct	
 Ā objects	
 Ā in	
 Ā the	
 Ā data	
 Ā set.	
 Ā 
void:documents	
 Ā  Number	
 Ā  Total	
 Ā number	
 Ā of	
 Ā documents,	
 Ā in	
 Ā case	
 Ā that	
 Ā the	
 Ā data	
 Ā set	
 Ā is	
 Ā 
published	
 Ā as	
 Ā a	
 Ā set	
 Ā of	
 Ā individual	
 Ā documents.	
 Ā 
VoID:	
 Ā Structural	
 Ā Metadata	
 Ā (5)	
 Ā 	
 Ā 
EUCLID	
 Ā -­‐	
 Ā Providing	
 Ā Linked	
 Ā Data	
 Ā  38	
 Ā 
Partitioned	
 Ā data	
 Ā sets	
 Ā 
•  The	
 Ā void:subset	
 Ā property	
 Ā provides	
 Ā descrip3on	
 Ā of	
 Ā parts	
 Ā of	
 Ā a	
 Ā 
data	
 Ā set	
 Ā 
	
 Ā 
•  Data	
 Ā sets	
 Ā can	
 Ā be	
 Ā par33oned	
 Ā based	
 Ā on	
 Ā classes	
 Ā or	
 Ā proper9es:	
 Ā 
•  void:classPartition	
 Ā contains	
 Ā only	
 Ā instances	
 Ā of	
 Ā a	
 Ā par3cular	
 Ā class	
 Ā 
•  void:propertyPartition	
 Ā contains	
 Ā only	
 Ā triples	
 Ā with	
 Ā a	
 Ā par3cular	
 Ā predicate	
 Ā 
:MusicBrainz	
 Ā a	
 Ā void:Dataset;	
 Ā 
	
 Ā 	
 Ā void:subset	
 Ā :MusicBrainzArtists	
 Ā .	
 Ā 
:MusicBrainz	
 Ā a	
 Ā void:Dataset;	
 Ā 
	
 Ā 	
 Ā void:classPartition	
 Ā [	
 Ā void:class	
 Ā mo:Release	
 Ā .]	
 Ā ;	
 Ā 
	
 Ā 	
 Ā void:propertyParition	
 Ā [	
 Ā void:property	
 Ā mo:member	
 Ā .]	
 Ā .	
 Ā 
VoID:	
 Ā Describing	
 Ā Linksets	
 Ā 	
 Ā 	
 Ā 
EUCLID	
 Ā -­‐	
 Ā Providing	
 Ā Linked	
 Ā Data	
 Ā  39	
 Ā 
•  Linkset:	
 Ā collec3on	
 Ā of	
 Ā RDF	
 Ā links	
 Ā between	
 Ā two	
 Ā 
RDF	
 Ā data	
 Ā sets	
 Ā 
:DS1	
 Ā  :DS2	
 Ā 
:LS1	
 Ā  :LS2	
 Ā 
Image	
 Ā based	
 Ā on	
 Ā hjp://seman3cweb.org/wiki/File:Void-­‐linkset-­‐conceptual.png	
 Ā 
owl:sameAs	
 Ā 
	
 Ā 	
 Ā 	
 Ā 	
 Ā 	
 Ā 	
 Ā 	
 Ā 	
 Ā 	
 Ā 	
 Ā 	
 Ā 	
 Ā 
@PREFIX	
 Ā void:<http://rdfs.org/ns/void#>	
 Ā 	
 Ā 
@PREFIX	
 Ā owl:<http://www.w3.org/2002/07/owl#>	
 Ā 
	
 Ā 
:DS1	
 Ā a	
 Ā void:Dataset	
 Ā .	
 Ā 
:DS2	
 Ā a	
 Ā void:Dataset	
 Ā .	
 Ā 
:DS1	
 Ā void:subset	
 Ā :LS1	
 Ā .	
 Ā 
:LS1	
 Ā a	
 Ā void:Linkset;	
 Ā 
	
 Ā 	
 Ā 	
 Ā 	
 Ā 	
 Ā void:linkPredicate	
 Ā 	
 Ā 
	
 Ā 	
 Ā 	
 Ā 	
 Ā 	
 Ā 	
 Ā 	
 Ā owl:sameAs;	
 Ā 	
 Ā 
	
 Ā 	
 Ā 	
 Ā 	
 Ā 	
 Ā void:target	
 Ā :DS1,	
 Ā :DS2	
 Ā .	
 Ā 
VoID:	
 Ā Describing	
 Ā Linksets	
 Ā (2)	
 Ā 	
 Ā 	
 Ā 
EUCLID	
 Ā -­‐	
 Ā Providing	
 Ā Linked	
 Ā Data	
 Ā  40	
 Ā 
Example	
 Ā 
	
 Ā 	
 Ā 	
 Ā 	
 Ā 	
 Ā 	
 Ā 	
 Ā 	
 Ā 	
 Ā 	
 Ā 	
 Ā 	
 Ā 
@PREFIX	
 Ā void:<http://rdfs.org/ns/void#>	
 Ā 	
 Ā 
@PREFIX	
 Ā skos:<http://www.w3.org/2002/07/owl#>	
 Ā 
	
 Ā 
:MusicBrainz	
 Ā a	
 Ā void:Dataset	
 Ā .	
 Ā 
:DBpedia	
 Ā a	
 Ā void:Dataset	
 Ā .	
 Ā 
	
 Ā 
:MusicBrainz	
 Ā void:classPartition	
 Ā :MBArtists	
 Ā .	
 Ā 
:MBArtists	
 Ā void:class	
 Ā mo:MusicArtist	
 Ā .	
 Ā 
	
 Ā 
:MBArtists	
 Ā a	
 Ā void:Linkset;	
 Ā 
	
 Ā 	
 Ā 	
 Ā 	
 Ā 	
 Ā void:linkPredicate	
 Ā 	
 Ā 
	
 Ā 	
 Ā 	
 Ā 	
 Ā 	
 Ā 	
 Ā 	
 Ā skos:exactMatch;	
 Ā 	
 Ā 
	
 Ā 	
 Ā 	
 Ā 	
 Ā 	
 Ā void:target	
 Ā :MusicBrainz,	
 Ā :DBpedia	
 Ā .	
 Ā 
The	
 Ā access	
 Ā metadata	
 Ā describes	
 Ā the	
 Ā methods	
 Ā of	
 Ā 
accessing	
 Ā the	
 Ā actual	
 Ā RDF	
 Ā data	
 Ā set	
 Ā 
	
 Ā 
	
 Ā 
	
 Ā 
	
 Ā 
	
 Ā 
	
 Ā 
	
 Ā 
	
 Ā 
	
 Ā 
*	
 Ā This	
 Ā assumes	
 Ā that	
 Ā the	
 Ā default	
 Ā graph	
 Ā of	
 Ā the	
 Ā SPARQL	
 Ā endpoint	
 Ā contains	
 Ā the	
 Ā data	
 Ā set.	
 Ā 
VoID	
 Ā cannot	
 Ā express	
 Ā that	
 Ā a	
 Ā data	
 Ā set	
 Ā is	
 Ā contained	
 Ā a	
  specific	
 Ā named	
 Ā graph.	
 Ā This	
 Ā can	
 Ā be	
 Ā 
specified	
 Ā with	
 Ā SPARQL	
 Ā 1.1.	
 Ā Service	
 Ā Descrip3on	
 Ā 	
 Ā 	
 Ā 
VoID:	
 Ā Access	
 Ā Metadata	
 Ā 	
 Ā 
EUCLID	
 Ā -­‐	
 Ā Providing	
 Ā Linked	
 Ā Data	
 Ā  41	
 Ā 
Method	
 Ā  Predicate	
 Ā 
	
 Ā 
Descrip9on	
 Ā 
URI	
 Ā look	
 Ā up	
 Ā endpoint	
 Ā  void:uriLookupEndpoint	
 Ā 
Specifies	
 Ā the	
 Ā URI	
 Ā of	
 Ā a	
 Ā service	
 Ā for	
 Ā accessing	
 Ā the	
 Ā data	
 Ā 
set	
 Ā (different	
 Ā from	
 Ā the	
 Ā SPARQL	
 Ā protocol)	
 Ā 
Root	
 Ā resource	
 Ā  void:rootResource	
 Ā 
URI	
 Ā of	
 Ā the	
 Ā top	
 Ā concepts	
 Ā (only	
 Ā for	
 Ā data	
 Ā sets	
 Ā 
structured	
 Ā as	
 Ā trees)	
 Ā 
SPARQL	
 Ā endpoint	
 Ā  void:sparqlEndpoint	
 Ā  Provides	
 Ā access	
 Ā to	
 Ā the	
 Ā data	
 Ā set	
 Ā via	
 Ā the	
 Ā SPARQL	
 Ā 
protocol.*	
 Ā 	
 Ā 
RDF	
 Ā data	
 Ā dumps	
 Ā  void:dataDump	
   Specifies	
 Ā the	
 Ā loca3on	
 Ā of	
 Ā the	
 Ā dump	
  file.	
 Ā If	
 Ā the	
 Ā data	
 Ā 
set	
 Ā is	
 Ā split	
 Ā into	
 Ā mul3ple	
  files,	
 Ā then	
 Ā several	
 Ā values	
 Ā of	
 Ā 
this	
 Ā property	
 Ā are	
 Ā provided.	
 Ā 	
 Ā 
CH	
 Ā 5	
 Ā 
Providing	
 Ā Access	
 Ā to	
 Ā the	
 Ā 	
 Ā 	
 Ā 	
 Ā 
Data	
 Ā Set	
 Ā 
	
 Ā 
The	
 Ā data	
 Ā set	
 Ā can	
 Ā be	
 Ā accessed	
 Ā via	
 Ā different	
 Ā 
mechanisms:	
 Ā 	
 Ā 	
 Ā 	
 Ā 
EUCLID	
 Ā -­‐	
 Ā Providing	
 Ā Linked	
 Ā Data	
 Ā  42	
 Ā 
RDFa	
 Ā 
RDF	
 Ā 
dump	
 Ā 
SPARQL	
 Ā 
endpoint	
 Ā 
Dereferencing	
 Ā 
HTTP	
 Ā URIs	
 Ā 
Dereferencing	
 Ā HTTP	
 Ā URIs	
 Ā 
•  Allows	
 Ā for	
 Ā easily	
 Ā exploring	
 Ā certain	
 Ā resources	
 Ā 
contained	
 Ā in	
 Ā the	
 Ā data	
 Ā set	
 Ā 
•  	
 Ā What	
 Ā to	
 Ā return	
 Ā for	
 Ā a	
 Ā URI?	
 Ā 
•  Immediate	
 Ā descrip9on:	
 Ā triples	
 Ā where	
 Ā the	
 Ā URI	
 Ā is	
 Ā the	
 Ā subject.	
 Ā 
•  Backlinks:	
 Ā triples	
 Ā where	
 Ā the	
 Ā URI	
 Ā is	
 Ā the	
 Ā object.	
 Ā 
•  Related	
 Ā descrip9ons:	
 Ā informa3on	
 Ā of	
 Ā interest	
 Ā in	
 Ā typical	
 Ā usage	
 Ā scenarios.	
 Ā 
•  Metadata:	
 Ā informa3on	
 Ā as	
 Ā author	
 Ā and	
 Ā licensing	
 Ā informa3on.	
 Ā 
•  Syntax:	
 Ā RDF	
 Ā descrip3ons	
 Ā as	
 Ā RDF/XML	
 Ā and	
 Ā human-­‐readable	
 Ā formats.	
 Ā 
•  Applica3ons	
 Ā (e.g.	
 Ā LD	
 Ā browsers)	
 Ā render	
 Ā the	
 Ā retrieved	
 Ā 
informa3on	
 Ā so	
 Ā it	
 Ā can	
 Ā be	
 Ā perceived	
 Ā by	
 Ā a	
 Ā user.	
 Ā 
EUCLID	
 Ā -­‐	
 Ā Providing	
 Ā Linked	
 Ā Data	
 Ā  43	
 Ā 
Source:	
 Ā 	
 Ā How	
 Ā to	
 Ā Publish	
 Ā Linked	
 Ā Data	
 Ā on	
 Ā The	
 Ā Web	
 Ā -­‐	
 Ā Chris	
 Ā Bizer,	
 Ā Richard	
 Ā Cyganiak,	
 Ā Tom	
 Ā Heath.	
 Ā 
	
 Ā 
CH	
 Ā 1	
 Ā 
Dereferencing	
 Ā HTTP	
 Ā URIs	
 Ā (2)	
 Ā 
Example:	
 Ā Dereferencing	
 Ā 
EUCLID	
 Ā -­‐	
 Ā Providing	
 Ā Linked	
 Ā Data	
 Ā  44	
 Ā 
RDFa	
 Ā 
•  RDFa	
 Ā =	
 Ā ā€œRDF	
 Ā in	
 Ā ajributesā€	
 Ā 
•  Extension	
 Ā to	
 Ā HTML5	
 Ā for	
 Ā embedding	
 Ā RDF	
 Ā within	
 Ā 
HTML	
 Ā pages:	
 Ā 
–  The	
 Ā HTML	
 Ā is	
 Ā processed	
 Ā by	
 Ā the	
 Ā browser,	
 Ā the	
 Ā (human)	
 Ā 
consumer	
 Ā don’t	
 Ā see	
 Ā the	
 Ā RDF	
 Ā data	
 Ā 	
 Ā 
–  The	
 Ā RDF	
 Ā triples	
 Ā within	
 Ā the	
 Ā page	
 Ā are	
 Ā consumed	
 Ā by	
 Ā APIs	
 Ā to	
 Ā 
extract	
 Ā the	
 Ā (semi-­‐)structured	
 Ā data	
 Ā 
	
 Ā 
•  It	
 Ā is	
 Ā considered	
 Ā as	
 Ā the	
 Ā bridge	
 Ā between	
 Ā the	
 Ā Web	
 Ā of	
 Ā 
Data	
 Ā and	
 Ā the	
 Ā Web	
 Ā of	
 Ā Documents	
 Ā 
•  It	
 Ā is	
 Ā a	
 Ā complete	
 Ā serializa9on	
 Ā of	
 Ā RDF	
 Ā 
EUCLID	
 Ā -­‐	
 Ā Providing	
 Ā Linked	
 Ā Data	
 Ā 
	
 Ā 
45	
 Ā 
RDFa:	
 Ā Attributes	
 Ā 
A]ribute	
 Ā role	
 Ā  A]ribute	
 Ā  Descrip9on	
 Ā 
Syntax	
 Ā 
prefix	
 Ā  List	
 Ā of	
 Ā prefix-­‐name	
 Ā IRIs	
 Ā pairs	
 Ā 
vocab	
 Ā  IRI	
 Ā that	
  specifies	
 Ā the	
 Ā vocabulary	
 Ā where	
 Ā the	
 Ā concept	
 Ā is	
  defined	
 Ā 
Subject	
 Ā  about	
   Specifies	
 Ā the	
 Ā subject	
 Ā of	
 Ā the	
 Ā rela3onship	
 Ā 
Predicate	
 Ā 
property	
 Ā  Express	
 Ā the	
 Ā rela3onship	
 Ā between	
 Ā the	
 Ā subject	
 Ā and	
 Ā the	
 Ā value	
 Ā 
rel	
   Defines	
 Ā a	
 Ā rela3on	
 Ā between	
 Ā the	
 Ā subject	
 Ā and	
 Ā a	
 Ā URL	
 Ā 	
 Ā 
rev	
 Ā  Express	
 Ā reverse	
 Ā rela3onships	
 Ā between	
 Ā two	
 Ā resources	
 Ā 
Resource	
 Ā 
href	
   Specifies	
 Ā an	
 Ā object	
 Ā URI	
 Ā for	
 Ā the	
 Ā rel	
 Ā and	
 Ā rev	
 Ā ajributes	
 Ā 
resource	
 Ā  Same	
 Ā as	
 Ā href	
 Ā (used	
 Ā when	
 Ā href	
 Ā is	
 Ā not	
 Ā present)	
 Ā 
src	
   Specifies	
 Ā the	
 Ā subject	
 Ā of	
 Ā a	
 Ā rela3onship	
 Ā 
Literal	
 Ā 
datatype	
 Ā  Express	
 Ā the	
 Ā datatype	
 Ā of	
 Ā the	
 Ā object	
 Ā of	
 Ā the	
 Ā property	
 Ā ajribute	
 Ā 
content	
 Ā  Supply	
 Ā machine-­‐readable	
 Ā content	
 Ā for	
 Ā a	
 Ā literal	
 Ā 
xml:lang,	
 Ā lang	
   Specifies	
 Ā the	
 Ā language	
 Ā of	
 Ā the	
 Ā literal	
 Ā 
Macro	
 Ā  typeof	
 Ā  Indicate	
 Ā the	
 Ā RDF	
 Ā type(s)	
 Ā to	
 Ā associate	
 Ā with	
 Ā a	
 Ā subject	
 Ā 
inlist	
 Ā  An	
 Ā object	
 Ā is	
 Ā added	
 Ā to	
 Ā the	
 Ā list	
 Ā of	
 Ā a	
 Ā predicate.	
 Ā 	
 Ā 
EUCLID	
 Ā -­‐	
 Ā Providing	
 Ā Linked	
 Ā Data	
 Ā 
	
 Ā 
46	
 Ā 
RDFa:	
 Ā Example	
 Ā 	
 Ā 
Extracting	
 Ā RDF	
 Ā from	
 Ā HTML	
 Ā 
EUCLID	
 Ā -­‐	
 Ā Providing	
 Ā Linked	
 Ā Data	
 Ā 
	
 Ā 
47	
 Ā 
<div	
 Ā class="ar3stheader"	
 Ā 	
 Ā 
	
 Ā 	
 Ā about="hjp://musicbrainz.org/ar3st/b10bbbfc-­‐cf9e-­‐42e0-­‐be17-­‐e2c3e1d2600d#_"	
 Ā 	
 Ā 	
 Ā 
	
 Ā 	
 Ā typeof="hjp://purl.org./ontology/mo/MusicGroup">	
 Ā 
	
 Ā 	
 Ā 	
  …	
 Ā 
</div>	
 Ā 
<hjp://musicbrainz.org/ar3st/b10bbbfc-­‐cf9e-­‐42e0-­‐be17-­‐e2c3e1d2600d#_>	
 Ā 
	
 Ā 
	
 Ā 
	
 Ā 
	
 Ā 
HTML	
 Ā (+RDFa):	
 Ā 
RDF:	
 Ā 
RDFa:	
 Ā Example	
 Ā 	
 Ā 
Extracting	
 Ā RDF	
 Ā from	
 Ā HTML	
 Ā 
EUCLID	
 Ā -­‐	
 Ā Providing	
 Ā Linked	
 Ā Data	
 Ā 
	
 Ā 
48	
 Ā 
<div	
 Ā class="ar3stheader"	
 Ā 	
 Ā 
	
 Ā 	
 Ā about="hjp://musicbrainz.org/ar3st/b10bbbfc-­‐cf9e-­‐42e0-­‐be17-­‐e2c3e1d2600d#_"	
 Ā 	
 Ā 	
 Ā 
	
 Ā 	
 Ā typeof="hjp://purl.org./ontology/mo/MusicGroup">	
 Ā 
	
 Ā 	
 Ā 	
  …	
 Ā 
</div>	
 Ā 
<hjp://musicbrainz.org/ar3st/b10bbbfc-­‐cf9e-­‐42e0-­‐be17-­‐e2c3e1d2600d#_>	
 Ā 
	
 Ā 	
 Ā <hjp://www.w3.org/1999/02/22-­‐rdf-­‐syntax-­‐ns#type>	
 Ā 
	
 Ā 
	
 Ā 
	
 Ā 
HTML	
 Ā (+RDFa):	
 Ā 
RDF:	
 Ā 
RDFa:	
 Ā Example	
 Ā 	
 Ā 
Extracting	
 Ā RDF	
 Ā from	
 Ā HTML	
 Ā 
EUCLID	
 Ā -­‐	
 Ā Providing	
 Ā Linked	
 Ā Data	
 Ā 
	
 Ā 
49	
 Ā 
<div	
 Ā class="ar3stheader"	
 Ā 	
 Ā 
	
 Ā 	
 Ā about="hjp://musicbrainz.org/ar3st/b10bbbfc-­‐cf9e-­‐42e0-­‐be17-­‐e2c3e1d2600d#_"	
 Ā 	
 Ā 	
 Ā 
	
 Ā 	
 Ā typeof="hjp://purl.org./ontology/mo/MusicGroup">	
 Ā 
	
 Ā 	
 Ā 	
  …	
 Ā 
</div>	
 Ā 
<hjp://musicbrainz.org/ar3st/b10bbbfc-­‐cf9e-­‐42e0-­‐be17-­‐e2c3e1d2600d#_>	
 Ā 
	
 Ā 	
 Ā <hjp://www.w3.org/1999/02/22-­‐rdf-­‐syntax-­‐ns#type>	
 Ā 
	
 Ā 	
 Ā 	
 Ā 	
 Ā <hjp://purl.org./ontology/mo/MusicGroup>.	
 Ā 
	
 Ā 
	
 Ā 
HTML	
 Ā (+RDFa):	
 Ā 
RDF:	
 Ā 
RDFa:	
 Ā Example	
 Ā (2)	
 Ā 
Extracting	
 Ā RDF	
 Ā from	
 Ā MusicBrainz.org	
 Ā 
EUCLID	
 Ā -­‐	
 Ā Providing	
 Ā Linked	
 Ā Data	
 Ā 
	
 Ā 
50	
 Ā 
hjp://musicbrainz.org/ar3st/b10bbbfc-­‐cf9e-­‐42e0-­‐be17-­‐e2c3e1d2600d	
 Ā 
RDFa:	
 Ā Example	
 Ā (2)	
 Ā 
Extracting	
 Ā RDF	
 Ā from	
 Ā MusicBrainz.org	
 Ā 
EUCLID	
 Ā -­‐	
 Ā Providing	
 Ā Linked	
 Ā Data	
 Ā 
	
 Ā 
51	
 Ā 
Source:	
 Ā hjp://www.w3.org/2007/08/pyRdfa/	
 Ā 
RDFa:	
 Ā Example	
 Ā (2)	
 Ā 
Extracting	
 Ā RDF	
 Ā from	
 Ā MusicBrainz.org	
 Ā 
EUCLID	
 Ā -­‐	
 Ā Providing	
 Ā Linked	
 Ā Data	
 Ā 
	
 Ā 
52	
 Ā 
hjp://www.w3.org/2007/08/pyRdfa/extract?uri=hjp%3A%2F%2Fmusicbrainz.org
%2Far3st%2Fb10bbbfc-­‐cf9e-­‐42e0-­‐be17-­‐e2c3e1d2600d&format=nt	
 Ā 
Watch	
 Ā the	
 Ā EUCLID	
 Ā screencast:	
 Ā http://vimeo.com/euclidproject	
 Ā 
RDF	
 Ā Dump	
 Ā 
•  An	
 Ā RDF	
 Ā dump	
 Ā refers	
 Ā to	
 Ā a	
  file	
 Ā which	
 Ā contains	
 Ā (part	
 Ā of)	
 Ā 
a	
 Ā data	
 Ā set	
  specified	
 Ā in	
 Ā an	
 Ā RDF	
 Ā format	
 Ā (RDF/XML,	
 Ā N-­‐
Triples,	
 Ā N-­‐Quads)	
 Ā 
•  The	
 Ā data	
 Ā set	
 Ā can	
 Ā be	
 Ā split	
 Ā into	
 Ā several	
 Ā RDF	
 Ā dumps	
 Ā 
	
 Ā 
•  A	
 Ā list	
 Ā of	
 Ā available	
 Ā data	
 Ā sets	
 Ā available	
 Ā as	
 Ā RDF	
 Ā dumps	
 Ā 
can	
 Ā be	
 Ā found	
 Ā at:	
 Ā 
–  hjp://www.w3.org/wiki/DataSetRDFDumps	
 Ā 	
 Ā 
EUCLID	
 Ā -­‐	
 Ā Providing	
 Ā Linked	
 Ā Data	
 Ā  53	
 Ā 
SPARQL	
 Ā Endpoint	
 Ā 
•  The	
 Ā SPARQL	
 Ā endpoint	
 Ā refers	
 Ā to	
 Ā the	
 Ā URI	
 Ā of	
 Ā the	
 Ā 
listener	
 Ā of	
 Ā the	
 Ā SPARQL	
 Ā protocol	
 Ā service,	
 Ā which	
 Ā 
handles	
 Ā requests	
 Ā for	
 Ā SPARQL	
 Ā protocol	
 Ā opera3ons	
 Ā 
	
 Ā 
•  The	
 Ā user	
 Ā submits	
 Ā SPARQL	
 Ā queries	
 Ā to	
 Ā the	
 Ā SPARQL	
 Ā 
endpoint	
 Ā in	
 Ā order	
 Ā to	
 Ā retrieve	
 Ā only	
 Ā a	
 Ā desired	
 Ā subset	
 Ā of	
 Ā 
the	
 Ā RDF	
 Ā data	
 Ā set	
 Ā 
	
 Ā 
•  List	
 Ā of	
 Ā available	
 Ā SPARQL	
 Ā endpoints:	
 Ā 
•  hjp://www.w3.org/wiki/SparqlEndpoints	
 Ā 
•  hjp://labs.mondeca.com/sparqlEndpointsStatus/	
 Ā 	
 Ā 
EUCLID	
 Ā -­‐	
 Ā Providing	
 Ā Linked	
 Ā Data	
 Ā  54	
 Ā 
CH	
 Ā 2	
 Ā 
Using	
 Ā Linked	
 Ā Data	
 Ā Catalogs	
 Ā 
•  Data	
 Ā catalogs,	
 Ā markets	
 Ā or	
 Ā repositories	
 Ā are	
 Ā pla{orms	
 Ā 
dedicated	
 Ā to	
 Ā provide	
 Ā access	
 Ā to	
 Ā a	
 Ā wide	
 Ā range	
 Ā of	
 Ā data	
 Ā 
sets	
 Ā from	
 Ā different	
 Ā domains	
 Ā 
	
 Ā 
•  Allow	
 Ā data	
 Ā consumers	
 Ā to	
 Ā easily	
  find	
 Ā and	
 Ā use	
 Ā the	
 Ā data	
 Ā 
•  Usually	
 Ā the	
 Ā catalogs	
 Ā offer	
 Ā relevant	
 Ā metadata	
 Ā about	
 Ā 
the	
 Ā crea3on	
 Ā of	
 Ā the	
 Ā data	
 Ā set	
 Ā 
EUCLID	
 Ā -­‐	
 Ā Providing	
 Ā Linked	
 Ā Data	
 Ā  55	
 Ā 
Using	
 Ā Linked	
 Ā Data	
 Ā Catalogs	
 Ā (2)	
 Ā 
How	
 Ā to	
 Ā publish	
 Ā an	
 Ā RDF	
 Ā data	
 Ā set	
 Ā into	
 Ā a	
 Ā catalog?	
 Ā 
EUCLID	
 Ā -­‐	
 Ā Providing	
 Ā Linked	
 Ā Data	
 Ā  56	
 Ā 
Create	
 Ā your	
 Ā own	
 Ā data	
 Ā 
catalog	
 Ā 
Recommended	
 Ā for	
 Ā big	
 Ā 
organiza3ons/ins3tu3ons	
 Ā 
aiming	
 Ā at	
 Ā providing	
 Ā a	
 Ā large	
 Ā 
number	
 Ā of	
 Ā data	
 Ā sets	
 Ā 
Use	
 Ā a	
 Ā data	
 Ā management	
 Ā 
system,	
 Ā for	
 Ā example:	
 Ā 
Upload	
 Ā your	
 Ā data	
 Ā set	
 Ā 
into	
 Ā an	
 Ā exis3ng	
 Ā catalog	
 Ā 
Allows	
 Ā data	
 Ā consumers	
 Ā to	
 Ā 
easily	
  find	
 Ā new	
 Ā data	
 Ā sets	
 Ā 
Common	
 Ā LD	
 Ā catalogs	
 Ā are:	
 Ā 
	
 Ā -­‐	
 Ā 
	
 Ā -­‐	
 Ā The	
 Ā Linking	
 Ā Open	
 Ā Data	
 Ā Cloud	
 Ā 
Validating	
 Ā Data	
 Ā Sets	
 Ā 
There	
 Ā are	
 Ā different	
 Ā ways	
 Ā to	
 Ā validate	
 Ā the	
 Ā published	
 Ā RDF	
 Ā 
data	
 Ā set:	
 Ā 
EUCLID	
 Ā -­‐	
 Ā Providing	
 Ā Linked	
 Ā Data	
 Ā  57	
 Ā 
General	
 Ā 
validators	
 Ā 
Parsing	
 Ā &	
 Ā 
Syntax	
 Ā 
•  Vapour	
 Ā -­‐	
 Ā Performs	
 Ā two	
 Ā types	
 Ā of	
 Ā tests:	
 Ā without	
 Ā content	
 Ā 
nego3a3on	
 Ā and	
 Ā reques3ng	
 Ā RDF/XML	
 Ā content	
 Ā 
	
 Ā 	
 Ā 	
 Ā 	
 Ā 	
 Ā 	
 Ā 	
 Ā 	
 Ā hjp://validator.linkeddata.org/vapour	
 Ā 
•  URI	
 Ā Debugger	
 Ā -­‐	
 Ā Retreieves	
 Ā the	
 Ā HTTP	
 Ā responses	
 Ā of	
 Ā accessing	
 Ā a	
 Ā URI	
 Ā 	
 Ā 	
 Ā 	
 Ā 	
 Ā 	
 Ā 	
 Ā 	
 Ā 	
 Ā 	
 Ā 	
 Ā 	
 Ā 	
 Ā 	
 Ā 	
 Ā 	
 Ā 	
 Ā 	
 Ā 	
 Ā 	
 Ā 
	
 Ā 	
 Ā 	
 Ā 	
 Ā 	
 Ā 	
 Ā 	
 Ā 	
 Ā hjp://linkeddata.informa3k.hu-­‐berlin.de/uridbg/	
 Ā 
	
 Ā 
•  RDF	
 Ā Triple-­‐Checker	
  –	
 Ā Dereferences	
 Ā namespaces	
 Ā associated	
 Ā with	
 Ā 
the	
 Ā resources	
 Ā used	
 Ā in	
 Ā the	
 Ā document	
 Ā 	
 Ā 	
 Ā 	
 Ā 	
 Ā 	
 Ā 	
 Ā 	
 Ā 	
 Ā 	
 Ā 	
 Ā 	
 Ā 	
 Ā 	
 Ā 	
 Ā 	
 Ā 	
 Ā 	
 Ā 	
 Ā 	
 Ā 	
 Ā 	
 Ā 	
 Ā 	
 Ā 	
 Ā 	
 Ā 	
 Ā 	
 Ā 	
 Ā 	
 Ā 	
 Ā 	
 Ā 	
 Ā 	
 Ā 	
 Ā 	
 Ā 	
 Ā 	
 Ā 	
 Ā 	
 Ā 	
 Ā 	
 Ā 	
 Ā 	
 Ā 	
 Ā 	
 Ā 	
 Ā 	
 Ā 	
 Ā 	
 Ā 	
 Ā 	
 Ā 	
 Ā 	
 Ā 	
 Ā 	
 Ā 	
 Ā 	
 Ā 	
 Ā 	
 Ā 	
 Ā 	
 Ā 	
 Ā 	
 Ā 	
 Ā 	
 Ā 	
 Ā 	
 Ā 	
 Ā 	
 Ā 	
 Ā 	
 Ā 
	
 Ā 	
 Ā 	
 Ā 	
 Ā 	
 Ā 	
 Ā 	
 Ā 	
 Ā hjp://graphite.ecs.soton.ac.uk/checker/	
 Ā 
•  W3C	
 Ā RDF/XML	
 Ā Valida9on	
 Ā Service	
  –	
 Ā Evaluates	
 Ā the	
 Ā syntax	
 Ā of	
 Ā RDF/
XML	
 Ā documents	
 Ā and	
 Ā displays	
 Ā the	
 Ā RDF	
 Ā triples	
 Ā in	
 Ā it	
 Ā 	
 Ā 	
 Ā 	
 Ā 	
 Ā 	
 Ā 	
 Ā 	
 Ā 	
 Ā 	
 Ā 	
 Ā 	
 Ā 	
 Ā 	
 Ā 	
 Ā 	
 Ā 	
 Ā 	
 Ā 	
 Ā 	
 Ā 	
 Ā 	
 Ā 	
 Ā 	
 Ā 	
 Ā 	
 Ā 	
 Ā 	
 Ā 	
 Ā 	
 Ā 	
 Ā 	
 Ā 	
 Ā 	
 Ā 	
 Ā 
hjp://validator.linkeddata.org/vapour	
 Ā 	
 Ā 	
 Ā 
•  W3C	
 Ā Markup	
 Ā Valida9on	
 Ā Service	
  –	
 Ā Checks	
 Ā syntac3c	
 Ā correctness	
 Ā 
for	
 Ā web	
 Ā documents	
 Ā with	
 Ā RDFa	
 Ā markup	
 Ā 	
 Ā 	
 Ā 	
 Ā 	
 Ā 	
 Ā 	
 Ā 	
 Ā 	
 Ā 	
 Ā 	
 Ā 	
 Ā 	
 Ā 	
 Ā 	
 Ā 	
 Ā 	
 Ā 	
 Ā 	
 Ā 	
 Ā 	
 Ā 	
 Ā 	
 Ā 	
 Ā 	
 Ā 	
 Ā 	
 Ā 	
 Ā 	
 Ā 	
 Ā 	
 Ā 	
 Ā 	
 Ā 	
 Ā 	
 Ā 	
 Ā 	
 Ā 	
 Ā 	
 Ā 	
 Ā 	
 Ā 	
 Ā 	
 Ā 	
 Ā 	
 Ā 	
 Ā 	
 Ā 	
 Ā 
hjp://validator.w3.org/	
 Ā 
•  RDF:ALERTS	
  –	
 Ā Validates	
 Ā syntax,	
  undefined	
 Ā resources,	
 Ā datatype	
 Ā 
and	
 Ā other	
 Ā types	
 Ā of	
 Ā errors	
 Ā 	
 Ā 	
 Ā 	
 Ā 	
 Ā 	
 Ā 	
 Ā 	
 Ā 	
 Ā 	
 Ā 	
 Ā 	
 Ā 	
 Ā 	
 Ā 	
 Ā 	
 Ā 	
 Ā 	
 Ā 	
 Ā 	
 Ā 	
 Ā 	
 Ā 	
 Ā 	
 Ā 	
 Ā 	
 Ā 	
 Ā 	
 Ā 	
 Ā 	
 Ā 	
 Ā 	
 Ā 	
 Ā 	
 Ā 	
 Ā 	
 Ā 	
 Ā 	
 Ā 	
 Ā 	
 Ā 	
 Ā 	
 Ā 	
 Ā 	
 Ā 	
 Ā 	
 Ā 	
 Ā 	
 Ā 	
 Ā 	
 Ā 	
 Ā 	
 Ā 	
 Ā 	
 Ā 	
 Ā 	
 Ā 	
 Ā 	
 Ā 	
 Ā 	
 Ā 	
 Ā 	
 Ā 	
 Ā 	
 Ā 	
 Ā 	
 Ā 	
 Ā 	
 Ā 	
 Ā 	
 Ā 	
 Ā 	
 Ā 	
 Ā 	
 Ā 	
 Ā 	
 Ā 	
 Ā 	
 Ā 	
 Ā 	
 Ā 	
 Ā 	
 Ā 	
 Ā 	
 Ā 	
 Ā 	
 Ā 	
 Ā 	
 Ā 	
 Ā 
hjp://swse.deri.org/RDFAlerts/	
 Ā 
Accessibility	
 Ā 
Validating	
 Ā Data	
 Ā Sets	
 Ā (2)	
 Ā 
Example:	
 Ā Validating	
 Ā URIs	
 Ā with	
 Ā Vapour	
 Ā 
EUCLID	
 Ā -­‐	
 Ā Providing	
 Ā Linked	
 Ā Data	
 Ā  58	
 Ā 
Source:	
 Ā hjp://idi.fundacionc3c.org/vapour	
 Ā 	
 Ā 
Validating	
 Ā Data	
 Ā Sets	
 Ā (3)	
 Ā 
Example:	
 Ā Validating	
 Ā URIs	
 Ā with	
 Ā Vapour	
 Ā 
EUCLID	
 Ā -­‐	
 Ā Providing	
 Ā Linked	
 Ā Data	
 Ā  59	
 Ā 
Source:	
 Ā hjp://idi.fundacionc3c.org/vapour	
 Ā 	
 Ā 
Validating	
 Ā Data	
 Ā Sets	
 Ā (4)	
 Ā 
Example:	
 Ā Validating	
 Ā URIs	
 Ā with	
 Ā 
Vapour	
 Ā 
EUCLID	
 Ā -­‐	
 Ā Providing	
 Ā Linked	
 Ā Data	
 Ā  60	
 Ā 
Source:	
 Ā hjp://idi.fundacionc3c.org/vapour	
 Ā 	
 Ā 
Example:	
 Ā Validating	
 Ā URIs	
 Ā with	
 Ā Vapour	
 Ā 
hjp://dbpedia.org/page/The_Beatles	
 Ā 
hjp://dbpedia.org/data/The_Beatles.xml	
 Ā 
HTML	
 Ā content	
 Ā 
RDF	
 Ā document	
 Ā 
PROVIDING	
 Ā LINKED	
 Ā DATA:	
 Ā 
CHECKLIST	
 Ā 
EUCLID	
 Ā -­‐	
 Ā Providing	
 Ā Linked	
 Ā Data	
 Ā  61	
 Ā 
Providing	
 Ā Linked	
 Ā Data:	
 Ā 
Checklist	
 Ā (1)	
 Ā 
Creating	
 Ā Linked	
 Ā Data	
 Ā 
o All	
 Ā the	
 Ā relevant	
 Ā en33es/concepts	
 Ā were	
 Ā 
effec3vely	
 Ā extracted	
 Ā from	
 Ā the	
 Ā raw	
 Ā data	
 Ā ?	
 Ā 
o Are	
 Ā all	
 Ā the	
 Ā created	
 Ā URIs	
 Ā dereferenceable?	
 Ā 
o Are	
 Ā you	
 Ā reusing	
 Ā terms	
 Ā from	
 Ā widely	
 Ā accepted	
 Ā 	
 Ā 
vocabularies?	
 Ā 
EUCLID	
 Ā -­‐	
 Ā Providing	
 Ā Linked	
 Ā Data	
 Ā  62	
 Ā 
Providing	
 Ā Linked	
 Ā Data:	
 Ā 
Checklist	
 Ā (2)	
 Ā 
Interlinking	
 Ā Linked	
 Ā Data	
 Ā 
o Is	
 Ā the	
 Ā data	
 Ā set	
 Ā linked	
 Ā to	
 Ā other	
 Ā RDF	
 Ā data	
 Ā sets?	
 Ā 
o Are	
 Ā the	
 Ā created	
 Ā vocabulary	
 Ā terms	
 Ā linked	
 Ā to	
 Ā 
other	
 Ā vocabularies?	
 Ā 
EUCLID	
 Ā -­‐	
 Ā Providing	
 Ā Linked	
 Ā Data	
 Ā  63	
 Ā 
Providing	
 Ā Linked	
 Ā Data:	
 Ā 
Checklist	
 Ā (3)	
 Ā 
Publishing	
 Ā Linked	
 Ā Data	
 Ā 
o Do	
 Ā you	
 Ā provide	
 Ā data	
 Ā set	
 Ā metadata?	
 Ā 
o Do	
 Ā you	
 Ā provide	
 Ā informa3on	
 Ā about	
 Ā licensing?	
 Ā 
o Do	
 Ā you	
 Ā provide	
 Ā addi3onal	
 Ā access	
 Ā methods?	
 Ā 
o Is	
 Ā the	
 Ā data	
 Ā set	
 Ā available	
 Ā in	
 Ā LD	
 Ā catalogs?	
 Ā 
o Did	
 Ā the	
 Ā data	
 Ā set	
 Ā pass	
 Ā the	
 Ā valida3on	
 Ā tests?	
 Ā 
EUCLID	
 Ā -­‐	
 Ā Providing	
 Ā Linked	
 Ā Data	
 Ā  64	
 Ā 
Summary	
 Ā 
EUCLID	
 Ā -­‐	
 Ā Providing	
 Ā Linked	
 Ā Data	
 Ā  65	
 Ā 
•  The	
 Ā Linked	
 Ā Data	
 Ā lifecycle:	
 Ā 
•  3	
 Ā core	
 Ā tasks:	
 Ā crea3ng,	
 Ā interlinking	
 Ā and	
 Ā publishing	
 Ā 
•  Crea3on	
 Ā of	
 Ā Linked	
 Ā Data:	
 Ā 
•  Extrac3ng	
 Ā relevant	
 Ā data,	
 Ā using	
 Ā URIs	
 Ā to	
 Ā name	
 Ā en33es	
 Ā and	
 Ā selec3ng	
 Ā 
vocabularies	
 Ā and	
 Ā expressing	
 Ā the	
 Ā data	
 Ā using	
 Ā the	
 Ā RDF	
 Ā data	
 Ā model	
 Ā 
•  Interlinking	
 Ā Linked	
 Ā Data:	
 Ā 
•  Challenges	
 Ā of	
 Ā link	
 Ā discovery,	
 Ā using	
 Ā Silk	
 Ā to	
 Ā create	
 Ā links	
 Ā between	
 Ā two	
 Ā 
data	
 Ā sets	
 Ā and	
 Ā using	
 Ā SKOS	
 Ā links	
 Ā 	
 Ā 
•  Publishing	
 Ā Linked	
 Ā Data:	
 Ā 
•  Crea3on	
 Ā of	
 Ā data	
 Ā set	
 Ā metadata;	
 Ā publishing	
 Ā the	
 Ā data	
 Ā set	
 Ā via	
 Ā RDF	
 Ā 
dumps,	
 Ā SPARQL	
 Ā endpoints	
 Ā or	
 Ā RDFa;	
 Ā using	
 Ā RDFa	
 Ā and	
 Ā schema.org	
 Ā to	
 Ā 
enrich	
 Ā search	
 Ā results,	
 Ā and	
 Ā uploading	
 Ā the	
 Ā data	
 Ā set	
 Ā to	
 Ā a	
 Ā LD	
 Ā catalog	
 Ā 
In	
 Ā this	
 Ā chapter	
 Ā we	
 Ā studied:	
 Ā 
The	
 Ā Web	
 Ā &	
 Ā Linked	
 Ā Data	
 Ā 
•  Linked	
 Ā Data	
 Ā catalogs	
 Ā 
•  Applica9ons	
 Ā 
CKAN	
 Ā 
•  CKAN	
 Ā is	
 Ā an	
 Ā open	
 Ā source	
 Ā pla{orm	
 Ā for	
 Ā developing	
 Ā data	
 Ā 
set	
 Ā catalogs	
 Ā 
•  Implement	
 Ā useful	
 Ā tools	
 Ā for	
 Ā data	
 Ā publishers	
 Ā to	
 Ā 
support:	
 Ā 
•  Data	
 Ā harves3ng	
 Ā 
•  Crea3on	
 Ā of	
 Ā metadata	
 Ā 
•  Access	
 Ā mechanisms	
 Ā to	
 Ā the	
 Ā data	
 Ā set	
 Ā 
•  Upda3ng	
 Ā the	
 Ā data	
 Ā set	
 Ā 
•  Monitoring	
 Ā the	
 Ā access	
 Ā to	
 Ā the	
 Ā data	
 Ā set	
 Ā 
EUCLID	
 Ā -­‐	
 Ā Providing	
 Ā Linked	
 Ā Data	
 Ā  67	
 Ā 
CKAN	
 Ā (2)	
 Ā 
EUCLID	
 Ā -­‐	
 Ā Providing	
 Ā Linked	
 Ā Data	
 Ā  68	
 Ā 
Source:	
 Ā hjp://ckan.org	
 Ā 
CKAN	
 Ā (3)	
 Ā 
EUCLID	
 Ā -­‐	
 Ā Providing	
 Ā Linked	
 Ā Data	
 Ā  69	
 Ā 
Source:	
 Ā hjp://ckan.org	
 Ā 
•  The	
 Ā Data	
 Ā Hub	
 Ā is	
 Ā a	
 Ā community-­‐run	
 Ā data	
 Ā catalog	
 Ā which	
 Ā 
contains	
 Ā more	
 Ā than	
 Ā 5,000	
 Ā data	
 Ā sets1	
 Ā 
•  ā€œ(…)	
 Ā is	
 Ā an	
 Ā openly	
 Ā editable	
 Ā open	
 Ā data	
 Ā catalogue,	
 Ā in	
 Ā the	
 Ā 
style	
 Ā of	
 Ā Wikipediaā€.2	
 Ā 
	
 Ā 
•  It	
 Ā is	
 Ā implemented	
 Ā on	
 Ā top	
 Ā of	
 Ā the	
 Ā CKAN	
 Ā pla{orm	
 Ā 	
 Ā 
•  Allows	
 Ā the	
 Ā crea3on	
 Ā of	
 Ā groups:	
 Ā 
–  The	
 Ā Linking	
 Ā Open	
 Ā Data	
 Ā Cloud	
 Ā group	
 Ā exclusively	
 Ā contains	
 Ā 
Linked	
 Ā Data	
 Ā sets	
 Ā 	
 Ā 
EUCLID	
 Ā -­‐	
 Ā Providing	
 Ā Linked	
 Ā Data	
 Ā  70	
 Ā 
1	
 Ā According	
 Ā to	
 Ā the	
 Ā informa3on	
 Ā presented	
 Ā in	
 Ā the	
 Ā portal	
 Ā on	
 Ā March	
 Ā 2013	
 Ā 
2	
 Ā Source:	
 Ā hjp://datahub.io/about	
 Ā 	
 Ā 
The	
 Ā Data	
 Ā Hub	
 Ā 	
 Ā 
EUCLID	
 Ā -­‐	
 Ā Providing	
 Ā Linked	
 Ā Data	
 Ā  71	
 Ā 
Source:	
 Ā hjp://datahub.io/	
 Ā 
The	
 Ā Data	
 Ā Hub	
 Ā (2)	
 Ā 
The	
 Ā Data	
 Ā Hub	
 Ā (3)	
 Ā 
EUCLID	
 Ā -­‐	
 Ā Providing	
 Ā Linked	
 Ā Data	
 Ā  72	
 Ā 
Source:	
 Ā hjp://datahub.io/	
 Ā 
The	
 Ā Linking	
 Ā Open	
 Ā Data	
 Ā Cloud	
 Ā 
EUCLID	
 Ā -­‐	
 Ā Providing	
 Ā Linked	
 Ā Data	
 Ā  73	
 Ā 
September	
 Ā 2011	
 Ā 
Source:	
 Ā Linking	
 Ā Open	
 Ā Data	
 Ā cloud	
 Ā diagram,	
 Ā by	
 Ā Richard	
 Ā Cyganiak	
 Ā and	
 Ā Anja	
 Ā Jentzsch	
 Ā 
The	
 Ā Linking	
 Ā Open	
 Ā Data	
 Ā Cloud	
 Ā 
How	
 Ā to	
 Ā publish	
 Ā an	
 Ā RDF	
 Ā data	
 Ā set	
 Ā in	
 Ā this	
 Ā cloud?	
 Ā 
1.  The	
 Ā data	
 Ā set	
 Ā must	
 Ā follow	
 Ā the	
 Ā Linked	
 Ā Data	
 Ā principles	
 Ā 
2.  The	
 Ā data	
 Ā set	
 Ā must	
 Ā contain	
 Ā at	
 Ā least	
 Ā 1,000	
 Ā RDF	
 Ā triples	
 Ā 
3.  The	
 Ā data	
 Ā set	
 Ā must	
 Ā contain	
 Ā at	
 Ā least	
 Ā 50	
 Ā RDF	
 Ā links	
 Ā to	
 Ā a	
 Ā 
data	
 Ā set	
 Ā that	
 Ā is	
 Ā already	
 Ā in	
 Ā the	
 Ā diagram	
 Ā 
4.  Access	
 Ā to	
 Ā the	
 Ā data	
 Ā set	
 Ā must	
 Ā be	
 Ā provided	
 Ā 
Once	
 Ā these	
 Ā criteria	
 Ā are	
 Ā met,	
 Ā the	
 Ā data	
 Ā publisher	
 Ā must	
 Ā add	
 Ā 
the	
 Ā data	
 Ā set	
 Ā to	
 Ā the	
 Ā Data	
 Ā Hub	
 Ā catalog,	
 Ā and	
 Ā contact	
 Ā the	
 Ā 
administrators	
 Ā of	
 Ā the	
 Ā Linking	
 Ā Open	
 Ā Data	
 Ā Cloud	
 Ā group	
 Ā 
EUCLID	
 Ā -­‐	
 Ā Providing	
 Ā Linked	
 Ā Data	
 Ā  74	
 Ā 
Source:	
 Ā hjp://lod-­‐cloud.net/	
 Ā 
Linked	
 Ā Data	
 Ā &	
 Ā Search	
 Ā Engines	
 Ā 
EUCLID	
 Ā -­‐	
 Ā Providing	
 Ā Linked	
 Ā Data	
 Ā  75	
 Ā 
•  Search	
 Ā engines	
 Ā collect	
 Ā informa3on	
 Ā about	
 Ā web	
 Ā 
resources	
 Ā in	
 Ā order	
 Ā to	
 Ā produce	
 Ā richer	
 Ā search	
 Ā results	
 Ā 
by	
 Ā improving	
 Ā the	
 Ā display	
 Ā of	
 Ā the	
 Ā results	
 Ā 
•  This	
 Ā is	
 Ā only	
 Ā possible	
 Ā if	
 Ā the	
 Ā search	
 Ā engines	
 Ā are	
 Ā able	
 Ā to	
 Ā 
understand	
 Ā the	
 Ā content	
 Ā within	
 Ā the	
 Ā web	
 Ā pages	
 Ā 	
 Ā 
•  The	
 Ā HTML	
 Ā pages	
 Ā must	
 Ā be	
 Ā annotated	
 Ā with	
 Ā machine-­‐
readable	
 Ā content	
 Ā to	
 Ā describe	
 Ā their	
 Ā content:	
 Ā 
Mark	
 Ā up	
 Ā format	
 Ā  Vocabulary	
 Ā 
RDFa	
 Ā for	
 Ā marking	
 Ā up	
 Ā data	
 Ā 
EUCLID	
 Ā -­‐	
 Ā Providing	
 Ā Linked	
 Ā Data	
 Ā  76	
 Ā 
•  RDFa	
 Ā is	
 Ā used	
 Ā to	
 Ā provide	
 Ā (semi-­‐)structured	
 Ā Linked	
 Ā 
Data	
 Ā embedded	
 Ā in	
 Ā web	
 Ā content	
 Ā 
•  Examples:	
 Ā 
– Some	
 Ā search	
 Ā engines	
 Ā use	
 Ā RDFa,	
 Ā e.g.,	
 Ā Google,	
 Ā 
Yahoo!	
 Ā and	
 Ā Bing	
 Ā 
– Facebook’s	
 Ā Open	
 Ā Graph	
 Ā is	
 Ā based	
 Ā on	
 Ā RDFa	
 Ā 
Google	
 Ā Rich	
 Ā Snippets	
 Ā 
EUCLID	
 Ā -­‐	
 Ā Providing	
 Ā Linked	
 Ā Data	
 Ā  77	
 Ā 
•  Embedding	
 Ā seman3cs	
 Ā via	
 Ā RDFa	
 Ā (or	
 Ā microformats/
microdata)	
 Ā enhances	
 Ā search	
 Ā results:	
 Ā 
Google	
 Ā Rich	
 Ā Snippets	
 Ā (2)	
 Ā 
EUCLID	
 Ā -­‐	
 Ā Providing	
 Ā Linked	
 Ā Data	
 Ā  78	
 Ā 
Schema.org	
 Ā 
EUCLID	
 Ā -­‐	
 Ā Providing	
 Ā Linked	
 Ā Data	
 Ā  79	
 Ā 
•  Collec3on	
 Ā of	
 Ā schemas/vocabularies	
 Ā to	
 Ā markup	
 Ā the	
 Ā 
HTML	
 Ā pages	
 Ā 
•  It	
 Ā is	
 Ā recognized	
 Ā by	
 Ā Bing,	
 Ā Google,	
 Ā Yahoo!	
 Ā and	
 Ā Yandex	
 Ā 
•  Covers	
 Ā a	
 Ā wide	
 Ā range	
 Ā of	
 Ā knowledge	
 Ā domains	
 Ā 	
 Ā 
•  It	
 Ā also	
 Ā offers	
 Ā an	
 Ā extension	
 Ā mechanism	
 Ā in	
 Ā case	
 Ā the	
 Ā 
publisher	
 Ā is	
 Ā interested	
 Ā in	
 Ā adding	
 Ā new	
 Ā concepts	
 Ā to	
 Ā the	
 Ā 
vocabularies	
 Ā 
Schema.org	
 Ā (2)	
 Ā 
EUCLID	
 Ā -­‐	
 Ā Providing	
 Ā Linked	
 Ā Data	
 Ā  80	
 Ā 
The	
 Ā vocabularies	
 Ā cover	
 Ā the	
 Ā following	
 Ā topics:	
 Ā 
Source:	
 Ā hjp://schema.org/docs/schemas.html	
 Ā 
ā€œThe	
 Ā world	
 Ā is	
 Ā too	
 Ā rich,	
 Ā 
complex	
 Ā and	
 Ā interes.ng	
 Ā for	
 Ā 
a	
 Ā single	
 Ā schema	
 Ā to	
 Ā describe	
 Ā 
fully	
 Ā on	
 Ā its	
 Ā own.	
 Ā With	
 Ā 
schema.org	
 Ā we	
 Ā aim	
 Ā to	
  find	
 Ā a	
 Ā 
balance,	
 Ā by	
 Ā providing	
 Ā a	
 Ā core	
 Ā 
schema	
 Ā that	
 Ā covers	
 Ā lots	
 Ā of	
 Ā 
situa.ons,	
 Ā alongside	
 Ā 
extension	
 Ā mechanisms	
 Ā for	
 Ā 
extra	
 Ā detail.ā€	
 Ā 
(Dan	
 Ā Brickley,	
 Ā schema.org)	
 Ā 
EUCLID	
 Ā -­‐	
 Ā Providing	
 Ā Linked	
 Ā Data	
 Ā  81	
 Ā 
Integrates(/aligns)	
 Ā exis3ng	
 Ā vocabularies	
 Ā where	
 Ā 
appropriate,	
 Ā e.g.	
 Ā rNews	
 Ā 
Source:	
 Ā hjp://schema.org/Ar3cle	
 Ā 
Schema.org	
 Ā (3)	
 Ā 
Google	
 Ā Knowledge	
 Ā Graph	
 Ā 
EUCLID	
 Ā -­‐	
 Ā Providing	
 Ā Linked	
 Ā Data	
 Ā  82	
 Ā 
•  The	
 Ā user	
 Ā is	
 Ā able	
 Ā to	
  find	
 Ā 
answer	
 Ā to	
 Ā their	
 Ā queries	
 Ā 
without	
 Ā browsing	
 Ā pages	
 Ā 
•  Provides	
 Ā detailed	
 Ā 
informa3on	
 Ā 
	
 Ā 
Google	
 Ā Knowledge	
 Ā Graph	
 Ā (2)	
 Ā 
EUCLID	
 Ā -­‐	
 Ā Providing	
 Ā Linked	
 Ā Data	
 Ā  83	
 Ā 
•  Google	
 Ā Search	
 Ā results	
 Ā 
	
 Ā include	
 Ā structured	
 Ā data	
 Ā 
from	
 Ā Freebase	
 Ā 
	
 Ā 
•  Might	
 Ā disambiguate	
 Ā 
search	
 Ā terms	
 Ā 
Ā 	
 Ā 	
 Ā 	
 Ā 	
 Ā 	
 Ā 	
 Ā Freebase	
 Ā 
EUCLID	
 Ā -­‐	
 Ā Providing	
 Ā Linked	
 Ā Data	
 Ā  84	
 Ā 
	
 Ā 	
 Ā 	
 Ā 	
 Ā 	
 Ā 	
 Ā 	
 Ā 	
 Ā 	
 Ā 
•  Knowledge	
 Ā base	
 Ā of	
 Ā 
structured	
 Ā data	
 Ā 
•  Data	
 Ā is	
 Ā stored	
 Ā as	
 Ā a	
 Ā 
graph	
 Ā 
	
 Ā 
•  Describes	
 Ā data	
 Ā from	
 Ā 
different	
 Ā domains	
 Ā 
Bing	
 Ā Snapshot	
 Ā 
EUCLID	
 Ā -­‐	
 Ā Providing	
 Ā Linked	
 Ā Data	
 Ā  85	
 Ā 
•  Provides	
 Ā structured	
 Ā data	
 Ā related	
 Ā to	
 Ā the	
 Ā search	
 Ā term	
 Ā 
•  Includes	
 Ā a	
  significant	
 Ā number	
 Ā of	
 Ā en33es	
 Ā from	
 Ā more	
 Ā 
domains	
 Ā 
•  Connects	
 Ā data	
 Ā from	
 Ā LinkedIn	
 Ā 
•  Is	
 Ā is	
 Ā powered	
 Ā by	
 Ā the	
 Ā graph	
 Ā engine	
 Ā Trinity.RDF	
 Ā 
Bing	
 Ā Snapshot	
 Ā (2)	
 Ā 
EUCLID	
 Ā -­‐	
 Ā Providing	
 Ā Linked	
 Ā Data	
 Ā  86	
 Ā 
Ā 	
 Ā 	
 Ā 	
 Ā 	
 Ā 	
 Ā 	
 Ā Open	
 Ā Graph	
 Ā Protocol	
 Ā 
EUCLID	
 Ā -­‐	
 Ā Providing	
 Ā Linked	
 Ā Data	
 Ā  87	
 Ā 
•  It	
 Ā was	
 Ā originally	
 Ā created	
 Ā by	
 Ā Facebook	
 Ā 
•  Allows	
 Ā describing	
 Ā web	
 Ā content	
 Ā as	
 Ā graph	
 Ā objects,	
 Ā 
establishing	
 Ā connec3ons	
 Ā between	
 Ā people	
 Ā and	
 Ā 
objects	
 Ā 
•  The	
 Ā descrip3ons	
 Ā are	
 Ā embedded	
 Ā in	
 Ā the	
 Ā web	
 Ā page	
 Ā as	
 Ā 
RDFa	
 Ā data	
 Ā 
•  Supports	
 Ā descrip3on	
 Ā of	
 Ā several	
 Ā domains:	
 Ā basic	
 Ā 
metadata,	
 Ā music,	
 Ā video,	
 Ā ar3cles,	
 Ā books,	
 Ā websites	
 Ā and	
 Ā 
user	
  profiles	
 Ā 
Source:	
 Ā hjp://ogp.me/	
 Ā 
Ā 	
 Ā 	
 Ā 	
 Ā 	
 Ā 	
 Ā 	
 Ā Open	
 Ā Graph	
 Ā Protocol	
 Ā (2)	
 Ā 
EUCLID	
 Ā -­‐	
 Ā Providing	
 Ā Linked	
 Ā Data	
 Ā  88	
 Ā 
Source:	
 Ā hjp://ogp.me/	
 Ā 
Who	
 Ā is	
 Ā using	
 Ā Open	
 Ā Graph	
 Ā protocol?	
 Ā 
Source:	
 Ā hjp://ogp.me/	
 Ā 
Facebook	
 Ā 
Google	
 Ā 
Mixi	
 Ā 
Consumers	
 Ā  Publishers	
 Ā 
IMDb	
 Ā 
Microso•	
 Ā 
NHL	
 Ā 
Posterous	
 Ā 
Rojen	
 Ā Tomatoes	
 Ā 
TIME	
 Ā 
Ā 	
 Ā 	
 Ā 	
 Ā 	
 Ā 	
 Ā 	
 Ā Open	
 Ā Graph	
 Ā Protocol	
 Ā (3)	
 Ā 
EUCLID	
 Ā -­‐	
 Ā Providing	
 Ā Linked	
 Ā Data	
 Ā  89	
 Ā 
•  Facebook	
 Ā expands	
 Ā vocabulary	
 Ā of	
 Ā rela3onships	
 Ā beyond	
 Ā 
ā€œfriendshipā€	
 Ā and	
 Ā ā€œlikeā€	
 Ā 	
 Ā 	
 Ā 	
 Ā 	
 Ā 	
 Ā 	
 Ā 	
 Ā 	
 Ā 	
 Ā more	
 Ā ac9ons!	
 Ā 	
 Ā 
Source:	
 Ā hjps://developers.facebook.com/docs/opengraph/	
 Ā 
Ā 	
 Ā 	
 Ā 	
 Ā 	
 Ā 	
 Ā 	
 Ā Open	
 Ā Graph	
 Ā Protocol	
 Ā &	
 Ā 	
 Ā 	
 Ā 	
 Ā 	
 Ā 	
 Ā 	
 Ā 	
 Ā 	
 Ā 	
 Ā 	
 Ā 	
 Ā 	
 Ā 	
 Ā 	
 Ā 	
 Ā 	
 Ā 	
 Ā 	
 Ā 	
 Ā 	
 Ā 	
 Ā 	
 Ā 	
 Ā 	
 Ā 	
 Ā 	
 Ā 	
 Ā 	
 Ā 	
 Ā 	
 Ā 	
 Ā 	
 Ā 	
 Ā 	
 Ā 	
 Ā 	
 Ā 	
 Ā 	
 Ā 	
 Ā 
	
 Ā 	
 Ā 	
 Ā Facebook	
 Ā 
EUCLID	
 Ā -­‐	
 Ā Providing	
 Ā Linked	
 Ā Data	
 Ā  90	
 Ā 
List	
 Ā of	
 Ā domains	
 Ā and	
 Ā ac9ons	
 Ā 
Source:	
 Ā hjps://developers.facebook.com/docs/opengraph/	
 Ā 
•  Listen	
 Ā 
•  Create	
 Ā a	
 Ā playlist	
 Ā 
•  Watch	
 Ā 
•  Rate	
 Ā 
•  Wants	
 Ā to	
 Ā watch	
 Ā 
•  Rate	
 Ā 
•  Read	
 Ā 
•  Quote	
 Ā 
•  Wants	
 Ā to	
 Ā read	
 Ā 
•  Achieve	
 Ā 
•  High	
 Ā score	
 Ā 
•  Bike	
 Ā 
•  Run	
 Ā 
•  Walk	
 Ā 
•  Like	
 Ā 
•  Recommend	
 Ā 
•  Follow	
 Ā 
General	
 Ā 
Music	
 Ā 
Movies	
 Ā 	
 Ā 
&	
 Ā TV	
 Ā 
Games	
 Ā 
Fitness	
 Ā 
Book	
 Ā 
How	
 Ā can	
 Ā we	
 Ā exploit	
 Ā these	
 Ā links	
 Ā and	
 Ā rela3onships?	
 Ā 
Ā 	
 Ā 	
 Ā 	
 Ā 	
 Ā 	
 Ā Facebook	
 Ā Graph	
 Ā Search	
 Ā 
EUCLID	
 Ā -­‐	
 Ā Providing	
 Ā Linked	
 Ā Data	
 Ā  91	
 Ā 
•  focuse	
 Ā on	
 Ā people	
 Ā and	
 Ā their	
 Ā interests,	
 Ā exploi3ng	
 Ā 
how	
 Ā everything	
 Ā is	
 Ā related	
 Ā to	
 Ā each	
 Ā other	
 Ā 
•  Queries	
 Ā are	
  specified	
 Ā using	
 Ā natural	
 Ā language	
 Ā 
•  Takes	
 Ā advantage	
 Ā of	
 Ā context	
 Ā and	
 Ā suggest	
 Ā possible	
 Ā 
queries	
 Ā 	
 Ā 
•  Allows	
 Ā for	
 Ā building	
 Ā more	
 Ā complex	
 Ā (expressive)	
 Ā 
queries	
 Ā that	
 Ā are	
 Ā not	
 Ā possible	
 Ā with	
 Ā normal	
 Ā search:	
 Ā 
–  For	
 Ā example,	
 Ā ā€œmusic	
 Ā liked	
 Ā by	
 Ā me	
 Ā and	
 Ā friends	
 Ā who	
 Ā live	
 Ā in	
 Ā 
my	
 Ā cityā€	
 Ā 
Ā 	
 Ā 	
 Ā 	
 Ā 	
 Ā 	
 Ā Facebook	
 Ā Graph	
 Ā Search	
 Ā (2)	
 Ā 
EUCLID	
 Ā -­‐	
 Ā Providing	
 Ā Linked	
 Ā Data	
 Ā  92	
 Ā 
	
 Ā 
	
 Ā 
Context	
 Ā (informa3on	
 Ā from	
  profile):	
 Ā 
Graph	
 Ā search	
 Ā sugges9ons:	
 Ā 
Ā 	
 Ā 	
 Ā 	
 Ā 	
 Ā 	
 Ā Facebook	
 Ā Graph	
 Ā Search	
 Ā (3)	
 Ā 
EUCLID	
 Ā -­‐	
 Ā Providing	
 Ā Linked	
 Ā Data	
 Ā  93	
 Ā 
Results	
 Ā 
Ā 	
 Ā 	
 Ā 	
 Ā 	
 Ā 	
 Ā Facebook	
 Ā Graph	
 Ā Search	
 Ā (4)	
 Ā 
EUCLID	
 Ā -­‐	
 Ā Providing	
 Ā Linked	
 Ā Data	
 Ā  94	
 Ā 
Observations	
 Ā 
	
 Ā 
•  Allows	
 Ā for	
 Ā conjunc3ve	
 Ā queries	
 Ā (applying	
  filter	
 Ā over	
 Ā intermediate	
 Ā 
results	
 Ā =	
 Ā ā€œapply	
 Ā operatorā€)	
 Ā 
•  Disjunc9ve	
 Ā queries	
 Ā are	
 Ā not	
 Ā supported:	
 Ā 
–  For	
 Ā example:	
 Ā ā€œMy	
 Ā friends	
 Ā who	
 Ā like	
 Ā Seman3cWeb.com	
 Ā OR	
 Ā ReadWriteā€	
 Ā 
	
 Ā 
•  Post	
 Ā search	
 Ā is	
 Ā not	
 Ā supported	
 Ā 
–  It	
 Ā is	
 Ā not	
 Ā possible	
 Ā to	
 Ā search	
 Ā in	
 Ā post	
 Ā content	
 Ā submijed	
 Ā to	
 Ā the	
 Ā 3meline	
 Ā 
•  User	
 Ā privacy	
 Ā segngs	
 Ā affect	
 Ā the	
 Ā results	
 Ā 
Tools	
 Ā for	
 Ā providing	
 Ā Linked	
 Ā Data	
 Ā 
•  Extrac9ng	
 Ā data	
 Ā from	
 Ā spreadsheets:	
  OpenRefine	
 Ā 
•  Extrac9ng	
 Ā data	
 Ā from	
 Ā RDBMS:	
 Ā R2RML	
 Ā 
•  Extrac9ng	
 Ā data	
 Ā from	
 Ā text:	
 Ā Zemanta,	
 Ā OpenCalais,	
 Ā GATE	
 Ā 
•  Interlinking	
 Ā data	
 Ā sets:	
 Ā Silk	
 Ā 
EXTRACTING	
 Ā DATA	
 Ā FROM	
 Ā 
SPREADSHEETS	
 Ā WITH	
 Ā OPENREFINE	
 Ā 
EUCLID	
 Ā -­‐	
 Ā Providing	
 Ā Linked	
 Ā Data	
 Ā  96	
 Ā 
Integrate	
 Ā Chart	
 Ā Data	
 Ā 
•  Task:	
 Ā Integrate	
 Ā latest	
 Ā chart	
 Ā 
informa3on	
 Ā into	
 Ā your	
 Ā RDF	
 Ā 
database.	
 Ā 
•  Data	
 Ā may	
 Ā be	
 Ā available	
 Ā in	
 Ā non-­‐
RDF	
 Ā formats:	
 Ā 
–  Plain	
 Ā text	
 Ā 
–  CSV,	
 Ā TSV,	
 Ā separator-­‐based	
 Ā 
files	
 Ā 
–  HTML	
 Ā tables	
 Ā 
–  Spreadsheets	
 Ā 
(OpenDocument,	
 Ā Excel,	
  …)	
 Ā 
–  XML	
 Ā 
–  JSON	
 Ā 
–  …	
 Ā 
97	
 Ā 
LD	
 Ā Data	
 Ā set	
 Ā Access	
 Ā 
Integrated	
 Ā 
Data	
 Ā Set	
 Ā 
Interlinking	
 Ā  Cleansing	
 Ā 
Vocabulary	
 Ā 
Mapping	
 Ā 
SPARQL	
 Ā 
Endpoint	
 Ā 
Publishing	
 Ā 
CSV/	
 Ā 
TSV	
 Ā 
HTML	
 Ā  Spreadsheets	
 Ā  JSON	
 Ā 
Data	
 Ā acquisi3on	
 Ā 
EUCLID	
 Ā -­‐	
 Ā Providing	
 Ā Linked	
 Ā Data	
 Ā 
Example	
 Ā Data	
 Ā 
The Beatles, 250 million
Elvis Presley, 203.3 million
Michael Jackson, 157.4 million
Madonna, 160.1 million
Led Zeppelin, 135.5 million
Queen, 90.5 million	
 Ā 
	
 Ā 
98	
 Ā 
hjp://en.wikipedia.org/wiki/	
 Ā 
List_of_best-­‐selling_music_ar3sts	
 Ā 
Ar3st	
 Ā 
Country	
 Ā of	
 Ā 
origin	
 Ā 
Period	
 Ā 
ac3ve	
 Ā 
Release-­‐year	
 Ā 
of	
  first	
 Ā 
charted	
 Ā 
record	
 Ā 
Total	
  cer3fied	
 Ā units	
 Ā 
(from	
 Ā available	
 Ā markets)[Notes]	
 Ā 
The	
 Ā Beatles	
 Ā 
United	
 Ā 
Kingdom	
 Ā 
1960–
1970[4]	
 Ā 
1962[4]	
 Ā 
Total	
 Ā available	
  cer9fied	
 Ā units:	
 Ā 	
 Ā 
250	
 Ā million[show]	
 Ā 
Elvis	
 Ā Presley	
 Ā 
United	
 Ā 
States	
 Ā 
1954–
1977[28]	
 Ā 
1954[28]	
 Ā 
Total	
 Ā available	
  cer9fied	
 Ā units:	
 Ā 
203.3	
 Ā million[show]	
 Ā 
Michael	
 Ā 
Jackson[Note	
 Ā 2]	
 Ā 
United	
 Ā 
States	
 Ā 
1964–
2009[32]	
 Ā 
1971[32]	
 Ā 
Total	
 Ā available	
  cer9fied	
 Ā units:	
 Ā 
157.4	
 Ā million[show]	
 Ā 
Madonna	
 Ā 
United	
 Ā 
States	
 Ā 
1979–
present[44]	
 Ā 
1982[44]	
 Ā 
Total	
 Ā available	
  cer9fied	
 Ā units:	
 Ā 
160.1	
 Ā million[show]	
 Ā 
Led	
 Ā Zeppelin	
 Ā 
United	
 Ā 
Kingdom	
 Ā 
1968–
1980[50]	
 Ā 
1969[50]	
 Ā 
Total	
 Ā available	
  cer9fied	
 Ā units:	
 Ā 
135.5	
 Ā million[show]	
 Ā 
Queen	
 Ā 
United	
 Ā 
Kingdom	
 Ā 
1971–
present[53]	
 Ā 
1973[53]	
 Ā 
Total	
 Ā available	
  cer9fied	
 Ā units:	
 Ā 	
 Ā 
90.5	
 Ā million[show]	
 Ā 
{
"artist": {
"class": "artist",
"name": "The Beatles"
},
"rank": 1,
"value": 250 million
},
…
CSV	
 Ā 
JSON	
 Ā 
HTML	
 Ā tables	
 Ā 
EUCLID	
 Ā -­‐	
 Ā Providing	
 Ā Linked	
 Ā Data	
 Ā 
Ā 	
 Ā 	
 Ā 	
 Ā 	
 Ā 	
 Ā 	
 Ā 	
 Ā 	
 Ā 	
 Ā 	
 Ā 	
 Ā 	
 Ā 	
 Ā 	
 Ā 	
 Ā 	
 Ā 	
 Ā 	
  OpenRefine	
 Ā 
	
 Ā 
•  transforms	
 Ā and	
 Ā cleans	
 Ā messy	
 Ā 
input	
 Ā data	
 Ā sets.	
 Ā 
	
 Ā 
•  is	
 Ā an	
 Ā open-­‐source	
 Ā successor	
 Ā of	
 Ā 
Google	
  Refine.	
 Ā 
	
 Ā 
•  allows	
 Ā for	
 Ā en3ty	
 Ā reconcilia3on	
 Ā 
against	
 Ā SPARQL	
 Ā endpoints	
 Ā or	
 Ā 
RDF	
 Ā data.	
 Ā 
	
 Ā 
•  is	
 Ā extended	
 Ā with	
 Ā plugins	
 Ā that	
 Ā 
enhance	
 Ā its	
 Ā func3onality,	
 Ā e.g.	
 Ā 
for	
 Ā RDF	
 Ā support.	
 Ā 
99	
 Ā EUCLID	
 Ā -­‐	
 Ā Providing	
 Ā Linked	
 Ā Data	
 Ā 
Quick	
 Ā Facts	
 Ā 
Use	
 Ā of	
  OpenRefine	
 Ā 
100	
 Ā 
1.  Messy	
 Ā input	
 Ā data	
 Ā is	
 Ā 
imported,	
 Ā 	
 Ā transformed	
 Ā 
into	
 Ā a	
 Ā table	
 Ā represen-­‐
ta3on	
 Ā and	
 Ā cleaned.	
 Ā 
3.  Define	
 Ā the	
 Ā structure	
 Ā of	
 Ā 
the	
 Ā RDF	
 Ā output.	
 Ā 
	
 Ā 
4.  The	
 Ā data	
 Ā is	
 Ā exported	
 Ā 
into	
 Ā some	
 Ā RDF	
 Ā syntax.	
 Ā 
2.  En3ty	
 Ā reconcilia3on	
 Ā is	
 Ā 
applied	
 Ā to	
 Ā allow	
 Ā for	
 Ā 
interlinking	
 Ā with	
 Ā 
exis3ng	
 Ā data	
 Ā sets.	
 Ā 
The Beatles, 250 million
Elvis Presley, 203.3 million
Michael Jackson, 157.4 million
Madonna, 160.1 million
Led Zeppelin, 135.5 million
Queen, 90.5 million	
 Ā 
	
 Ā 
CSV	
 Ā 
musicbrainz:b10bbbfc-cf9e-42e0-be17-e2c3e1d2600d :totalSales "25000000000"^^xsd:int .

musicbrainz:01809552-4f87-45b0-afff-2c6f0730a3be :totalSales "2.033E10"^^xsd:int .

musicbrainz:f27ec8db-af05-4f36-916e-3d57f91ecf5e :totalSales "1.574E10"^^xsd:int .

musicbrainz:79239441-bfd5-4981-a70c-55c3f15c1287 :totalSales "1.601E10"^^xsd:int .

musicbrainz:678d88b2-87b0-403b-b63d-5da7465aecc3 :totalSales "1.355E10"^^xsd:int .

musicbrainz:0383dadf-2a4e-4d10-a46a-e9e041da8eb3 :totalSales "9.05E9"^^xsd:int .
RDF	
 Ā 
EUCLID	
 Ā -­‐	
 Ā Providing	
 Ā Linked	
 Ā Data	
 Ā 
Typical	
 Ā steps:	
 Ā 
•  Group	
 Ā and	
 Ā explore	
 Ā data	
 Ā 
items	
 Ā 
•  Dele3ng	
 Ā columns	
 Ā or	
 Ā rows	
 Ā 
based	
 Ā on	
  filter	
 Ā condi3on	
 Ā 
•  Split	
 Ā columns	
 Ā into	
 Ā several	
 Ā 
columns	
 Ā based	
 Ā on	
 Ā 
condi3on	
 Ā 
•  Modify	
 Ā messy	
 Ā data	
 Ā items	
 Ā 
with	
 Ā GREL,	
 Ā a	
 Ā powerful	
 Ā 
expression	
 Ā language	
 Ā 
•  Replay	
 Ā steps	
 Ā from	
 Ā a	
 Ā 
previous	
  Refine	
 Ā project	
 Ā 
101	
 Ā EUCLID	
 Ā -­‐	
 Ā Providing	
 Ā Linked	
 Ā Data	
 Ā 
Data	
 Ā Transformation	
 Ā 
How	
 Ā to	
 Ā Generate	
 Ā RDF?	
 Ā 
•  Addi3onal	
 Ā problem:	
 Ā data	
 Ā needs	
 Ā to	
 Ā be	
 Ā interlinked	
 Ā 
with	
 Ā exis3ng	
 Ā MusicBrainz	
 Ā data	
 Ā 
•  This	
 Ā is	
 Ā the	
 Ā point	
 Ā where	
 Ā plugins	
 Ā come	
 Ā into	
 Ā play:	
 Ā 
–  RDF	
  Refine:	
 Ā developed	
 Ā by	
 Ā DERI	
 Ā 
–  An	
 Ā extension	
 Ā of	
  OpenRefine	
 Ā to	
 Ā support	
 Ā RDF	
 Ā 
102	
 Ā 
?	
 Ā 
RDF	
 Ā 
EUCLID	
 Ā -­‐	
 Ā Providing	
 Ā Linked	
 Ā Data	
 Ā 
Core	
 Ā Capabilities	
 Ā 
•  Interlinking	
 Ā of	
 Ā data	
 Ā by	
 Ā en3ty	
 Ā reconcilia3on	
 Ā 
– Against	
 Ā SPARQL	
 Ā endpoints,	
 Ā RDF	
 Ā dumps	
 Ā 
– Discovery	
 Ā of	
 Ā relevant	
 Ā RDF	
 Ā data	
 Ā sets	
 Ā 
	
 Ā 
•  RDF	
 Ā export	
 Ā with	
 Ā the	
 Ā help	
 Ā of	
 Ā RDF	
 Ā skeletons	
 Ā 
– Define	
 Ā the	
 Ā vocabulary	
 Ā and	
 Ā graph	
 Ā structure	
 Ā of	
 Ā the	
 Ā 
RDF	
 Ā serializa3on	
 Ā 
– In	
 Ā Turtle,	
 Ā RDF/XML	
 Ā 
103	
 Ā EUCLID	
 Ā -­‐	
 Ā Providing	
 Ā Linked	
 Ā Data	
 Ā 
Typical	
 Ā steps:	
 Ā 
•  Define	
 Ā a	
 Ā reconcilia3on	
 Ā service	
 Ā 
•  Select	
  specific	
 Ā types	
 Ā to	
 Ā reconcile	
 Ā against	
 Ā 
•  Start	
 Ā reconciling	
 Ā a	
 Ā column	
 Ā against	
 Ā the	
 Ā 
service	
 Ā 
104	
 Ā EUCLID	
 Ā -­‐	
 Ā Providing	
 Ā Linked	
 Ā Data	
 Ā 
Entity	
 Ā Reconciliation	
 Ā 
Define	
 Ā RDF	
 Ā Skeletons	
 Ā 
•  An	
 Ā RDF	
 Ā skeleton	
  defines	
 Ā the	
 Ā structure	
 Ā of	
 Ā the	
 Ā 
RDF	
 Ā triples	
 Ā that	
 Ā are	
 Ā exported	
 Ā 
EUCLID	
 Ā -­‐	
 Ā Providing	
 Ā Linked	
 Ā Data	
 Ā  105	
 Ā 
RDF	
 Ā Skeletons	
 Ā 
03.09.13	
 Ā  106	
 Ā 106	
 Ā EUCLID	
 Ā -­‐	
 Ā Providing	
 Ā Linked	
 Ā Data	
 Ā 
EXTRACTING	
 Ā DATA	
 Ā FROM	
 Ā 
RDBMS	
 Ā WITH	
 Ā R2RML	
 Ā 
EUCLID	
 Ā -­‐	
 Ā Providing	
 Ā Linked	
 Ā Data	
 Ā  107	
 Ā 
W3C	
 Ā RDB2RDF	
 Ā 
•  Task:	
 Ā Integrate	
 Ā data	
 Ā from	
 Ā 
rela3onal	
 Ā DBMS	
 Ā with	
 Ā Linked	
 Ā 
Data	
 Ā 
•  Approach:	
 Ā map	
 Ā from	
 Ā 
rela3onal	
 Ā schema	
 Ā to	
 Ā seman3c	
 Ā 
vocabulary	
 Ā with	
 Ā R2RML	
 Ā 
•  Publishing:	
 Ā two	
 Ā alterna3ves	
  –	
 Ā 
–  Translate	
 Ā SPARQL	
 Ā into	
 Ā 
SQL	
 Ā on	
 Ā the	
  fly	
 Ā 
–  Batch	
 Ā transform	
 Ā data	
 Ā into	
 Ā 
RDF,	
 Ā index	
 Ā and	
 Ā provide	
 Ā 
SPARQL	
 Ā access	
 Ā in	
 Ā a	
 Ā 
triplestore	
 Ā 
108	
 Ā 
LD	
 Ā Data	
 Ā set	
 Ā Access	
 Ā 
Integrated	
 Ā 
Data	
 Ā in	
 Ā 
Triplestore	
 Ā 
Interlinking	
 Ā  Cleansing	
 Ā 
Vocabulary	
 Ā 
Mapping	
 Ā 
SPARQL	
 Ā 
Endpoint	
 Ā 
Publishing	
 Ā 
Data	
 Ā acquisi3on	
 Ā 
EUCLID	
 Ā -­‐	
 Ā Providing	
 Ā Linked	
 Ā Data	
 Ā 
R2RML	
 Ā 
Engine	
 Ā 
Rela3onal	
 Ā 
DBMS	
 Ā 
W3C	
 Ā RDB2RDF	
 Ā 
•  The	
 Ā W3C	
 Ā made,	
 Ā last	
 Ā year,	
 Ā two	
 Ā recommenda3ons	
 Ā for	
 Ā 
mapping	
 Ā between	
 Ā rela3onal	
 Ā databases	
 Ā and	
 Ā RDF:	
 Ā 
–  Direct	
 Ā mapping	
 Ā directly	
 Ā exposes	
 Ā data	
 Ā as	
 Ā RDF	
 Ā 
•  Not	
 Ā allowance	
 Ā for	
 Ā vocabulary	
 Ā 	
 Ā mapping	
 Ā 
•  No	
 Ā allowance	
 Ā for	
 Ā interlinking	
 Ā (unless	
 Ā URIs	
 Ā used	
 Ā in	
 Ā rela3onal	
 Ā data)	
 Ā 
•  Not	
 Ā appropriate	
 Ā for	
 Ā this	
 Ā topic	
 Ā 
– R2RML,	
 Ā the	
 Ā RDB	
 Ā to	
 Ā RDF	
 Ā mapping	
 Ā language	
 Ā 
•  Allows	
 Ā vocabulary	
 Ā mapping	
 Ā (subject,	
 Ā predicate	
 Ā and	
 Ā 
object	
 Ā maps	
 Ā with	
 Ā class	
 Ā op3ons)	
 Ā 
•  Allows	
 Ā interlinking	
  –	
 Ā URIs	
 Ā can	
 Ā be	
 Ā constructed	
 Ā 
•  Means	
 Ā to	
 Ā provide	
 Ā MusicBrainz	
 Ā RDF/SPARQL	
 Ā itself	
 Ā 
EUCLID	
 Ā -­‐	
 Ā Providing	
 Ā Linked	
 Ā Data	
 Ā  109	
 Ā 
hjp://www.w3.org/2001/sw/rdb2rdf/	
 Ā 
MusicBrainz	
 Ā Next	
 Ā Gen	
 Ā Schema	
 Ā 
EUCLID	
 Ā -­‐	
 Ā Providing	
 Ā Linked	
 Ā Data	
 Ā  110	
 Ā 
•  Ar9st	
 Ā 
	
 Ā As	
 Ā pre-­‐NGS,	
 Ā but	
 Ā 	
 Ā 	
 Ā 
	
 Ā 	
 Ā 	
 Ā 	
 Ā further	
 Ā ajributes	
 Ā 
•  Ar9st	
 Ā Credit	
 Ā 
	
 Ā Allows	
 Ā joint	
 Ā credit	
 Ā 
•  Release	
 Ā Group	
 Ā 
	
 Ā Cf.	
 Ā ā€˜album’	
 Ā 	
 Ā 
	
 Ā 	
 Ā 	
 Ā 	
 Ā versus:	
 Ā 
•  Release	
 Ā 
•  Medium	
 Ā 	
 Ā 
	
 Ā 
	
 Ā 
•  Track	
 Ā 
•  Track	
 Ā List	
 Ā 
•  Work	
 Ā 
•  Recording	
 Ā 
Source:	
 Ā hjps://wiki.musicbrainz.org/Next_Genera3on_Schema	
 Ā 
Music	
 Ā Ontology	
 Ā 
•  OWL	
 Ā ontology	
 Ā with	
 Ā following	
 Ā core	
 Ā concepts	
 Ā 
(classes)	
 Ā and	
 Ā rela3onships	
 Ā (proper3es):	
 Ā 
EUCLID	
 Ā -­‐	
 Ā Providing	
 Ā Linked	
 Ā Data	
 Ā  111	
 Ā 
Source:	
 Ā hjp://musicontology.com	
 Ā 
R2RML	
 Ā Class	
 Ā Mapping	
 Ā 
•  Mapping	
 Ā tables	
 Ā to	
 Ā classes	
 Ā is	
 Ā ā€˜easy’:	
 Ā 
lb:Artist	
 Ā a	
 Ā rr:TriplesMap	
 Ā ;	
 Ā 
	
 Ā 	
 Ā rr:logicalTable	
 Ā [rr:tableName	
 Ā "artist"]	
 Ā ;	
 Ā 
	
 Ā 	
 Ā rr:subjectMap	
 Ā 	
 Ā 
	
 Ā 	
 Ā 	
 Ā 	
 Ā [rr:class	
 Ā mo:MusicArtist	
 Ā ;	
 Ā 
	
 Ā 	
 Ā 	
 Ā 	
 Ā 	
 Ā rr:template	
 Ā 	
 Ā 
	
 Ā 	
 Ā 	
 Ā 	
 Ā 	
 Ā 	
 Ā 	
 Ā 	
 Ā 	
 Ā 	
 Ā "http://musicbrainz.org/artist/{gid}#_"]	
 Ā ;	
 Ā 
	
 Ā 	
 Ā rr:predicateObjectMap	
 Ā 	
 Ā 
	
 Ā 	
 Ā 	
 Ā 	
 Ā [rr:predicate	
 Ā mo:musicbrainz_guid	
 Ā ;	
 Ā 
	
 Ā 	
 Ā 	
 Ā 	
 Ā 	
 Ā rr:objectMap	
 Ā [rr:column	
 Ā "gid"	
 Ā ;	
 Ā 	
 Ā 
	
 Ā 	
 Ā 	
 Ā 	
 Ā 	
 Ā 	
 Ā 	
 Ā 	
 Ā 	
 Ā 	
 Ā 	
 Ā 	
 Ā 	
 Ā 	
 Ā 	
 Ā 	
 Ā 	
 Ā 	
 Ā 	
 Ā rr:datatype	
 Ā xsd:string]]	
 Ā .	
 Ā 
EUCLID	
 Ā -­‐	
 Ā Providing	
 Ā Linked	
 Ā Data	
 Ā  112	
 Ā 
R2RML	
 Ā Property	
 Ā Mapping	
 Ā 
•  Mapping	
 Ā columns	
 Ā to	
 Ā proper3es	
 Ā can	
 Ā be	
 Ā easy:	
 Ā 
lb:artist_name	
 Ā a	
 Ā rr:TriplesMap	
 Ā ;	
 Ā 
	
 Ā 	
 Ā rr:logicalTable	
 Ā [rr:sqlQuery	
 Ā 	
 Ā 
	
 Ā 	
 Ā 	
 Ā 	
 Ā """SELECT	
 Ā artist.gid,	
 Ā artist_name.name	
 Ā 	
 Ā 	
 Ā 
	
 Ā 	
 Ā 	
 Ā 	
 Ā 	
 Ā 	
 Ā 	
 Ā FROM	
 Ā artist	
 Ā 
	
 Ā 	
 Ā 	
 Ā 	
 Ā 	
 Ā 	
 Ā 	
 Ā 	
 Ā 	
 Ā INNER	
 Ā JOIN	
 Ā artist_name	
 Ā ON	
 Ā artist.name	
 Ā =	
 Ā 
artist_name.id"""]	
 Ā ;	
 Ā 
	
 Ā 	
 Ā rr:subjectMap	
 Ā lb:sm_artist	
 Ā ;	
 Ā 
	
 Ā 	
 Ā rr:predicateObjectMap	
 Ā 	
 Ā 
	
 Ā 	
 Ā 	
 Ā 	
 Ā [rr:predicate	
 Ā foaf:name	
 Ā ;	
 Ā 
	
 Ā 	
 Ā 	
 Ā 	
 Ā 	
 Ā rr:objectMap	
 Ā [rr:column	
 Ā "name"]]	
 Ā .	
 Ā 
EUCLID	
 Ā -­‐	
 Ā Providing	
 Ā Linked	
 Ā Data	
 Ā  113	
 Ā 
NGS	
 Ā Advanced	
 Ā Relations	
 Ā 
EUCLID	
 Ā -­‐	
 Ā Providing	
 Ā Linked	
 Ā Data	
 Ā  114	
 Ā 
•  Major	
 Ā en33es	
 Ā (Ar3st,	
 Ā Release	
 Ā Group,	
 Ā Track,	
 Ā etc.)	
 Ā plus	
 Ā 
URL	
 Ā are	
 Ā paired	
 Ā 
	
 Ā (l_ar3st_ar3st)	
 Ā 
•  Each	
 Ā pairing	
 Ā 
	
 Ā of	
 Ā instances	
 Ā 
	
 Ā refers	
 Ā to	
 Ā a	
 Ā Link	
 Ā 
•  Links	
 Ā have	
 Ā types	
 Ā 	
 Ā 
	
 Ā (cf.	
 Ā RDF	
 Ā proper3es)	
 Ā 
	
 Ā and	
 Ā ajributes	
 Ā 
	
 Ā 
	
 Ā 	
 Ā 
Source:	
 Ā hjp://wiki.musicbrainz.org/Advanced_Rela3onship	
 Ā 
R2RML	
 Ā Advanced	
 Ā Mapping	
 Ā 
•  Mapping	
 Ā advanced	
 Ā rela3onships	
 Ā (SQL	
 Ā joins):	
 Ā 
lb:artist_member	
 Ā a	
 Ā rr:TriplesMap	
 Ā ;	
 Ā 
	
 Ā 	
 Ā rr:logicalTable	
 Ā [rr:sqlQuery	
 Ā 
	
 Ā 	
 Ā 	
 Ā 	
 Ā """SELECT	
 Ā a1.gid,	
 Ā a2.gid	
 Ā AS	
 Ā band	
 Ā 
	
 Ā 	
 Ā 	
 Ā 	
 Ā 	
 Ā 	
 Ā 	
 Ā FROM	
 Ā artist	
 Ā a1	
 Ā 
	
 Ā 	
 Ā 	
 Ā 	
 Ā 	
 Ā 	
 Ā 	
 Ā 	
 Ā 	
 Ā INNER	
 Ā JOIN	
 Ā l_artist_artist	
 Ā ON	
 Ā a1.id	
 Ā =	
 Ā 
l_artist_artist.entity0	
 Ā 	
 Ā 
	
 Ā 	
 Ā 	
 Ā 	
 Ā 	
 Ā 	
 Ā 	
 Ā 	
 Ā 	
 Ā INNER	
 Ā JOIN	
 Ā link	
 Ā ON	
 Ā l_artist_artist.link	
 Ā =	
 Ā link.id	
 Ā 	
 Ā 
	
 Ā 	
 Ā 	
 Ā 	
 Ā 	
 Ā 	
 Ā 	
 Ā 	
 Ā 	
 Ā INNER	
 Ā JOIN	
 Ā link_type	
 Ā ON	
 Ā link_type	
 Ā =	
 Ā link_type.id	
 Ā 	
 Ā 
	
 Ā 	
 Ā 	
 Ā 	
 Ā 	
 Ā 	
 Ā 	
 Ā 	
 Ā 	
 Ā INNER	
 Ā JOIN	
 Ā artist	
 Ā a2	
 Ā on	
 Ā l_artist_artist.entity1	
 Ā =	
 Ā a2.id	
 Ā 	
 Ā 
	
 Ā 	
 Ā 	
 Ā 	
 Ā 	
 Ā 	
 Ā 	
 Ā WHERE	
 Ā link_type.gid='5be4c609-­‐9afa-­‐4ea0-­‐910b-­‐12ffb71e3821'	
 Ā 
	
 Ā 	
 Ā 	
 Ā 	
 Ā 	
 Ā 	
 Ā 	
 Ā 	
 Ā 	
 Ā AND	
 Ā link.ended=FALSE"""]	
 Ā ;	
 Ā 
	
 Ā 	
 Ā rr:subjectMap	
 Ā lb:sm_artist	
 Ā ;	
 Ā 
	
 Ā 	
 Ā rr:predicateObjectMap	
 Ā 	
 Ā 
	
 Ā 	
 Ā 	
 Ā 	
 Ā [rr:predicate	
 Ā mo:member_of	
 Ā ;	
 Ā 
	
 Ā 	
 Ā 	
 Ā 	
 Ā 	
 Ā rr:objectMap	
 Ā [rr:template	
 Ā "http://musicbrainz.org/artist/
{band}#_"	
 Ā ;	
 Ā 
	
 Ā 	
 Ā 	
 Ā 	
 Ā 	
 Ā 	
 Ā 	
 Ā 	
 Ā 	
 Ā 	
 Ā 	
 Ā 	
 Ā 	
 Ā 	
 Ā 	
 Ā 	
 Ā 	
 Ā 	
 Ā 	
 Ā rr:termType	
 Ā rr:IRI]]	
 Ā .	
 Ā 
EUCLID	
 Ā -­‐	
 Ā Providing	
 Ā Linked	
 Ā Data	
 Ā  115	
 Ā 
EXTRACTING	
 Ā DATA	
 Ā FROM	
 Ā TEXT	
 Ā 
EUCLID	
 Ā -­‐	
 Ā Providing	
 Ā Linked	
 Ā Data	
 Ā  116	
 Ā 
OpenCalais	
 Ā 
	
 Ā 
	
 Ā 
•  Not	
 Ā easily	
 Ā customised/extended	
 Ā 
•  Domain-­‐specific	
 Ā coverage	
 Ā varies	
 Ā 
EUCLID	
 Ā -­‐	
 Ā Providing	
 Ā Linked	
 Ā Data	
 Ā  117	
 Ā 
Source:	
 Ā hjp://viewer.opencalais.com/	
 Ā 
DBpedia	
 Ā Spotlight	
 Ā 	
 Ā 
	
 Ā 
	
 Ā 
•  Not	
 Ā easily	
 Ā customised/extended	
 Ā 
•  Is	
 Ā currently	
 Ā only	
 Ā available	
 Ā for	
 Ā English	
 Ā 
EUCLID	
 Ā -­‐	
 Ā Providing	
 Ā Linked	
 Ā Data	
 Ā  118	
 Ā 
Source:	
 Ā hjp://dbpedia-­‐spotlight.github.com/demo/	
 Ā 
hjp://dbpedia.org/page/Slowcore	
 Ā 
hjp://dbpedia.org/page/Dorothy_Parker	
 Ā 
Zemanta	
 Ā 
EUCLID	
 Ā -­‐	
 Ā Providing	
 Ā Linked	
 Ā Data	
 Ā  119	
 Ā 
Source:	
 Ā hjp://www.zemanta.com/demo/	
 Ā 
•  Common	
 Ā problem	
 Ā with	
 Ā general	
 Ā purpose,	
 Ā open-­‐domain	
 Ā 	
 Ā 	
 Ā 
	
 Ā seman3c	
 Ā 	
 Ā 
	
 Ā annota3on	
 Ā 	
 Ā 
	
 Ā tools	
 Ā 
•  Best	
 Ā results	
 Ā 	
 Ā 
	
 Ā require	
 Ā 	
 Ā 
	
 Ā bespoke	
 Ā 	
 Ā 
	
 Ā customisa3on	
 Ā 
•  General	
 Ā Architecture	
 Ā for	
 Ā Text	
 Ā Engineering	
 Ā 
•  Free	
 Ā open-­‐source	
 Ā (LGPL)	
 Ā 
	
 Ā framework	
 Ā and	
 Ā development	
 Ā environment	
 Ā 
•  Started	
 Ā 1996,	
 Ā large	
 Ā developer	
 Ā community	
 Ā 
•  Used	
 Ā worldwide	
 Ā by	
 Ā many	
 Ā organisa3ons	
 Ā to	
 Ā 
build	
 Ā bespoke	
 Ā solu3ons;	
 Ā e.g.	
 Ā Press	
 Ā Associa3on	
 Ā 
and	
 Ā the	
 Ā Na3onal	
 Ā Archive	
 Ā 
•  Informa3on	
 Ā Extrac3on	
 Ā in	
 Ā many	
 Ā languages	
 Ā 
GATE	
 Ā 
EUCLID	
 Ā -­‐	
 Ā Providing	
 Ā Linked	
 Ā Data	
 Ā  120	
 Ā 
hjp://www.gate.ac.uk/	
 Ā 
•  Increases	
 Ā recall	
 Ā over	
 Ā DBpedia	
 Ā by	
 Ā deriving	
 Ā new	
 Ā 
lexicalisa3ons	
 Ā for	
 Ā URIs	
 Ā from	
 Ā link	
 Ā anchor	
 Ā texts,	
 Ā 
disambigua3on	
 Ā pages,	
 Ā and	
 Ā redirect	
 Ā pages	
 Ā 
GATE	
 Ā Example	
 Ā -­‐	
 Ā LODIE	
 Ā 
EUCLID	
 Ā -­‐	
 Ā Providing	
 Ā Linked	
 Ā Data	
 Ā  121	
 Ā 
Precision	
 Ā and	
 Ā Recall	
 Ā 
•  Generic	
 Ā services	
 Ā typically	
 Ā very	
 Ā low	
 Ā recall	
 Ā 
•  Combina3on	
 Ā is	
 Ā one	
 Ā solu3on	
 Ā 
•  Other	
 Ā solu3on	
 Ā is	
 Ā custom	
 Ā extrac3on	
 Ā 
122	
 Ā 
PER LOC ORG TOTAL
DB	
 Ā Spotlight 0.97	
 Ā /	
 Ā 0.40 0.82	
 Ā /	
 Ā 0.46 0.86	
 Ā /	
 Ā 0.31 0.85	
 Ā /	
 Ā 0.39
Zemanta 0.96	
 Ā /	
 Ā 0.84 0.89	
 Ā /	
 Ā 0.62 0.82	
 Ā /	
 Ā 0.57 0.90	
 Ā /	
 Ā 0.68
LODIE 0.81	
 Ā /	
 Ā 0.82 0.73	
 Ā /	
 Ā 0.76 0.56	
 Ā /	
 Ā 0.59 0.71	
 Ā /	
 Ā 0.74
Zemanta	
  ∩	
 Ā LODIE 1.00	
 Ā /	
 Ā 0.74 0.95	
 Ā /	
 Ā 0.45 0.97	
 Ā /	
 Ā 0.42 0.97	
 Ā /	
 Ā 0.54
Zemanta	
 Ā U	
 Ā LODIE 0.94	
 Ā /	
 Ā 0.93 0.77	
 Ā /	
 Ā 0.76 0.72	
 Ā /	
 Ā 0.71 0.82	
 Ā /	
 Ā 0.81
EUCLID	
 Ā -­‐	
 Ā Providing	
 Ā Linked	
 Ā Data	
 Ā 
Custom	
 Ā GATE	
 Ā Gazetteer	
 Ā 
•  Retrieve	
 Ā MusicBrainz	
 Ā 
	
 Ā en3ty/label/class	
 Ā 	
 Ā 
	
 Ā with	
 Ā SPARQL	
 Ā query	
 Ā 
123	
 Ā EUCLID	
 Ā -­‐	
 Ā Providing	
 Ā Linked	
 Ā Data	
 Ā 
GATECloud	
 Ā 
•  Custom	
 Ā (e.g.	
 Ā based	
 Ā around	
 Ā custom	
 Ā gazejeer)	
 Ā 
GATE	
 Ā pipelines	
 Ā can	
 Ā be	
 Ā executed	
 Ā on	
 Ā the	
 Ā cloud:	
 Ā 
124	
 Ā EUCLID	
 Ā -­‐	
 Ā Providing	
 Ā Linked	
 Ā Data	
 Ā 
INTERLINKING	
 Ā DATA	
 Ā SETS	
 Ā WITH	
 Ā 
SILK	
 Ā 
EUCLID	
 Ā -­‐	
 Ā Providing	
 Ā Linked	
 Ā Data	
 Ā  125	
 Ā 
Interlinking	
 Ā with	
 Ā Silk	
 Ā 
•  Task:	
 Ā Create	
 Ā links	
 Ā between	
 Ā 
the	
 Ā data	
 Ā set	
 Ā and	
 Ā external	
 Ā 
Linked	
 Ā Data	
 Ā sources.	
 Ā 
•  Approach:	
 Ā Crea3on	
 Ā of	
 Ā 
specified	
 Ā links	
 Ā by	
 Ā querying	
 Ā 
the	
 Ā target	
 Ā data	
 Ā sets	
 Ā 
•  Alterna9ves:	
 Ā 
–  Manual	
 Ā crea3on	
 Ā of	
 Ā 
linkage	
 Ā rules	
 Ā by	
 Ā the	
 Ā user	
 Ā 	
 Ā 
–  Automa3c	
 Ā learning	
 Ā 
linkage	
 Ā rules	
 Ā by	
 Ā 
submi…ng	
  predefined	
 Ā 
SPARQL	
 Ā queries	
 Ā 
126	
 Ā 
LD	
 Ā Data	
 Ā set	
 Ā Access	
 Ā 
Integrated	
 Ā 
Data	
 Ā Set	
 Ā 
Interlinking	
 Ā  Cleansing	
 Ā 
Vocabulary	
 Ā 
Mapping	
 Ā 
SPARQL	
 Ā 
Endpoint	
 Ā 
Publishing	
 Ā 
CSV/	
 Ā 
TSV	
 Ā 
HTML	
 Ā  Spreadsheets	
 Ā  JSON	
 Ā 
Data	
 Ā acquisi3on	
 Ā 
EUCLID	
 Ā -­‐	
 Ā Providing	
 Ā Linked	
 Ā Data	
 Ā 
Link	
 Ā Discovery	
 Ā with	
 Ā Silk	
 Ā 
•  Open	
 Ā source	
 Ā tool	
 Ā for	
 Ā discovering	
 Ā RDF	
 Ā links	
 Ā between	
 Ā 
data	
 Ā items	
 Ā within	
 Ā different	
 Ā Linked	
 Ā Data	
 Ā sources	
 Ā 
•  It	
 Ā is	
 Ā based	
 Ā on	
 Ā the	
 Ā Silk	
 Ā Link	
  Specifica3on	
 Ā Language	
 Ā 
(Silk-­‐LSL)	
 Ā for	
 Ā expressing	
 Ā linkage	
 Ā rules	
 Ā 
•  It	
 Ā accesses	
 Ā the	
 Ā target	
 Ā RDF	
 Ā data	
 Ā sets	
 Ā via	
 Ā SPARQL	
 Ā 
endpoints	
 Ā to	
 Ā generate	
 Ā RDF	
 Ā links	
 Ā 
EUCLID	
 Ā -­‐	
 Ā Providing	
 Ā Linked	
 Ā Data	
 Ā  127	
 Ā 
Source:	
 Ā Robert	
 Ā Isele.	
 Ā ā€œLOD2	
 Ā Webinar	
 Ā Series:Silkā€	
 Ā 	
 Ā 
Silk	
 Ā Variants	
 Ā 
•  Silk	
 Ā Single	
 Ā Machine	
 Ā 
•  Generates	
 Ā RDF	
 Ā links	
 Ā on	
 Ā a	
 Ā single	
 Ā machine	
 Ā 
•  Data	
 Ā sets	
 Ā can	
 Ā reside	
 Ā either	
 Ā locally	
 Ā or	
 Ā in	
 Ā remote	
 Ā machines	
 Ā 
•  Provides	
 Ā mul3threading	
 Ā and	
 Ā caching	
 Ā 
•  Silk	
 Ā MapReduce	
 Ā 
•  Uses	
 Ā a	
 Ā cluster	
 Ā composed	
 Ā of	
 Ā mul3ple	
 Ā machines	
 Ā 
•  Based	
 Ā on	
 Ā Hadoop	
 Ā and	
 Ā designed	
 Ā to	
 Ā scale	
 Ā to	
 Ā big	
 Ā data	
 Ā sets	
 Ā 	
 Ā 
•  Silk	
 Ā Server	
 Ā 
•  Used	
 Ā within	
 Ā applica3ons	
 Ā that	
 Ā consume	
 Ā Linked	
 Ā Data	
 Ā from	
 Ā the	
 Ā Web	
 Ā 
while	
 Ā keeping	
 Ā track	
 Ā of	
 Ā known	
 Ā en33es	
 Ā 	
 Ā 
•  Provides	
 Ā an	
 Ā HTTP	
 Ā API	
 Ā for	
 Ā matching	
 Ā en33es	
 Ā from	
 Ā an	
 Ā incoming	
 Ā 
stream	
 Ā 
EUCLID	
 Ā -­‐	
 Ā Providing	
 Ā Linked	
 Ā Data	
 Ā  128	
 Ā 
Source:	
 Ā hjp://wifo5-­‐03.informa3k.uni-­‐mannheim.de/bizer/silk/	
 Ā 
Source:	
 Ā Silk	
 Ā workflow	
 Ā is	
 Ā par3ally	
 Ā based	
 Ā on	
 Ā ā€œLOD2	
 Ā Webinar	
 Ā Series:	
 Ā Silk	
 Ā -­‐(Simplified)	
 Ā 
Linking	
 Ā Workflowā€	
 Ā by	
 Ā Rober	
 Ā Isele.	
 Ā 	
 Ā 	
 Ā 
	
 Ā 	
 Ā 
Silk	
 Ā Workflow	
 Ā 
EUCLID	
 Ā -­‐	
 Ā Providing	
 Ā Linked	
 Ā Data	
 Ā  129	
 Ā 
Select	
 Ā LD	
 Ā 
data	
 Ā sets	
 Ā 
•  Iden3fy	
 Ā 
suitable	
 Ā data	
 Ā 
sets	
 Ā in	
 Ā LD	
 Ā 
catalogs*	
 Ā 
•  Select	
 Ā the	
 Ā two	
 Ā 
data	
 Ā sets	
 Ā to	
 Ā 
link	
 Ā 
Specify	
 Ā LD	
 Ā 
data	
 Ā sets	
 Ā 
•  Specify	
 Ā the	
 Ā 
access	
 Ā method	
 Ā 
to	
 Ā the	
 Ā data	
 Ā set	
 Ā 
(RDF	
 Ā dump,	
 Ā 
SPARQL	
 Ā 
endpoint)*	
 Ā 
•  Specify	
 Ā the	
 Ā 
en3ty	
 Ā types	
 Ā to	
 Ā 
be	
 Ā linked	
 Ā 
Write	
 Ā 
linkage	
 Ā rule	
 Ā 
•  Specifies	
 Ā how	
 Ā 
to	
 Ā compare	
 Ā 
the	
 Ā resources	
 Ā 
•  Use	
 Ā Silk-­‐LSL	
 Ā 
•  The	
 Ā rules	
 Ā can	
 Ā 
also	
 Ā be	
 Ā learnt	
 Ā 	
 Ā 
Generate	
 Ā 
RDF	
 Ā links	
 Ā 
•  Output	
 Ā links	
 Ā 
can	
 Ā be	
 Ā stored	
 Ā 
in	
 Ā a	
  file	
 Ā or	
 Ā a	
 Ā 
triple	
 Ā store	
 Ā 
•  Can	
 Ā discover	
 Ā 
SKOS	
 Ā links	
 Ā 
Silk	
 Ā framework	
 Ā 
*	
 Ā See	
 Ā sec3on	
 Ā ā€œPublishing	
 Ā Linked	
 Ā Dataā€	
 Ā 	
 Ā 
Linkage	
 Ā Rule	
 Ā Components	
 Ā 
•  Linkage	
 Ā rules	
  define	
 Ā the	
 Ā condi3ons	
 Ā to	
 Ā create	
 Ā the	
 Ā links	
 Ā 
between	
 Ā the	
 Ā data	
 Ā sets.	
 Ā These	
 Ā rules	
 Ā are	
 Ā composed	
 Ā of:	
 Ā 
EUCLID	
 Ā -­‐	
 Ā Providing	
 Ā Linked	
 Ā Data	
 Ā  130	
 Ā 
Source:	
 Ā hjp://wifo5-­‐03.informa3k.uni-­‐mannheim.de/bizer/silk/	
 Ā 
RDF	
 Ā Paths	
 Ā 
•  Describe	
 Ā the	
 Ā elements	
 Ā to	
 Ā be	
 Ā 
compared	
 Ā 
•  Example:	
 Ā ?a/rdfs:label	
 Ā 
	
 Ā 
Transforma9ons	
 Ā 
•  Apply	
 Ā transforma3ons	
 Ā to	
 Ā the	
 Ā 
result	
 Ā set	
 Ā of	
 Ā an	
 Ā RDF	
 Ā path	
 Ā 
•  Examples:	
 Ā LowerCase,	
 Ā 
Concatenate,	
 Ā Replace,	
  …	
 Ā 
Comparators	
 Ā 
•  Compute	
 Ā the	
 Ā similarity	
 Ā of	
 Ā two	
 Ā 
inputs	
 Ā 
•  Examples:	
 Ā String	
 Ā similarity	
 Ā 
metrics,	
 Ā Date	
 Ā similarity,	
  …	
 Ā 
	
 Ā 
Aggrega9ons	
 Ā 
•  Compute	
 Ā an	
 Ā aggregated	
 Ā value	
 Ā 
from	
 Ā mul3ple	
 Ā comparators	
 Ā 
•  Examples:	
 Ā Min,	
 Ā Max,	
 Ā Avg,	
 Ā various	
 Ā 
means,	
 Ā Euclidian	
 Ā distance	
  …	
 Ā 
1	
 Ā  2	
 Ā 
3	
 Ā  4	
 Ā 
Silk	
 Ā Workbench	
 Ā 
•  Web	
 Ā applica3on	
 Ā built	
 Ā on	
 Ā top	
 Ā of	
 Ā Silk,	
 Ā which	
 Ā allows	
 Ā the	
 Ā 
crea3on	
 Ā of	
 Ā projects	
 Ā to	
 Ā manage	
 Ā the	
 Ā crea3on	
 Ā of	
 Ā links	
 Ā 
between	
 Ā RDF	
 Ā data	
 Ā sets	
 Ā 
•  The	
 Ā data	
 Ā sets	
 Ā can	
 Ā be	
 Ā stored	
 Ā locally	
 Ā or	
 Ā accessed	
 Ā 
remotely	
 Ā by	
 Ā specifying	
 Ā the	
 Ā SPARQL	
 Ā endpoint	
 Ā 
•  The	
 Ā user	
 Ā is	
 Ā able	
 Ā to	
 Ā create	
 Ā customized	
 Ā linking	
 Ā tasks:	
 Ā 
–  The	
 Ā tool	
 Ā offers	
 Ā a	
 Ā graphical	
 Ā editor	
 Ā to	
 Ā create	
 Ā linkage	
 Ā rules	
 Ā by	
 Ā 
combining	
 Ā the	
 Ā linkage	
 Ā rules	
 Ā components	
 Ā via	
 Ā drag	
 Ā &	
 Ā drop	
 Ā 
elements	
 Ā 
–  Includes	
 Ā support	
 Ā for	
 Ā (automa3c)	
 Ā learning	
 Ā linkage	
 Ā rules	
 Ā 
EUCLID	
 Ā -­‐	
 Ā Providing	
 Ā Linked	
 Ā Data	
 Ā  131	
 Ā 
Project	
  configuration	
 Ā 
	
 Ā 
	
 Ā 
	
 Ā 
	
 Ā 
	
 Ā 
Silk	
 Ā Workbench	
 Ā (2)	
 Ā 
EUCLID	
 Ā -­‐	
 Ā Providing	
 Ā Linked	
 Ā Data	
 Ā  132	
 Ā 
1	
 Ā 
2	
 Ā 
3	
 Ā 
4	
 Ā 
1.  Project:	
 Ā name	
 Ā and	
 Ā components	
 Ā (data	
 Ā 
sources,	
 Ā 	
 Ā linking	
 Ā tasks	
 Ā and	
 Ā output	
 Ā tasks)	
 Ā 
2.  Data	
 Ā sources:	
  specifica3on	
 Ā of	
 Ā the	
 Ā data	
 Ā 
sets	
 Ā to	
 Ā be	
 Ā interlinked	
 Ā 
3.  Linking	
 Ā task:	
  specifica3on	
 Ā of	
 Ā the	
 Ā linkage	
 Ā 
rules	
 Ā and	
 Ā type	
 Ā of	
 Ā links	
 Ā to	
 Ā be	
 Ā created	
 Ā 
4.  Output	
 Ā task:	
 Ā mechanism	
 Ā to	
 Ā store	
 Ā the	
 Ā 
results	
 Ā from	
 Ā the	
 Ā lnking	
 Ā process	
 Ā 
2	
 Ā 
Editing	
 Ā a	
 Ā linking	
 Ā task	
 Ā 	
 Ā 
	
 Ā 
	
 Ā 
	
 Ā 
	
 Ā 
	
 Ā 
	
 Ā 
Silk	
 Ā Workbench	
 Ā (3)	
 Ā 
EUCLID	
 Ā -­‐	
 Ā Providing	
 Ā Linked	
 Ā Data	
 Ā  133	
 Ā 
1	
 Ā 
4	
 Ā 
2	
 Ā 
3	
 Ā 
1.  Linkage	
 Ā rule	
 Ā components	
 Ā 	
 Ā 
2.  Graphical	
 Ā editor:	
 Ā the	
 Ā items	
 Ā from	
 Ā (1)	
 Ā are	
 Ā dragged	
 Ā &	
 Ā 
dropped	
 Ā in	
 Ā this	
 Ā area,	
 Ā and	
 Ā connected	
 Ā to	
 Ā compose	
 Ā the	
 Ā 
linkage	
 Ā rules	
 Ā 	
 Ā 
3.  Generate	
 Ā links:	
 Ā based	
 Ā on	
 Ā the	
  defined	
 Ā linkage	
 Ā rules	
 Ā in	
 Ā (2),	
 Ā 
the	
 Ā data	
 Ā sets	
 Ā are	
 Ā accessed	
 Ā to	
 Ā discover	
 Ā possible	
 Ā links	
 Ā 
4.  Learn:	
 Ā automa3c	
 Ā learning	
 Ā of	
 Ā linkage	
 Ā rules	
 Ā 
Adding	
 Ā a	
 Ā linkage	
 Ā rule	
 Ā 	
 Ā 
	
 Ā 
	
 Ā 
	
 Ā 
	
 Ā 
	
 Ā 
	
 Ā 
Silk	
 Ā Workbench	
 Ā (4)	
 Ā 
EUCLID	
 Ā -­‐	
 Ā Providing	
 Ā Linked	
 Ā Data	
 Ā  134	
 Ā 
The	
 Ā previous	
 Ā linkage	
 Ā rule	
 Ā states:	
 Ā 
1.  Retrieve	
 Ā the	
 Ā foaf:name	
 Ā values	
 Ā from	
 Ā MusicBrainz	
 Ā and	
 Ā 
the	
 Ā rdfs:label	
 Ā from	
 Ā DBpedia	
 Ā 
2.  Apply	
 Ā lower	
 Ā case	
 Ā transforma3on	
 Ā to	
 Ā the	
 Ā output	
 Ā of	
 Ā (1)	
 Ā 
3.  Compare	
 Ā the	
 Ā output	
 Ā from	
 Ā (2)	
 Ā using	
 Ā the	
 Ā metric	
 Ā 
ā€œLevenshtein	
 Ā distanceā€.	
 Ā If	
 Ā this	
 Ā distance	
 Ā is	
 Ā greater	
 Ā than	
 Ā 
0.90,	
 Ā then	
 Ā create	
 Ā a	
 Ā link.	
 Ā 
1	
 Ā  2	
 Ā 
3	
 Ā 
Generate	
 Ā Links	
 Ā 
	
 Ā 
	
 Ā 
	
 Ā 
	
 Ā 
	
 Ā 
Silk	
 Ā Workbench	
 Ā (5)	
 Ā 
EUCLID	
 Ā -­‐	
 Ā Providing	
 Ā Linked	
 Ā Data	
 Ā  135	
 Ā 
Learn	
 Ā Rules	
 Ā 
	
 Ā 
	
 Ā 
	
 Ā 
	
 Ā 
	
 Ā 
	
 Ā 
Silk	
 Ā Workbench	
 Ā (6)	
 Ā 
EUCLID	
 Ā -­‐	
 Ā Providing	
 Ā Linked	
 Ā Data	
 Ā  136	
 Ā 
For	
 Ā exercises,	
 Ā quiz	
 Ā and	
 Ā further	
 Ā material	
 Ā visit	
 Ā our	
 Ā website:	
 Ā 
	
 Ā 
EUCLID	
 Ā -­‐	
 Ā Providing	
 Ā Linked	
 Ā Data	
 Ā  137	
 Ā 
@euclid_project	
 Ā  euclidproject	
 Ā  euclidproject	
 Ā 
http://www.euclid-­‐project.eu	
 Ā 
Other	
 Ā channels:	
 Ā 
eBook	
 Ā  Course	
 Ā 

ESWC SS 2013 - Tuesday Tutorial 1 Maribel Acosta and Barry Norton: Providing Linked Data

  • 1.
    Providing Ā Linked Ā Data Ā  Presented Ā by: Ā  Barry Ā Norton Ā  Maribel Ā Acosta Ā 
  • 2.
    Motivation: Ā Music! Ā  2 Ā  Visualiza3on Ā  Module Ā  Metadata Ā  Streaming Ā providers Ā  Physical Ā Wrapper Ā  Downloads Ā  Data Ā acquisi3on Ā  R2R Ā Transf. Ā LD Ā Wrapper Ā  Musical Ā Content Ā  Applica3on Ā  Analysis Ā & Ā  Mining Ā Module Ā  LD Ā Data Ā set Ā Access Ā  LD Ā Wrapper Ā  RDF/ Ā  XML Ā  Integrated Ā  Dataset Ā  Interlinking Ā  Cleansing Ā  Vocabulary Ā  Mapping Ā  SPARQL Ā  Endpoint Ā  Publishing Ā  RDFa Ā  Other Ā content Ā 
  • 3.
    LINKED Ā DATA Ā LIFECYCLE Ā  EUCLID Ā -­‐ Ā Querying Ā Linked Ā Data Ā  3 Ā 
  • 4.
    Linked Ā Data Ā Principles Ā  1.  Use Ā URIs Ā as Ā names Ā for Ā things. Ā  2.  Use Ā HTTP Ā URIs Ā so Ā that Ā users Ā can Ā look Ā up Ā  those Ā names. Ā  3.  When Ā someone Ā looks Ā up Ā a Ā URI, Ā provide Ā  useful Ā informa9on, Ā using Ā the Ā standards Ā  (RDF*, Ā SPARQL). Ā  4.  Include Ā links Ā to Ā other Ā URIs, Ā so Ā that Ā users Ā  can Ā discover Ā more Ā things. Ā  EUCLID Ā -­‐ Ā Providing Ā Linked Ā Data Ā  4 Ā  CH Ā 1 Ā 
  • 5.
    Linked Ā Data Ā  Lifecycle Ā  Linked Ā Data Ā Lifecycle Ā  EUCLID Ā -­‐ Ā Providing Ā Linked Ā Data Ā  5 Ā  Source: Ā Sƶren Ā Auer. Ā ā€œThe Ā Seman3c Ā Data Ā Webā€ Ā (slides) Ā  Source: Ā JosĆ© Ā M. Ā Alvarez. Ā ā€œMy Ā Linked Ā Data Ā Lifecycleā€ Ā  Source: Ā Michael Ā Hausenblas. Ā ā€œLinked Ā Data Ā lifeyclcleā€ Ā 
  • 6.
    Core Ā Tasks Ā for Ā Providing Ā  Ā  Ā  Ā  Ā  Ā  Linked Ā Data Ā  Ā  EUCLID Ā -­‐ Ā Providing Ā Linked Ā Data Ā  6 Ā  Based Ā on Ā the Ā proposed Ā LD Ā lifecycles Ā and Ā the Ā LD Ā  principles, Ā we Ā can Ā iden3fy Ā 3 Ā main Ā tasks Ā for Ā providing Ā LD: Ā  ① Crea9ng: Ā includes Ā data Ā extrac3on, Ā crea3on Ā of Ā HTTP Ā  URIs, Ā and Ā vocabulary Ā selec3on. Ā (LD Ā principles Ā 1 Ā & Ā 2) Ā  ⑔ Interlinking: Ā involves Ā the Ā crea3on Ā of Ā (RDF) Ā links Ā to Ā  external Ā data Ā sets. Ā (LD Ā principle Ā 4) Ā  ③ Publishing: Ā consists Ā of Ā crea3ng Ā the Ā metadata Ā and Ā  making Ā the Ā data Ā set Ā accessible. Ā (LD Ā principle Ā 3) Ā  Ā 
  • 7.
    Agenda Ā  1.  Crea9ng Ā Linked Ā Data Ā  2.  Interlinking Ā Linked Ā Data Ā  3.  Publishing Ā Linked Ā Data Ā  4.  Linked Ā Data Ā publishing Ā checklist Ā  7 Ā EUCLID Ā -­‐ Ā Providing Ā Linked Ā Data Ā 
  • 8.
    CREATING Ā LINKED Ā DATA Ā  EUCLID Ā -­‐ Ā Querying Ā Linked Ā Data Ā  8 Ā 
  • 9.
    •  The Ā data Ā of Ā interest Ā may Ā be Ā stored Ā in Ā a Ā wide Ā range Ā or Ā  formats: Ā  Ā  •  Several Ā tools Ā support Ā the Ā process Ā of Ā mining Ā data Ā  from Ā different Ā repositories, Ā for Ā example: Ā  Extracting Ā the Ā Data Ā  9 Ā EUCLID Ā -­‐ Ā Providing Ā Linked Ā Data Ā  Spreadsheets Ā  or Ā tabular Ā data Ā  Ā  Databases Ā  Text Ā  R2RML Ā 
  • 10.
    Using Ā the Ā RDF Ā Data Ā Model Ā  EUCLID Ā -­‐ Ā Providing Ā Linked Ā Data Ā  10 Ā  •  The Ā RDF Ā data Ā model Ā is Ā used Ā to Ā represent Ā the Ā  extracted Ā informa3on Ā  •  The Ā nodes Ā represent Ā the Ā concepts/en33es Ā within Ā  the Ā data. Ā A Ā node Ā corresponds Ā to Ā a Ā URI, Ā a Ā blank Ā node Ā  or Ā a Ā literal Ā (only Ā in Ā predicates) Ā  •  The Ā rela3onships Ā between Ā the Ā concepts/en33es Ā are Ā  modeled Ā as Ā arcs Ā  Subject Ā  Object Ā  Predicate Ā 
  • 11.
    Naming Ā Things: Ā URIs Ā  •  All Ā the Ā things Ā or Ā dis3nct Ā en33es Ā within Ā the Ā data Ā must Ā  be Ā named Ā  •  According Ā to Ā the Ā Linked Ā Data Ā principles, Ā the Ā standard Ā  mechanism Ā to Ā name Ā en33es Ā is Ā the Ā URI Ā  •  Designing Ā Cool Ā URIs: Ā  –  Leave Ā out Ā informa3on Ā about Ā the Ā data Ā regarding Ā to: Ā author, Ā  technologies, Ā status, Ā access Ā mechanisms,  … Ā  –  Simplicity: Ā short, Ā mnemonic Ā URIs Ā  –  Stability: Ā maintain Ā the Ā URIs Ā as Ā long Ā as Ā possible Ā  –  Manageability: Ā issue Ā the Ā URIs Ā in Ā a Ā way Ā that Ā you Ā can Ā manage Ā  11 Ā EUCLID Ā -­‐ Ā Providing Ā Linked Ā Data Ā  Source:hjp://www.w3.org/TR/cooluris/ Ā 
  • 12.
    Selecting Ā Vocabularies Ā  • Vocabularies Ā  Ā model Ā the Ā concepts Ā and Ā the Ā  rela9onship Ā between Ā them Ā in Ā a Ā knowledge Ā domain Ā  •  Terms Ā from Ā well-­‐known Ā vocabularies Ā should Ā be Ā  reused Ā wherever Ā possible Ā  •  New Ā terms Ā should Ā be Ā define Ā only Ā if Ā you Ā can Ā not  find Ā  required Ā terms Ā in Ā exis3ng Ā vocabularies Ā  •  A Ā large Ā number Ā of Ā vocabularies Ā in Ā RDF Ā are Ā openly Ā  available, Ā e.g., Ā Linked Ā Open Ā Vocabularies Ā (LOV) Ā  12 Ā EUCLID Ā -­‐ Ā Providing Ā Linked Ā Data Ā 
  • 13.
    Selecting Ā Vocabularies Ā (2) Ā  EUCLID Ā -­‐ Ā Providing Ā Linked Ā Data Ā  13 Ā  Linked Ā Open Ā Vocabularies Ā  322 Ā vocabularies Ā  classified Ā by Ā domain Ā  Source:hjp://lov.okfn.org/dataset/lov/ Ā 
  • 14.
    Selecting Ā Vocabularies Ā (3) Ā  EUCLID Ā -­‐ Ā Providing Ā Linked Ā Data Ā  14 Ā  Linked Ā Open Ā Vocabularies: Ā Analyzing Ā MusicOntology Ā  Source:hjp://lov.okfn.org/dataset/lov/details/vocabulary_mo.html Ā 
  • 15.
    Selecting Ā Vocabularies Ā (4) Ā  EUCLID Ā -­‐ Ā Providing Ā Linked Ā Data Ā  15 Ā  Other Ā lists Ā of Ā well-­‐known Ā vocabularies Ā are Ā maintained Ā  by: Ā  •  W3C Ā SWEO Ā Linking Ā Open Ā Data Ā community Ā project Ā  hjp://www.w3.org/wiki/TaskForces/CommunityProjects/LinkingOpenData/ CommonVocabularies Ā  •  Library Ā Linked Ā Data Ā Incubator Ā Group: Ā Vocabularies Ā in Ā  the Ā library Ā domain Ā  hjp://www.w3.org/2005/Incubator/lld/XGR-­‐lld-­‐vocabdataset-­‐20111025 Ā 
  • 16.
    INTERLINKING Ā LINKED Ā DATA Ā  EUCLID Ā -­‐ Ā Providing Ā Linked Ā Data Ā  16 Ā 
  • 17.
    Interlinking Ā Data Ā Sets Ā  •  It’s Ā one Ā of Ā the Ā Linked Ā Data Ā principles! Ā  •  Involves Ā the Ā crea3on Ā of Ā RDF Ā links Ā between Ā two Ā  different Ā RDF Ā data Ā sets: Ā  –  Links Ā at Ā instance Ā level Ā (rdfs:seeAlso, Ā owl:sameAs) Ā  –  Links Ā at Ā schema Ā level Ā (RDFS Ā subclass/subproperty, Ā OWL Ā  equivalent Ā class/property, Ā SKOS Ā mapping Ā proper9es) Ā  •  Appropriate Ā links Ā are Ā detected Ā via Ā link Ā discovery Ā  EUCLID Ā -­‐ Ā Providing Ā Linked Ā Data Ā  17 Ā  4. Ā Include Ā links Ā to Ā other Ā URIs, Ā so Ā that Ā users Ā can Ā discover Ā  more Ā things. Ā 
  • 18.
    Interlinking Ā Data Ā Sets Ā (2) Ā  Challenges Ā for Ā link Ā discovery Ā  •  Linked Ā Data Ā sets Ā are Ā heterogeneous Ā in Ā terms Ā of Ā  vocabularies, Ā formats Ā and Ā data Ā representa3on Ā  •  Large Ā range Ā of Ā knowledge Ā domains Ā  Ā  •  Scalability: Ā LD Ā is Ā composed Ā of Ā a Ā large Ā number Ā of Ā data Ā  sets Ā and Ā RDF Ā triples, Ā hence Ā it Ā is Ā not Ā possible Ā to Ā  compare Ā every Ā possible Ā en3ty Ā pair Ā  EUCLID Ā -­‐ Ā Providing Ā Linked Ā Data Ā  18 Ā  Source: Ā Robert Ā Isele. Ā ā€œLOD2 Ā Webinar Ā Series:Silkā€ Ā  Ā 
  • 19.
    Interlinking Ā Data Ā Sets Ā (3) Ā  Challenges Ā for Ā link Ā discovery Ā  •  It Ā corresponds Ā to Ā the Ā en9ty Ā resolu9on Ā problem: Ā  deciding Ā whether Ā two Ā en..es Ā correspond Ā to Ā same Ā object Ā in Ā  the Ā real Ā world Ā  •  Name Ā ambigui9es: Ā typos, Ā misspellings, Ā different Ā  languages, Ā homonyms Ā  Ā  •  Structural Ā ambigui9es: Ā same Ā concepts/en33es Ā with Ā  different Ā structures. Ā Requires Ā the Ā applica3on Ā of Ā ontology Ā  and Ā schema Ā matching Ā techniques Ā  EUCLID Ā -­‐ Ā Providing Ā Linked Ā Data Ā  19 Ā 
  • 20.
    Interlinking Ā Data Ā Sets Ā (4) Ā  EUCLID Ā -­‐ Ā Providing Ā Linked Ā Data Ā  20 Ā  RDF Ā data Ā sets Ā  Ā  can Ā be Ā interlinked: Ā  Manually Ā  •  Involves Ā the Ā manual Ā explora3on Ā of Ā  LD Ā data Ā sets Ā and Ā their Ā RDF Ā  resources Ā to Ā iden3fy Ā linking Ā targets Ā  •  May Ā not Ā be Ā feasible Ā when Ā the Ā  number Ā of Ā en33es Ā within Ā the Ā data Ā  set Ā is Ā very Ā large Ā  Ā  Automatically Ā  •  Using Ā tools Ā that Ā perform Ā link Ā  discovery Ā based Ā on Ā linkage Ā rules, Ā for Ā  example: Ā Silk, Ā Limes Ā and Ā xCurator Ā 
  • 21.
    owl:sameAs Ā & Ā rdfs:seeAlso Ā  •  owl:sameAs Ā  •  Creates Ā links Ā between Ā individuals Ā  Ā  •  States Ā that Ā two Ā URIs Ā refer Ā to Ā the Ā same Ā individuals Ā  Ā  •  rdfs:seeAlso Ā  •  States Ā that Ā a Ā resource Ā may Ā provide Ā addi3onal Ā informa3on Ā  about Ā the Ā subject Ā resource Ā  •  Links Ā in Ā MusicBrainz: Ā  –  owl:seeAlso Ā is Ā used Ā for Ā music Ā ar3sts Ā  –  rdfs:seeAlso Ā is Ā used Ā for Ā albums Ā  EUCLID Ā -­‐ Ā Providing Ā Linked Ā Data Ā  21 Ā 
  • 22.
    SKOS Ā  •  Simple Ā Knowledge Ā Organiza3on Ā System Ā  –  hjp://www.w3.org/TR/skos-­‐reference/ Ā  Ā  •  Data Ā model Ā for Ā knowledge Ā organiza3on Ā systems Ā  (thesauri, Ā classifica3on Ā scheme, Ā taxonomies) Ā  Ā  •  SKOS Ā data Ā is Ā expressed Ā as Ā RDF Ā triples Ā  •  Allows Ā the Ā crea3on Ā of Ā RDF Ā links Ā between Ā different Ā  data Ā sets Ā with Ā the Ā usage Ā of Ā mapping Ā proper9es Ā  EUCLID Ā -­‐ Ā Providing Ā Linked Ā Data Ā  22 Ā 
  • 23.
    SKOS: Ā Mapping Ā Properties Ā  These Ā proper3es Ā are Ā used Ā to Ā link Ā SKOS Ā concepts Ā  (par3cularly Ā instances) Ā in Ā different Ā schemes: Ā  •  skos:closeMatch: Ā links Ā two Ā concepts Ā that Ā are Ā  sufficiently Ā similar Ā (some3mes Ā can Ā be Ā used Ā interchangeably) Ā  •  skos:exactMatch: Ā indicates Ā that Ā the Ā two Ā concepts Ā  can Ā be Ā used Ā interchangeably. Ā  Ā  •  Axiom: Ā It Ā is Ā a Ā transi9ve Ā property Ā  •  skos:relatedMatch: Ā states Ā an Ā associa3ve Ā mapping Ā  link Ā between Ā two Ā concepts Ā  EUCLID Ā -­‐ Ā Providing Ā Linked Ā Data Ā  23 Ā 
  • 24.
    Example Ā of Ā SKOS Ā exact Ā match Ā  Ā  Ā  SKOS: Ā Mapping Ā Properties Ā (2) Ā  EUCLID Ā -­‐ Ā Providing Ā Linked Ā Data Ā  24 Ā  mo:MusicArtist Ā skos:exactMatch Ā dbpedia-­‐ont:MusicalArtist. Ā  @prefix Ā skos: Ā <http://www.w3.org/2004/02/skos/core#> Ā  @prefix Ā mo: Ā <http://purl.org/ontology/mo/> Ā  @prefix Ā dbpedia-­‐ont: Ā <http://dbpedia.org/ontology/> Ā  @prefix Ā schema: Ā <http://schema.org/> Ā  Ā  Ā  Ā  mo:MusicGroup Ā skos:exactMatch Ā schema:MusicGroup. Ā  mo:MusicGroup Ā skos:exactMatch Ā dbpedia-­‐ont:Band. Ā 
  • 25.
    Example Ā of Ā SKOS Ā close Ā match Ā  Ā  Ā  SKOS: Ā Mapping Ā Properties Ā (3) Ā  EUCLID Ā -­‐ Ā Providing Ā Linked Ā Data Ā  25 Ā  mo:SignalGroup Ā skos:closeMatch Ā schema:MusicAlbum. Ā  @prefix Ā skos: Ā <http://www.w3.org/2004/02/skos/core#> Ā  @prefix Ā mo: Ā <http://purl.org/ontology/mo/> Ā  @prefix Ā dbpedia-­‐ont: Ā <http://dbpedia.org/ontology/> Ā  @prefix Ā schema: Ā <http://schema.org/> Ā  Ā  Ā  mo:SignalGroup Ā skos:closeMatch Ā dbpedia-­‐ont:Album. Ā 
  • 26.
    Integrity Ā conditions Ā  • Guarantee Ā consistency Ā and Ā avoid Ā contradic3ons Ā in Ā  the Ā rela3onships Ā between Ā SKOS Ā concepts Ā  SKOS: Ā Mapping Ā Properties Ā (4) Ā  EUCLID Ā -­‐ Ā Providing Ā Linked Ā Data Ā  26 Ā  skos:Mapping Ā  Relation Ā  skos:close Ā  Match Ā  skos:exact Ā  Match Ā  skos:related Ā  Match Ā  Symmetric Ā  & Ā Transi9ve Ā  Disjoint Ā  with Ā  Par3al Ā Mapping Ā Rela3on Ā diagram Ā with Ā integrity Ā condi3ons Ā  Symmetric Ā 
  • 27.
    PUBLISHING Ā LINKED Ā DATA Ā  EUCLID Ā -­‐ Ā Providing Ā Linked Ā Data Ā  27 Ā 
  • 28.
    Publishing Ā Linked Ā Data Ā  Once Ā the Ā RDF Ā data Ā set Ā has Ā been Ā created Ā and Ā  interlinked, Ā the Ā publishing Ā process Ā involves Ā the Ā  following Ā tasks: Ā  1.  Metadata Ā crea3on Ā for Ā describing Ā the Ā data Ā set Ā  Ā  2.  Making Ā the Ā data Ā set Ā accessible Ā  3.  Exposing Ā the Ā data Ā set Ā in Ā Linked Ā Data Ā repositories Ā  4.  Valida9ng Ā the Ā data Ā set Ā  EUCLID Ā -­‐ Ā Providing Ā Linked Ā Data Ā  28 Ā 
  • 29.
    •  Consists Ā of Ā providing Ā (machine-­‐readable) Ā metadata Ā  of Ā RDF Ā data Ā sets Ā which Ā can Ā be Ā processed Ā by Ā engines Ā  •  This Ā informa3on Ā allows Ā for: Ā  –  Efficient Ā and Ā effec3ve Ā search Ā of Ā data Ā sets Ā  –  Selec3on Ā of Ā appropriate Ā data Ā sets Ā (for Ā consump3on Ā or Ā  interlinking) Ā  –  Get Ā general Ā sta3s3cs Ā of Ā the Ā data Ā sets Ā  Ā  EUCLID Ā -­‐ Ā Providing Ā Linked Ā Data Ā  29 Ā  Describing Ā RDF Ā Data Ā Sets Ā 
  • 30.
    Describing Ā RDF Ā Data Ā Sets Ā (2) Ā  •  The Ā common Ā language Ā for Ā describing Ā RDF Ā data Ā sets Ā is Ā  VoID Ā (Vocabulary Ā of Ā Interlinked Ā Data Ā sets) Ā  Ā  •  Defines Ā an Ā RDF Ā data Ā set Ā with Ā the Ā predicate Ā  void:Dataset Ā  Ā  •  Covers Ā 4 Ā types Ā of Ā metadata: Ā  EUCLID Ā -­‐ Ā Providing Ā Linked Ā Data Ā  30 Ā  •  General Ā metadata Ā  •  Structural Ā metadata Ā  •  Descrip3ons Ā of Ā linksets Ā  •  Access Ā metadata Ā 
  • 31.
    VoID: Ā General Ā Metadata Ā  •  General Ā metadata Ā is Ā used Ā by Ā users Ā to Ā iden3fy Ā  appropriate Ā data Ā sets. Ā  •  Specifies Ā informa3on Ā about Ā descrip3on Ā of Ā the Ā data Ā  set, Ā contact Ā person/organiza3on, Ā the Ā license Ā of Ā the Ā  data Ā set, Ā data Ā subject Ā and Ā some Ā technical Ā features. Ā  •  VoID Ā (re)uses Ā predicates Ā from Ā the Ā Dublin Ā Core Ā  Metadata1 Ā and Ā FOAF2 Ā vocabularies. Ā  EUCLID Ā -­‐ Ā Providing Ā Linked Ā Data Ā  31 Ā  1 Ā hjp://dublincore.org/documents/2010/10/11/dcmi-­‐terms/ Ā  2 Ā hjp://xmlns.com/foaf/spec/ Ā 
  • 32.
    VoID: Ā General Ā Metadata Ā (2) Ā  Predicate Ā  Range Ā  Descrip9on Ā  dcterms:title Ā  Literal Ā  Name Ā of Ā the Ā data Ā set. Ā  dcterms:description Ā  Literal Ā  Descrip3on Ā of Ā the Ā data Ā set. Ā  dcterms:source Ā  RDF Ā resource Ā  Source Ā from Ā which Ā the Ā data Ā set Ā was Ā derived. Ā  dcterms:creator Ā  RDF Ā resource Ā  Primarily Ā responsible Ā of Ā crea3ng Ā the Ā data Ā set. Ā  dcterms:date Ā  xsd:date Ā  Time Ā associated Ā with Ā an Ā event Ā in Ā the Ā life-­‐cycle Ā of Ā the Ā resource. Ā  dcterms:created Ā  xsd:date Ā  Date Ā of Ā crea3on Ā of Ā the Ā data Ā set. Ā  dcterms:issued Ā  xsd:date Ā  Date Ā of Ā publica3on Ā of Ā the Ā data Ā set. Ā  dcterms:modified Ā  xsd:date Ā  Date Ā on Ā which Ā the Ā data Ā set Ā was Ā changed. Ā  foaf:homepage Ā  Literal Ā  Name Ā of Ā the Ā data Ā set. Ā  dcterms:publisher Ā  RDF Ā resource Ā  En3ty Ā responsible Ā for Ā making Ā the Ā data Ā set Ā available. Ā  dcterms:contributor Ā  RDF Ā resource Ā  En3ty Ā responsible Ā for Ā making Ā contribu3ons Ā to Ā the Ā data Ā set. Ā  EUCLID Ā -­‐ Ā Providing Ā Linked Ā Data Ā  32 Ā  Source: Ā  Ā hjp://www.w3.org/TR/void/#metadata Ā  Ā General Ā Information Ā  Contains Ā informa3on Ā about Ā the Ā crea3on Ā of Ā the Ā data Ā set Ā  Ā 
  • 33.
    VoID: Ā General Ā Metadata Ā (3) Ā  Other Ā Information Ā  •  License Ā of Ā the Ā data Ā set: Ā specifies Ā the Ā usage Ā condi3ons Ā of Ā  the Ā data. Ā The Ā license Ā can Ā be Ā pointed Ā with Ā the Ā property Ā  dcterms:license Ā  •  Category Ā of Ā the Ā data Ā set: Ā to Ā specify Ā the Ā topics Ā or Ā domains Ā  covered Ā by Ā the Ā data Ā set, Ā the Ā property Ā dcterms:subject Ā  can Ā be Ā used Ā  •  Technical Ā features: Ā the Ā property Ā void:feature Ā can Ā be Ā  used Ā to Ā express Ā technical Ā proper3es Ā of Ā the Ā data Ā (e.g. Ā RDF Ā  serializa3on Ā formats) Ā  Ā  EUCLID Ā -­‐ Ā Providing Ā Linked Ā Data Ā  33 Ā 
  • 34.
    VoID: Ā Structural Ā Metadata Ā  Ā  EUCLID Ā -­‐ Ā Providing Ā Linked Ā Data Ā  34 Ā  •  Provides Ā high-­‐level Ā informa3on Ā about Ā the Ā internal Ā  structure Ā of Ā the Ā data Ā set Ā  •  This Ā metadata Ā is Ā useful Ā when Ā exploring Ā or Ā querying Ā  the Ā data Ā set Ā  •  Includes Ā informa3on Ā about Ā resources, Ā vocabularies Ā  used Ā in Ā the Ā data Ā set, Ā sta3s3cs Ā and Ā examples Ā of Ā  resources Ā in Ā the Ā data Ā set Ā 
  • 35.
    VoID: Ā Structural Ā Metadata Ā (2) Ā  Ā  EUCLID Ā -­‐ Ā Providing Ā Linked Ā Data Ā  35 Ā  Information Ā about Ā resources Ā  •  Example Ā resources: Ā allow Ā users Ā to Ā get Ā an Ā impression Ā of Ā the Ā  kind Ā of Ā resources Ā included Ā in Ā the Ā data Ā set. Ā Examples Ā can Ā be Ā  shown Ā with Ā the Ā property Ā void:exampleResource Ā  •  Pajern Ā for Ā resource Ā URIs: Ā the Ā void:uriSpace Ā property Ā  can Ā be Ā used Ā to Ā state Ā that Ā all Ā the Ā en3ty Ā URIs Ā in Ā a Ā data Ā set Ā start Ā  with Ā a Ā given Ā string Ā  Ā  :MusicBrainz Ā a Ā void:Dataset; Ā  Ā  Ā  Ā void:exampleResource Ā  Ā  Ā  Ā  Ā  Ā <http://musicbrainz.org/artist/b10bbbfc-­‐cf9e-­‐42e0-­‐be17-­‐e2c3e1d2600d> Ā . Ā  :MusicBrainz Ā a Ā void:Dataset; Ā  Ā  Ā  Ā void:uriSpace Ā "http://musicbrainz.org/" Ā . Ā 
  • 36.
    VoID: Ā Structural Ā Metadata Ā (3) Ā  Ā  EUCLID Ā -­‐ Ā Providing Ā Linked Ā Data Ā  36 Ā  Vocabularies Ā used Ā in Ā the Ā data Ā set Ā  •  The Ā void:vocabulary Ā property Ā iden3fies Ā the Ā vocabulary Ā or Ā  ontology Ā that Ā is Ā used Ā in Ā a Ā data Ā set Ā  •  Typically, Ā only Ā the Ā most Ā relevant Ā vocabularies Ā are Ā listed Ā  •  This Ā property Ā can Ā only Ā be Ā used Ā for Ā en3re Ā vocabularies. Ā It Ā  cannot Ā be Ā used Ā to Ā express Ā that Ā a Ā subset Ā of Ā the Ā vocabulary Ā  occurs Ā in Ā the Ā data Ā set. Ā  Ā  :MusicBrainz Ā a Ā void:Dataset; Ā  Ā  Ā  Ā void:vocabulary Ā <http://purl.org/ontology/mo/> Ā . Ā 
  • 37.
    VoID: Ā Structural Ā Metadata Ā (4) Ā  EUCLID Ā -­‐ Ā Providing Ā Linked Ā Data Ā  37 Ā  Source: Ā  Ā hjp://www.w3.org/TR/void/#metadata Ā  Ā  Statistics Ā about Ā a Ā data Ā set Ā  Express Ā numeric Ā sta3s3cs Ā about Ā a Ā data Ā set: Ā  Ā  Ā  Ā  Predicate Ā  Range Ā  Descrip9on Ā  void:triples Ā  Number Ā  Total Ā number Ā of Ā triples Ā contained Ā in Ā the Ā data Ā set. Ā  Ā  void:entities Ā  Number Ā  Total Ā number Ā of Ā en33es Ā that Ā are Ā described Ā in Ā the Ā data Ā set. Ā  An Ā en3ty Ā must Ā have Ā a Ā URI, Ā and Ā match Ā the Ā void:uriRegexPajern Ā  Ā  void:classes Ā  Number Ā  Total Ā number Ā of Ā dis3nct Ā classes Ā in Ā the Ā data Ā set. Ā  void:properties Ā  Number Ā  Total Ā number Ā of Ā dis3nct Ā proper3es Ā in Ā the Ā data Ā set. Ā  void:distinctSubjects Ā  Number Ā  Total Ā number Ā of Ā dis3nct Ā subjects Ā in Ā the Ā data Ā set. Ā  void:distinctObjects Ā  Number Ā  Total Ā number Ā of Ā dis3nct Ā objects Ā in Ā the Ā data Ā set. Ā  void:documents Ā  Number Ā  Total Ā number Ā of Ā documents, Ā in Ā case Ā that Ā the Ā data Ā set Ā is Ā  published Ā as Ā a Ā set Ā of Ā individual Ā documents. Ā 
  • 38.
    VoID: Ā Structural Ā Metadata Ā (5) Ā  Ā  EUCLID Ā -­‐ Ā Providing Ā Linked Ā Data Ā  38 Ā  Partitioned Ā data Ā sets Ā  •  The Ā void:subset Ā property Ā provides Ā descrip3on Ā of Ā parts Ā of Ā a Ā  data Ā set Ā  Ā  •  Data Ā sets Ā can Ā be Ā par33oned Ā based Ā on Ā classes Ā or Ā proper9es: Ā  •  void:classPartition Ā contains Ā only Ā instances Ā of Ā a Ā par3cular Ā class Ā  •  void:propertyPartition Ā contains Ā only Ā triples Ā with Ā a Ā par3cular Ā predicate Ā  :MusicBrainz Ā a Ā void:Dataset; Ā  Ā  Ā void:subset Ā :MusicBrainzArtists Ā . Ā  :MusicBrainz Ā a Ā void:Dataset; Ā  Ā  Ā void:classPartition Ā [ Ā void:class Ā mo:Release Ā .] Ā ; Ā  Ā  Ā void:propertyParition Ā [ Ā void:property Ā mo:member Ā .] Ā . Ā 
  • 39.
    VoID: Ā Describing Ā Linksets Ā  Ā  Ā  EUCLID Ā -­‐ Ā Providing Ā Linked Ā Data Ā  39 Ā  •  Linkset: Ā collec3on Ā of Ā RDF Ā links Ā between Ā two Ā  RDF Ā data Ā sets Ā  :DS1 Ā  :DS2 Ā  :LS1 Ā  :LS2 Ā  Image Ā based Ā on Ā hjp://seman3cweb.org/wiki/File:Void-­‐linkset-­‐conceptual.png Ā  owl:sameAs Ā  Ā  Ā  Ā  Ā  Ā  Ā  Ā  Ā  Ā  Ā  Ā  Ā  @PREFIX Ā void:<http://rdfs.org/ns/void#> Ā  Ā  @PREFIX Ā owl:<http://www.w3.org/2002/07/owl#> Ā  Ā  :DS1 Ā a Ā void:Dataset Ā . Ā  :DS2 Ā a Ā void:Dataset Ā . Ā  :DS1 Ā void:subset Ā :LS1 Ā . Ā  :LS1 Ā a Ā void:Linkset; Ā  Ā  Ā  Ā  Ā  Ā void:linkPredicate Ā  Ā  Ā  Ā  Ā  Ā  Ā  Ā  Ā owl:sameAs; Ā  Ā  Ā  Ā  Ā  Ā  Ā void:target Ā :DS1, Ā :DS2 Ā . Ā 
  • 40.
    VoID: Ā Describing Ā Linksets Ā (2) Ā  Ā  Ā  EUCLID Ā -­‐ Ā Providing Ā Linked Ā Data Ā  40 Ā  Example Ā  Ā  Ā  Ā  Ā  Ā  Ā  Ā  Ā  Ā  Ā  Ā  Ā  @PREFIX Ā void:<http://rdfs.org/ns/void#> Ā  Ā  @PREFIX Ā skos:<http://www.w3.org/2002/07/owl#> Ā  Ā  :MusicBrainz Ā a Ā void:Dataset Ā . Ā  :DBpedia Ā a Ā void:Dataset Ā . Ā  Ā  :MusicBrainz Ā void:classPartition Ā :MBArtists Ā . Ā  :MBArtists Ā void:class Ā mo:MusicArtist Ā . Ā  Ā  :MBArtists Ā a Ā void:Linkset; Ā  Ā  Ā  Ā  Ā  Ā void:linkPredicate Ā  Ā  Ā  Ā  Ā  Ā  Ā  Ā  Ā skos:exactMatch; Ā  Ā  Ā  Ā  Ā  Ā  Ā void:target Ā :MusicBrainz, Ā :DBpedia Ā . Ā 
  • 41.
    The Ā access Ā metadata Ā describes Ā the Ā methods Ā of Ā  accessing Ā the Ā actual Ā RDF Ā data Ā set Ā  Ā  Ā  Ā  Ā  Ā  Ā  Ā  Ā  Ā  * Ā This Ā assumes Ā that Ā the Ā default Ā graph Ā of Ā the Ā SPARQL Ā endpoint Ā contains Ā the Ā data Ā set. Ā  VoID Ā cannot Ā express Ā that Ā a Ā data Ā set Ā is Ā contained Ā a Ā specific Ā named Ā graph. Ā This Ā can Ā be Ā  specified Ā with Ā SPARQL Ā 1.1. Ā Service Ā Descrip3on Ā  Ā  Ā  VoID: Ā Access Ā Metadata Ā  Ā  EUCLID Ā -­‐ Ā Providing Ā Linked Ā Data Ā  41 Ā  Method Ā  Predicate Ā  Ā  Descrip9on Ā  URI Ā look Ā up Ā endpoint Ā  void:uriLookupEndpoint Ā  Specifies Ā the Ā URI Ā of Ā a Ā service Ā for Ā accessing Ā the Ā data Ā  set Ā (different Ā from Ā the Ā SPARQL Ā protocol) Ā  Root Ā resource Ā  void:rootResource Ā  URI Ā of Ā the Ā top Ā concepts Ā (only Ā for Ā data Ā sets Ā  structured Ā as Ā trees) Ā  SPARQL Ā endpoint Ā  void:sparqlEndpoint Ā  Provides Ā access Ā to Ā the Ā data Ā set Ā via Ā the Ā SPARQL Ā  protocol.* Ā  Ā  RDF Ā data Ā dumps Ā  void:dataDump Ā  Specifies Ā the Ā loca3on Ā of Ā the Ā dump  file. Ā If Ā the Ā data Ā  set Ā is Ā split Ā into Ā mul3ple  files, Ā then Ā several Ā values Ā of Ā  this Ā property Ā are Ā provided. Ā  Ā  CH Ā 5 Ā 
  • 42.
    Providing Ā Access Ā to Ā the Ā  Ā  Ā  Ā  Data Ā Set Ā  Ā  The Ā data Ā set Ā can Ā be Ā accessed Ā via Ā different Ā  mechanisms: Ā  Ā  Ā  Ā  EUCLID Ā -­‐ Ā Providing Ā Linked Ā Data Ā  42 Ā  RDFa Ā  RDF Ā  dump Ā  SPARQL Ā  endpoint Ā  Dereferencing Ā  HTTP Ā URIs Ā 
  • 43.
    Dereferencing Ā HTTP Ā URIs Ā  •  Allows Ā for Ā easily Ā exploring Ā certain Ā resources Ā  contained Ā in Ā the Ā data Ā set Ā  •  Ā What Ā to Ā return Ā for Ā a Ā URI? Ā  •  Immediate Ā descrip9on: Ā triples Ā where Ā the Ā URI Ā is Ā the Ā subject. Ā  •  Backlinks: Ā triples Ā where Ā the Ā URI Ā is Ā the Ā object. Ā  •  Related Ā descrip9ons: Ā informa3on Ā of Ā interest Ā in Ā typical Ā usage Ā scenarios. Ā  •  Metadata: Ā informa3on Ā as Ā author Ā and Ā licensing Ā informa3on. Ā  •  Syntax: Ā RDF Ā descrip3ons Ā as Ā RDF/XML Ā and Ā human-­‐readable Ā formats. Ā  •  Applica3ons Ā (e.g. Ā LD Ā browsers) Ā render Ā the Ā retrieved Ā  informa3on Ā so Ā it Ā can Ā be Ā perceived Ā by Ā a Ā user. Ā  EUCLID Ā -­‐ Ā Providing Ā Linked Ā Data Ā  43 Ā  Source: Ā  Ā How Ā to Ā Publish Ā Linked Ā Data Ā on Ā The Ā Web Ā -­‐ Ā Chris Ā Bizer, Ā Richard Ā Cyganiak, Ā Tom Ā Heath. Ā  Ā  CH Ā 1 Ā 
  • 44.
    Dereferencing Ā HTTP Ā URIs Ā (2) Ā  Example: Ā Dereferencing Ā  EUCLID Ā -­‐ Ā Providing Ā Linked Ā Data Ā  44 Ā 
  • 45.
    RDFa Ā  •  RDFa Ā = Ā ā€œRDF Ā in Ā ajributesā€ Ā  •  Extension Ā to Ā HTML5 Ā for Ā embedding Ā RDF Ā within Ā  HTML Ā pages: Ā  –  The Ā HTML Ā is Ā processed Ā by Ā the Ā browser, Ā the Ā (human) Ā  consumer Ā don’t Ā see Ā the Ā RDF Ā data Ā  Ā  –  The Ā RDF Ā triples Ā within Ā the Ā page Ā are Ā consumed Ā by Ā APIs Ā to Ā  extract Ā the Ā (semi-­‐)structured Ā data Ā  Ā  •  It Ā is Ā considered Ā as Ā the Ā bridge Ā between Ā the Ā Web Ā of Ā  Data Ā and Ā the Ā Web Ā of Ā Documents Ā  •  It Ā is Ā a Ā complete Ā serializa9on Ā of Ā RDF Ā  EUCLID Ā -­‐ Ā Providing Ā Linked Ā Data Ā  Ā  45 Ā 
  • 46.
    RDFa: Ā Attributes Ā  A]ribute Ā role Ā  A]ribute Ā  Descrip9on Ā  Syntax Ā  prefix Ā  List Ā of Ā prefix-­‐name Ā IRIs Ā pairs Ā  vocab Ā  IRI Ā that Ā specifies Ā the Ā vocabulary Ā where Ā the Ā concept Ā is Ā defined Ā  Subject Ā  about Ā  Specifies Ā the Ā subject Ā of Ā the Ā rela3onship Ā  Predicate Ā  property Ā  Express Ā the Ā rela3onship Ā between Ā the Ā subject Ā and Ā the Ā value Ā  rel Ā  Defines Ā a Ā rela3on Ā between Ā the Ā subject Ā and Ā a Ā URL Ā  Ā  rev Ā  Express Ā reverse Ā rela3onships Ā between Ā two Ā resources Ā  Resource Ā  href Ā  Specifies Ā an Ā object Ā URI Ā for Ā the Ā rel Ā and Ā rev Ā ajributes Ā  resource Ā  Same Ā as Ā href Ā (used Ā when Ā href Ā is Ā not Ā present) Ā  src Ā  Specifies Ā the Ā subject Ā of Ā a Ā rela3onship Ā  Literal Ā  datatype Ā  Express Ā the Ā datatype Ā of Ā the Ā object Ā of Ā the Ā property Ā ajribute Ā  content Ā  Supply Ā machine-­‐readable Ā content Ā for Ā a Ā literal Ā  xml:lang, Ā lang Ā  Specifies Ā the Ā language Ā of Ā the Ā literal Ā  Macro Ā  typeof Ā  Indicate Ā the Ā RDF Ā type(s) Ā to Ā associate Ā with Ā a Ā subject Ā  inlist Ā  An Ā object Ā is Ā added Ā to Ā the Ā list Ā of Ā a Ā predicate. Ā  Ā  EUCLID Ā -­‐ Ā Providing Ā Linked Ā Data Ā  Ā  46 Ā 
  • 47.
    RDFa: Ā Example Ā  Ā  Extracting Ā RDF Ā from Ā HTML Ā  EUCLID Ā -­‐ Ā Providing Ā Linked Ā Data Ā  Ā  47 Ā  <div Ā class="ar3stheader" Ā  Ā  Ā  Ā about="hjp://musicbrainz.org/ar3st/b10bbbfc-­‐cf9e-­‐42e0-­‐be17-­‐e2c3e1d2600d#_" Ā  Ā  Ā  Ā  Ā typeof="hjp://purl.org./ontology/mo/MusicGroup"> Ā  Ā  Ā   … Ā  </div> Ā  <hjp://musicbrainz.org/ar3st/b10bbbfc-­‐cf9e-­‐42e0-­‐be17-­‐e2c3e1d2600d#_> Ā  Ā  Ā  Ā  Ā  HTML Ā (+RDFa): Ā  RDF: Ā 
  • 48.
    RDFa: Ā Example Ā  Ā  Extracting Ā RDF Ā from Ā HTML Ā  EUCLID Ā -­‐ Ā Providing Ā Linked Ā Data Ā  Ā  48 Ā  <div Ā class="ar3stheader" Ā  Ā  Ā  Ā about="hjp://musicbrainz.org/ar3st/b10bbbfc-­‐cf9e-­‐42e0-­‐be17-­‐e2c3e1d2600d#_" Ā  Ā  Ā  Ā  Ā typeof="hjp://purl.org./ontology/mo/MusicGroup"> Ā  Ā  Ā   … Ā  </div> Ā  <hjp://musicbrainz.org/ar3st/b10bbbfc-­‐cf9e-­‐42e0-­‐be17-­‐e2c3e1d2600d#_> Ā  Ā  Ā <hjp://www.w3.org/1999/02/22-­‐rdf-­‐syntax-­‐ns#type> Ā  Ā  Ā  Ā  HTML Ā (+RDFa): Ā  RDF: Ā 
  • 49.
    RDFa: Ā Example Ā  Ā  Extracting Ā RDF Ā from Ā HTML Ā  EUCLID Ā -­‐ Ā Providing Ā Linked Ā Data Ā  Ā  49 Ā  <div Ā class="ar3stheader" Ā  Ā  Ā  Ā about="hjp://musicbrainz.org/ar3st/b10bbbfc-­‐cf9e-­‐42e0-­‐be17-­‐e2c3e1d2600d#_" Ā  Ā  Ā  Ā  Ā typeof="hjp://purl.org./ontology/mo/MusicGroup"> Ā  Ā  Ā   … Ā  </div> Ā  <hjp://musicbrainz.org/ar3st/b10bbbfc-­‐cf9e-­‐42e0-­‐be17-­‐e2c3e1d2600d#_> Ā  Ā  Ā <hjp://www.w3.org/1999/02/22-­‐rdf-­‐syntax-­‐ns#type> Ā  Ā  Ā  Ā  Ā <hjp://purl.org./ontology/mo/MusicGroup>. Ā  Ā  Ā  HTML Ā (+RDFa): Ā  RDF: Ā 
  • 50.
    RDFa: Ā Example Ā (2) Ā  Extracting Ā RDF Ā from Ā MusicBrainz.org Ā  EUCLID Ā -­‐ Ā Providing Ā Linked Ā Data Ā  Ā  50 Ā  hjp://musicbrainz.org/ar3st/b10bbbfc-­‐cf9e-­‐42e0-­‐be17-­‐e2c3e1d2600d Ā 
  • 51.
    RDFa: Ā Example Ā (2) Ā  Extracting Ā RDF Ā from Ā MusicBrainz.org Ā  EUCLID Ā -­‐ Ā Providing Ā Linked Ā Data Ā  Ā  51 Ā  Source: Ā hjp://www.w3.org/2007/08/pyRdfa/ Ā 
  • 52.
    RDFa: Ā Example Ā (2) Ā  Extracting Ā RDF Ā from Ā MusicBrainz.org Ā  EUCLID Ā -­‐ Ā Providing Ā Linked Ā Data Ā  Ā  52 Ā  hjp://www.w3.org/2007/08/pyRdfa/extract?uri=hjp%3A%2F%2Fmusicbrainz.org %2Far3st%2Fb10bbbfc-­‐cf9e-­‐42e0-­‐be17-­‐e2c3e1d2600d&format=nt Ā  Watch Ā the Ā EUCLID Ā screencast: Ā http://vimeo.com/euclidproject Ā 
  • 53.
    RDF Ā Dump Ā  • An Ā RDF Ā dump Ā refers Ā to Ā a  file Ā which Ā contains Ā (part Ā of) Ā  a Ā data Ā set Ā specified Ā in Ā an Ā RDF Ā format Ā (RDF/XML, Ā N-­‐ Triples, Ā N-­‐Quads) Ā  •  The Ā data Ā set Ā can Ā be Ā split Ā into Ā several Ā RDF Ā dumps Ā  Ā  •  A Ā list Ā of Ā available Ā data Ā sets Ā available Ā as Ā RDF Ā dumps Ā  can Ā be Ā found Ā at: Ā  –  hjp://www.w3.org/wiki/DataSetRDFDumps Ā  Ā  EUCLID Ā -­‐ Ā Providing Ā Linked Ā Data Ā  53 Ā 
  • 54.
    SPARQL Ā Endpoint Ā  • The Ā SPARQL Ā endpoint Ā refers Ā to Ā the Ā URI Ā of Ā the Ā  listener Ā of Ā the Ā SPARQL Ā protocol Ā service, Ā which Ā  handles Ā requests Ā for Ā SPARQL Ā protocol Ā opera3ons Ā  Ā  •  The Ā user Ā submits Ā SPARQL Ā queries Ā to Ā the Ā SPARQL Ā  endpoint Ā in Ā order Ā to Ā retrieve Ā only Ā a Ā desired Ā subset Ā of Ā  the Ā RDF Ā data Ā set Ā  Ā  •  List Ā of Ā available Ā SPARQL Ā endpoints: Ā  •  hjp://www.w3.org/wiki/SparqlEndpoints Ā  •  hjp://labs.mondeca.com/sparqlEndpointsStatus/ Ā  Ā  EUCLID Ā -­‐ Ā Providing Ā Linked Ā Data Ā  54 Ā  CH Ā 2 Ā 
  • 55.
    Using Ā Linked Ā Data Ā Catalogs Ā  •  Data Ā catalogs, Ā markets Ā or Ā repositories Ā are Ā pla{orms Ā  dedicated Ā to Ā provide Ā access Ā to Ā a Ā wide Ā range Ā of Ā data Ā  sets Ā from Ā different Ā domains Ā  Ā  •  Allow Ā data Ā consumers Ā to Ā easily  find Ā and Ā use Ā the Ā data Ā  •  Usually Ā the Ā catalogs Ā offer Ā relevant Ā metadata Ā about Ā  the Ā crea3on Ā of Ā the Ā data Ā set Ā  EUCLID Ā -­‐ Ā Providing Ā Linked Ā Data Ā  55 Ā 
  • 56.
    Using Ā Linked Ā Data Ā Catalogs Ā (2) Ā  How Ā to Ā publish Ā an Ā RDF Ā data Ā set Ā into Ā a Ā catalog? Ā  EUCLID Ā -­‐ Ā Providing Ā Linked Ā Data Ā  56 Ā  Create Ā your Ā own Ā data Ā  catalog Ā  Recommended Ā for Ā big Ā  organiza3ons/ins3tu3ons Ā  aiming Ā at Ā providing Ā a Ā large Ā  number Ā of Ā data Ā sets Ā  Use Ā a Ā data Ā management Ā  system, Ā for Ā example: Ā  Upload Ā your Ā data Ā set Ā  into Ā an Ā exis3ng Ā catalog Ā  Allows Ā data Ā consumers Ā to Ā  easily  find Ā new Ā data Ā sets Ā  Common Ā LD Ā catalogs Ā are: Ā  Ā -­‐ Ā  Ā -­‐ Ā The Ā Linking Ā Open Ā Data Ā Cloud Ā 
  • 57.
    Validating Ā Data Ā Sets Ā  There Ā are Ā different Ā ways Ā to Ā validate Ā the Ā published Ā RDF Ā  data Ā set: Ā  EUCLID Ā -­‐ Ā Providing Ā Linked Ā Data Ā  57 Ā  General Ā  validators Ā  Parsing Ā & Ā  Syntax Ā  •  Vapour Ā -­‐ Ā Performs Ā two Ā types Ā of Ā tests: Ā without Ā content Ā  nego3a3on Ā and Ā reques3ng Ā RDF/XML Ā content Ā  Ā  Ā  Ā  Ā  Ā  Ā  Ā  Ā hjp://validator.linkeddata.org/vapour Ā  •  URI Ā Debugger Ā -­‐ Ā Retreieves Ā the Ā HTTP Ā responses Ā of Ā accessing Ā a Ā URI Ā  Ā  Ā  Ā  Ā  Ā  Ā  Ā  Ā  Ā  Ā  Ā  Ā  Ā  Ā  Ā  Ā  Ā  Ā  Ā  Ā  Ā  Ā  Ā  Ā  Ā  Ā  Ā hjp://linkeddata.informa3k.hu-­‐berlin.de/uridbg/ Ā  Ā  •  RDF Ā Triple-­‐Checker  – Ā Dereferences Ā namespaces Ā associated Ā with Ā  the Ā resources Ā used Ā in Ā the Ā document Ā  Ā  Ā  Ā  Ā  Ā  Ā  Ā  Ā  Ā  Ā  Ā  Ā  Ā  Ā  Ā  Ā  Ā  Ā  Ā  Ā  Ā  Ā  Ā  Ā  Ā  Ā  Ā  Ā  Ā  Ā  Ā  Ā  Ā  Ā  Ā  Ā  Ā  Ā  Ā  Ā  Ā  Ā  Ā  Ā  Ā  Ā  Ā  Ā  Ā  Ā  Ā  Ā  Ā  Ā  Ā  Ā  Ā  Ā  Ā  Ā  Ā  Ā  Ā  Ā  Ā  Ā  Ā  Ā  Ā  Ā  Ā  Ā  Ā  Ā  Ā  Ā  Ā  Ā  Ā hjp://graphite.ecs.soton.ac.uk/checker/ Ā  •  W3C Ā RDF/XML Ā Valida9on Ā Service  – Ā Evaluates Ā the Ā syntax Ā of Ā RDF/ XML Ā documents Ā and Ā displays Ā the Ā RDF Ā triples Ā in Ā it Ā  Ā  Ā  Ā  Ā  Ā  Ā  Ā  Ā  Ā  Ā  Ā  Ā  Ā  Ā  Ā  Ā  Ā  Ā  Ā  Ā  Ā  Ā  Ā  Ā  Ā  Ā  Ā  Ā  Ā  Ā  Ā  Ā  Ā  Ā  hjp://validator.linkeddata.org/vapour Ā  Ā  Ā  •  W3C Ā Markup Ā Valida9on Ā Service  – Ā Checks Ā syntac3c Ā correctness Ā  for Ā web Ā documents Ā with Ā RDFa Ā markup Ā  Ā  Ā  Ā  Ā  Ā  Ā  Ā  Ā  Ā  Ā  Ā  Ā  Ā  Ā  Ā  Ā  Ā  Ā  Ā  Ā  Ā  Ā  Ā  Ā  Ā  Ā  Ā  Ā  Ā  Ā  Ā  Ā  Ā  Ā  Ā  Ā  Ā  Ā  Ā  Ā  Ā  Ā  Ā  Ā  Ā  Ā  Ā  hjp://validator.w3.org/ Ā  •  RDF:ALERTS  – Ā Validates Ā syntax, Ā undefined Ā resources, Ā datatype Ā  and Ā other Ā types Ā of Ā errors Ā  Ā  Ā  Ā  Ā  Ā  Ā  Ā  Ā  Ā  Ā  Ā  Ā  Ā  Ā  Ā  Ā  Ā  Ā  Ā  Ā  Ā  Ā  Ā  Ā  Ā  Ā  Ā  Ā  Ā  Ā  Ā  Ā  Ā  Ā  Ā  Ā  Ā  Ā  Ā  Ā  Ā  Ā  Ā  Ā  Ā  Ā  Ā  Ā  Ā  Ā  Ā  Ā  Ā  Ā  Ā  Ā  Ā  Ā  Ā  Ā  Ā  Ā  Ā  Ā  Ā  Ā  Ā  Ā  Ā  Ā  Ā  Ā  Ā  Ā  Ā  Ā  Ā  Ā  Ā  Ā  Ā  Ā  Ā  Ā  Ā  Ā  Ā  Ā  hjp://swse.deri.org/RDFAlerts/ Ā  Accessibility Ā 
  • 58.
    Validating Ā Data Ā Sets Ā (2) Ā  Example: Ā Validating Ā URIs Ā with Ā Vapour Ā  EUCLID Ā -­‐ Ā Providing Ā Linked Ā Data Ā  58 Ā  Source: Ā hjp://idi.fundacionc3c.org/vapour Ā  Ā 
  • 59.
    Validating Ā Data Ā Sets Ā (3) Ā  Example: Ā Validating Ā URIs Ā with Ā Vapour Ā  EUCLID Ā -­‐ Ā Providing Ā Linked Ā Data Ā  59 Ā  Source: Ā hjp://idi.fundacionc3c.org/vapour Ā  Ā 
  • 60.
    Validating Ā Data Ā Sets Ā (4) Ā  Example: Ā Validating Ā URIs Ā with Ā  Vapour Ā  EUCLID Ā -­‐ Ā Providing Ā Linked Ā Data Ā  60 Ā  Source: Ā hjp://idi.fundacionc3c.org/vapour Ā  Ā  Example: Ā Validating Ā URIs Ā with Ā Vapour Ā  hjp://dbpedia.org/page/The_Beatles Ā  hjp://dbpedia.org/data/The_Beatles.xml Ā  HTML Ā content Ā  RDF Ā document Ā 
  • 61.
    PROVIDING Ā LINKED Ā DATA: Ā  CHECKLIST Ā  EUCLID Ā -­‐ Ā Providing Ā Linked Ā Data Ā  61 Ā 
  • 62.
    Providing Ā Linked Ā Data: Ā  Checklist Ā (1) Ā  Creating Ā Linked Ā Data Ā  o All Ā the Ā relevant Ā en33es/concepts Ā were Ā  effec3vely Ā extracted Ā from Ā the Ā raw Ā data Ā ? Ā  o Are Ā all Ā the Ā created Ā URIs Ā dereferenceable? Ā  o Are Ā you Ā reusing Ā terms Ā from Ā widely Ā accepted Ā  Ā  vocabularies? Ā  EUCLID Ā -­‐ Ā Providing Ā Linked Ā Data Ā  62 Ā 
  • 63.
    Providing Ā Linked Ā Data: Ā  Checklist Ā (2) Ā  Interlinking Ā Linked Ā Data Ā  o Is Ā the Ā data Ā set Ā linked Ā to Ā other Ā RDF Ā data Ā sets? Ā  o Are Ā the Ā created Ā vocabulary Ā terms Ā linked Ā to Ā  other Ā vocabularies? Ā  EUCLID Ā -­‐ Ā Providing Ā Linked Ā Data Ā  63 Ā 
  • 64.
    Providing Ā Linked Ā Data: Ā  Checklist Ā (3) Ā  Publishing Ā Linked Ā Data Ā  o Do Ā you Ā provide Ā data Ā set Ā metadata? Ā  o Do Ā you Ā provide Ā informa3on Ā about Ā licensing? Ā  o Do Ā you Ā provide Ā addi3onal Ā access Ā methods? Ā  o Is Ā the Ā data Ā set Ā available Ā in Ā LD Ā catalogs? Ā  o Did Ā the Ā data Ā set Ā pass Ā the Ā valida3on Ā tests? Ā  EUCLID Ā -­‐ Ā Providing Ā Linked Ā Data Ā  64 Ā 
  • 65.
    Summary Ā  EUCLID Ā -­‐ Ā Providing Ā Linked Ā Data Ā  65 Ā  •  The Ā Linked Ā Data Ā lifecycle: Ā  •  3 Ā core Ā tasks: Ā crea3ng, Ā interlinking Ā and Ā publishing Ā  •  Crea3on Ā of Ā Linked Ā Data: Ā  •  Extrac3ng Ā relevant Ā data, Ā using Ā URIs Ā to Ā name Ā en33es Ā and Ā selec3ng Ā  vocabularies Ā and Ā expressing Ā the Ā data Ā using Ā the Ā RDF Ā data Ā model Ā  •  Interlinking Ā Linked Ā Data: Ā  •  Challenges Ā of Ā link Ā discovery, Ā using Ā Silk Ā to Ā create Ā links Ā between Ā two Ā  data Ā sets Ā and Ā using Ā SKOS Ā links Ā  Ā  •  Publishing Ā Linked Ā Data: Ā  •  Crea3on Ā of Ā data Ā set Ā metadata; Ā publishing Ā the Ā data Ā set Ā via Ā RDF Ā  dumps, Ā SPARQL Ā endpoints Ā or Ā RDFa; Ā using Ā RDFa Ā and Ā schema.org Ā to Ā  enrich Ā search Ā results, Ā and Ā uploading Ā the Ā data Ā set Ā to Ā a Ā LD Ā catalog Ā  In Ā this Ā chapter Ā we Ā studied: Ā 
  • 66.
    The Ā Web Ā & Ā Linked Ā Data Ā  •  Linked Ā Data Ā catalogs Ā  •  Applica9ons Ā 
  • 67.
    CKAN Ā  •  CKAN Ā is Ā an Ā open Ā source Ā pla{orm Ā for Ā developing Ā data Ā  set Ā catalogs Ā  •  Implement Ā useful Ā tools Ā for Ā data Ā publishers Ā to Ā  support: Ā  •  Data Ā harves3ng Ā  •  Crea3on Ā of Ā metadata Ā  •  Access Ā mechanisms Ā to Ā the Ā data Ā set Ā  •  Upda3ng Ā the Ā data Ā set Ā  •  Monitoring Ā the Ā access Ā to Ā the Ā data Ā set Ā  EUCLID Ā -­‐ Ā Providing Ā Linked Ā Data Ā  67 Ā 
  • 68.
    CKAN Ā (2) Ā  EUCLID Ā -­‐ Ā Providing Ā Linked Ā Data Ā  68 Ā  Source: Ā hjp://ckan.org Ā 
  • 69.
    CKAN Ā (3) Ā  EUCLID Ā -­‐ Ā Providing Ā Linked Ā Data Ā  69 Ā  Source: Ā hjp://ckan.org Ā 
  • 70.
    •  The Ā Data Ā Hub Ā is Ā a Ā community-­‐run Ā data Ā catalog Ā which Ā  contains Ā more Ā than Ā 5,000 Ā data Ā sets1 Ā  •  ā€œ(…) Ā is Ā an Ā openly Ā editable Ā open Ā data Ā catalogue, Ā in Ā the Ā  style Ā of Ā Wikipediaā€.2 Ā  Ā  •  It Ā is Ā implemented Ā on Ā top Ā of Ā the Ā CKAN Ā pla{orm Ā  Ā  •  Allows Ā the Ā crea3on Ā of Ā groups: Ā  –  The Ā Linking Ā Open Ā Data Ā Cloud Ā group Ā exclusively Ā contains Ā  Linked Ā Data Ā sets Ā  Ā  EUCLID Ā -­‐ Ā Providing Ā Linked Ā Data Ā  70 Ā  1 Ā According Ā to Ā the Ā informa3on Ā presented Ā in Ā the Ā portal Ā on Ā March Ā 2013 Ā  2 Ā Source: Ā hjp://datahub.io/about Ā  Ā  The Ā Data Ā Hub Ā  Ā 
  • 71.
    EUCLID Ā -­‐ Ā Providing Ā Linked Ā Data Ā  71 Ā  Source: Ā hjp://datahub.io/ Ā  The Ā Data Ā Hub Ā (2) Ā 
  • 72.
    The Ā Data Ā Hub Ā (3) Ā  EUCLID Ā -­‐ Ā Providing Ā Linked Ā Data Ā  72 Ā  Source: Ā hjp://datahub.io/ Ā 
  • 73.
    The Ā Linking Ā Open Ā Data Ā Cloud Ā  EUCLID Ā -­‐ Ā Providing Ā Linked Ā Data Ā  73 Ā  September Ā 2011 Ā  Source: Ā Linking Ā Open Ā Data Ā cloud Ā diagram, Ā by Ā Richard Ā Cyganiak Ā and Ā Anja Ā Jentzsch Ā 
  • 74.
    The Ā Linking Ā Open Ā Data Ā Cloud Ā  How Ā to Ā publish Ā an Ā RDF Ā data Ā set Ā in Ā this Ā cloud? Ā  1.  The Ā data Ā set Ā must Ā follow Ā the Ā Linked Ā Data Ā principles Ā  2.  The Ā data Ā set Ā must Ā contain Ā at Ā least Ā 1,000 Ā RDF Ā triples Ā  3.  The Ā data Ā set Ā must Ā contain Ā at Ā least Ā 50 Ā RDF Ā links Ā to Ā a Ā  data Ā set Ā that Ā is Ā already Ā in Ā the Ā diagram Ā  4.  Access Ā to Ā the Ā data Ā set Ā must Ā be Ā provided Ā  Once Ā these Ā criteria Ā are Ā met, Ā the Ā data Ā publisher Ā must Ā add Ā  the Ā data Ā set Ā to Ā the Ā Data Ā Hub Ā catalog, Ā and Ā contact Ā the Ā  administrators Ā of Ā the Ā Linking Ā Open Ā Data Ā Cloud Ā group Ā  EUCLID Ā -­‐ Ā Providing Ā Linked Ā Data Ā  74 Ā  Source: Ā hjp://lod-­‐cloud.net/ Ā 
  • 75.
    Linked Ā Data Ā & Ā Search Ā Engines Ā  EUCLID Ā -­‐ Ā Providing Ā Linked Ā Data Ā  75 Ā  •  Search Ā engines Ā collect Ā informa3on Ā about Ā web Ā  resources Ā in Ā order Ā to Ā produce Ā richer Ā search Ā results Ā  by Ā improving Ā the Ā display Ā of Ā the Ā results Ā  •  This Ā is Ā only Ā possible Ā if Ā the Ā search Ā engines Ā are Ā able Ā to Ā  understand Ā the Ā content Ā within Ā the Ā web Ā pages Ā  Ā  •  The Ā HTML Ā pages Ā must Ā be Ā annotated Ā with Ā machine-­‐ readable Ā content Ā to Ā describe Ā their Ā content: Ā  Mark Ā up Ā format Ā  Vocabulary Ā 
  • 76.
    RDFa Ā for Ā marking Ā up Ā data Ā  EUCLID Ā -­‐ Ā Providing Ā Linked Ā Data Ā  76 Ā  •  RDFa Ā is Ā used Ā to Ā provide Ā (semi-­‐)structured Ā Linked Ā  Data Ā embedded Ā in Ā web Ā content Ā  •  Examples: Ā  – Some Ā search Ā engines Ā use Ā RDFa, Ā e.g., Ā Google, Ā  Yahoo! Ā and Ā Bing Ā  – Facebook’s Ā Open Ā Graph Ā is Ā based Ā on Ā RDFa Ā 
  • 77.
    Google Ā Rich Ā Snippets Ā  EUCLID Ā -­‐ Ā Providing Ā Linked Ā Data Ā  77 Ā  •  Embedding Ā seman3cs Ā via Ā RDFa Ā (or Ā microformats/ microdata) Ā enhances Ā search Ā results: Ā 
  • 78.
    Google Ā Rich Ā Snippets Ā (2) Ā  EUCLID Ā -­‐ Ā Providing Ā Linked Ā Data Ā  78 Ā 
  • 79.
    Schema.org Ā  EUCLID Ā -­‐ Ā Providing Ā Linked Ā Data Ā  79 Ā  •  Collec3on Ā of Ā schemas/vocabularies Ā to Ā markup Ā the Ā  HTML Ā pages Ā  •  It Ā is Ā recognized Ā by Ā Bing, Ā Google, Ā Yahoo! Ā and Ā Yandex Ā  •  Covers Ā a Ā wide Ā range Ā of Ā knowledge Ā domains Ā  Ā  •  It Ā also Ā offers Ā an Ā extension Ā mechanism Ā in Ā case Ā the Ā  publisher Ā is Ā interested Ā in Ā adding Ā new Ā concepts Ā to Ā the Ā  vocabularies Ā 
  • 80.
    Schema.org Ā (2) Ā  EUCLID Ā -­‐ Ā Providing Ā Linked Ā Data Ā  80 Ā  The Ā vocabularies Ā cover Ā the Ā following Ā topics: Ā  Source: Ā hjp://schema.org/docs/schemas.html Ā  ā€œThe Ā world Ā is Ā too Ā rich, Ā  complex Ā and Ā interes.ng Ā for Ā  a Ā single Ā schema Ā to Ā describe Ā  fully Ā on Ā its Ā own. Ā With Ā  schema.org Ā we Ā aim Ā to  find Ā a Ā  balance, Ā by Ā providing Ā a Ā core Ā  schema Ā that Ā covers Ā lots Ā of Ā  situa.ons, Ā alongside Ā  extension Ā mechanisms Ā for Ā  extra Ā detail.ā€ Ā  (Dan Ā Brickley, Ā schema.org) Ā 
  • 81.
    EUCLID Ā -­‐ Ā Providing Ā Linked Ā Data Ā  81 Ā  Integrates(/aligns) Ā exis3ng Ā vocabularies Ā where Ā  appropriate, Ā e.g. Ā rNews Ā  Source: Ā hjp://schema.org/Ar3cle Ā  Schema.org Ā (3) Ā 
  • 82.
    Google Ā Knowledge Ā Graph Ā  EUCLID Ā -­‐ Ā Providing Ā Linked Ā Data Ā  82 Ā  •  The Ā user Ā is Ā able Ā to  find Ā  answer Ā to Ā their Ā queries Ā  without Ā browsing Ā pages Ā  •  Provides Ā detailed Ā  informa3on Ā  Ā 
  • 83.
    Google Ā Knowledge Ā Graph Ā (2) Ā  EUCLID Ā -­‐ Ā Providing Ā Linked Ā Data Ā  83 Ā  •  Google Ā Search Ā results Ā  Ā include Ā structured Ā data Ā  from Ā Freebase Ā  Ā  •  Might Ā disambiguate Ā  search Ā terms Ā 
  • 84.
    Ā  Ā  Ā  Ā  Ā  Ā  Ā Freebase Ā  EUCLID Ā -­‐ Ā Providing Ā Linked Ā Data Ā  84 Ā  Ā  Ā  Ā  Ā  Ā  Ā  Ā  Ā  Ā  •  Knowledge Ā base Ā of Ā  structured Ā data Ā  •  Data Ā is Ā stored Ā as Ā a Ā  graph Ā  Ā  •  Describes Ā data Ā from Ā  different Ā domains Ā 
  • 85.
    Bing Ā Snapshot Ā  EUCLID Ā -­‐ Ā Providing Ā Linked Ā Data Ā  85 Ā  •  Provides Ā structured Ā data Ā related Ā to Ā the Ā search Ā term Ā  •  Includes Ā a Ā significant Ā number Ā of Ā en33es Ā from Ā more Ā  domains Ā  •  Connects Ā data Ā from Ā LinkedIn Ā  •  Is Ā is Ā powered Ā by Ā the Ā graph Ā engine Ā Trinity.RDF Ā 
  • 86.
    Bing Ā Snapshot Ā (2) Ā  EUCLID Ā -­‐ Ā Providing Ā Linked Ā Data Ā  86 Ā 
  • 87.
    Ā  Ā  Ā  Ā  Ā  Ā  Ā Open Ā Graph Ā Protocol Ā  EUCLID Ā -­‐ Ā Providing Ā Linked Ā Data Ā  87 Ā  •  It Ā was Ā originally Ā created Ā by Ā Facebook Ā  •  Allows Ā describing Ā web Ā content Ā as Ā graph Ā objects, Ā  establishing Ā connec3ons Ā between Ā people Ā and Ā  objects Ā  •  The Ā descrip3ons Ā are Ā embedded Ā in Ā the Ā web Ā page Ā as Ā  RDFa Ā data Ā  •  Supports Ā descrip3on Ā of Ā several Ā domains: Ā basic Ā  metadata, Ā music, Ā video, Ā ar3cles, Ā books, Ā websites Ā and Ā  user Ā profiles Ā  Source: Ā hjp://ogp.me/ Ā 
  • 88.
    Ā  Ā  Ā  Ā  Ā  Ā  Ā Open Ā Graph Ā Protocol Ā (2) Ā  EUCLID Ā -­‐ Ā Providing Ā Linked Ā Data Ā  88 Ā  Source: Ā hjp://ogp.me/ Ā  Who Ā is Ā using Ā Open Ā Graph Ā protocol? Ā  Source: Ā hjp://ogp.me/ Ā  Facebook Ā  Google Ā  Mixi Ā  Consumers Ā  Publishers Ā  IMDb Ā  Microso• Ā  NHL Ā  Posterous Ā  Rojen Ā Tomatoes Ā  TIME Ā 
  • 89.
    Ā  Ā  Ā  Ā  Ā  Ā  Ā Open Ā Graph Ā Protocol Ā (3) Ā  EUCLID Ā -­‐ Ā Providing Ā Linked Ā Data Ā  89 Ā  •  Facebook Ā expands Ā vocabulary Ā of Ā rela3onships Ā beyond Ā  ā€œfriendshipā€ Ā and Ā ā€œlikeā€ Ā  Ā  Ā  Ā  Ā  Ā  Ā  Ā  Ā  Ā more Ā ac9ons! Ā  Ā  Source: Ā hjps://developers.facebook.com/docs/opengraph/ Ā 
  • 90.
    Ā  Ā  Ā  Ā  Ā  Ā  Ā Open Ā Graph Ā Protocol Ā & Ā  Ā  Ā  Ā  Ā  Ā  Ā  Ā  Ā  Ā  Ā  Ā  Ā  Ā  Ā  Ā  Ā  Ā  Ā  Ā  Ā  Ā  Ā  Ā  Ā  Ā  Ā  Ā  Ā  Ā  Ā  Ā  Ā  Ā  Ā  Ā  Ā  Ā  Ā  Ā  Ā  Ā  Ā Facebook Ā  EUCLID Ā -­‐ Ā Providing Ā Linked Ā Data Ā  90 Ā  List Ā of Ā domains Ā and Ā ac9ons Ā  Source: Ā hjps://developers.facebook.com/docs/opengraph/ Ā  •  Listen Ā  •  Create Ā a Ā playlist Ā  •  Watch Ā  •  Rate Ā  •  Wants Ā to Ā watch Ā  •  Rate Ā  •  Read Ā  •  Quote Ā  •  Wants Ā to Ā read Ā  •  Achieve Ā  •  High Ā score Ā  •  Bike Ā  •  Run Ā  •  Walk Ā  •  Like Ā  •  Recommend Ā  •  Follow Ā  General Ā  Music Ā  Movies Ā  Ā  & Ā TV Ā  Games Ā  Fitness Ā  Book Ā  How Ā can Ā we Ā exploit Ā these Ā links Ā and Ā rela3onships? Ā 
  • 91.
    Ā  Ā  Ā  Ā  Ā  Ā Facebook Ā Graph Ā Search Ā  EUCLID Ā -­‐ Ā Providing Ā Linked Ā Data Ā  91 Ā  •  focuse Ā on Ā people Ā and Ā their Ā interests, Ā exploi3ng Ā  how Ā everything Ā is Ā related Ā to Ā each Ā other Ā  •  Queries Ā are Ā specified Ā using Ā natural Ā language Ā  •  Takes Ā advantage Ā of Ā context Ā and Ā suggest Ā possible Ā  queries Ā  Ā  •  Allows Ā for Ā building Ā more Ā complex Ā (expressive) Ā  queries Ā that Ā are Ā not Ā possible Ā with Ā normal Ā search: Ā  –  For Ā example, Ā ā€œmusic Ā liked Ā by Ā me Ā and Ā friends Ā who Ā live Ā in Ā  my Ā cityā€ Ā 
  • 92.
    Ā  Ā  Ā  Ā  Ā  Ā Facebook Ā Graph Ā Search Ā (2) Ā  EUCLID Ā -­‐ Ā Providing Ā Linked Ā Data Ā  92 Ā  Ā  Ā  Context Ā (informa3on Ā from Ā profile): Ā  Graph Ā search Ā sugges9ons: Ā 
  • 93.
    Ā  Ā  Ā  Ā  Ā  Ā Facebook Ā Graph Ā Search Ā (3) Ā  EUCLID Ā -­‐ Ā Providing Ā Linked Ā Data Ā  93 Ā  Results Ā 
  • 94.
    Ā  Ā  Ā  Ā  Ā  Ā Facebook Ā Graph Ā Search Ā (4) Ā  EUCLID Ā -­‐ Ā Providing Ā Linked Ā Data Ā  94 Ā  Observations Ā  Ā  •  Allows Ā for Ā conjunc3ve Ā queries Ā (applying  filter Ā over Ā intermediate Ā  results Ā = Ā ā€œapply Ā operatorā€) Ā  •  Disjunc9ve Ā queries Ā are Ā not Ā supported: Ā  –  For Ā example: Ā ā€œMy Ā friends Ā who Ā like Ā Seman3cWeb.com Ā OR Ā ReadWriteā€ Ā  Ā  •  Post Ā search Ā is Ā not Ā supported Ā  –  It Ā is Ā not Ā possible Ā to Ā search Ā in Ā post Ā content Ā submijed Ā to Ā the Ā 3meline Ā  •  User Ā privacy Ā segngs Ā affect Ā the Ā results Ā 
  • 95.
    Tools Ā for Ā providing Ā Linked Ā Data Ā  •  Extrac9ng Ā data Ā from Ā spreadsheets: Ā OpenRefine Ā  •  Extrac9ng Ā data Ā from Ā RDBMS: Ā R2RML Ā  •  Extrac9ng Ā data Ā from Ā text: Ā Zemanta, Ā OpenCalais, Ā GATE Ā  •  Interlinking Ā data Ā sets: Ā Silk Ā 
  • 96.
    EXTRACTING Ā DATA Ā FROM Ā  SPREADSHEETS Ā WITH Ā OPENREFINE Ā  EUCLID Ā -­‐ Ā Providing Ā Linked Ā Data Ā  96 Ā 
  • 97.
    Integrate Ā Chart Ā Data Ā  •  Task: Ā Integrate Ā latest Ā chart Ā  informa3on Ā into Ā your Ā RDF Ā  database. Ā  •  Data Ā may Ā be Ā available Ā in Ā non-­‐ RDF Ā formats: Ā  –  Plain Ā text Ā  –  CSV, Ā TSV, Ā separator-­‐based Ā  files Ā  –  HTML Ā tables Ā  –  Spreadsheets Ā  (OpenDocument, Ā Excel,  …) Ā  –  XML Ā  –  JSON Ā  –  … Ā  97 Ā  LD Ā Data Ā set Ā Access Ā  Integrated Ā  Data Ā Set Ā  Interlinking Ā  Cleansing Ā  Vocabulary Ā  Mapping Ā  SPARQL Ā  Endpoint Ā  Publishing Ā  CSV/ Ā  TSV Ā  HTML Ā  Spreadsheets Ā  JSON Ā  Data Ā acquisi3on Ā  EUCLID Ā -­‐ Ā Providing Ā Linked Ā Data Ā 
  • 98.
    Example Ā Data Ā  TheBeatles, 250 million Elvis Presley, 203.3 million Michael Jackson, 157.4 million Madonna, 160.1 million Led Zeppelin, 135.5 million Queen, 90.5 million Ā  Ā  98 Ā  hjp://en.wikipedia.org/wiki/ Ā  List_of_best-­‐selling_music_ar3sts Ā  Ar3st Ā  Country Ā of Ā  origin Ā  Period Ā  ac3ve Ā  Release-­‐year Ā  of  first Ā  charted Ā  record Ā  Total Ā cer3fied Ā units Ā  (from Ā available Ā markets)[Notes] Ā  The Ā Beatles Ā  United Ā  Kingdom Ā  1960– 1970[4] Ā  1962[4] Ā  Total Ā available Ā cer9fied Ā units: Ā  Ā  250 Ā million[show] Ā  Elvis Ā Presley Ā  United Ā  States Ā  1954– 1977[28] Ā  1954[28] Ā  Total Ā available Ā cer9fied Ā units: Ā  203.3 Ā million[show] Ā  Michael Ā  Jackson[Note Ā 2] Ā  United Ā  States Ā  1964– 2009[32] Ā  1971[32] Ā  Total Ā available Ā cer9fied Ā units: Ā  157.4 Ā million[show] Ā  Madonna Ā  United Ā  States Ā  1979– present[44] Ā  1982[44] Ā  Total Ā available Ā cer9fied Ā units: Ā  160.1 Ā million[show] Ā  Led Ā Zeppelin Ā  United Ā  Kingdom Ā  1968– 1980[50] Ā  1969[50] Ā  Total Ā available Ā cer9fied Ā units: Ā  135.5 Ā million[show] Ā  Queen Ā  United Ā  Kingdom Ā  1971– present[53] Ā  1973[53] Ā  Total Ā available Ā cer9fied Ā units: Ā  Ā  90.5 Ā million[show] Ā  { "artist": { "class": "artist", "name": "The Beatles" }, "rank": 1, "value": 250 million }, … CSV Ā  JSON Ā  HTML Ā tables Ā  EUCLID Ā -­‐ Ā Providing Ā Linked Ā Data Ā 
  • 99.
    Ā  Ā  Ā  Ā  Ā  Ā  Ā  Ā  Ā  Ā  Ā  Ā  Ā  Ā  Ā  Ā  Ā  Ā  Ā OpenRefine Ā  Ā  •  transforms Ā and Ā cleans Ā messy Ā  input Ā data Ā sets. Ā  Ā  •  is Ā an Ā open-­‐source Ā successor Ā of Ā  Google Ā Refine. Ā  Ā  •  allows Ā for Ā en3ty Ā reconcilia3on Ā  against Ā SPARQL Ā endpoints Ā or Ā  RDF Ā data. Ā  Ā  •  is Ā extended Ā with Ā plugins Ā that Ā  enhance Ā its Ā func3onality, Ā e.g. Ā  for Ā RDF Ā support. Ā  99 Ā EUCLID Ā -­‐ Ā Providing Ā Linked Ā Data Ā  Quick Ā Facts Ā 
  • 100.
    Use Ā of Ā OpenRefine Ā  100 Ā  1.  Messy Ā input Ā data Ā is Ā  imported, Ā  Ā transformed Ā  into Ā a Ā table Ā represen-­‐ ta3on Ā and Ā cleaned. Ā  3.  Define Ā the Ā structure Ā of Ā  the Ā RDF Ā output. Ā  Ā  4.  The Ā data Ā is Ā exported Ā  into Ā some Ā RDF Ā syntax. Ā  2.  En3ty Ā reconcilia3on Ā is Ā  applied Ā to Ā allow Ā for Ā  interlinking Ā with Ā  exis3ng Ā data Ā sets. Ā  The Beatles, 250 million Elvis Presley, 203.3 million Michael Jackson, 157.4 million Madonna, 160.1 million Led Zeppelin, 135.5 million Queen, 90.5 million Ā  Ā  CSV Ā  musicbrainz:b10bbbfc-cf9e-42e0-be17-e2c3e1d2600d :totalSales "25000000000"^^xsd:int . musicbrainz:01809552-4f87-45b0-afff-2c6f0730a3be :totalSales "2.033E10"^^xsd:int . musicbrainz:f27ec8db-af05-4f36-916e-3d57f91ecf5e :totalSales "1.574E10"^^xsd:int . musicbrainz:79239441-bfd5-4981-a70c-55c3f15c1287 :totalSales "1.601E10"^^xsd:int . musicbrainz:678d88b2-87b0-403b-b63d-5da7465aecc3 :totalSales "1.355E10"^^xsd:int . musicbrainz:0383dadf-2a4e-4d10-a46a-e9e041da8eb3 :totalSales "9.05E9"^^xsd:int . RDF Ā  EUCLID Ā -­‐ Ā Providing Ā Linked Ā Data Ā 
  • 101.
    Typical Ā steps: Ā  • Group Ā and Ā explore Ā data Ā  items Ā  •  Dele3ng Ā columns Ā or Ā rows Ā  based Ā on  filter Ā condi3on Ā  •  Split Ā columns Ā into Ā several Ā  columns Ā based Ā on Ā  condi3on Ā  •  Modify Ā messy Ā data Ā items Ā  with Ā GREL, Ā a Ā powerful Ā  expression Ā language Ā  •  Replay Ā steps Ā from Ā a Ā  previous Ā Refine Ā project Ā  101 Ā EUCLID Ā -­‐ Ā Providing Ā Linked Ā Data Ā  Data Ā Transformation Ā 
  • 102.
    How Ā to Ā Generate Ā RDF? Ā  •  Addi3onal Ā problem: Ā data Ā needs Ā to Ā be Ā interlinked Ā  with Ā exis3ng Ā MusicBrainz Ā data Ā  •  This Ā is Ā the Ā point Ā where Ā plugins Ā come Ā into Ā play: Ā  –  RDF Ā Refine: Ā developed Ā by Ā DERI Ā  –  An Ā extension Ā of Ā OpenRefine Ā to Ā support Ā RDF Ā  102 Ā  ? Ā  RDF Ā  EUCLID Ā -­‐ Ā Providing Ā Linked Ā Data Ā 
  • 103.
    Core Ā Capabilities Ā  • Interlinking Ā of Ā data Ā by Ā en3ty Ā reconcilia3on Ā  – Against Ā SPARQL Ā endpoints, Ā RDF Ā dumps Ā  – Discovery Ā of Ā relevant Ā RDF Ā data Ā sets Ā  Ā  •  RDF Ā export Ā with Ā the Ā help Ā of Ā RDF Ā skeletons Ā  – Define Ā the Ā vocabulary Ā and Ā graph Ā structure Ā of Ā the Ā  RDF Ā serializa3on Ā  – In Ā Turtle, Ā RDF/XML Ā  103 Ā EUCLID Ā -­‐ Ā Providing Ā Linked Ā Data Ā 
  • 104.
    Typical Ā steps: Ā  • Define Ā a Ā reconcilia3on Ā service Ā  •  Select Ā specific Ā types Ā to Ā reconcile Ā against Ā  •  Start Ā reconciling Ā a Ā column Ā against Ā the Ā  service Ā  104 Ā EUCLID Ā -­‐ Ā Providing Ā Linked Ā Data Ā  Entity Ā Reconciliation Ā 
  • 105.
    Define Ā RDF Ā Skeletons Ā  •  An Ā RDF Ā skeleton Ā defines Ā the Ā structure Ā of Ā the Ā  RDF Ā triples Ā that Ā are Ā exported Ā  EUCLID Ā -­‐ Ā Providing Ā Linked Ā Data Ā  105 Ā 
  • 106.
    RDF Ā Skeletons Ā  03.09.13 Ā  106 Ā 106 Ā EUCLID Ā -­‐ Ā Providing Ā Linked Ā Data Ā 
  • 107.
    EXTRACTING Ā DATA Ā FROM Ā  RDBMS Ā WITH Ā R2RML Ā  EUCLID Ā -­‐ Ā Providing Ā Linked Ā Data Ā  107 Ā 
  • 108.
    W3C Ā RDB2RDF Ā  • Task: Ā Integrate Ā data Ā from Ā  rela3onal Ā DBMS Ā with Ā Linked Ā  Data Ā  •  Approach: Ā map Ā from Ā  rela3onal Ā schema Ā to Ā seman3c Ā  vocabulary Ā with Ā R2RML Ā  •  Publishing: Ā two Ā alterna3ves  – Ā  –  Translate Ā SPARQL Ā into Ā  SQL Ā on Ā the  fly Ā  –  Batch Ā transform Ā data Ā into Ā  RDF, Ā index Ā and Ā provide Ā  SPARQL Ā access Ā in Ā a Ā  triplestore Ā  108 Ā  LD Ā Data Ā set Ā Access Ā  Integrated Ā  Data Ā in Ā  Triplestore Ā  Interlinking Ā  Cleansing Ā  Vocabulary Ā  Mapping Ā  SPARQL Ā  Endpoint Ā  Publishing Ā  Data Ā acquisi3on Ā  EUCLID Ā -­‐ Ā Providing Ā Linked Ā Data Ā  R2RML Ā  Engine Ā  Rela3onal Ā  DBMS Ā 
  • 109.
    W3C Ā RDB2RDF Ā  • The Ā W3C Ā made, Ā last Ā year, Ā two Ā recommenda3ons Ā for Ā  mapping Ā between Ā rela3onal Ā databases Ā and Ā RDF: Ā  –  Direct Ā mapping Ā directly Ā exposes Ā data Ā as Ā RDF Ā  •  Not Ā allowance Ā for Ā vocabulary Ā  Ā mapping Ā  •  No Ā allowance Ā for Ā interlinking Ā (unless Ā URIs Ā used Ā in Ā rela3onal Ā data) Ā  •  Not Ā appropriate Ā for Ā this Ā topic Ā  – R2RML, Ā the Ā RDB Ā to Ā RDF Ā mapping Ā language Ā  •  Allows Ā vocabulary Ā mapping Ā (subject, Ā predicate Ā and Ā  object Ā maps Ā with Ā class Ā op3ons) Ā  •  Allows Ā interlinking  – Ā URIs Ā can Ā be Ā constructed Ā  •  Means Ā to Ā provide Ā MusicBrainz Ā RDF/SPARQL Ā itself Ā  EUCLID Ā -­‐ Ā Providing Ā Linked Ā Data Ā  109 Ā  hjp://www.w3.org/2001/sw/rdb2rdf/ Ā 
  • 110.
    MusicBrainz Ā Next Ā Gen Ā Schema Ā  EUCLID Ā -­‐ Ā Providing Ā Linked Ā Data Ā  110 Ā  •  Ar9st Ā  Ā As Ā pre-­‐NGS, Ā but Ā  Ā  Ā  Ā  Ā  Ā  Ā further Ā ajributes Ā  •  Ar9st Ā Credit Ā  Ā Allows Ā joint Ā credit Ā  •  Release Ā Group Ā  Ā Cf. Ā ā€˜album’ Ā  Ā  Ā  Ā  Ā  Ā versus: Ā  •  Release Ā  •  Medium Ā  Ā  Ā  Ā  •  Track Ā  •  Track Ā List Ā  •  Work Ā  •  Recording Ā  Source: Ā hjps://wiki.musicbrainz.org/Next_Genera3on_Schema Ā 
  • 111.
    Music Ā Ontology Ā  • OWL Ā ontology Ā with Ā following Ā core Ā concepts Ā  (classes) Ā and Ā rela3onships Ā (proper3es): Ā  EUCLID Ā -­‐ Ā Providing Ā Linked Ā Data Ā  111 Ā  Source: Ā hjp://musicontology.com Ā 
  • 112.
    R2RML Ā Class Ā Mapping Ā  •  Mapping Ā tables Ā to Ā classes Ā is Ā ā€˜easy’: Ā  lb:Artist Ā a Ā rr:TriplesMap Ā ; Ā  Ā  Ā rr:logicalTable Ā [rr:tableName Ā "artist"] Ā ; Ā  Ā  Ā rr:subjectMap Ā  Ā  Ā  Ā  Ā  Ā [rr:class Ā mo:MusicArtist Ā ; Ā  Ā  Ā  Ā  Ā  Ā rr:template Ā  Ā  Ā  Ā  Ā  Ā  Ā  Ā  Ā  Ā  Ā  Ā "http://musicbrainz.org/artist/{gid}#_"] Ā ; Ā  Ā  Ā rr:predicateObjectMap Ā  Ā  Ā  Ā  Ā  Ā [rr:predicate Ā mo:musicbrainz_guid Ā ; Ā  Ā  Ā  Ā  Ā  Ā rr:objectMap Ā [rr:column Ā "gid" Ā ; Ā  Ā  Ā  Ā  Ā  Ā  Ā  Ā  Ā  Ā  Ā  Ā  Ā  Ā  Ā  Ā  Ā  Ā  Ā  Ā  Ā rr:datatype Ā xsd:string]] Ā . Ā  EUCLID Ā -­‐ Ā Providing Ā Linked Ā Data Ā  112 Ā 
  • 113.
    R2RML Ā Property Ā Mapping Ā  •  Mapping Ā columns Ā to Ā proper3es Ā can Ā be Ā easy: Ā  lb:artist_name Ā a Ā rr:TriplesMap Ā ; Ā  Ā  Ā rr:logicalTable Ā [rr:sqlQuery Ā  Ā  Ā  Ā  Ā  Ā """SELECT Ā artist.gid, Ā artist_name.name Ā  Ā  Ā  Ā  Ā  Ā  Ā  Ā  Ā  Ā FROM Ā artist Ā  Ā  Ā  Ā  Ā  Ā  Ā  Ā  Ā  Ā INNER Ā JOIN Ā artist_name Ā ON Ā artist.name Ā = Ā  artist_name.id"""] Ā ; Ā  Ā  Ā rr:subjectMap Ā lb:sm_artist Ā ; Ā  Ā  Ā rr:predicateObjectMap Ā  Ā  Ā  Ā  Ā  Ā [rr:predicate Ā foaf:name Ā ; Ā  Ā  Ā  Ā  Ā  Ā rr:objectMap Ā [rr:column Ā "name"]] Ā . Ā  EUCLID Ā -­‐ Ā Providing Ā Linked Ā Data Ā  113 Ā 
  • 114.
    NGS Ā Advanced Ā Relations Ā  EUCLID Ā -­‐ Ā Providing Ā Linked Ā Data Ā  114 Ā  •  Major Ā en33es Ā (Ar3st, Ā Release Ā Group, Ā Track, Ā etc.) Ā plus Ā  URL Ā are Ā paired Ā  Ā (l_ar3st_ar3st) Ā  •  Each Ā pairing Ā  Ā of Ā instances Ā  Ā refers Ā to Ā a Ā Link Ā  •  Links Ā have Ā types Ā  Ā  Ā (cf. Ā RDF Ā proper3es) Ā  Ā and Ā ajributes Ā  Ā  Ā  Ā  Source: Ā hjp://wiki.musicbrainz.org/Advanced_Rela3onship Ā 
  • 115.
    R2RML Ā Advanced Ā Mapping Ā  •  Mapping Ā advanced Ā rela3onships Ā (SQL Ā joins): Ā  lb:artist_member Ā a Ā rr:TriplesMap Ā ; Ā  Ā  Ā rr:logicalTable Ā [rr:sqlQuery Ā  Ā  Ā  Ā  Ā """SELECT Ā a1.gid, Ā a2.gid Ā AS Ā band Ā  Ā  Ā  Ā  Ā  Ā  Ā  Ā FROM Ā artist Ā a1 Ā  Ā  Ā  Ā  Ā  Ā  Ā  Ā  Ā  Ā INNER Ā JOIN Ā l_artist_artist Ā ON Ā a1.id Ā = Ā  l_artist_artist.entity0 Ā  Ā  Ā  Ā  Ā  Ā  Ā  Ā  Ā  Ā  Ā INNER Ā JOIN Ā link Ā ON Ā l_artist_artist.link Ā = Ā link.id Ā  Ā  Ā  Ā  Ā  Ā  Ā  Ā  Ā  Ā  Ā INNER Ā JOIN Ā link_type Ā ON Ā link_type Ā = Ā link_type.id Ā  Ā  Ā  Ā  Ā  Ā  Ā  Ā  Ā  Ā  Ā INNER Ā JOIN Ā artist Ā a2 Ā on Ā l_artist_artist.entity1 Ā = Ā a2.id Ā  Ā  Ā  Ā  Ā  Ā  Ā  Ā  Ā WHERE Ā link_type.gid='5be4c609-­‐9afa-­‐4ea0-­‐910b-­‐12ffb71e3821' Ā  Ā  Ā  Ā  Ā  Ā  Ā  Ā  Ā  Ā AND Ā link.ended=FALSE"""] Ā ; Ā  Ā  Ā rr:subjectMap Ā lb:sm_artist Ā ; Ā  Ā  Ā rr:predicateObjectMap Ā  Ā  Ā  Ā  Ā  Ā [rr:predicate Ā mo:member_of Ā ; Ā  Ā  Ā  Ā  Ā  Ā rr:objectMap Ā [rr:template Ā "http://musicbrainz.org/artist/ {band}#_" Ā ; Ā  Ā  Ā  Ā  Ā  Ā  Ā  Ā  Ā  Ā  Ā  Ā  Ā  Ā  Ā  Ā  Ā  Ā  Ā  Ā rr:termType Ā rr:IRI]] Ā . Ā  EUCLID Ā -­‐ Ā Providing Ā Linked Ā Data Ā  115 Ā 
  • 116.
    EXTRACTING Ā DATA Ā FROM Ā TEXT Ā  EUCLID Ā -­‐ Ā Providing Ā Linked Ā Data Ā  116 Ā 
  • 117.
    OpenCalais Ā  Ā  Ā  •  Not Ā easily Ā customised/extended Ā  •  Domain-­‐specific Ā coverage Ā varies Ā  EUCLID Ā -­‐ Ā Providing Ā Linked Ā Data Ā  117 Ā  Source: Ā hjp://viewer.opencalais.com/ Ā 
  • 118.
    DBpedia Ā Spotlight Ā  Ā  Ā  Ā  •  Not Ā easily Ā customised/extended Ā  •  Is Ā currently Ā only Ā available Ā for Ā English Ā  EUCLID Ā -­‐ Ā Providing Ā Linked Ā Data Ā  118 Ā  Source: Ā hjp://dbpedia-­‐spotlight.github.com/demo/ Ā  hjp://dbpedia.org/page/Slowcore Ā  hjp://dbpedia.org/page/Dorothy_Parker Ā 
  • 119.
    Zemanta Ā  EUCLID Ā -­‐ Ā Providing Ā Linked Ā Data Ā  119 Ā  Source: Ā hjp://www.zemanta.com/demo/ Ā  •  Common Ā problem Ā with Ā general Ā purpose, Ā open-­‐domain Ā  Ā  Ā  Ā seman3c Ā  Ā  Ā annota3on Ā  Ā  Ā tools Ā  •  Best Ā results Ā  Ā  Ā require Ā  Ā  Ā bespoke Ā  Ā  Ā customisa3on Ā 
  • 120.
    •  General Ā Architecture Ā for Ā Text Ā Engineering Ā  •  Free Ā open-­‐source Ā (LGPL) Ā  Ā framework Ā and Ā development Ā environment Ā  •  Started Ā 1996, Ā large Ā developer Ā community Ā  •  Used Ā worldwide Ā by Ā many Ā organisa3ons Ā to Ā  build Ā bespoke Ā solu3ons; Ā e.g. Ā Press Ā Associa3on Ā  and Ā the Ā Na3onal Ā Archive Ā  •  Informa3on Ā Extrac3on Ā in Ā many Ā languages Ā  GATE Ā  EUCLID Ā -­‐ Ā Providing Ā Linked Ā Data Ā  120 Ā  hjp://www.gate.ac.uk/ Ā 
  • 121.
    •  Increases Ā recall Ā over Ā DBpedia Ā by Ā deriving Ā new Ā  lexicalisa3ons Ā for Ā URIs Ā from Ā link Ā anchor Ā texts, Ā  disambigua3on Ā pages, Ā and Ā redirect Ā pages Ā  GATE Ā Example Ā -­‐ Ā LODIE Ā  EUCLID Ā -­‐ Ā Providing Ā Linked Ā Data Ā  121 Ā 
  • 122.
    Precision Ā and Ā Recall Ā  •  Generic Ā services Ā typically Ā very Ā low Ā recall Ā  •  Combina3on Ā is Ā one Ā solu3on Ā  •  Other Ā solu3on Ā is Ā custom Ā extrac3on Ā  122 Ā  PER LOC ORG TOTAL DB Ā Spotlight 0.97 Ā / Ā 0.40 0.82 Ā / Ā 0.46 0.86 Ā / Ā 0.31 0.85 Ā / Ā 0.39 Zemanta 0.96 Ā / Ā 0.84 0.89 Ā / Ā 0.62 0.82 Ā / Ā 0.57 0.90 Ā / Ā 0.68 LODIE 0.81 Ā / Ā 0.82 0.73 Ā / Ā 0.76 0.56 Ā / Ā 0.59 0.71 Ā / Ā 0.74 Zemanta  ∩ Ā LODIE 1.00 Ā / Ā 0.74 0.95 Ā / Ā 0.45 0.97 Ā / Ā 0.42 0.97 Ā / Ā 0.54 Zemanta Ā U Ā LODIE 0.94 Ā / Ā 0.93 0.77 Ā / Ā 0.76 0.72 Ā / Ā 0.71 0.82 Ā / Ā 0.81 EUCLID Ā -­‐ Ā Providing Ā Linked Ā Data Ā 
  • 123.
    Custom Ā GATE Ā Gazetteer Ā  •  Retrieve Ā MusicBrainz Ā  Ā en3ty/label/class Ā  Ā  Ā with Ā SPARQL Ā query Ā  123 Ā EUCLID Ā -­‐ Ā Providing Ā Linked Ā Data Ā 
  • 124.
    GATECloud Ā  •  Custom Ā (e.g. Ā based Ā around Ā custom Ā gazejeer) Ā  GATE Ā pipelines Ā can Ā be Ā executed Ā on Ā the Ā cloud: Ā  124 Ā EUCLID Ā -­‐ Ā Providing Ā Linked Ā Data Ā 
  • 125.
    INTERLINKING Ā DATA Ā SETS Ā WITH Ā  SILK Ā  EUCLID Ā -­‐ Ā Providing Ā Linked Ā Data Ā  125 Ā 
  • 126.
    Interlinking Ā with Ā Silk Ā  •  Task: Ā Create Ā links Ā between Ā  the Ā data Ā set Ā and Ā external Ā  Linked Ā Data Ā sources. Ā  •  Approach: Ā Crea3on Ā of Ā  specified Ā links Ā by Ā querying Ā  the Ā target Ā data Ā sets Ā  •  Alterna9ves: Ā  –  Manual Ā crea3on Ā of Ā  linkage Ā rules Ā by Ā the Ā user Ā  Ā  –  Automa3c Ā learning Ā  linkage Ā rules Ā by Ā  submi…ng Ā predefined Ā  SPARQL Ā queries Ā  126 Ā  LD Ā Data Ā set Ā Access Ā  Integrated Ā  Data Ā Set Ā  Interlinking Ā  Cleansing Ā  Vocabulary Ā  Mapping Ā  SPARQL Ā  Endpoint Ā  Publishing Ā  CSV/ Ā  TSV Ā  HTML Ā  Spreadsheets Ā  JSON Ā  Data Ā acquisi3on Ā  EUCLID Ā -­‐ Ā Providing Ā Linked Ā Data Ā 
  • 127.
    Link Ā Discovery Ā with Ā Silk Ā  •  Open Ā source Ā tool Ā for Ā discovering Ā RDF Ā links Ā between Ā  data Ā items Ā within Ā different Ā Linked Ā Data Ā sources Ā  •  It Ā is Ā based Ā on Ā the Ā Silk Ā Link Ā Specifica3on Ā Language Ā  (Silk-­‐LSL) Ā for Ā expressing Ā linkage Ā rules Ā  •  It Ā accesses Ā the Ā target Ā RDF Ā data Ā sets Ā via Ā SPARQL Ā  endpoints Ā to Ā generate Ā RDF Ā links Ā  EUCLID Ā -­‐ Ā Providing Ā Linked Ā Data Ā  127 Ā  Source: Ā Robert Ā Isele. Ā ā€œLOD2 Ā Webinar Ā Series:Silkā€ Ā  Ā 
  • 128.
    Silk Ā Variants Ā  • Silk Ā Single Ā Machine Ā  •  Generates Ā RDF Ā links Ā on Ā a Ā single Ā machine Ā  •  Data Ā sets Ā can Ā reside Ā either Ā locally Ā or Ā in Ā remote Ā machines Ā  •  Provides Ā mul3threading Ā and Ā caching Ā  •  Silk Ā MapReduce Ā  •  Uses Ā a Ā cluster Ā composed Ā of Ā mul3ple Ā machines Ā  •  Based Ā on Ā Hadoop Ā and Ā designed Ā to Ā scale Ā to Ā big Ā data Ā sets Ā  Ā  •  Silk Ā Server Ā  •  Used Ā within Ā applica3ons Ā that Ā consume Ā Linked Ā Data Ā from Ā the Ā Web Ā  while Ā keeping Ā track Ā of Ā known Ā en33es Ā  Ā  •  Provides Ā an Ā HTTP Ā API Ā for Ā matching Ā en33es Ā from Ā an Ā incoming Ā  stream Ā  EUCLID Ā -­‐ Ā Providing Ā Linked Ā Data Ā  128 Ā  Source: Ā hjp://wifo5-­‐03.informa3k.uni-­‐mannheim.de/bizer/silk/ Ā 
  • 129.
    Source: Ā Silk Ā workflow Ā is Ā par3ally Ā based Ā on Ā ā€œLOD2 Ā Webinar Ā Series: Ā Silk Ā -­‐(Simplified) Ā  Linking Ā Workflowā€ Ā by Ā Rober Ā Isele. Ā  Ā  Ā  Ā  Ā  Silk Ā Workflow Ā  EUCLID Ā -­‐ Ā Providing Ā Linked Ā Data Ā  129 Ā  Select Ā LD Ā  data Ā sets Ā  •  Iden3fy Ā  suitable Ā data Ā  sets Ā in Ā LD Ā  catalogs* Ā  •  Select Ā the Ā two Ā  data Ā sets Ā to Ā  link Ā  Specify Ā LD Ā  data Ā sets Ā  •  Specify Ā the Ā  access Ā method Ā  to Ā the Ā data Ā set Ā  (RDF Ā dump, Ā  SPARQL Ā  endpoint)* Ā  •  Specify Ā the Ā  en3ty Ā types Ā to Ā  be Ā linked Ā  Write Ā  linkage Ā rule Ā  •  Specifies Ā how Ā  to Ā compare Ā  the Ā resources Ā  •  Use Ā Silk-­‐LSL Ā  •  The Ā rules Ā can Ā  also Ā be Ā learnt Ā  Ā  Generate Ā  RDF Ā links Ā  •  Output Ā links Ā  can Ā be Ā stored Ā  in Ā a  file Ā or Ā a Ā  triple Ā store Ā  •  Can Ā discover Ā  SKOS Ā links Ā  Silk Ā framework Ā  * Ā See Ā sec3on Ā ā€œPublishing Ā Linked Ā Dataā€ Ā  Ā 
  • 130.
    Linkage Ā Rule Ā Components Ā  •  Linkage Ā rules Ā define Ā the Ā condi3ons Ā to Ā create Ā the Ā links Ā  between Ā the Ā data Ā sets. Ā These Ā rules Ā are Ā composed Ā of: Ā  EUCLID Ā -­‐ Ā Providing Ā Linked Ā Data Ā  130 Ā  Source: Ā hjp://wifo5-­‐03.informa3k.uni-­‐mannheim.de/bizer/silk/ Ā  RDF Ā Paths Ā  •  Describe Ā the Ā elements Ā to Ā be Ā  compared Ā  •  Example: Ā ?a/rdfs:label Ā  Ā  Transforma9ons Ā  •  Apply Ā transforma3ons Ā to Ā the Ā  result Ā set Ā of Ā an Ā RDF Ā path Ā  •  Examples: Ā LowerCase, Ā  Concatenate, Ā Replace,  … Ā  Comparators Ā  •  Compute Ā the Ā similarity Ā of Ā two Ā  inputs Ā  •  Examples: Ā String Ā similarity Ā  metrics, Ā Date Ā similarity,  … Ā  Ā  Aggrega9ons Ā  •  Compute Ā an Ā aggregated Ā value Ā  from Ā mul3ple Ā comparators Ā  •  Examples: Ā Min, Ā Max, Ā Avg, Ā various Ā  means, Ā Euclidian Ā distance  … Ā  1 Ā  2 Ā  3 Ā  4 Ā 
  • 131.
    Silk Ā Workbench Ā  • Web Ā applica3on Ā built Ā on Ā top Ā of Ā Silk, Ā which Ā allows Ā the Ā  crea3on Ā of Ā projects Ā to Ā manage Ā the Ā crea3on Ā of Ā links Ā  between Ā RDF Ā data Ā sets Ā  •  The Ā data Ā sets Ā can Ā be Ā stored Ā locally Ā or Ā accessed Ā  remotely Ā by Ā specifying Ā the Ā SPARQL Ā endpoint Ā  •  The Ā user Ā is Ā able Ā to Ā create Ā customized Ā linking Ā tasks: Ā  –  The Ā tool Ā offers Ā a Ā graphical Ā editor Ā to Ā create Ā linkage Ā rules Ā by Ā  combining Ā the Ā linkage Ā rules Ā components Ā via Ā drag Ā & Ā drop Ā  elements Ā  –  Includes Ā support Ā for Ā (automa3c) Ā learning Ā linkage Ā rules Ā  EUCLID Ā -­‐ Ā Providing Ā Linked Ā Data Ā  131 Ā 
  • 132.
    Project Ā configuration Ā  Ā  Ā  Ā  Ā  Ā  Silk Ā Workbench Ā (2) Ā  EUCLID Ā -­‐ Ā Providing Ā Linked Ā Data Ā  132 Ā  1 Ā  2 Ā  3 Ā  4 Ā  1.  Project: Ā name Ā and Ā components Ā (data Ā  sources, Ā  Ā linking Ā tasks Ā and Ā output Ā tasks) Ā  2.  Data Ā sources: Ā specifica3on Ā of Ā the Ā data Ā  sets Ā to Ā be Ā interlinked Ā  3.  Linking Ā task: Ā specifica3on Ā of Ā the Ā linkage Ā  rules Ā and Ā type Ā of Ā links Ā to Ā be Ā created Ā  4.  Output Ā task: Ā mechanism Ā to Ā store Ā the Ā  results Ā from Ā the Ā lnking Ā process Ā  2 Ā 
  • 133.
    Editing Ā a Ā linking Ā task Ā  Ā  Ā  Ā  Ā  Ā  Ā  Ā  Silk Ā Workbench Ā (3) Ā  EUCLID Ā -­‐ Ā Providing Ā Linked Ā Data Ā  133 Ā  1 Ā  4 Ā  2 Ā  3 Ā  1.  Linkage Ā rule Ā components Ā  Ā  2.  Graphical Ā editor: Ā the Ā items Ā from Ā (1) Ā are Ā dragged Ā & Ā  dropped Ā in Ā this Ā area, Ā and Ā connected Ā to Ā compose Ā the Ā  linkage Ā rules Ā  Ā  3.  Generate Ā links: Ā based Ā on Ā the Ā defined Ā linkage Ā rules Ā in Ā (2), Ā  the Ā data Ā sets Ā are Ā accessed Ā to Ā discover Ā possible Ā links Ā  4.  Learn: Ā automa3c Ā learning Ā of Ā linkage Ā rules Ā 
  • 134.
    Adding Ā a Ā linkage Ā rule Ā  Ā  Ā  Ā  Ā  Ā  Ā  Ā  Silk Ā Workbench Ā (4) Ā  EUCLID Ā -­‐ Ā Providing Ā Linked Ā Data Ā  134 Ā  The Ā previous Ā linkage Ā rule Ā states: Ā  1.  Retrieve Ā the Ā foaf:name Ā values Ā from Ā MusicBrainz Ā and Ā  the Ā rdfs:label Ā from Ā DBpedia Ā  2.  Apply Ā lower Ā case Ā transforma3on Ā to Ā the Ā output Ā of Ā (1) Ā  3.  Compare Ā the Ā output Ā from Ā (2) Ā using Ā the Ā metric Ā  ā€œLevenshtein Ā distanceā€. Ā If Ā this Ā distance Ā is Ā greater Ā than Ā  0.90, Ā then Ā create Ā a Ā link. Ā  1 Ā  2 Ā  3 Ā 
  • 135.
    Generate Ā Links Ā  Ā  Ā  Ā  Ā  Ā  Silk Ā Workbench Ā (5) Ā  EUCLID Ā -­‐ Ā Providing Ā Linked Ā Data Ā  135 Ā 
  • 136.
    Learn Ā Rules Ā  Ā  Ā  Ā  Ā  Ā  Ā  Silk Ā Workbench Ā (6) Ā  EUCLID Ā -­‐ Ā Providing Ā Linked Ā Data Ā  136 Ā 
  • 137.
    For Ā exercises, Ā quiz Ā and Ā further Ā material Ā visit Ā our Ā website: Ā  Ā  EUCLID Ā -­‐ Ā Providing Ā Linked Ā Data Ā  137 Ā  @euclid_project Ā  euclidproject Ā  euclidproject Ā  http://www.euclid-­‐project.eu Ā  Other Ā channels: Ā  eBook Ā  Course Ā