Ld4 dh tutorial

Linked Data for the
Humanities: methods and
techniques
Enrico Daga
The Open University
Aldo Gangemi
Università di Bologna
Tutorial @ DH2019, Utrecht, 8th July
Albert Meroño-Peñuela
Vrije Universiteit Amsterdam
Special Guest

14.00 Session I
• Linked Data in a nutshell
• Producing Linked Data
15.30 (Coffee break)
16.00 Session II
• Consuming Linked Data
• Hybrid Methods
Welcome

Invented the web in 1989
(yeah!)
Invented the semantic
web in 1994 (duh?)

“To a computer, then, the web is a ﬂat,
boring world devoid of meaning”
Tim Berners Lee, http://www.w3.org/Talks/WWW94Tim/

“This is a pity, as in fact documents on the
web describe real objects and imaginary
concepts, and give particular relationships
between them”

“Adding semantics to the web involves two things:
allowing documents which have information in
machine-readable forms, and allowing links to be
created with relationship values.”

“The Semantic Web is not a separate Web but an
extension of the current one, in which information
is given well-deﬁned meaning, better enabling
computers and people to work in cooperation.”

Linked Data is a way of publishing structured information
that allows datasets to be connected and enriched by the
means of links among their entities.
• LD uses the World Wide Web as publishing platform
• Based on W3C standards - open to everyone
• Enables your data to refer to other data
• … and other data to refer to yours!
Linked Data in a nutshell
h"ps://en.wikipedia.org/wiki/Linked_data

“Linking Open Data cloud diagram, by Richard Cyganiak and Anja Jentzsch. http://lod-cloud.net/”
Linked Open Data in 2007

Linked Data: The story so far (2009)

http://linkeddatacatalog.dws.informatik.uni-mannheim.de/state/
(crawlable) 2014

03/2019
https://lod-cloud.net/
https://lod-cloud.net/

Building blocks
The very basics

• A hierarchy of languages
• Each layer exploits and uses
capabilities of the layers below
The W3C “Layer Cake”

• A principle: hypertext
• A protocol: HTTP
• An identiﬁcation scheme: URNs/URIs
• A language: HTML
The traditional Web

• A principle: hypertext
• A protocol: HTTP
• An identiﬁcation scheme: URNs/URIs
• A language: HTML RDF
The semantic Web

• Uniform Resource Identiﬁers (URIs)
• To identify things
• HyperText Transfer Protocol (HTTP)
• To access data about them
• Resource Description Framework (RDF)
• a meta-model for data representation.
• it does not specify a particular schema
• offers a structure for representing schemas and data
• SPARQL Protocol and Query Language (SPARQL)
• To query LD databases directly on the Web
Linked Data Technology Stack

• A Uniform Resource Identiﬁer (URI) is a compact sequence of
characters that identiﬁes an abstract or physical resource.
[RFC3986]
• Syntax
URI = scheme ":" hier-part [ "?" query ] [ "#" fragment ]
• Example
foo://example.com:8042/over/there?name=ferret#nose
_/ _________________/_________/ __________/ __/
| | | | |
scheme authority path query fragment
HTTP URIs

• URIs (Unique Resource Identifiers) are used to identify things (also
called entities) in the real world
• For instance: people, places, events, companies, products, movies, etc.
A Web of Things

HTTP
Simplest thing ever
• On top of
• The Internet Protocol (IPv4)
• Domain Name System (DNS): e.g. dbpedia.org
• A Client / Server protocol: Request -> Response
• Message structure: Headers + Body (content)

Resource Description Framework
Rela%onships between things are expressed by the means of
a multi-directed, fully labeled graph
where
nodes could be resources or XMLSchema-typed values;
rela%onships are also identiﬁed by URIs
The RDF model
(the “content” of the HTTP body…)

RDF is based on an atomic element: the triple.
Triple: (subject predicate object)
- subject: a URI or a blank node
- predicate: MUST be a URI
- object: a URI, a blank node, or a literal
The RDF Triple

•
Example RDF Graph
0341Leipzig
hasAreaCode
Burkhard Jung
hasMayor
Saxony
locatedIn
51.3333
latitude
12.3833
longitude
Germany
Social Democratic Party
1958-03-07 isMemberOf
locatedIn
born
isMayorOf

• Representation of data values
• Serialization as strings
• Interpretation based on the datatype
• Literals without Datatype are treated as strings
• and can be annotated with a language (Alpha-2): @en
Literals
Leipzig
Burkhard Jung
51.3333latitude
12.3833
longitude
1958-03-07
born
isMayorOf hasMayor

• N-Triples (application/n-triples)
• Turtle (text/turtle)
• RDF/XML (application/rdf+xml)
• N-Quads (application/n-quads)
• TriG (application/trig)
• […]
RDF serializations

N-Triples
https://www.w3.org/TR/n-triples/
<http://dbpedia.org/resource/Wolfgang_Amadeus_Mozart>
<http://dbpedia.org/ontology/deathPlace>
<http://dbpedia.org/resource/Vienna> .
<http://dbpedia.org/ontology/birthDate>
"1756-1-27"^^<http://www.w3.org/2001/XMLSchema#date> .
<http://dbpedia.org/ontology/deathDate>
"1791-12-5"^^<http://www.w3.org/2001/XMLSchema#date> .
<http://dbpedia.org/ontology/birthPlace>
<http://dbpedia.org/resource/Salzburg> .

• Namespaces in XML: https://www.w3.org/TR/xml-names/
• Namespaces end either with # or /
• In serialisations, are mapped to prefixes, for brevity
• http://prefix.cc to get help with namespaces and common
prefixes
• http://dbpedia.org/resource/Wolfgang_Amadeus_Mozart
• http://dbpedia.org/resource/
• dbr:Wolfgang_Amadeus_Mozart
Namespaces

Turtle
https://www.w3.org/TR/turtle/
@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
@prefix dbr: <http://dbpedia.org/resource/> .
@prefix dbo: <http://dbpedia.org/ontology/> .
@prefix yago: <http://dbpedia.org/class/yago/> .
@prefix wikidata: <http://www.wikidata.org/entity/> .
dbr:Wolfgang_Amadeus_Mozart
rdfs:label "Wolfgang Amadeus Mozart” ;
rdf:type owl:Thing , yago:WikicatGermanClassicalComposers ,
yago:WikicatGermanComposers , dbo:Person .
dbr:Wolfgang_Amadeus_Mozart owl:sameAs wikidata:Q254 ;
dbo:deathPlace dbr:Vienna .
@prefix xsd: <http://www.w3.org/2001/XMLSchema#> .
dbr:Wolfgang_Amadeus_Mozart dbo:deathDate "1791-12-5"^^xsd:date ;
dbo:birthPlace dbr:Salzburg ;
dbo:birthDate "1756-1-27"^^xsd:date .

• SPARQL: https://www.w3.org/TR/sparql11-overview/
• OWL: https://www.w3.org/OWL/
• RIF: https://www.w3.org/TR/rif-overview/
• SHACL: https://www.w3.org/TR/shacl/
• SPIN: https://spinrdf.org/
• RDFa: https://www.w3.org/TR/rdfa-syntax/
• JSON-LD: https://json-ld.org/
Many languages

• Triple Stores: database management systems that allow to
query RDF
• RDF1.1 named graphs allow to integrate multiple RDF
documents preserving the context of each triple: g s p o
• Syntax: N-Quads
Named Graphs

1. Use URIs to identify the “things” in your data
2. Use h2p:// URIs so people (and machines) can look them
up on the web
3. When a URI is looked up, request/return a descrip%on of
the thing in RDF
4. Include links to related things (e.g. owl:sameAs)
Linked Data principles
Something very basic
http://www.w3.org/DesignIssues/LinkedData.html

• Understand URI resolution
• Grasp Content-Negotiation
• Experience graph traversal
• https://ld4humanities.github.io/ > Hands-On resources
Objectives

HTTP
Simplest thing ever
• On top of
• The Internet Protocol (IPv4)
• Domain Name System (DNS): e.g. dbpedia.org
• A Client / Server protocol: Request -> Response
• Message structure: Headers + Body
https://www.slideshare.net/randyconnolly/chapter01-presentation-16514220

Headers
• Vary between Request and Response
(two newlines)
Body
• Any data
HTTP
Message structure

GET /resource/Wolfgang_Amadeus_Mozart HTTP/1.1
Host: dbpedia.org
User-Agent: curl/7.19.7
Accept: */*
HTTP
http://dbpedia.org/resource/Wolfgang_Amadeus_Mozart
HTTP/1.1 303 See Other
Date: Wed, 03 Jul 2019 13:41:14 GMT
Content-Type: text/html; charset=UTF-8
Content-Length: 0
Connection: keep-alive
Server: Virtuoso/07.20.3230
Location: http://dbpedia.org/page/Wolfgang_Amadeus_Mozart
Access-Control-Allow-Origin: *
…
REQUESTRESPONSE

cURL is a command line tool and library for transferring data
with URLs
wURL is a simple web app that allows non-Unix users to use
cURL from a Web browser
http://purl.org/ld4dh/wurl
https://curl.haxx.se/
… let’s try …

Host: dbpedia.org
Accept: */*
HTTP
curl -v http://dbpedia.org/resource/Wolfgang_Amadeus_Mozart
Date: Wed, 03 Jul 2019 13:41:14 GMT
Content-Length: 0
Access-Control-Allow-Origin: *
…
REQUESTRESPONSE

GET /page/Wolfgang_Amadeus_Mozart HTTP/1.1
Host: dbpedia.org
Accept: */*
HTTP
curl -v http://dbpedia.org/page/Wolfgang_Amadeus_Mozart
HTTP/1.1 200 OK
Date: Wed, 03 Jul 2019 13:41:14 GMT
Content-Length: 0
[…]
<html> […]

Host: dbpedia.org
Accept: text/turtle
HTTP
-H “Accept: text/turtle”
Date: Wed, 03 Jul 2019 13:41:14 GMT
Content-Length: 0
Location: http://dbpedia.org/data/Wolfgang_Amadeus_Mozart.ttl

GET /data/Wolfgang_Amadeus_Mozart.ttl HTTP/1.1
Host: dbpedia.org
Accept: text/turtle
HTTP
curl -v http://dbpedia.org/data/Wolfgang_Amadeus_Mozart.ttl
HTTP/1.1 200 OK
Content-Type: text/turtle; charset=UTF-8
Content-Length: 50708
[…]
@prefix dbo: <http://dbpedia.org/ontology/> .
@prefix dbr: <http://dbpedia.org/resource/> .
dbr:Amadeus_Mozart dbo:wikiPageRedirects dbr:Wolfgang_Ama
dbr:The_Story_of_Mozart dbo:wikiPageRedirects dbr:Wolfga
dbr:Mozartian dbo:wikiPageRedirects dbr:Wolfgang_Amadeus_

curl -v “http://dbpedia.org/page/Wolfgang_Amadeus_Mozart"
-H “Accept: text/turtle”
curl -v “http://dbpedia.org/data/Wolfgang_Amadeus_Mozart.ttl”
-H “Accept: text/turtle” -L

• In what formats is Mozart available?
• text/html, application/rdf+xml, text/n-triples, text/turtle
• Find Mozart's image (it’s a jpg)
• When Mozart was born?
• Where Mozart died?
• How many inhabitants has the city today?
How many Mozart?
http://dbpedia.org/resource/Wolfgang_Amadeus_Mozart

• Find the location of the experience
• When did it happened?
• Who is the listener?
• What musical opera was performed?
• What is the author of the listened music?
• What is the performer?
• What is the genre
• Find information about this genre
• Can you ﬁnd other operas of the same genre?
a Listening Experience
http://data.open.ac.uk/led/lexp/1446304716352

This type of task is possible using a SPARQL endpoint:
http://dbpedia.org/sparql
A scent of SPARQL
“Find other operas of the same genre”
SELECT * WHERE {
?entity
<http://purl.org/dc/terms/subject>
<http://dbpedia.org/resource/Category:Grand_operas> .
}

1. https://www.theguardian.com/this-page-does-not-exists
2. urn:issn:23346587
3. issn:23346587
4. mailto:enrico.motta@open.ac.uk
5. dbr:Music
Quiz
Which of the following are not valid RDF IRIs?

1. Knowledge representation
1. Identify the source
2. Understand the content (domain)
3. Modelling: reuse or build an ontology
2. Produce RDF
1. Populate the ontology
2. Encode or (re)engineer in RDF - “tripliﬁcation”
3. Put it on the Web and provide services to access and query the data
1. Support URI dereferencing (Content negotiation)
2. Expose a SPARQL Endpoint
3. Describe your dataset with Linked Data (ehm …start over)
So you want to do Linked Data?

• World’s academic communities has been dealing for
years with knowledge representa%on
• Ar%ﬁcial intelligence, natural language processing, model
management, and many other research ﬁelds largely
contributed
• Some ancestors traced the way
How to represent knowledge?

EXAMPLE
• Instances are associated with one or several
classes:
Boddingtons rdf:type Ale .
Grafentrunkrdf:type Bock .
Hoegaarden rdf:type White .
Jever rdf:type Pilsner .

Ontologies
different levels of detail & complexity
Complexity
Types
Labels
Descriptions
Comments
Class
Hierarchies
Relations
Documented
meaning
Basic Logic
Rules
Inferences
Transitivity
Domain
Range
Rules
Description Logic
Reasoning
Class unions
Sets semantics
Intersections
Disjointness
[…]
light-weight heavy-weight

Copyright IKS Consortium
• A vocabulary for describing properties and classes of RDF
resources
• rdfs:Resource
• rdf:type
• rdfs:Class
• rdf:Property
• rdfs:subClassOf
• rdfs:subPropertyOf
• rdfs:domain
• rdfs:range
RDF Schema
http://www.w3.org/TR/rdf-schema/

• OWL allows to specify other axioms
• Property cardinality restric%ons
• Classes disjunc%on
• Property transi%vity
• Cardinality constraints
• But beware: more expressivity means more reasoning
complexity
The Web Ontology Language (OWL)
formal language for automated reasoning

The Web Ontology Language (OWL)
formal language for automated reasoning
:Novel rdf:type owl:Class.
:Short_Story rdf:type owl:Class.
:Poetry rdf:type owl:Class.
:Literature rdf:type owl:Class;
owl:unionOf (:Novel :Short_Story :Poetry).
<myWork> rdf:type :Novel .
<myWork> rdf:type :Literature .
IF
THEN

http://ontologydesignpatterns.org

• Schema layer of RDF
• Deﬁnes terms (classes and properties)
• Typically RDFS or OWL family
• Reusability is important for supporting interoperability
• Common vocabularies: Dublin Core, SKOS, FOAF, SIOC,
vCard, DOAP, Core Organization Ontology, VoID
Vocabularies
light-weight semantics
http://www.slideshare.net/prototypo/introduction-to-linked-data-rdf-vocabularies

!69
Vocabulary: Friend-of-a-Friend (FOAF)
defines classes and properties for representing 
information about people and their 
relationships
Soeren rdf:type foaf:Person .
Soeren currentProject http://OntoWiki.net .
Soeren foaf:homepage http://aksw.org/Soeren .
Soeren foaf:knows http://sembase.at/Tassilo .
Soeren foaf:sha1 09ac456515dee .

!70
Vocabulary: Semantically 
Interlinked Online Communities.
Represent content from Blogs, Wikis, Forums, Mailinglists, Chats
etc.

!71
Vocabulary: Simple Knowledge Organization
System (SKOS)
support the use of thesauri, classification schemes, subject 
heading systems and taxonomies

• DBpedia Ontology Schema:
• manually created for DBpedia (infoboxes)
• 1140 classes + 1149 object properties + 1741 datatype properties; >7K axioms (1537 on C, 2676 on
OP, 3264 on DTP: 1.3, 2.3, 1.8 ratios);
• (200M triples in DBpedia)
• YAGO:
• large hierarchy linking Wikipedia leaf categories to WordNet
• 250,000 classes
• UMBEL (Upper Mapping and Binding Exchange Layer):
• 20000 classes derived from OpenCyc
• DOLCE-Zero (Foundational Ontology, aligned to DBpedia):
• 76 classes + 105 object properties + 5 datatype properties; 596 axioms (196 on C, 389 on OP, 11 on
DTP: 2.4, 3.7, 2.2 ratios)
• presence of “restrictions”, top-level disjointness, and patterns
• Wikipedia Categories:
• Not a class hierarchy (e.g. cycles), represented using SKOS
• 415,000+ categories
2011/05/12
General Purpose Ontologies
(different levels of detail & complexity)

Domain Ontologies
(different levels of detail & complexity)
https://lov.linkeddata.es/dataset/lov/

1. From a Relational Database
2. From Web content (Scraping)
3. From XML or other structured data formats
4. From a data table (e.g. a CSV ﬁle)
5. From natural language (Sic!)
How to produce RDF?

• W3C R2RML - language to specify
mappings between SQL databases and
RDF: http://www.w3.org/TR/r2rml/
• D2RQ - allows to access relational
databases as virtual graphs: http://d2rq.org/
• DB2Triples - runs a speciﬁed R2RML ﬁle
and generates RDF: https://github.com/
antidot/db2triples
1. From a relational database

http://data.cnr.it/data/cnr/individuo/CNR

• RDFa and microformats are used to embed semantic
information (expressed using the RDF model) into regular
HTML pages
• RDFa does it using existing (rel) and additional
(about, property, typeof) attributes
• Microformats only use usual HTML attributes (class)
• To extract, e.g., Apache any23: https://any23.apache.org
2. From Web pages

DBpedia is the de-facto Hub of LOD.
• descrip%ons of ca. 3.4 million things (1.5 million classified in a consistent ontology,
including 312,000 persons, 413,000 places, 94,000 music albums, 49,000 films,
15,000 video games, 140,000 organizations, 146,000 species, 4,600 diseases
• labels and abstracts for these 3.2 million things in up to 92 different languages;
1,460,000 links to images and 5,543,000 links to external web pages; 
4,887,000 external links into other RDF datasets, 565,000 Wikipedia categories,
and 75,000 YAGO categories
• altogether over 1 billion pieces of informa%on (i.e. RDF triples): 257M from English
edition, 766M from other language editions
• DBpedia Live (http://live.dbpedia.org/sparql/) & 
Mappings Wiki (http://mappings.dbpedia.org) 
integrate the community into a refinement cycle

Extracting structured information from Wikipedia and make this
information available on the Web as LOD:
• link other data sets on the Web to Wikipedia data (encyclopaedic
knowledge)
• ask sophisticated queries against Wikipedia (e.g. universities in
Paris, mayors of towns in a certain region),
• Represents a community consensus
Transforming Wikipedia into a Knowledge Base

Structure in Wikipedia
• Title
• Abstract
• Infoboxes
• Geo-coordinates
• Categories
• Images
• Links
– other language versions
– other Wikipedia pages
– To the Web
– Redirects
– Disambiguations

Infobox templates
{{Infobox Korean settlement
| title = Busan Metropolitan City
| img = Busan.jpg
| imgcaption = A view of the [[Geumjeong]] district in Busan
| hangul = 부산 광역시
...
| area_km2 = 763.46
| pop = 3635389
| popyear = 2006
| mayor = Hur Nam-sik
| divs = 15 wards (Gu), 1 county (Gun)
| region = [[Yeongnam]]
| dialect = [[Gyeongsang]]
}}
http://dbpedia.org/resource/Busan
dbp:Busan dbpp:title ″Busan Metropolitan City″
dbp:Busan dbpp:hangul ″부산 광역시″@Hang
dbp:Busan dbpp:area_km2 ″763.46“^xsd:float
dbp:Busan dbpp:pop ″3635389“^xsd:int
dbp:Busan dbpp:region dbp:Yeongnam
dbp:Busan dbpp:dialect dbp:Gyeongsang
...
Wikitext-Syntax
RDF representation

2011/05/12
83
DBpedia SPARQL Endpoint
PREFIX dbo: <http://dbpedia.org/ontology/>
PREFIX dct: <http://purl.org/dc/terms/>
PREFIX dbr: <http://dbpedia.org/resource/>
PREFIX dbc: <http://dbpedia.org/resource/Category:>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX foaf: <http://xmlns.com/foaf/0.1/>
SELECT ?name ?birth ?description ?person WHERE {
?person dbo:birthPlace dbr:Berlin .
?person dct:subject dbc:German_musicians .
?person dbo:birthDate ?birth .
?person foaf:name ?name .
?person rdfs:comment ?description .
FILTER (LANG(?description) = 'en') .
} ORDER BY ?name

2011/05/12
• hosted on a OpenLink Virtuoso server
• can answer SPARQL queries like
• Give me all Sitcoms that are set in NYC?
• All tennis players from Moscow?
• All ﬁlms by Quentin Tarentino?
• All German musicians that were born in Berlin in the 19th
century?
• All soccer players with tricot number 11, playing for a club having
a stadium with over 40,000 seats and is born in a country with
over 10 million inhabitants?
DBpedia SPARQL Endpoint
http://dbpedia.org/sparql

• Two steps:
• Remodelling task
• Reengineering task
• Web APIs
• JSON: annotate with JSON-LD https://json-ld.org/
• XML
• XML != RDF
• XML serialisation of DOM (tree), RDF is a graph instead, no root.
• eXtensible Stylesheet Language Transformations (XSLT) to
generate a RDF format, e.g. N-Triples
3. From Web APIs, XML or other formats

• data.open.ac.uk is the home of The Open University LOD
• 2010, OU ﬁrst university in the UK to publish LOD.
• Collects and interlinks open data from institutional
repositories of the University, and makes it available as LD
data.open.ac.uk

Open Educational Resources
• Metadata about educational resources produced
or co-produced by The Open University
• OU/BBC Coproductions | OU podcasts |
OpenLearn | Videofinder
Scientific Production
• Metadata about scientific production of The
Open University
• Open Research Online (http://
oro.open.ac.uk/)
Social Media
• Content hosted by social media web sites.
• Metadata are extracted from public APIs and
aggregated into RDF.
• Audioboo | YouTube
Datasets
http://data.open.ac.uk
Organisational
• Data collected form internal repositories and first
made public as linked data.
• The OU's Key Information Set from Unistats |
OU People Profiles | KMi People Profiles | Open
University data XCRI-CAP 1.2 | Qualifications |
Courses | OU Planet Stories
Data from Research Projects
• Linked Data from research projects.
• Arts and Humanities Research Council project
metadata | The Listening Experience Database |
The UK Reading Experience Database | The
Reading Experience Database: DBpedia
alignments

• Two tasks: remodelling & reengineering
• Homemade recipe:
1. Find your identiﬁer(s), establish namespaces
2. Map columns to predicates, establish cell value type
(URI or Literal)
3. Iterate over the rows
4. Generate a triple for each cell
4. From a data table

• A Google Form Spreadsheet
• Prepare column names (ﬁrst row)
• Identify the Subject column (S)
• Generate a tuple for each column value (S, c, v) - G SQL
• Clean: remove tuples with empty values
• Format tuples into valid N3 triples
Example
(only reengineering)
https://docs.google.com/spreadsheets/d/
1j_LHZIOhkbD61r7fSxuf4017tgbOoL_Z6tLT0oDQz_0/edit?usp=sharing

1. Load the data into a Triple Store
• Virtuoso Open Source: virtuoso.openlinksw.com
• Apache Jena: http://jena.apache.org/
• Blazegraph: www.blazegraph.com
• https://en.wikipedia.org/wiki/Comparison_of_triplestores
2. Publish the SPARQL Endpoint
3. Setup content negotiation
• http://www.example.com/…
303 to SPARQL DESCRIBE <http://www.example.com/...>
How to publish on the Web?
(signposting only here)

Coffee break
See you at 4pm (sharp!)

• Understand triple patterns
• Try with some features of the language
Objectives

SPARQL
SPARQL Protocol And RDF Query Language

Triple and Graph Patterns
How do we describe the structure of the RDF graph
which we're interested in?

# An RDF triple in Turtle syntax
dbr:Wolfgang_Amadeus_Mozart foaf:name ?name .

# A SPARQL triple pattern, with a single variable
dbr:Wolfgang_Amadeus_Mozart foaf:name ?name .

# All parts of a triple pattern can be variables
?subject foaf:name ?name.

# Matching labels of resources
?subject rdfs:label ?label.

# Combine triples patterns to create a graph pattern
PREFIX dby: <http://dbpedia.org/class/yago/>
?subject rdfs:label ?label .
?subject rdf:type dby:WikicatOperaComposers .
# SPARQL is based on Turtle, which allows abbreviations
# e.g. predicate-object lists:
?subject rdfs:label ?label;
rdf:type dby:WikicatOperaComposers .

# Graph patterns allow us to traverse a graph
?person rdfs:label “Wolfgang Amadeus Mozart”@de .
?person dbo:deathPlace ?place .
?place dbo:populationTotal ?population .

#Graph patterns allow us to traverse a graph
?person rdfs:label “Wolfgang Amadeus Mozart”@de .

Structure of a Query
What does a basic SPARQL query look like?

# Query. 1
# Associate URIs with prefixes
# Example of a SELECT query, retrieving 2 variables
# Variables selected MUST be bound in graph pattern
SELECT ?person ?label
WHERE {
#This is our graph pattern
?person rdfs:label “Wolfgang Amadeus Mozart”@de ;
dbo:deathPlace ?place .
?place dbo:populationTotal ?population
}

• We will use this UI: http://yasgui.org/
• Credits:
Let’s try it out
http://about.yasgui.org/
http://laurensrietveld.nl/

# Query. 2
# Example of a SELECT query, retrieving all variables
SELECT *
WHERE {
?person rdfs:label “Wolfgang Amadeus Mozart”@de ;
dbo:deathPlace ?place .
}

OPTIONAL bindings
How do we allow for missing or unknown
information?

# Query. 3
SELECT ?name ?image
WHERE {
#This pattern must be bound
?person rdfs:label "Wolfgang Amadeus Mozart"@de ;
dbo:birthPlace ?place .
#Anything in this block doesn't have to be bound
OPTIONAL {
}
}

UNION queries
How do we allow for alternatives or variations in the
graph?

# Query. 4
SELECT ?person ?place
WHERE {
{
}
UNION
{
?person dbo:birthPlace ?place .
}
}

Sorting & Restrictions
How do we apply a sort order to the results?
How can we add restrictions?
How can we restrict the number of results returned?

# Query. 5
# Select the URI and population of all places
SELECT ?place ?population
WHERE {
}

# Ex. 6
# Select the URI and population of all places
# with highest first
WHERE {
}
# Use an ORDER BY clause to apply a sort.
# Can be ASC or DESC
ORDER BY DESC(?population)

# Ex. 7
# Select the URI and population of a city
# with highest first
PREFIX dbp: <http://dbpedia.org/property/>
WHERE {
FILTER EXISTS {
?place dbp:countryCode []
}
}
# Can be ASC or DESC

# Ex. 8
# Select the URI and population of the 11-20th most
populated countries
WHERE {
FILTER EXISTS {
?place dbp:countryCode []
}
}
# Limit to first ten results
LIMIT 10
# Apply an offset to get next “page”
OFFSET 10

Filtering
How do we restrict results based on aspects of the
data rather than the graph, e.g. string matching?

# In the following triple the literal has assigned a
# datatype to indicate it is a date
PREFIX xsd: <http://www.w3.org/2001/XMLSchema#>
dbr:Wolfgang_Amadeus_Mozart
dbo:birthDate "1756-1-27"^^xsd:date

# Query. 9
# Select name of persons born between 1st Jan 1756 and
1st Jan 1757
SELECT ?name
WHERE {
?person dbo:birthDate ?date;
foaf:name ?name.
FILTER (?date > "1756-01-01"^^xsd:date &&
?date < "1757-01-01"^^xsd:date)
}

# Query. 10
# Select the URI and population of places with an area
below 20km^2, with most populated first
PREFIX dbpp: <http://dbpedia.org/ontology/PopulatedPlace/>
WHERE {
?place dbo:populationTotal ?population ;
dbpp:areaTotal ?area .
# Note that we have to cast the data to the right type
# As it is not declared in the data
FILTER( xsd:double(?area) < 20 )
}

# Query. 11
# Select persons named Wolfgang
SELECT ?subject ?name
WHERE {
?subject foaf:name ?name ;
dbo:deathPlace dbr:Vienna .
FILTER( regex(?name, "Wolfgang", "i" ) )
}

• Logical: !, &&, ||
• Math: +, -, *, /
• Comparison: =, !=, >, <, ...
• Variable tests: isURI, isBlank, isLiteral, bound
• Accessors: str, lang, datatype
• Other: sameTerm, langMatches, regex
Built-In Filters

DISTINCT
How do we remove duplicate results?

# Query. 12
# Select list of places that gave birth to german
classical composers
PREFIX space: <http://purl.org/net/schemas/space/>
SELECT DISTINCT ?place
WHERE {
[] dbo:birthPlace ?place ;
dct:subject dbc:German_classical_composers
}

SPARQL Query Forms
Does SPARQL do more than just SELECT data?

ASK
Test whether the graph contains some data of
interest

# Query. 13
# Is Mozart’s date of birth 1756-1-27?
ASK WHERE {
dbr:Mozart space:launched "1756-1-27"^^xsd:date .
}
# ASK returns a boolean value

DESCRIBE
Generate an RDF description of a resource(s)

# Query. 14
# Describe persons born in 1757
DESCRIBE ?person {
?person dbp:birthDate ?date .
FILTER ( ?date < "1958-01-01"^^xsd:date &&
?date >= "1757-01-01"^^xsd:date )
}

CONSTRUCT
Create a custom RDF graph based on query criteria
Can be used to transform RDF data

@prefix exp: <http://www.example.com/property#> .
<mailto:albert.meronyo@gmail.com>
exp:timestamp "05/06/2019 14:38:16" ;
exp:xml_level "Basic" ;
exp:rdf_level "Expert" ;
exp:sparql_level "Expert" ;
exp:web_level "Expert" ;
exp:ontology_level "Expert" ;
exp:ld_publishing_level "Expert" ;
exp:ld_consumption "Expert" ;
exp:ld_ui_level "Expert" ;
exp:interested_in "Advanced Ontology Design, Linked
Data Production, Linked Data Consumption, Success
stories, Advanced Linked Data Techniques, Advanced
SPARQL" ;
exp:known_projects "JazzCats, LinkezBrainz, MIDI
Linked Data cloud, LOD Laundromat" ;
exp:plans_of_use "Yes; music, history" .

# Example
CONSTRUCT {
?person rdf:type foaf:Person
}
WHERE {
?person ex:timestamp []
}

# Example.
# Remodelling! Change the identifier
CONSTRUCT {
?person rdf:type foaf:Person ;
foaf:mbox ?mbox
?person ?predicate ?any
}
WHERE {
BIND(
CONCAT(
“http://www.ld4humanities/2019/participant/“,
MD5(str(?mbox))) AS ?person) .
?mbox ?predicate ?any
}

NAMED GRAPHS
SPARQL can query multiple RDF graphs together!

# Query. 15
# Search in multiple Graphs
SELECT
distinct ?type
FROM <http://data.open.ac.uk/context/youtube>
FROM <http://data.open.ac.uk/context/podcast>
FROM <http://data.open.ac.uk/context/openlearn>
FROM <http://data.open.ac.uk/context/course>
FROM <http://data.open.ac.uk/context/qualification>
WHERE{
[] a ?type
}

# Query. 16
# Search in multiple Graphs
SELECT
distinct ?g ?type
FROM NAMED <http://data.open.ac.uk/context/youtube>
FROM NAMED <http://data.open.ac.uk/context/podcast>
FROM NAMED <http://data.open.ac.uk/context/openlearn>
FROM NAMED <http://data.open.ac.uk/context/course>
FROM NAMED <http://data.open.ac.uk/context/
qualification>
WHERE{
GRAPH ?g { [] a ?type }
}

Videos from the Open University on YouTube.
YouTube videos are linked to courses and qualiﬁcations, which in
turn are linked to other entities (OpenLearn units, Podcasts,
Audios, and other Courses or Qualiﬁcations)
Find OU content related to a YouTube video from the YouTube
video:
https://www.youtube.com/watch?v=SYry6PYsL8o
http://data.open.ac.uk/youtube/SYry6PYsL8o
http://data.open.ac.uk
prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
prefix podcast: <http://data.open.ac.uk/podcast/ontology/>
prefix yt: <http://data.open.ac.uk/youtube/ontology/>
prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#>
prefix rkb: <http://courseware.rkbexplorer.com/ontologies/courseware#>
prefix saou: <http://data.open.ac.uk/saou/ontology#>
prefix dbp: <http://dbpedia.org/property/>
prefix media: <http://purl.org/media#>
prefix olearn: <http://data.open.ac.uk/openlearn/ontology/>
prefix mlo: <http://purl.org/net/mlo/>
prefix bazaar: <http://digitalbazaar.com/media/>
prefix schema: <http://schema.org/>
SELECT
distinct
(?related as ?identifier)
?type
?label
(str(?location) as ?link)
FROM <http://data.open.ac.uk/context/youtube>
FROM <http://data.open.ac.uk/context/podcast>
FROM <http://data.open.ac.uk/context/openlearn>
FROM <http://data.open.ac.uk/context/course>
FROM <http://data.open.ac.uk/context/qualification>
WHERE
{
?x schema:productID "SYry6PYsL8o" . # change the youtube id to any OU youtube video
?x yt:relatesToCourse ?course .
{
# related video podcasts
?related podcast:relatesToCourse ?course .
?related a podcast:VideoPodcast .
?related rdfs:label ?label .
optional { ?related bazaar:download ?location }
BIND( "VideoPodcast" as ?type ) .
} union {
# related audio podcasts
?related podcast:relatesToCourse ?course .
?related a podcast:AudioPodcast .
optional { ?related bazaar:download ?location }
BIND( "AudioPodcast" as ?type ) .
} union {
# related openlearn units
?related a olearn:OpenLearnUnit .
?related olearn:relatesToCourse ?course .
BIND( "OpenLearnUnit" as ?type ) .
?related <http://dbpedia.org/property/url> ?location .
} union {
# related qualifications (compulsory course)
?related a mlo:qualification .
?related saou:hasPathway/saou:hasStage/saou:includesCompulsoryCourse ?course .
BIND( "Qualification" as ?type ) .
?related mlo:url ?location
}
} limit 200
Content recommendation

• SPARQL1.1 W3 Recommendation
- https://www.w3.org/TR/sparql11-query/
• YASGUI SPARQL editor
– http://yasgui.org/
Useful Links

Uses data.open.ac.uk to get
content recommendations (eg:
courses).
data.open.ac.uk drives the
click through which turns
OpenLearn visitors into OU
students!
Publish once, display
everywhere (from YouTube,
Audioboo, iTunesU, Podcast)
OpenLearn
h"p://www.open.edu/openlearn/

An open and freely
searchable database that
brings together a mass of
data about people’s
experiences of listening to
music of all kinds, in any
historical period and any
culture.
Reuse from LOD
Uses data.open.ac.uk as
publishing platform.
RDF, “natively”
The Listening Experience Database Project
h"p://led.kmi.open.ac.uk/
Feedback welcome: @enridaga #kmiou

https://recogito.pelagios.org/

Semantic Scouting
http://webtemp.src.cnr.it/semanticscouting/
2009

• Most of the data is actually metadata, describes
resources, documents, people, and it is essentially
structured
• However, LD can be used to enhance content such as
text or music!
• Two case studies:
• @Albert - MIDI Linked Data Cloud
• FindLEr: ﬁnd evidence of Listening Experiences
• Hands-On
LD with content

A basic recipe:
1. Text
2. Link to a LD Graph with Named Entity Recognition (NER)
- e.g. dbpedia
3. Explore the graph to ﬁnd common nodes between
entities
4. Suggest subjects for the text
Case study: find relevant topics

Text
Senior academics and politicians have condemned UK universities for failing to tackle endemic
racism against students and staff after a Guardian investigation found widespread evidence of
discrimination in the sector.
University staff from minority backgrounds said the findings showed there was “absolute
resistance” to dealing with the problem. Responses to freedom of information (FoI) requests the
Guardian sent to 131 universities showed that students and staff made at least 996 formal
complaints of racism over the past five years.
Of these, 367 were upheld, resulting in at least 78 student suspensions or expulsions and 51 staff
suspensions, dismissals and resignations.
But even these official figures are believed to underestimate the scale of racism in higher
education, with two separate investigations by the Guardian and the Equality and Human Rights
Commission identifying hundreds more cases that were not formally investigated by universities.
Scores of black and minority ethnic students and lecturers have told the Guardian they were
dissuaded from making official complaints and either dropped their allegations or settled for an
informal resolution. They said white university staff were often reluctant toaddress racism, with
racial slurs treated as banter or an inevitable byproduct of freedom of speech, and institutional
racism poorly recognised.
https://www.theguardian.com/education/2019/jul/05/uk-universities-condemned-for-
failure-to-tackle-racism

https://www.dbpedia-spotlight.org/demo/
https://www.dbpedia-spotlight.org/demo/

<http://dbpedia.org/resource/United_Kingdom>
<http://dbpedia.org/resource/Endemism>
<http://dbpedia.org/resource/Racism>
<http://dbpedia.org/resource/Australian_Human_Rights_Commission>
<http://dbpedia.org/resource/Institutional_racism>

SPARQL query
PREFIX skos: <http://www.w3.org/2004/02/skos/core#>
PREFIX dct: <http://purl.org/dc/terms/>
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
SELECT (count(?node) as ?sc) ?obj WHERE {
?node dct:subject ?cat .
?cat skos:broader{0,5} ?obj .
VALUES (?node){
(<http://dbpedia.org/resource/United_Kingdom>)
(<http://dbpedia.org/resource/Endemism>)
(<http://dbpedia.org/resource/Racism>)
(<http://dbpedia.org/resource/Australian_Human_Rights_Commission>)
(<http://dbpedia.org/resource/Institutional_racism>)
}
}
group by ?obj
order by desc(?sc)
limit 10

• Interoperability between these repositories (how to align their ontologies and entity
names?) is usually partial
• Quality
• owl:sameAs is very rarely “same as”. See http://sameas.org
• Completeness
• Principled Low Commitment (e.g. 404, 406, …)
• How to distinguish entities and documents?
• Method on top of the “Follow your nose” approach still to be developed
• What about incoming links?
• Licences? Policies?
• Availability of open data (limited resources). Some proposals, e.g. Linked Data Fragments
• User interfaces for LD operations - not only visualisation - still missing
Open Issues

Link and Open Your Data
Scholars & Institutions in the humanities are very
good at building high quality databases (e.g. thesauri,
gazetteers) but most of them are still closed!

Some sources of inspiration …
• EUCLID Project: http://euclid-project.eu/
• Randy Connolly’s slides about Web Development: https://
www.slideshare.net/randyconnolly
• Linked Data Patterns book
• http://patterns.dataincubator.org/book/
Credits

Thank you!
@enridaga @aldogangemi

Ld4 dh tutorial

More Related Content

What's hot

Similar to Ld4 dh tutorial

More from Enrico Daga

Recently uploaded

Ld4 dh tutorial