Triplestore and SPARQL

TRIPLESTORE AND
SPARQL
Lino Valdivia Jr
04.06.2013

OUTLINE
The Semantic Web
RDF
SPARQL
Triplestores
Apache Jena
DBPedia
Conclusions
Demo1: Apache Jena API
Demo2: DBPedia

THE SEMANTIC WEB
Most of the data in the web consists of unstructured or
semi-structured data
 HTML documents
 Multimedia: images, video streams, audio files
 Meant to read and processed by humans
What if we can structure and add metadata to this “Web
of Documents”, and make them understandable by
machines?
 Metadata → meaning, or semantics
 Machines can perform new tasks that used to require human
intervention
This is the motivation behind the Semantic Web!
 The term “Semantic Web” was initially coined by Tim Berners-Lee: “a
web of data that can be processed directly and indirectly by
machines.”

THE SEMANTIC WEB
“The Semantic Web is a web of data…[it] provides a common
framework that allows data to be shared and reused across
application, enterprise, and community boundaries.”
[w3.org]
For the Semantic Web to happen, we would need
1. A way to structure and link data in a standardized way
2. A way to describe the relationships of these data in a
common way
3. A way to query that linked data
4. A way to infer something from that linked data (by
applying a set of rules)
but we will only focus on #1 and #3

RDF: A WAY TO STRUCTURE AND
LINK DATA
RDF = Resource Description Framework, a standard way
for applications to represent information that can then
be shared and processed
A resource can be anything that is identifiable: a user, a
coffee cup, a picture of your cat, a bank statement
RDF provides a way to model data by breaking it down
into three components:
The subject
The object
The predicate (aka the property).

RDF AS A GRAPH
Consider the following statement: Jordi lives in Barcelona
 Subject: Jordi
 Object: Barcelona
 Predicate: lives-in (or, to be more precise, address-city)
RDFs are typically represented as a labeled directed
graph:
 The arrow points from the subject to the object
Jordi Barcelo
na
address-
city

RDFS AND URIS
Resources must be identifiable, and RDF uses Uniform
Resource Identifier (URI) references.
E.g. Jordi = http://example.org/Jordi
URIs <> URLs!!!
RDF graphs are typically shown with the URIs for the subject, object,
and predicate:
The RDF graph can also be rewritten in text as:
<http://example.org/Jordi> <http://example.org/address-city> <http://example.org/Barcelona> .
As you may have guessed, RDF is more machine-friendly than
human-friendly!
http://...Jord
i
http://.../Barcel
ona
http://.../address
-city

RDF: RESOURCES AND
LITERALS
The object of a triple in RDF can either be a resource
(identified by URIs) or a literal (values such as strings and
numbers):
We can represent the RDF graph above as text as:
<http://example.org/Jordi> <http://example.org/address-city> <http://example.org/Barcelona> .
<http://example.org/Jordi> <http://example.org/firstname> “Jordi” .
<http://example.org/Jordi> <http://example.org/age> “37” .
This textual representation is also known as Terse RDF Triple
Language, or Turtle for short.
http://...Jord
i
http://.../Barcel
ona
http://...address-
city
“Jordi” 37
http://...agehttp://...firstna
me

RDF: PREFIXES
Prefixes can be used to simplify representations, either in
graphs:
prefix ex: http://example.org
or in Turtle:
@prefix ex:<http://example.org/> .
ex:Jordi ex:address-city ex:Barcelona .
ex:Jordi ex:firstname “Jordi” .
ex:Jordi ex:age “37” .
Now that we have a way to structure and link our data, we
want to be able to query it for information.
ex:Jordi ex:Barcelona
ex:address-city
“Jordi” 37
ex:ageex:firstname

SPARQL: A WAY TO QUERY
LINKED DATA
SPARQL = SPARQL Protocol and RDF Query Language
SPARQL 1.1 became a W3C Recommendation on March
2013!
Example: given our RDF graph, show all users who live in
Barcelona:
PREFIX ex: <http://example.com/>
SELECT ?fname
FROM <users.rdf>
WHERE {
?user ex:address-city ex:Barcelona .
?user ex:firstname ?fname .
}

SPARQL AND GRAPH
PATTERNS
The statements in the WHERE clause form a graph
pattern, which is matched against subgraphs in the RDF
graph to form the solution.
SELECT ?fname
FROM <users.rdf>
WHERE {
}
ex:Jordi
ex:Barcelon
a
ex:address-city
“Jordi
”
37
ex:ageex:firstna
me
ex:Badalon
a
ex:Josep
ex:address-city

SPARQL: THE SELECT
OPERATION
SPARQL SELECT operations also support:
FILTERs, ORDER BYs, LIMITs, and OFFSETs:
Show the names of users who live in Barcelona and are less
than 40 years old, starting from the 11th to 40th user:
SELECT ?lname ?fname
FROM <users.rdf>
WHERE {
?user ex:lastname ?lname .
?user ex:age ?age
FILTER (?age < 40)
}
ORDER BY ?lname
LIMIT 30
OFFSET 10

SPARQL: THE SELECT
OPERATION
Alternative matches using UNION, for those cases
where resources in the expected result set may match
multiple patterns:
Show the first names of users who live in Barcelona or
in Badalona:
SELECT ?fname
FROM <users.rdf>
WHERE {
{
{ ?user ex:address-city ex:Barcelona . }
UNION
{ ?user ex:address-city ex:Badalona . }
}
}

SPARQL: THE SELECT
OPERATION
OPTIONAL matches, for those cases where not all
resources in the expected result set do not have to match a
pattern:
Show the first names of users who live in Barcelona and
their profile pic image, if they have one:
SELECT ?fname ?ppic
FROM <users.rdf>
WHERE {
OPTIONAL {
?user ex:ppic ?ppic .
}
}

SPARQL: THE SELECT
OPERATION
Set inclusion (IN/NOT IN)
GROUP BY, HAVING, and aggregate functions such
as COUNT and AVG (new in SPARQL 1.1)
Subqueries (new in SPARQL 1.1)

SPARQL: OTHER OPERATIONS
Aside from SELECTs for querying, SPARQL also has
CONSTRUCT – creates a single RDF graph from the
result of a query by combining (i.e. applying set union
on) the resulting triples
ASK – returns a Boolean that indicates whether the
query is resolvable or not
DESCRIBE – returns an RDF graph that describes the
result (as determined by the query service)
INSERT/DELETE – adds or removes triples from the
graph (new in SPARQL 1.1)
Graph management operations (CREATE, DROP, COPY,
MOVE, ADD) (new in SPARQL 1.1)

TRIPLESTORES
The statements in an RDF graph (subject-predicate-object) are also
known as triples, and the specialized database used for storing
them are called triplestores.
Triplestores vs Graph Databases – What’s the diff?
Triplestores are especially designed to store RDF graphs, which
are labeled directed graphs
On the other hand, graph databases can store any kind of graph
(unlabeled, undirected, weighted, etc.)
Graph databases don’t have a standard query language (Cypher?)
Triplestores must support SPARQL
Triplestores are optimized for graph pattern matching, and may
lack the full capabilities of graph DBs
But graph databases can be used to implement a triplestore
(see Sequeda, J. (2013, January 31) Introduction to
Triplestores)

SPARQL AND CYPHER
SPARQL:
SELECT ?fname
FROM <users.rdf>
WHERE {
}
Cypher:
MATCH user–[:ex_firstname]->fname,
user-[:ex_address-city]->city
WHERE city.uri = “ex:Barcelona”
RETURN fname
ex:Jordi
ex:Barcelon
a
ex:address-city
“Jordi
”
37
ex:ageex:firstna
me

TRIPLESTORE
IMPLEMENTATIONS
Native Triplestores
Sesame
BigData
Meronymy
Apache Jena TDB
Graph DB-based
AllegroGraph
Oracle Spatial and Graph (formerly Oracle Semantic Technologies)
Relational DB-based
Apache Jena SDB
IBM DB2

APACHE JENA
Born in HP Labs in 2000, became a top-level Apache
project in April 2012
The Jena Framework includes
A Java API for working with RDF models
A SPARQL query processor
An efficient disk-based native triplestore
A rule-based inference engine that can be used with
RDF-based ontologies
A server for accepting SPARQL queries over HTTP (a
SPARQL endpoint)

APACHE JENA: RDF API
The Statement interface represents triples, while the Model
interface represents the whole RDF graph
Given a Statement, one could invoke
 getSubject(), which would return a Resource
 getPredicate(), which would return a Property
 getObject(), which would return an RDFNode (which can be a
Resource or a Literal)
To create our example basic RDF graph:
Model model = ModelFactory.createDefaultModel();
Resource j = model.createResource(“http://example.org/Jordi”);
Resource bcn = model.createResource(“http://example.org/Barcelona”);
Property addrCity = model.createProperty(“ex”, “address-city”);
// This automatically creates a Statement in the associated model.
j.addProperty(addrCity, bcn);

APACHE JENA: ARQ API
Jena also provides an API called ARQ for
programmatically executing SPARQL queries.
To execute a given query on our example graph:
String queryString = “...”;
Query query = QueryFactory.create(queryString);
// Associate a query execution context against our model.
QueryExecution qe = QueryExecutionFactory.create(query, model);
ResultSet rs = qe.execSelect();
// ResultSet acts like an Iterator.
for (; rs.hasNext();)
{
QuerySolution qs = rs.nextSolution();
RDFNode r = qs.get(“fname”); // You can get a variable by name.
// Do what you want with it.
}
// Always good to close resources when done.
qe.close();

APACHE JENA: TDB
Jena’s native triplestore implementation is called TDB and
consists of
The node table
stores resources, predicates (relationships), and literals
maps nodes to internal node ids, and vice versa
node ids are 8 bytes (64 bits) long
The triple indexes
stores 3 indexes into the node table
The prefixes table
maps prefixes to URIs
TDB also supports ACID transactions using write-ahead
logging.
But no transaction is needed if there is only one single
writer (even with multiple concurrent readers)

RDF/SPARQL IN ACTION:
DBPEDIA.ORG
DBPedia describes itself as a “crowdsourced community
effort to extract structured information from Wikipedia”
 1.89 billion triples localized in 111 languages
 English dataset contains 3.77 million topics
Imagine if you can ask Wikipedia…
 Which towns in Cataluña have a population between 10,000 and 50,000
people?
 What are the birthdays of all blues guitarists who were born in Chicago?
 (sample query from DBPedia.org wiki) Show me all soccer players who
played as goalkeeper for a club that has a stadium with more than 40,000
seats and who are born in a country with more than 10 million inhabitants
DBPedia also provides a SPARQL endpoint, so other websites
can query its data and get results that are continuously
updated
DBPedia also contains geo-coordinates obtained from other
sources (e.g. Geonames, Eurostat, CIA World Fact Book) –
this opens the possibility for location-based applications
from mobile devices

CONCLUSIONS
The Semantic Web – Web 3.0?
RDF and SPARQL are key
technologies in the W3C’s vision
of the web of tomorrow
Companies like Google, Tesco,
and Best Buy already produce
RDF content!
Add some SPARQL to your
projects!
Source:
w3.org

BIBLIOGRAPHY
Berners-Lee, T., Hendler, J., & Lassila, O. (2001, May). The Semantic Web.
http://www.scientificamerican.com/article.cfm?id=the-semantic-web
W3 Consortium. (2004, February 10). RDF Primer.
http://www.w3.org/TR/2004/REC-rdf-primer-20040210/
W3 Consortium. (2013, March 21). SPARQL 1.1 Query Language
http://www.w3.org/TR/sparql11-query/
Sequeda, J. (2013, January 31) Introduction to Triplestores
http://semanticweb.com/introduction-to-triplestores_b34996
Apache Jena
http://jena.apache.org/
DBPedia
http://dbpedia.org/

Triplestore and SPARQL

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Viewers also liked

Viewers also liked (18)

Similar to Triplestore and SPARQL

Similar to Triplestore and SPARQL (20)

Triplestore and SPARQL

Editor's Notes