Triplestore and SPARQL

TRIPLESTORE AND
SPARQL
Lino Valdivia Jr
04.06.2013

OUTLINE
The Semantic Web
RDF
SPARQL
Triplestores
Apache Jena
DBPedia
Conclusions
Demo1: Apache Jena API
Demo2: DBPedia

THE SEMANTIC WEB
Most of the data in the web consists of unstructured or
semi-structured data
 HTML documents
 Multimedia: images, video streams, audio files
 Meant to read and processed by humans
What if we can structure and add metadata to this “Web
of Documents”, and make them understandable by
machines?
 Metadata → meaning, or semantics
 Machines can perform new tasks that used to require human
intervention
This is the motivation behind the Semantic Web!
 The term “Semantic Web” was initially coined by Tim Berners-Lee: “a
web of data that can be processed directly and indirectly by
machines.”

THE SEMANTIC WEB
“The Semantic Web is a web of data…[it] provides a common
framework that allows data to be shared and reused across
application, enterprise, and community boundaries.”
[w3.org]
For the Semantic Web to happen, we would need
1. A way to structure and link data in a standardized way
2. A way to describe the relationships of these data in a
common way
3. A way to query that linked data
4. A way to infer something from that linked data (by
applying a set of rules)
but we will only focus on #1 and #3

RDF: A WAY TO STRUCTURE AND
LINK DATA
RDF = Resource Description Framework, a standard way
for applications to represent information that can then
be shared and processed
A resource can be anything that is identifiable: a user, a
coffee cup, a picture of your cat, a bank statement
RDF provides a way to model data by breaking it down
into three components:
The subject
The object
The predicate (aka the property).

RDF AS A GRAPH
Consider the following statement: Jordi lives in Barcelona
 Subject: Jordi
 Object: Barcelona
 Predicate: lives-in (or, to be more precise, address-city)
RDFs are typically represented as a labeled directed
graph:
 The arrow points from the subject to the object
Jordi Barcelo
na
address-
city

RDFS AND URIS
Resources must be identifiable, and RDF uses Uniform
Resource Identifier (URI) references.
E.g. Jordi = http://example.org/Jordi
URIs <> URLs!!!
RDF graphs are typically shown with the URIs for the subject, object,
and predicate:
The RDF graph can also be rewritten in text as:
<http://example.org/Jordi> <http://example.org/address-city> <http://example.org/Barcelona> .
As you may have guessed, RDF is more machine-friendly than
human-friendly!
http://...Jord
i
http://.../Barcel
ona
http://.../address
-city

RDF: RESOURCES AND
LITERALS
The object of a triple in RDF can either be a resource
(identified by URIs) or a literal (values such as strings and
numbers):
We can represent the RDF graph above as text as:
<http://example.org/Jordi> <http://example.org/address-city> <http://example.org/Barcelona> .
<http://example.org/Jordi> <http://example.org/firstname> “Jordi” .
<http://example.org/Jordi> <http://example.org/age> “37” .
This textual representation is also known as Terse RDF Triple
Language, or Turtle for short.
http://...Jord
i
http://.../Barcel
ona
http://...address-
city
“Jordi” 37
http://...agehttp://...firstna
me

RDF: PREFIXES
Prefixes can be used to simplify representations, either in
graphs:
prefix ex: http://example.org
or in Turtle:
@prefix ex:<http://example.org/> .
ex:Jordi ex:address-city ex:Barcelona .
ex:Jordi ex:firstname “Jordi” .
ex:Jordi ex:age “37” .
Now that we have a way to structure and link our data, we
want to be able to query it for information.
ex:Jordi ex:Barcelona
ex:address-city
“Jordi” 37
ex:ageex:firstname

SPARQL: A WAY TO QUERY
LINKED DATA
SPARQL = SPARQL Protocol and RDF Query Language
SPARQL 1.1 became a W3C Recommendation on March
2013!
Example: given our RDF graph, show all users who live in
Barcelona:
PREFIX ex: <http://example.com/>
SELECT ?fname
FROM <users.rdf>
WHERE {
?user ex:address-city ex:Barcelona .
?user ex:firstname ?fname .
}

SPARQL AND GRAPH
PATTERNS
The statements in the WHERE clause form a graph
pattern, which is matched against subgraphs in the RDF
graph to form the solution.
SELECT ?fname
FROM <users.rdf>
WHERE {
}
ex:Jordi
ex:Barcelon
a
ex:address-city
“Jordi
”
37
ex:ageex:firstna
me
ex:Badalon
a
ex:Josep
ex:address-city

SPARQL: THE SELECT
OPERATION
SPARQL SELECT operations also support:
FILTERs, ORDER BYs, LIMITs, and OFFSETs:
Show the names of users who live in Barcelona and are less
than 40 years old, starting from the 11th to 40th user:
SELECT ?lname ?fname
FROM <users.rdf>
WHERE {
?user ex:lastname ?lname .
?user ex:age ?age
FILTER (?age < 40)
}
ORDER BY ?lname
LIMIT 30
OFFSET 10

SPARQL: THE SELECT
OPERATION
Alternative matches using UNION, for those cases
where resources in the expected result set may match
multiple patterns:
Show the first names of users who live in Barcelona or
in Badalona:
SELECT ?fname
FROM <users.rdf>
WHERE {
{
{ ?user ex:address-city ex:Barcelona . }
UNION
{ ?user ex:address-city ex:Badalona . }
}
}

SPARQL: THE SELECT
OPERATION
OPTIONAL matches, for those cases where not all
resources in the expected result set do not have to match a
pattern:
Show the first names of users who live in Barcelona and
their profile pic image, if they have one:
SELECT ?fname ?ppic
FROM <users.rdf>
WHERE {
OPTIONAL {
?user ex:ppic ?ppic .
}
}

SPARQL: THE SELECT
OPERATION
Set inclusion (IN/NOT IN)
GROUP BY, HAVING, and aggregate functions such
as COUNT and AVG (new in SPARQL 1.1)
Subqueries (new in SPARQL 1.1)

SPARQL: OTHER OPERATIONS
Aside from SELECTs for querying, SPARQL also has
CONSTRUCT – creates a single RDF graph from the
result of a query by combining (i.e. applying set union
on) the resulting triples
ASK – returns a Boolean that indicates whether the
query is resolvable or not
DESCRIBE – returns an RDF graph that describes the
result (as determined by the query service)
INSERT/DELETE – adds or removes triples from the
graph (new in SPARQL 1.1)
Graph management operations (CREATE, DROP, COPY,
MOVE, ADD) (new in SPARQL 1.1)

TRIPLESTORES
The statements in an RDF graph (subject-predicate-object) are also
known as triples, and the specialized database used for storing
them are called triplestores.
Triplestores vs Graph Databases – What’s the diff?
Triplestores are especially designed to store RDF graphs, which
are labeled directed graphs
On the other hand, graph databases can store any kind of graph
(unlabeled, undirected, weighted, etc.)
Graph databases don’t have a standard query language (Cypher?)
Triplestores must support SPARQL
Triplestores are optimized for graph pattern matching, and may
lack the full capabilities of graph DBs
But graph databases can be used to implement a triplestore
(see Sequeda, J. (2013, January 31) Introduction to
Triplestores)

SPARQL AND CYPHER
SPARQL:
SELECT ?fname
FROM <users.rdf>
WHERE {
}
Cypher:
MATCH user–[:ex_firstname]->fname,
user-[:ex_address-city]->city
WHERE city.uri = “ex:Barcelona”
RETURN fname
ex:Jordi
ex:Barcelon
a
ex:address-city
“Jordi
”
37
ex:ageex:firstna
me

TRIPLESTORE
IMPLEMENTATIONS
Native Triplestores
Sesame
BigData
Meronymy
Apache Jena TDB
Graph DB-based
AllegroGraph
Oracle Spatial and Graph (formerly Oracle Semantic Technologies)
Relational DB-based
Apache Jena SDB
IBM DB2

APACHE JENA
Born in HP Labs in 2000, became a top-level Apache
project in April 2012
The Jena Framework includes
A Java API for working with RDF models
A SPARQL query processor
An efficient disk-based native triplestore
A rule-based inference engine that can be used with
RDF-based ontologies
A server for accepting SPARQL queries over HTTP (a
SPARQL endpoint)

APACHE JENA: RDF API
The Statement interface represents triples, while the Model
interface represents the whole RDF graph
Given a Statement, one could invoke
 getSubject(), which would return a Resource
 getPredicate(), which would return a Property
 getObject(), which would return an RDFNode (which can be a
Resource or a Literal)
To create our example basic RDF graph:
Model model = ModelFactory.createDefaultModel();
Resource j = model.createResource(“http://example.org/Jordi”);
Resource bcn = model.createResource(“http://example.org/Barcelona”);
Property addrCity = model.createProperty(“ex”, “address-city”);
// This automatically creates a Statement in the associated model.
j.addProperty(addrCity, bcn);

APACHE JENA: ARQ API
Jena also provides an API called ARQ for
programmatically executing SPARQL queries.
To execute a given query on our example graph:
String queryString = “...”;
Query query = QueryFactory.create(queryString);
// Associate a query execution context against our model.
QueryExecution qe = QueryExecutionFactory.create(query, model);
ResultSet rs = qe.execSelect();
// ResultSet acts like an Iterator.
for (; rs.hasNext();)
{
QuerySolution qs = rs.nextSolution();
RDFNode r = qs.get(“fname”); // You can get a variable by name.
// Do what you want with it.
}
// Always good to close resources when done.
qe.close();

APACHE JENA: TDB
Jena’s native triplestore implementation is called TDB and
consists of
The node table
stores resources, predicates (relationships), and literals
maps nodes to internal node ids, and vice versa
node ids are 8 bytes (64 bits) long
The triple indexes
stores 3 indexes into the node table
The prefixes table
maps prefixes to URIs
TDB also supports ACID transactions using write-ahead
logging.
But no transaction is needed if there is only one single
writer (even with multiple concurrent readers)

RDF/SPARQL IN ACTION:
DBPEDIA.ORG
DBPedia describes itself as a “crowdsourced community
effort to extract structured information from Wikipedia”
 1.89 billion triples localized in 111 languages
 English dataset contains 3.77 million topics
Imagine if you can ask Wikipedia…
 Which towns in Cataluña have a population between 10,000 and 50,000
people?
 What are the birthdays of all blues guitarists who were born in Chicago?
 (sample query from DBPedia.org wiki) Show me all soccer players who
played as goalkeeper for a club that has a stadium with more than 40,000
seats and who are born in a country with more than 10 million inhabitants
DBPedia also provides a SPARQL endpoint, so other websites
can query its data and get results that are continuously
updated
DBPedia also contains geo-coordinates obtained from other
sources (e.g. Geonames, Eurostat, CIA World Fact Book) –
this opens the possibility for location-based applications
from mobile devices

CONCLUSIONS
The Semantic Web – Web 3.0?
RDF and SPARQL are key
technologies in the W3C’s vision
of the web of tomorrow
Companies like Google, Tesco,
and Best Buy already produce
RDF content!
Add some SPARQL to your
projects!
Source:
w3.org

BIBLIOGRAPHY
Berners-Lee, T., Hendler, J., & Lassila, O. (2001, May). The Semantic Web.
http://www.scientificamerican.com/article.cfm?id=the-semantic-web
W3 Consortium. (2004, February 10). RDF Primer.
http://www.w3.org/TR/2004/REC-rdf-primer-20040210/
W3 Consortium. (2013, March 21). SPARQL 1.1 Query Language
http://www.w3.org/TR/sparql11-query/
Sequeda, J. (2013, January 31) Introduction to Triplestores
http://semanticweb.com/introduction-to-triplestores_b34996
Apache Jena
http://jena.apache.org/
DBPedia
http://dbpedia.org/

Triplestore and SPARQL

More Related Content

What's hot

Viewers also liked

Similar to Triplestore and SPARQL

Triplestore and SPARQL

Editor's Notes