Introduction to Bio SPARQL

SPARQL
SPARQL Protocol And RDF Query Language
Amrapali Zaveri, Ph.D.

What is SPARQL?
SPARQL (pronounced sparkle) stands for: SPARQL Protocol And
RDF Query Language
• SPARQL 1.0 W3C-RecommendaFon since January 15th 2008
• SPARQL 1.1 W3C-RecommendaFon since March 21st 2013
• Query language to query instances in RDF documents
• Great pracFcal importance (almost all applicaFons need it)

SPARQL Example
SELECT
*
WHERE 
{
?subject ?predicate ?object .
}
Similar to SQL !

SPARQL Components
A SPARQL query comprises, in order:
• PREFIX declaraFons, for abbreviaFng URIs
• Dataset deﬁni;on, staFng what RDF graph(s) are being queried
• A result clause, idenFfying what informaFon to return from the
query
• The query pa@ern, specifying what to query for in the
underlying dataset
• Query modiﬁers, slicing, ordering, and otherwise rearranging
query results

SPARQL Components
# prefix declara;ons
PREFIX foo: <h`p://example.com/resources/>
# dataset defini;on
FROM
# result clause
SELECT
# query pa@ern
WHERE {
...
}
# query modifiers
ORDER BY ...

SPARQL Example
SELECT * WHERE {
?book rdf:type <h`p://dbpedia.org/ontology/Book> .
}
Try it yourself: dbpedia.org/sparql/

SPARQL Example
PREFIX dbo:<h@p://dbpedia.org/ontology/>
SELECT * WHERE {
?book a dbo:Book .
?book dbo:author ?author.
}

SPARQL Example
SELECT * WHERE {
?book a dbo:Book .
} LIMIT 10

SPARQL Example
SELECT ?author WHERE {
?book a dbo:Book .
} LIMIT 10

Filter in SPARQL
• Keyword FILTER, followed by filter expression in parentheses
• Filter condiFons output truth values (and possibly errors)
• Many filter funcFons are not specified by RDF 
funcFons
• partly taken from XQuery/XPath-Standard for XML

Filter Functions
Comparison operators: <, =, >, <=, >=, !=
• Comparison of data literals according to natural order
• Support for numerical data types, xsd:dateTime, xsd:string
(alphabeFc ordering), xsd:Boolean (1>0)
• For other types and other RDF-elements, only = and != are
available
• Comparison of literals of incompaFble types (e.g. xsd:string and
xsd:integer) is not allowed
ArithmaFc operators: +, -, *, /
• Support for numerical data types
• Used to combine values in ﬁlter condiFons Ex.: FILTER(?weight/
(?size*?size)>=25)

SPARQL Example
PREFIX dbo:<h`p://dbpedia.org/ontology/>
SELECT ?author WHERE {
?book a dbo:Book .
?book dbo:numberOfPages ?pages.
FILTER (?pages > 500)
} LIMIT 10

Special Filter Functions
Other RDF-specific filter funcFons:
sameTERM(A,B) true, if A and B are the same RDF-terms.
langMATCHES(A,B) true, if the language specificaFon A fits the paèrn B
REGEX(A,B)
true, if the character string A contains the regular
expression B

SPARQL Example
PREFIX dbp:<h@p://dbpedia.org/property/>
SELECT * WHERE {
?book a dbo:Book .
?book dbp:name ?name.
FILTER (langMATCHES(LANG(?name),"en))
} LIMIT 10

Filter Functions: Boolean
operators
Filter condiFons can be linked with boolean operators: &&, ||, !
• ParFally also expressible through graph paèrn:
• ConjuncFon corresponds to specificaFons of several filters
• DisjuncFon corresponds to applicaFon of filters in alternaFve
paèrns

Filter Functions: Boolean
operators
UnFl now, only basic formaong seong for results:
• How can one retrieve deﬁned parts of the output set?
• How are the results ordered?
• Can duplicate result rows be removed instantaneously?

Result Sorting
PREFIX dbp:<h`p://dbpedia.org/property/>
SELECT * WHERE {
?book a dbo:Book .
?book dbp:name ?name.
FILTER (langMATCHES(LANG(?name),"en))
} ORDER BY ?pages LIMIT 10

Resulting Sorting
• SorFng same as with filter comparison operators
• SorFng of URIs alphabeFcally as sequence of characters
Other possible specificaFons:
• ORDER BY DESC(?price): descending
• ORDER BY ASC(?price): ascending, default seong
• ORDER BY DESC(?price), ?Ftle: hierarchical classificaFon criteria

LIMIT, OFFSET, DISTINCT
RestricFon of output set:
• LIMIT: maximal number of results (table rows)
• OFFSET: posiFon of the ﬁrst delivered result
• SELECT DISTINCT: removal of duplicate table rows
LIMIT and OFFSET only make sense with ORDER BY!

Hands-on Bio2RDF
• SPARQL endpoint
• http://sparql.openlifedata.org/
• Datasets
• http://download.bio2rdf.org/release/3/
release.html

Hands-on DIY -1
DESCRIBE
<http://bio2rdf.org/drugbank:DB00088>
The DESCRIBE query result clause allows the server to return
whatever RDF it wants that describes the given resource(s).

Hands-on Bio2RDF -2
<http://bio2rdf.org/drugbank:DB00088>
• Retrieve title, affected organism, target and group
of this Drug
TIP: View it as NTriples to construct your query!

Hands-on Bio2RDF - 3
SELECT * WHERE {
?drug a
<http://bio2rdf.org/drugbank_vocabulary:Drug>. }
• ﬁlter for drugs starting with A

Hands-on Bio2RDF -3
SELECT * WHERE {
?drug a
<http:bio2rdf.orgdrugbank_vocabulary:Drug>. }
• FILTER regex(?title, "A")

Hands-on Bio2RDF -3
SELECT * WHERE {
?drug a
<http://bio2rdf.org/drugbank_vocabulary:Drug>. }
• FILTER regex(?title, “A")
• sort them alphabetically

Hands-on DIY
• Get 100 drug-metabolizing enzymes
• Get phenotypes and genes associated with
OMIM diseases
• Retrieve unique diseases and publications
associated with the BRCA1 gene

Hands-on DIY
• Get 100 drug-metabolizing enzymes
• Get phenotypes and genes associated with
OMIM diseases
• Retrieve unique diseases and publications
associated with the BRCA1 gene
TIPS:
- View the resource as NTriples to construct your query
- Use preﬁx.cc to look up preﬁxes
- Always use LIMIT
- Start by getting results one triple at a time and build up.

Answer 1
PREFIX dv: <http://bio2rdf.org/drugbank_vocabulary:>

PREFIX dct: <http://purl.org/dc/terms/>

SELECT distinct ?enzyme_name ?drug_name

{

?s a dv:Enzyme-Relation .

?s dv:enzyme/dct:title ?enzyme_name .

?s dv:drug/dct:title ?drug_name .

}

LIMIT 100

Answer 2
PREFIX om: <http://bio2rdf.org/omim_vocabulary:>
PREFIX b: <http://bio2rdf.org/bio2rdf_vocabulary:>
SELECT * {
?s a om:Phenotype .
?s dct:title ?name .
?s om:clinical-synopsis ?cs .
?cs om:feature ?f .
?f om:x-umls/b:identiﬁer ?umls .
?f om:x-hp/dct:identiﬁer ?hp .
?s om:phenotype-map ?pm .
?pm om:gene-symbol ?gene.
} LIMIT 10

Answer 3
PREFIX om: <http://bio2rdf.org/omim_vocabulary:>
PREFIX hgnc:<http://bio2rdf.org/hgnc.symbol>
PREFIX rdfs:<http://www.w3.org/2000/01/rdf-schema#>
SELECT DISTINCT ?geneName ?name ?pubmedArticle{
?s a om:Phenotype .
?s dct:title ?name .
?s om:phenotype-map ?pm .
?pm om:gene-symbol ?gene.
?gene dct:title ?geneName.
?s om:article ?pubmed.
?pubmed dct:title ?pubmedArticle.
FILTER regex(?gene, “BRCA1") }

Summary
BASIC Structure
PREFIX
WHERE
GRAPH Pattern
simple graph pattern
{…}
OPTIONAL
UNION
FILTER
LANG
DATATYPE
REGEX
OUTPUT Format
SELECT
CONSTRUCT
ASK
DESCRIBE
MODIFIERS
ORDER BY
LIMIT
OFFSET
DISTINCT

THANK YOU!
QUESTIONS?
@AmrapaliZ
amrapali.j.zaveri@maastrichtuniversity.nl
Contact me if you are interested in doing a project
at the Institute of Data Science, Uni Maastricht!

Introduction to Bio SPARQL

More Related Content

Similar to Introduction to Bio SPARQL

More from Amrapali Zaveri, PhD

Recently uploaded

Introduction to Bio SPARQL