Introduction to Bio	Ontologies
and The	Semantic Web
M.	Devisscher
Biological Databases
Overview
• Bio	ontologies
• Semantic technologies
• SPARQL	in	practice
Introduction
• Ontologies:	what are	ontologies ?
• Ontologies in	the	bio	domain:	OBO	Foundry
• Ontologies in	the	semantic web
• OBO
• RDF,	IRI,	TTL,	SPARQL,	OWL
What is	an ontology ?
• Ontology =	a	specification of	a	
conceptualization (Gruber 1993)
• In	practice:	controlled vocabularies
– Disambiguation (e.g.	Bank,	Running)
– Language/species	independence
• Very useful in	biology – complex	hierarchies of	
terms
Ontologies in	the	bio	Domain
• OBO	Foundry - open	Biological and
Biomedical Ontologies
• Common	principles
• List	of	ontologies at	
http://www.obofoundry.org
• OBO	is	also a	data	format	.obo
SideTrack – The	Gene	Ontology
• The	mother of	bio-ontologies:	the	GO
– Oldest bio	– ontology
– Many practical	applications:
• Cross	species	studies
• Overrepresentation studies	(RNASeq)
• GO	is	an OBO	ontology
SideTrack – The	Gene	Ontology
• Collection	of	terms
SideTrack – The	Gene	Ontology
• Relationships between terms:
– Subsumption:	is_a
– Partonomic:	part_of
• These	terms are	transitive
• Terms form	a	DAG	(directed,	acyclic graph)
• Some information	can be inferred
SideTrack – The	Gene	Ontology
SideTrack – The	Gene	Ontology
SideTrack – The	Gene	Ontology
• Know more:	www.geneontology.org
• AMIGO	:	the	GO	browser
Gene	Ontology	Annotation
• Gene	ontology	annotations	GOA	=	entities	
labeled	with	GO	terms
– E.g.	Uniprot-GOA
Semantic Technologies
• The	semantic web:	Tim	Berners Lee	et	al,	
Scientific American	2001
Semantic Technologies
• W3C:	a	set	of	specifications
http://www.w3.org/standards/semanticweb/
• A	mature toolset
– Dedicated data	formats
– Storage
– Query	language
Resource	Description	Framework
• A	standard	model	for data	interchange on	the
(semantic)	web
• Basic	data	element	=	a	Triple
– A	mini	sentence
– Contains three Terms:
• Subject	Predicate Object
• Representation of	triples
– Basic	data	format:	RDF/XML
– All data	expressed in	RDF	(Resource	Description
Framework)
– Several compatible	syntaxes:	TTL	(Terse Triple	
Language)	most	human	readable
Resource	Description	Framework
Example
The	Turtle Syntax
• Basic	Triple
<http://bioinformatics.be/entities#martijn>
<http://bioinformatics.be/relations#has_favorite_beer>
<http://bioinformatics.be/entities#karmeliet>.
The	Turtle Syntax
• Prefix
@prefix b4x: <http:bioinformatics.be/terms#>
b4x:martijn b4x:has_favorite_beer b4x:karmeliet.
The	Turtle Syntax
• Predicate lists
@prefix b4x: <http:bioinformatics.be/terms#> .
@prefix foaf: <http://xmlns.com/foaf/0.1/> .
b4x:martijn b4x:has_favorite_beer b4x:karmeliet;
foaf:name “Martijn Devisscher”.
The	Turtle Syntax
• Object	lists
@prefix b4x: <http:bioinformatics.be/terms#> .
@prefix foaf: <http://xmlns.com/foaf/0.1/> .
b4x:martijn b4x:has_favorite_beer b4x:karmeliet,
b4x:chimay_blauw;
foaf:name “Martijn Devisscher”.
IRI’s and Literals
• Terms can be either IRI’s,	Literals or	blank	nodes
• IRI	= Internationalized Resource	Identifier
• Unique	id – a	virtual	URI
– Example:	<http://bioinformatics.be/terms#martijn>
– There is	no	requirement for resolving
– Now:	Open	Data	initiatives:	please do	use resolvable
URI’s http://linkeddata.org
– Unique	identifiers can be registered on	
http://identifiers.org
Introduction
• Literals:	can be typed,	allowed types	from the	
XSD	namespace:
– E.g.	“This is	a	string	example”^^xsd:string
– E.g.	“5”^^xsd:integer
• IRI’s are	used for entities and attributes
• Literals are	used for attribute values that
aren’t entities
The	Turtle Syntax
• Typed literals
@prefix b4x: <http:bioinformatics.be/terms#> .
@prefix foaf: <http://xmlns.com/foaf/0.1/> .
@prefix xsd: <http://www.w3.org/2001/XMLSchema#> .
b4x:martijn b4x:has_favorite_beer b4x:karmeliet,
b4x:chimay_blauw;
b4x:length “184”^^xsd:integer;
foaf:name “Martijn Devisscher”^^xsd:string.
The	Turtle Syntax
• Blank	nodes
@prefix b4x: <http:bioinformatics.be/terms#> .
@prefix foaf: <http://xmlns.com/foaf/0.1/> .
@prefix xsd: <http://www.w3.org/2001/XMLSchema#> .
b4x:martijn b4x:has_favorite_beer b4x:karmeliet,
b4x:chimay_blauw;
b4x:length “184”^^xsd:integer;
foaf:name “Martijn Devisscher”^^xsd:string;
b4x:owns_cat [ b4x:color “Gray” ].
Classes	and Individuals
• rdf:type
@prefix b4x: <http:bioinformatics.be/terms#> .
@prefix foaf: <http://xmlns.com/foaf/0.1/> .
b4x:martijn rdf:type foaf:Person.
Classes	and Individuals
• Shorthand:	a
@prefix b4x: <http:bioinformatics.be/terms#> .
@prefix foaf: <http://xmlns.com/foaf/0.1/> .
b4x:martijn a foaf:Person;
foaf:knows b4x:geert.
b4x:geert a foaf:Person.
Example
<http://xmpl/entities#martijn>
<http://xmpl/relations#has_favorite_beer>
<http://xmpl/entities#karmeliet>.
Semantic Technologies
• Sets	of	triples form	a	Graph
Graphs
• Triples are	building	blocks of	Graphs
• Combining sets	of	triples allows the	
construction of	arbitrarily complex	graphs
b4x:martijn b4x:karmeliethas_favorite_beer
Graph	of	graphs
• Combining	RDF	datasets	can	be	considered	a	
slightly	bigger	graph,	a	knowledge	cloud
• Key	is	interoperability:	same	format	used	for	
disclosing	information,	independent	of	
backend
RDF	dataset	examples
• The	LOD	cloud	http://lod-cloud.net :	overview	
of	interlinked,	RDF	compatible	datasources
RDF	dataset	examples
• Closer	to	home
https://data.stad.gent/devzone/docs/linked-
open-data
Add meaning !
• Reuse terms from existing,	well	defined
vocabularies – ontologies (foaf,	dc,	go,	so)
• Describe new	terms =	Ontologies
• Contain
– A	crisp	human	definition
– Some machine	readable facts
Metadata
• Ontologies are	also described in	RDF
– RDFS:	RDF	- Schema
– OWL:	Web	Ontology Language
– Also expressed in	RDF
• For	clarity,	file	extension	can be .rdfs or	.owl
RDFS	Essentials
• Descriptions
– rdfs:label
– rdfs:comment
RDFS
• Relationships between properties,	classes
– rdfs:Class
– rdfs:subClassOf
– rdf:Property
– rdfs:subPropertyOf
– rdfs:range
– rdfs:domain
RDFS:	Example
@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#>.
@prefix foaf: <http://xmlns.com/foaf/0.1/> .
@prefix xsd: <http://www.w3.org/2001/XMLSchema#> .
b4x:karmeliet a b4x:Tripel .
b4x:Beer a rdfs:Class .
b4x:Tripel a rdfs:Class .
b4x:Tripel rdfs:subClassOf b4x:Beer .
b4x:has_favorite_beer a rdf:Property ;
rdfs:domain foaf:Person ;
rdfs:range b4x:Beer .
b4x:Beer rdfs:subClassOf b4x:Drink .
Analogy
• RDF	=	database	=	data
• RDFS/OWL	=	schema	=	metadata
• Both	are	described in	RDF,	but	have	a	different	
scope
Semantic Technologies
• Inference
– Enhance dataset	using knowledge from metadata
(e.g.	rdfs,	owl)
• Types	of	inference engines
– RDFS	inference
• RDFS	entailment regime
– OWL	inference
• Under	active research
• Engines	exist for specific subsets of	OWL	(OWL-DL)
RDFS	Entailment
RDFS:	Inference
b4x:kevin	b4x:has_favorite_beer	b4x:stella
Q:	What can we	infer from this using RDFS	
entailment ?
RDFS:	Inference
b4x:kevin	b4x:has_favorite_beer	b4x:stella
Inferred triples:
b4x:kevin	a	foaf:Person [from domain]
b4x:stella	a	b4x:Beer	[from range]
b4x:stella	a	b4x:Drink	[from subClassOf]
DuckTyping
• Watch	out	with inference !
Example:	You want	to express that people can
have	lengths
b4x:length a rdf:Property;
rdfs:domain foaf:Person;
rdfs:range xsd:integer.
DuckTyping
• Problem:
ex:VW_Transporter b4x:length “600”^xsd:integer.
• Would infer that VW_Transporter is	a	Person	!
• This is	called DuckTyping
If	it	looks	like	a	duck,	swims	like	a	duck,	and	
quacks	like	a	duck,	then	it	probably	is	a	duck
Task
• Find	a	solution:	express	in	rdfs that	people	can	
have	lengths
Task
• Find	a	solution:	express	in	rdfs that	people	can	
have	lengths
b4x:havingLenght a rdfs:Class.
b4x:length a rdf:Property;
rdfs:domain b4x:havingLength;
rdfs:range xsd:integer.
foaf:Person rdfs:subClassOf b4x:havingLength.
Storing	RDF
• As	an RDF	file	for download
• In	a	Triplestore
– Database	optimised for storing	triples
– Examples:	BlazeGraph,	Fuseki,	Sesame
Semantic Technologies
• Querying over	RDF	data:	SPARQL
• Cool	features:
– Distributed	querying =	actual distribution of	data	
and computing	resources
– SPARQL/Update:	modify data
• SPARQL	endpoints:	SPARQL	over	HTTP
SPARQL	Query	Syntax
• First	example:
SELECT ?subject ?predicate ?object WHERE {
?subject ?predicate ?object.
}
(Generally	not a	good idea as	it will pull	down	
the	whole dataset)
Binding	variables
Graph matching
?
SELECT ?person WHERE {
?person b4x:has_favorite_beer b4x:karmeliet
}
?
SPARQL	Query	Syntax
• Limit	result size :
SELECT ?subject ?predicate ?object WHERE {
?subject ?predicate ?object.
} LIMIT 10
SPARQL	Query	Syntax
• Find all classes:
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
SELECT ?class ?label WHERE {
?class a rdfs:Class.
?class rdfs:label ?label.
}
(This will only retrieve classes	that have	a	label)
SPARQL	Query	Syntax
• Find all classes:
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
SELECT ?class ?label WHERE {
?class a rdfs:Class.
OPTIONAL {
?class rdfs:label ?label.
}
}
SPARQL	Query	Syntax
• Find all classes	that contain “duck”	in	the	
label:
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
SELECT ?class ?label WHERE {
?class a rdfs:Class.
?class rdfs:label ?label.
FILTER( CONTAINS (str(?label) , “duck” ) )
}
SPARQL	Query	Syntax
• Make	it case	insensitive:
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
SELECT ?class ?label WHERE {
?class a rdfs:Class.
?class rdfs:label ?label.
FILTER( CONTAINS ( UCASE(str(?label)) , “DUCK” ) )
}
SPARQL	Query	Syntax
• Search	in	specific graph:
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
SELECT ?class ?label
FROM <http://example.org/animals>
WHERE {
?class a rdfs:Class.
?class rdfs:label ?label.
FILTER( CONTAINS ( UCASE(str(?label)) , “DUCK” ) )
}
SPARQL	Query	Syntax
• Search	in	specific graph:
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
SELECT ?class ?label WHERE {
GRAPH <http://example.org/animals> {
?class a rdfs:Class.
?class rdfs:label ?label.
FILTER( CONTAINS ( UCASE(str(?label)) , “DUCK” ) )
}
}
SPARQL	Query	Syntax
• Can also search	for graphs :
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
SELECT ?g WHERE {
GRAPH ?g {
?class a rdfs:Class.
?class rdfs:label ?label.
FILTER( CONTAINS ( UCASE(str(?label)) , “DUCK” ) )
}
}
Summary:	Querying RDF	data
RDF	Data
Inference
Engine
RDFS/OWL
RDF	Data
Inferred
SPARQL
Endpoint
• Basic data element = a Triple
– A mini sentence
– Contains three Terms:
– Subject Predicate Object
• Example:
<http://xmpl/entities#martijn>
<http://xmpl/relations#has_favorite_beer>
<http://xmpl/entities#karmeliet>.
Take	home	Summary
• Combine triples to represent
knowledge
• Use terms from ONTOLOGIES
– COMMON VOCABULARIES
– POSSIBLE TO INFER
MEANING
• OMIABIS
• OBIB
• SNOMED/ICD
• MESH
?
• SPARQL searches for patterns
?
Interoperability between OBO	and
Semantic Technologies
• Originated from two separate	academic worlds
• Computing	applications of	OBO	mainly
consistency checking and overrepresentation
analysis
• Semantic Technologies:	much broader toolset
• Interoperability ?
– Direct	offering in	both formats
– Automated mappings
• Migration	towards semantic toolkits
Where to find ontologies
• OBO	Foundry
• Bioportal;	NCBO
• Biogateway
• Bio2RDF
Where to find RDF	data
• Google	for SPARQL	endpoint
• =>	e.g.	EBI	databases
• Non	biological:	DBpedia
How	about Tim	Berners Lee’s vision
• We’re not there yet,	but	for bio	data	we’re
getting quite close
– The	explicitome
– Crowd sourcing
– Nanopublications
SPARQL	REFERENCE
http://www.w3.org/TR/sparql11-overview/
Recap:
SPARQL	in	11	minutes
SPARQL	:	Recap
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
SELECT ?label
FROM <http://graphName> WHERE {
?x rdfs:label ?label.
FILTER ( CONTAINS(?label, “dimethylalinine”) )
} LIMIT 10 ORDER BY ?label
SPARQL	:	Recap
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
SELECT ?label
FROM <http://graphName> WHERE {
?x rdfs:label ?label.
FILTER ( CONTAINS(?label, “dimethylalinine”) )
} LIMIT 10 ORDER BY ?label
• FIND	the	pattern	?x rdfs:label ?label.
SPARQL	:	Recap
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
SELECT ?label
FROM <http://graphName> WHERE {
?x rdfs:label ?label.
FILTER ( CONTAINS(?label, “dimethylalinine”) )
} LIMIT 10 ORDER BY ?label
• FIND	the	pattern	?x rdfs:label ?label.
• BIND	variables	?label,	?x
SPARQL	:	Recap
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
SELECT ?label
FROM <http://graphName> WHERE {
?x rdfs:label ?label.
FILTER ( CONTAINS(?label, “dimethylalinine”) )
} LIMIT 10 ORDER BY ?label
• FIND	the	pattern	?x rdfs:label ?label.
• BIND	variables	?label,	?x
• RETRIEVE variable	?label
SPARQL	:	Recap
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
SELECT ?label
FROM <http://graphName> WHERE {
?x rdfs:label ?label.
FILTER ( CONTAINS(?label, “dimethylalinine”) )
} LIMIT 10 ORDER BY ?label
• FIND	the	pattern	?x rdfs:label ?label.
• BIND	variables	?label,	?x
• RETRIEVE	variable	?label
• PREFIX:	replace	rdfs:label by	<http://www.w3.org/2000/01/rdf-schema#label>
SPARQL	:	Recap
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
SELECT ?label
FROM <http://graphName> WHERE {
?x rdfs:label ?label.
FILTER ( CONTAINS(?label, “dimethylalinine”) )
} LIMIT 10 ORDER BY ?label
• FIND	the	pattern	?x rdfs:label ?label.
• BIND	variables	?label,	?x
• RETRIEVE	variable	?label
• PREFIX:	replace	rdfs:label by	<http://www.w3.org/2000/01/rdf-schema#>
• FILTER results	to	labels	containing	“dimethylalinine”
SPARQL	:	Recap
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
SELECT ?label
FROM <http://graphName> WHERE {
?x rdfs:label ?label.
FILTER ( CONTAINS(?label, “dimethylalinine”) )
} LIMIT 10 ORDER BY ?label
• FIND	the	pattern	?x rdfs:label ?label.
• BIND	variables	?label,	?x
• RETRIEVE	variable	?label
• PREFIX:	replace	rdfs:label by	<http://www.w3.org/2000/01/rdf-schema#>
• FILTER	results	to	labels	containing	“dimethylalinine”
• LIMIT	results	to	first	10	matches	ordered	by	label
SPARQL	:	Recap
DESCRIBE
<http://rdf.wikipathways.org/Pathway/WP1425_r74390/WP/Interaction/e077e>
• Useful	short	query	to	get	direct	links	from/to	a	
given	node
Running	SPARQL
• From	a	web	interface
• From	a	web	interface
• Using	http
– HTTP	GET
– HTTP	POST	:	for	larger	query	strings
– Headers	determine	response	type	(JSON,	XML,	HTML)
http://…/sparql?default-graph-uri=<http://graphName>&query=URLENCODEDQUERYSTRING
Running	SPARQL
BIO-ONTOLOGIES
BioPortal
Access
• From	the	web	interface	!
• SPARQL	endpoint:	using	API	key;	on	request	
• Running	a	local	copy:	download	VM	image;	on	
request
Exercises
• Find	a	term
• Find	ontologies	containing	a	term
• Browse	some	ontologies
• Check	the	NCBO	annotator	!
BIO-DATA
EBI	RDF	Resources
EBI	RDF	Resources
Ensembl
gene
transcript
exon
ordered	part
sio:SIO_001261
obo:SO_0000147
obo:SO_0000234
obo:SO_transcribed_from
faldo:location
obo:SO_0001217
obo:SO_has_part
location
faldo:location
location
faldo:location
location
obo:SO_translates_to
sio:SIO_000300
rank
translation
id
id
id
id
synonym
skos:altLabelxref
Simplified
Exercise
• From	uniprot find	proteins	that	are	annotated	
with	a	given	Gene	Ontology	term
PREFIX up:<http://purl.uniprot.org/core/>
PREFIX taxon:<http://purl.uniprot.org/taxonomy/>
PREFIX rdfs:<http://www.w3.org/2000/01/rdf-schema#>
PREFIX obo:<http://purl.obolibrary.org/obo/>
SELECT * WHERE {
?protein up:classifiedWith obo:GO_0004499.
?protein up:organism taxon:9606.
}
http://sparql.uniprot.org
Exercise
• From	Expression	Atlas	find	proteins	that	are	
differentially	expressed	(P	<	1e-12)	in	Crohn’s
disease
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX owl: <http://www.w3.org/2002/07/owl#>
PREFIX dcterms: <http://purl.org/dc/terms/>
PREFIX obo: <http://purl.obolibrary.org/obo/>
PREFIX sio: <http://semanticscience.org/resource/>
PREFIX efo: <http://www.ebi.ac.uk/efo/>
PREFIX atlas: <http://rdf.ebi.ac.uk/resource/atlas/>
PREFIX atlasterms: <http://rdf.ebi.ac.uk/terms/atlas/>
PREFIX up:<http://purl.uniprot.org/core/>
PREFIX biopax3:<http://www.biopax.org/release/biopax-level3.owl#>
SELECT distinct ?protein ?expressionValue ?pvalue WHERE {
?factor rdf:type efo:EFO_0000384 .
?value atlasterms:hasFactorValue ?factor .
?value atlasterms:isMeasurementOf ?probe .
?value atlasterms:pValue ?pvalue .
?value rdfs:label ?expressionValue .
?probe atlasterms:dbXref ?protein .
FILTER ( ?pvalue < 1e-12 )
FILTER ( strstarts(str(?protein),"http://purl.uniprot.org/uniprot/") )
}ORDER BY ASC (?pvalue)
https://www.ebi.ac.uk/rdf/services/atlas/sparql
• Links	pathways	with	genes,	terms	from	
Pathway,	Cell	line	and	Disease	ontology,	
PubMed	references
• Models	individual	Interactions
• Can	be	downloaded	as	RDF
• Has	an	experimental	SPARQL	endpoint
WikiPathways
• Define	a	query	to	find	pathways	linked	to	
TNFalpha gene
Exercise
PREFIX wp: <http://vocabularies.wikipathways.org/wp#>
PREFIX dc: <http://purl.org/dc/elements/1.1/>
PREFIX dcterms: <http://purl.org/dc/terms/>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
SELECT DISTINCT ?PathwayName where {
?geneProduct a wp:GeneProduct .
?geneProduct dc:identifier ?GeneID .
?geneProduct dcterms:isPartOf ?pathway .
?geneProduct rdfs:label ?geneName .
?pathway dc:identifier ?pathwayid .
?pathway dc:title ?PathwayName .
FILTER(str(?geneName) = "TNFalpha" )
}
http://sparql.wikipathways.org
• Try	this,	or	another	query
– Using	web	interface
– Using	http	get
• Define	a	simple	describe
• Use	a	web	tool	to	URLEncode the	query
• Submit	query	as	a	URL	parameter
Exercise
DisGeNet
Phenotype
GDA
gene
sio:SIO_001121 ncit:C7057
sio:SIO_010056
sio:SIO_000628
id
id
skos:exactMatch
Mesh
DiseaseClass
sio:SIO_000628
HPO
score
ncit:C25338
ncit:C16612
sio:SIO_000628
id
skos:exactMatch
Simplified
• Find	diseases	linked	to	BRCA1
Exercise
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX owl: <http://www.w3.org/2002/07/owl#>
PREFIX xsd: <http://www.w3.org/2001/XMLSchema#>
PREFIX dcterms: <http://purl.org/dc/terms/>
PREFIX foaf: <http://xmlns.com/foaf/0.1/>
PREFIX skos: <http://www.w3.org/2004/02/skos/core#>
PREFIX void: <http://rdfs.org/ns/void#>
PREFIX sio: <http://semanticscience.org/resource/>
PREFIX ncit: <http://ncicb.nci.nih.gov/xml/owl/EVS/Thesaurus.owl#>
PREFIX up: <http://purl.uniprot.org/core/>
SELECT DISTINCT ?disease WHERE {
?gda a sio:SIO_000983.
?gda sio:SIO_000628 ?disease.
?disease a ncit:C7057.
?gda sio:SIO_000628 ?gene.
?gene a ncit:C16612.
?gene skos:exactMatch <http://identifiers.org/hgnc.symbol/BRCA1>}
http://rdf.disgenet.org/lodestar/sparql
• Yields	no	results
????
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX owl: <http://www.w3.org/2002/07/owl#>
PREFIX xsd: <http://www.w3.org/2001/XMLSchema#>
PREFIX dcterms: <http://purl.org/dc/terms/>
PREFIX foaf: <http://xmlns.com/foaf/0.1/>
PREFIX skos: <http://www.w3.org/2004/02/skos/core#>
PREFIX void: <http://rdfs.org/ns/void#>
PREFIX sio: <http://semanticscience.org/resource/>
PREFIX ncit: <http://ncicb.nci.nih.gov/xml/owl/EVS/Thesaurus.owl#>
PREFIX up: <http://purl.uniprot.org/core/>
SELECT DISTINCT ?disease WHERE {
?gda a sio:SIO_000983.
?gda sio:SIO_000628 ?disease.
?disease a ncit:C7057.
?gda sio:SIO_000628 ?gene.
?gene a ncit:C16612.
?gene skos:exactMatch <http://identifiers.org/hgnc.symbol/BRCA1>}
http://rdf.disgenet.org/lodestar/sparql
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX owl: <http://www.w3.org/2002/07/owl#>
PREFIX xsd: <http://www.w3.org/2001/XMLSchema#>
PREFIX dcterms: <http://purl.org/dc/terms/>
PREFIX foaf: <http://xmlns.com/foaf/0.1/>
PREFIX skos: <http://www.w3.org/2004/02/skos/core#>
PREFIX void: <http://rdfs.org/ns/void#>
PREFIX sio: <http://semanticscience.org/resource/>
PREFIX ncit: <http://ncicb.nci.nih.gov/xml/owl/EVS/Thesaurus.owl#>
PREFIX up: <http://purl.uniprot.org/core/>
SELECT DISTINCT ?disease WHERE {
?gda a [(rdfs:subClassOf)* sio:SIO_000983].
?gda sio:SIO_000628 ?disease.
?disease a ncit:C7057.
?gda sio:SIO_000628 ?gene.
?gene a ncit:C16612.
?gene skos:exactMatch <http://identifiers.org/hgnc.symbol/BRCA1>}
http://rdf.disgenet.org/lodestar/sparql
• Inference	cannot	be	assumed	on	a	SPARQL	
endpoint	=>	take	care	with	defining	queries
Why	?
• Define	a	query	to	find	genes	with	important	
link	to	Crohn’s	disease	(score	>	0.35)
Exercise
PREFIX xsd: <http://www.w3.org/2001/XMLSchema#>
PREFIX dcterms: <http://purl.org/dc/terms/>
PREFIX foaf: <http://xmlns.com/foaf/0.1/>
PREFIX skos: <http://www.w3.org/2004/02/skos/core#>
PREFIX void: <http://rdfs.org/ns/void#>
PREFIX sio: <http://semanticscience.org/resource/>
PREFIX ncit: <http://ncicb.nci.nih.gov/xml/owl/EVS/Thesaurus.owl#>
PREFIX up: <http://purl.uniprot.org/core/>
SELECT DISTINCT ?gene WHERE {
?gda sio:SIO_000628 ?gene,?disease .
?gene a ncit:C16612 .
?gene skos:exactMatch ?GeneID .
?disease a ncit:C7057 .
?disease dcterms:title ?DiseaseName .
?gda sio:SIO_000216 ?scoreIRI .
?scoreIRI sio:SIO_000300 ?score .
FILTER (?score > "0.35"^^xsd:decimal)
FILTER (contains(str(?DiseaseName),"Crohn"))
}
http://rdf.disgenet.org/lodestar/sparql
neXtProt
• Define	a	query	to	find	proteins	related	with	
Cardio	diseases
• Define	a	query	to	find	the	genomic	location	of	
gene	“TP53”
Exercise
select distinct ?id where {
?entry skos:exactMatch ?id.
?entry :isoform ?isoform.
?isoform :medical ?medical_annotation.
?medical_annotation :term ?term.
?term :related ?disease.
?disease a :MeshCv.
?disease rdfs:label ?label.
FILTER(CONTAINS(?label,"Cardio")).
}
https://snorql.nextprot.org/
select ?chrom ?start ?end where
{
?gene rdf:type :Gene.
?gene :name ?name.
?gene :chromosome ?chrom.
?gene :begin ?start.
?gene :end ?end.
FILTER (str(?name) = "TP53")
}
https://snorql.nextprot.org/
• Federated	querying:	include	data	from	
another	endpoint	using	the	SERVICE	keyword
• Example:	find	pathways	(from	wikipathways)	
involving	gene	linked	to	Crohn’s	disease	(from	
disgenet)
SPARQL	and	federated	queries
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX owl: <http://www.w3.org/2002/07/owl#>
PREFIX xsd: <http://www.w3.org/2001/XMLSchema#>
PREFIX dcterms: <http://purl.org/dc/terms/>
PREFIX foaf: <http://xmlns.com/foaf/0.1/>
PREFIX skos: <http://www.w3.org/2004/02/skos/core#>
PREFIX void: <http://rdfs.org/ns/void#>
PREFIX sio: <http://semanticscience.org/resource/>
PREFIX ncit: <http://ncicb.nci.nih.gov/xml/owl/EVS/Thesaurus.owl#>
PREFIX up: <http://purl.uniprot.org/core/>
PREFIX wp: <http://vocabularies.wikipathways.org/wp#>
PREFIX dc: <http://purl.org/dc/elements/1.1/>
PREFIX dcterms: <http://purl.org/dc/terms/>
http://rdf.disgenet.org/lodestar
SELECT DISTINCT ?PathwayName WHERE {
?gda sio:SIO_000628 ?gene, ?disease .
?gene a ncit:C16612 .
?disease a ncit:C7057 .
?disease dcterms:title ?DiseaseName .
?gda sio:SIO_000216 ?scoreIRI .
?scoreIRI sio:SIO_000300 ?score .
FILTER (?score > "0.35"^^xsd:decimal)
FILTER (contains(str(?DiseaseName),"Crohn"))
SERVICE <http://sparql.wikipathways.org/> {
?geneProduct a wp:GeneProduct .
?geneProduct dc:identifier ?gene .
?geneProduct dcterms:isPartOf ?pathway .
?pathway dc:identifier ?pathwayid .
?pathway dc:title ?PathwayName .
}
}
http://rdf.disgenet.org/lodestar/sparql
Application:	BOINQ
• Framework	for	managing	sequencing	data	
using	semantic	technologies
• Find	it	here	:	
https://github.com/mr-tijn/boinq2
Functionalities
• Uploader/converter
– Upload	BED/GFF/VCF	files
– Automatically	translated	into	triples
– Stored	in	triplestore
Functionalities
• Query	builder
– Visually	build	SPARQL	queries	that	query	your	data	
along	with	public	data
– Store	results	as	new	graphs	or	download	as	CSV
Demo:	query	for	finding	first	exons	of	
genes	related	to	colon	cancer

Bio ontologies and semantic technologies[2]