Bio ontologies and semantic technologies

Introduction to Bio
Ontologies
and The
Semantic Web
M.
Devisscher
Biological Databases

Overview
• Bio
ontologies
• Semantic technologies
• Practical
sessions:

– Protégé and a
bio
database
– DYI
SPARQL
endpoint

Introduction
• Ontologies:
what are
ontologies ?
• Ontologies in
the
bio
domain:
OBO
Foundry
• Ontologies in
the
semantic web
• OBO
• RDF,
IRI,
TTL,
SPARQL,
OWL

What is
an ontology ?
• Ontology =
a
specification of
a

conceptualization (Gruber 1993)
• In
practice:
controlled vocabularies
– Disambiguation (e.g.
Bank,
Running)
– Language/species
independence
• Very useful in
biology – complex
hierarchies of

terms

Ontologies in
the
bio
Domain
• OBO
Foundry -‐ open
Biological and
Biomedical Ontologies
• Common
principles
• List
of
ontologies at

http://www.obofoundry.org
• OBO
is
also a
data
format
.obo

SideTrack – The
Gene
Ontology
• The
mother of
bio-‐ontologies:
the
GO
– Oldest bio
– ontology
– Many practical
applications:
• Cross
species
studies
• Term
abundance studies
• GO
is
an OBO
ontology

SideTrack – The
Gene
Ontology
• Collection
of
terms

SideTrack – The
Gene
Ontology
• Relationships between terms:
– Subsumption:
is_a
– Partonomic:
part_of
• These
terms are
transitive
• Terms form
a
DAG
(directed,
acyclic graph)
• Some information
can be inferred

SideTrack – The
Gene
Ontology

SideTrack – The
Gene
Ontology
• Know more:
www.geneontology.org
• AMIGO
:
the
GO
browser

Gene
Ontology
Annotation
• Gene
ontology
annotations
GOA
=
entities

labeled
with
GO
terms
– E.g.
Uniprot-‐GOA

Semantic Technologies
• The
semantic web:
Tim
Berners Lee
et
al,

Scientific American
2001

• W3C:
a
set
of
specifications
http://www.w3.org/standards/semanticweb/
• A
mature toolset
– Dedicated data
formats
– Storage
– Query
language

• Basic
data
element
=
a
Triple
– A
mini
sentence
– Contains three Terms:
• Subject
Predicate Object

• Representation of
triples
– Basic
data
format:
RDF/XML
– All data
expressed in
RDF
(Resource
Description
Framework)
– Several compatible
syntaxes:
TTL
(Terse Triple

Language)
most
human
readable

The
Turtle Syntax
• Basic
Triple
<http://bioinformatics.be/entities#martijn>
<http://bioinformatics.be/relations#has_favorite_beer>
<http://bioinformatics.be/entities#karmeliet>.

The
Turtle Syntax
• Prefix
@prefix b4x: <http:bioinformatics.be/terms#>
b4x:martijn b4x:has_favorite_beer b4x:karmeliet.

The
Turtle Syntax
• Predicate lists
@prefix b4x: <http:bioinformatics.be/terms#> .
@prefix foaf: <http://xmlns.com/foaf/0.1/> .
b4x:martijn b4x:has_favorite_beer b4x:karmeliet;
foaf:name “Martijn Devisscher”.

The
Turtle Syntax
• Object
lists
b4x:martijn b4x:has_favorite_beer b4x:karmeliet,
b4x:chimay_blauw;
foaf:name “Martijn Devisscher”.

IRI’s and Literals
• Terms can be either IRI’s,
Literals or
blank
nodes
• IRI
= Internationalized Resource
Identifier
• Unique
id – a
virtual
URI
– Example:
http://bioinformatics.be/terms#martijn
– There is
no
requirement for resolving
– Now:
Open
Data
initiatives:
please do
use resolvable
URI’s http://linkeddata.org
– Unique
identifierscan be registered on

http://identifiers.org

Introduction
• Literals:
can be typed,
allowed types
from the

XSD
namespace:
– E.g.
“This is
a
string
example”^^xsd:string
– E.g.
“5”^^xsd:integer
• IRI’s are
used for entities and attributes
• Literals are
used for attribute values that
aren’t entities

The
Turtle Syntax
• Typed literals
@prefix xsd: <http://www.w3.org/2001/XMLSchema#> .
b4x:chimay_blauw;
b4x:length “184”^^xsd:integer;
foaf:name “Martijn Devisscher”^^xsd:string.

The
Turtle Syntax
• Blank
nodes
b4x:chimay_blauw;
b4x:length “184”^^xsd:integer;
foaf:name “Martijn Devisscher”^^xsd:string;
b4x:owns_cat [ b4x:color “Gray” ].

Classes
and Individuals
• rdf:type
b4x:martijn rdf:type foaf:Person.

Classes
and Individuals
• Shorthand:
a
b4x:martijn a foaf:Person;
foaf:knows b4x:geert.
b4x:geert a foaf:Person.

Example
<http://xmpl/entities#martijn>
<http://xmpl/relations#has_favorite_beer>
<http://xmpl/entities#karmeliet>.

• Sets
of
triples form
a
Graph

Graphs
• Triples are
building
blocks of
Graphs
• Combining sets
of
triples allows the

construction of
arbitrarily complex
graphs
b4x:martijn b4x:karmeliethas_favorite_beer

Add meaning !
• Reuse terms from existing,
well
defined
vocabularies – ontologies (foaf,
dc,
go,
so)
• Describe new
terms =
Ontologies
• Contain
– A
crisp
human
definition
– Some machine
readable facts

Metadata
• Ontologies are
also described in
RDF
– RDFS:
RDF
-‐ Schema
– OWL:
Web
Ontology Language
– Also expressed in
RDF
• For
clarity,
file
extension
can be .rdfs or
.owl

RDFS
Essentials
• Descriptions
– rdfs:label
– rdfs:comment

RDFS
• Relationships between properties,
classes
– rdfs:Class
– rdfs:subClassOf
– rdf:Property
– rdfs:subPropertyOf
– rdfs:range
– rdfs:domain

RDFS:
Example
@prefix rdfs: <http://www.w3.org/2000/01/rdf-‐schema#>.
b4x:karmeliet a b4x:Trappist .
b4x:Beer a rdfs:Class .
b4x:Trappist a rdfs:Class .
b4x:Trappist rdfs:subClassOf b4x:Beer .
b4x:has_favorite_beer a rdf:Property ;
rdfs:domain foaf:Person ;
rdfs:range b4x:Beer .
b4x:Beer rdfs:subClassOf b4x:Drink .

Analogy
• RDF
=
database
=
data
• RDFS/OWL
=
schema
=
metadata
• Both
are
described in
RDF,
but
have
a
different

scope

• Inference
– Enhance dataset
using knowledge from metadata
(e.g.
rdfs,
owl)
• Types
of
inference engines
– RDFS
inference
• RDFS
entailmentregime
– OWL
inference
• Under
active research
• Engines
exist for specific subsets of
OWL
(OWL-‐DL)

RDFS:
Inference
b4x:kevin
b4x:has_favorite_beer
b4x:stella
Q:
What can we
infer from this using RDFS

entailment ?

RDFS:
Inference
b4x:kevin
b4x:has_favorite_beer
b4x:stella
Inferred triples:
b4x:kevin
a
foaf:Person [from domain]
b4x:stella
a
b4x:Beer
[from range]
b4x:stella
a
b4x:Drink
[from subClassOf]

DuckTyping
• Watch
out
with inference !
Example:
You want
to express that people can
have
lengths
b4x:length a rdf:Property;
rdfs:domain foaf:Person;
rdfs:range xsd:integer.

DuckTyping
• Problem:
ex:VW_Transporter b4x:length “600”^xsd:integer.
• Would infer that VW_Transporter is
a
Person
!
• This is
called DuckTyping
If
it
looks
like
a
duck,
swims
like
a
duck,
and

quacks
like
a
duck,
then
it
probably
is
a
duck

Task
• Find
a
solution:
express
in
rdfs that
people
can

have
lengths

Task
• Find
a
solution:
express
in
rdfs that
people
can

have
lengths
b4x:havingLenght a rdfs:Class.
b4x:length a rdf:Property;
rdfs:domain b4x:havingLength;
rdfs:range xsd:integer.
foaf:Person rdfs:subClassOf b4x:havingLength.

Storing
RDF
• As
an RDF
file
for download
• In
a
Triplestore
– Database
optimised for storing
triples
– Examples:
BlazeGraph,
Fuseki,
Sesame

• Querying over
RDF
data:
SPARQL
• Cool
features:
– Distributed
querying =
actual distribution of
data

and computing
resources
– SPARQL/Update:
modify data
• SPARQL
endpoints:
SPARQL
over
HTTP

SPARQL
Query
Syntax
• First
example:
SELECT ?subject ?predicate ?object WHERE {
?subject ?predicate ?object.
}
(Generally
not a
good idea as
it will pull
down

the
whole dataset)
Binding
variables
Graph matching

?
SELECT ?person WHERE {
?person b4x:has_favorite_beer b4x:karmeliet
}

SPARQL
Query
Syntax
• Limit
result size :
SELECT ?subject ?predicate ?object WHERE {
?subject ?predicate ?object.
} LIMIT 10

SPARQL
Query
Syntax
• Find all classes:
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-‐schema#>
SELECT ?class ?label WHERE {
?class a rdfs:Class.
?class rdfs:label ?label.
}
(This will only retrieve classes
that have
a
label)

SPARQL
Query
Syntax
• Find all classes:
OPTIONAL {
}
}

SPARQL
Query
Syntax
• Find all classes
that contain “duck”
in
the

label:
FILTER( CONTAINS (str(?label) , “duck” ) )
}

SPARQL
Query
Syntax
• Make
it case
insensitive:
FILTER( CONTAINS ( UCASE(str(?label)) , “DUCK” ) )
}

SPARQL
Query
Syntax
• Search
in
specific graph:
SELECT ?class ?label
FROM <http://example.org/animals>
WHERE {
}

SPARQL
Query
Syntax
• Search
in
specific graph:
GRAPH <http://example.org/animals> {
}
}

SPARQL
Query
Syntax
• Can also search
for graphs :
SELECT ?g WHERE {
GRAPH ?g {
}
}

Summary:
Querying RDF
data
RDF
Data
Inference
Engine
RDFS/OWL
RDF
Data
Inferred
SPARQL
Endpoint

• Basic data element = a Triple
– A mini sentence
– Contains three Terms:
– Subject Predicate Object
• Example:
<http://xmpl/entities#martijn>
<http://xmpl/relations#has_favorite_beer>
<http://xmpl/entities#karmeliet>.
Take
home
Summary

• Combine triples to represent
knowledge

• Use terms from ONTOLOGIES
– COMMON VOCABULARIES
– POSSIBLE TO INFER
MEANING
• OMIABIS
• OBIB
• SNOMED/ICD
• MESH

?
• SPARQL searches for patterns

Interoperability between OBO
and
• Originated from two separate
academic worlds
• Computing
applications of
OBO
mainly
consistencycheckingand overrepresentation
analysis
• Semantic Technologies:
much broader toolset
• Interoperability ?
– Direct
offering in
both formats
– Automated mapping

Where to find ontologies
• OBO
Foundry
• Bioportal;
NCBO
• Biogateway
• Bio2RDF

Where to find RDF
data
• Google
for SPARQL
endpoint
• =>
e.g.
EBI
databases
• Non
biological:
DBpedia

How
about Tim
Berners Lee’s vision
• We’re not there yet,
but
for bio
data
we’re
getting quite close
– The
explicitome
– Crowd sourcing
– Nanopublications

SPARQL
:
Recap
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
SELECT ?label
FROM <http://graphName> WHERE {
?x rdfs:label ?label.
FILTER ( CONTAINS(?label, “dimethylalinine”) )
} LIMIT 10 ORDER BY ?label

SPARQL
:
Recap
SELECT ?label
• FIND
the
pattern

SPARQL
:
Recap
SELECT ?label
• FIND
the
pattern
• BIND
variables
?label,
?x

SPARQL
:
Recap
SELECT ?label
• FIND
the
pattern
• BIND
variables
?label,
?x
• RETRIEVE variable
?label

SPARQL
:
Recap
SELECT ?label
• FIND
the
pattern
• BIND
variables
?label,
?x
• RETRIEVE
variable
?label
• PREFIX:
replace
rdfs:label by
<http://www.w3.org/2000/01/rdf-schema#>

SPARQL
:
Recap
SELECT ?label
• FIND
the
pattern
• BIND
variables
?label,
?x
• RETRIEVE
variable
?label
• PREFIX:
replace
rdfs:label by
• FILTER results
to
labels
containing
“dimethylalinine”

SPARQL
:
Recap
SELECT ?label
• FIND
the
pattern
• BIND
variables
?label,
?x
• RETRIEVE
variable
?label
• PREFIX:
replace
rdfs:label by
• FILTER
results
to
labels
containing
“dimethylalinine”
• LIMIT
results
to
first
10
matches
ordered
by
label

SPARQL
:
Recap
DESCRIBE
<http://rdf.wikipathways.org/Pathway/WP1425_r74390/WP/Interaction/e077e>
• Useful
short
query
to
get
direct
links
from/to
a

given
node

SPARQL
REFERENCE
http://www.w3.org/TR/sparql11-‐overview/

Running
SPARQL
• From
a
web
interface

• From
a
web
interface
• Using
http
– HTTP
GET
– HTTP
POST
:
for
larger
query
strings
– Headers
determine
response
type
(JSON,
XML,
HTML)
http://…/sparql?default-graph-uri=<http://graphName>&query=URLENCODEDQUERYSTRING
Running
SPARQL

Access
• From
the
web
interface
!
• SPARQL
endpoint:
using
API
key;
on
request

• Running
a
local
copy:
download
VM
image;
on

request

Exercises
• Find
a
term
• Find
ontologies
containing
a
term
• Browse
some
ontologies
• Check
the
NCBO
annotator
!

Exercise
• From
uniprot find
proteins
that
are
annotated

with
a
given
Gene
Ontology
term

PREFIX up:<http://purl.uniprot.org/core/>
PREFIX taxon:<http://purl.uniprot.org/taxonomy/>
PREFIX rdfs:<http://www.w3.org/2000/01/rdf-schema#>
PREFIX obo:<http://purl.obolibrary.org/obo/>
SELECT * WHERE {
?protein up:classifiedWith obo:GO_0004499.
?protein up:organism taxon:9606.
}
http://sparql.uniprot.org

Exercise
• From
Expression
Atlas
find
proteins
that
are

differentially
expressed
(P
<
1e-‐12)
in
Crohn’s
disease

PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX owl: <http://www.w3.org/2002/07/owl#>
PREFIX dcterms: <http://purl.org/dc/terms/>
PREFIX obo: <http://purl.obolibrary.org/obo/>
PREFIX sio: <http://semanticscience.org/resource/>
PREFIX efo: <http://www.ebi.ac.uk/efo/>
PREFIX atlas: <http://rdf.ebi.ac.uk/resource/atlas/>
PREFIX atlasterms: <http://rdf.ebi.ac.uk/terms/atlas/>
PREFIX up:<http://purl.uniprot.org/core/>
PREFIX biopax3:<http://www.biopax.org/release/biopax-level3.owl#>
SELECT distinct ?protein ?expressionValue ?pvalue WHERE {
?factor rdf:type efo:EFO_0000384 .
?value atlasterms:hasFactorValue ?factor .
?value atlasterms:isMeasurementOf ?probe .
?value atlasterms:pValue ?pvalue .
?value rdfs:label ?expressionValue .
?probe atlasterms:dbXref ?protein .
FILTER ( ?pvalue < 1e-12 )
FILTER ( strstarts(str(?protein),"http://purl.uniprot.org/uniprot/") )}
}ORDER BY ASC (?pvalue)
https://www.ebi.ac.uk/rdf/services/atlas/sparql

• Links
pathways
with
genes,
terms
from

Pathway,
Cell
line
and
Disease
ontology,

PubMed
references
• Models
individual
Interactions
• Can
be
downloaded
as
RDF
• Has
an
experimental
SPARQL
endpoint
WikiPathways

• Define
a
query
to
find
pathways
linked
to

TNFalpha gene
Exercise

PREFIX wp: <http://vocabularies.wikipathways.org/wp#>
PREFIX dc: <http://purl.org/dc/elements/1.1/>
SELECT DISTINCT ?PathwayName where {
?geneProduct a wp:GeneProduct .
?geneProduct dc:identifier ?GeneID .
?geneProduct dcterms:isPartOf ?pathway .
?geneProduct rdfs:label ?geneName .
?pathway dc:identifier ?pathwayid .
?pathway dc:title ?PathwayName .
FILTER(str(?geneName) = "TNFalpha" )
}
http://sparql.wikipathways.org

• Try
this,
or
another
query
– Using
web
interface
– Using
http
get
• Define
a
simple
describe
• Use
a
web
tool
to
URLEncode the
query
• Submit
query
as
a
URL
parameter
Exercise

PREFIX xsd: <http://www.w3.org/2001/XMLSchema#>
PREFIX foaf: <http://xmlns.com/foaf/0.1/>
PREFIX skos: <http://www.w3.org/2004/02/skos/core#>
PREFIX void: <http://rdfs.org/ns/void#>
PREFIX ncit: <http://ncicb.nci.nih.gov/xml/owl/EVS/Thesaurus.owl#>
PREFIX up: <http://purl.uniprot.org/core/>
SELECT DISTINCT ?gene WHERE {
?gda sio:SIO_000628 ?gene,?disease .
?gene a ncit:C16612 .
?gene skos:exactMatch ?GeneID .
?disease a ncit:C7057 .
?disease dcterms:title ?DiseaseName .
?gda sio:SIO_000216 ?scoreIRI .
?scoreIRI sio:SIO_000300 ?score .
FILTER (?score > "0.35"^^xsd:decimal)
FILTER (contains(str(?DiseaseName),"Crohn"))
}
http://rdf.disgenet.org/lodestar

PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX owl: <http://www.w3.org/2002/07/owl#>
PREFIX xsd: <http://www.w3.org/2001/XMLSchema#>
PREFIX foaf: <http://xmlns.com/foaf/0.1/>
PREFIX skos: <http://www.w3.org/2004/02/skos/core#>
PREFIX void: <http://rdfs.org/ns/void#>
PREFIX ncit: <http://ncicb.nci.nih.gov/xml/owl/EVS/Thesaurus.owl#>
PREFIX up: <http://purl.uniprot.org/core/>
PREFIX wp: <http://vocabularies.wikipathways.org/wp#>
PREFIX dc: <http://purl.org/dc/elements/1.1/>
http://rdf.disgenet.org/lodestar

SELECT DISTINCT ?PathwayName WHERE {
?gda sio:SIO_000628 ?gene, ?disease .
?gene a ncit:C16612 .
?disease a ncit:C7057 .
?disease dcterms:title ?DiseaseName .
?gda sio:SIO_000216 ?scoreIRI .
?scoreIRI sio:SIO_000300 ?score .
FILTER (?score > "0.35"^^xsd:decimal)
FILTER (contains(str(?DiseaseName),"Crohn"))
SERVICE <http://sparql.wikipathways.org/> {
?geneProduct a wp:GeneProduct .
?geneProduct dc:identifier ?gene .
?geneProduct dcterms:isPartOf ?pathway .
?pathway dc:identifier ?pathwayid .
?pathway dc:title ?PathwayName .
}
}
http://rdf.disgenet.org/lodestar/sparql

Bio ontologies and semantic technologies

Bio ontologies and semantic technologies

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Viewers also liked

Viewers also liked (20)

Similar to Bio ontologies and semantic technologies

Similar to Bio ontologies and semantic technologies (20)

More from Prof. Wim Van Criekinge

More from Prof. Wim Van Criekinge (20)

Recently uploaded

Recently uploaded (20)

Bio ontologies and semantic technologies