Ontology-Based Classification of Molecules: a Logic Programming Approach

O NTOLOGY-BASED
C LASSIFICATION OF M OLECULES :
A L OGIC P ROGRAMMING A PPROACH

Despoina Magka

Department of Computer Science, University of Oxford

November 30, 2012

B IOINFORMATICS AND S EMANTIC T ECHNOLOGIES
Life sciences data deluge

1

Hierarchical organisation of biochemical knowledge

1


Fast, automatic and repeatable classiﬁcation driven by
Semantic technologies

1


Web Ontology Language, a W3C standard family
of logic-based formalisms

1


Web Ontology Language, a W3C standard family
of logic-based formalisms
OWL bio- and chemo-ontologies widely adopted

1

T HE C H EBI O NTOLOGY

OWL ontology Chemical Entities of Biological Interest

2


Dictionary of molecules with taxonomical information

2



caffeine is a cyclic molecule
2



serotonin is an organic molecule

2



ascorbic acid is a carboxylic ester

2


Pharmaceutical design and study of biological pathways

2



ChEBI is manually incremented

2



Currently ~30,000 chemical entities, expands at 3,500/yr

2



Existing chemical databases describe millions of molecules

2



Existing chemical databases describe millions of molecules
Speed up growth by automating chemical classiﬁcation

2

E XPRESSIVITY L IMITATIONS OF OWL
1 At least one tree-shaped model for each consistent OWL
ontology problematic representation of cycles

3


E XAMPLE

C C

C C

3


E XAMPLE
Cyclobutane ∃(= 4)hasAtom.(Carbon ∃(= 2)hasBond.Carbon)

C C

C C

3


E XAMPLE

C C

C C

OWL-based reasoning support
1 Is cyclobutane a cyclic molecule?

3

2 No minimality condition on the models hard to axiomatise
classes based on the absence of attributes

E XAMPLE

C C

C C


3


E XAMPLE
Oxygen
C C

C C


3


E XAMPLE
Oxygen
C C

C C

2 Is cyclobutane a hydrocarbon?

3


E XAMPLE
Oxygen
C C

C C

3


E XAMPLE
Oxygen
C C

C C

Required reasoning support
2 Is cyclobutane a hydrocarbon?

3

R ESULTS OVERVIEW
1 Expressive and decidable formalism for modelling complex
objects: Description Graphs Logic Programs

4

R ESULTS OVERVIEW
2 Modelling that spans a wide range of structure-dependent
classes of molecules

4

R ESULTS OVERVIEW
3 Implementation that draws upon DLV and performs
structure-based classiﬁcation with a signiﬁcant speedup

4

R ESULTS OVERVIEW
4 Evaluation over part of the manually curated ChEBI
ontology revealed modelling errors

4

R ESULTS OVERVIEW
4 Evaluation over part of the manually curated ChEBI
ontology revealed modelling errors

Language for representing biochemical structures with a
favourable performance/expressivity trade-off

4

C LASSIFYING S TRUCTURED O BJECTS

5


ascorbicAcid : 0
o
6
o c o
o
5 c 11 c 1 c
hasAtom h
2
12 10 7
single c c
13
double 9 8

4 o 3 o

5


ascorbicAcid : 0
o
6
o c o
o
5 c 11 c 1 c
hasAtom h
2
12 10 7
single c c
13
double 9 8

4 o 3 o

ascorbicAcid(x) →hasAtom(x, f1 (x)) ∧ . . . ∧ hasAtom(x, f13 (x))
o(f1 (x)) ∧ . . . ∧ c(f7 (x)) ∧ . . . ∧
single(f1 (x), f7 (x)) ∧ double(f7 (x), f2 (x)) ∧ . . .

5


ascorbicAcid : 0
o
6
o c o
o
5 c 11 c 1 c
hasAtom h
2
12 10 7
single c c
13
double 9 8

4 o 3 o

ascorbicAcid(x) →hasAtom(x, f1 (x)) ∧ . . . ∧ hasAtom(x, f13 (x))
o(f1 (x)) ∧ . . . ∧ c(f7 (x)) ∧ . . . ∧
single(f1 (x), f7 (x)) ∧ double(f7 (x), f2 (x)) ∧ . . .
hasAtom(x, y1 ) ∧ hasAtom(x, y2 ) ∧ y1 = y2 → polyatomicEntity(x)
∧5 hasAtom(x, yi ) ∧ c(y1 ) ∧ o(y2 ) ∧ o(y3 )∧
i=1
c(y4 ) ∧ horc(y5 ) ∧ double(y1 , y2 )∧
single(y1 , y3 ) ∧ single(y3 , y4 ) ∧ single(y1 , y5 ) → carboxylicEster(x)
5


ascorbicAcid : 0
o
6
o c o
o
5 c 11 c 1 c
hasAtom h
2
12 10 7
single c c
13
double 9 8

4 o 3 o

Input fact: ascorbicAcid(a)
Stable model: ascorbicAcid(a), hasAtom(a, af ) for 1 ≤ i ≤ 13,
i
o(af ) for 1 ≤ i ≤ 6, c(af ) for 7 ≤ i ≤ 12, h(af ), single(af , af ),
i i 13 8 3
single(af , af ), single(af , af ) for i ∈ {5, 11}, single(af , af ),
9 4 12 i 11 6
single(af , af ) for i ∈ {1, 9, 11, 13}, single(af , af ) for i ∈ {1, 8},
10 i 7 i
double(af , af ), double(af , af ), horc(af ) for 7 ≤ i ≤ 13,
2 7 8 9 i
polyatomicEntity(a), carboxylicEster(a), cyclic(a)

5


ascorbicAcid : 0
o
6
o c o
o
5 c 11 c 1 c
hasAtom h
2
12 10 7
single c c
13
double 9 8

4 o 3 o

Input fact: ascorbicAcid(a)
Stable model: ascorbicAcid(a), hasAtom(a, af ) for 1 ≤ i ≤ 13,
i
o(af ) for 1 ≤ i ≤ 6, c(af ) for 7 ≤ i ≤ 12, h(af ), single(af , af ),
i i 13 8 3
single(af , af ), single(af , af ) for i ∈ {5, 11}, single(af , af ),
9 4 12 i 11 6
single(af , af ) for i ∈ {1, 9, 11, 13}, single(af , af ) for i ∈ {1, 8},
10 i 7 i
double(af , af ), double(af , af ), horc(af ) for 7 ≤ i ≤ 13,
2 7 8 9 i
polyatomicEntity(a), carboxylicEster(a), cyclic(a)
Ascorbic acid is a cyclic polyatomic entity and a carboxylic ester

5

C HEMICAL C LASSES W E C OVERED
1 Existence of subcomponents

6

Carbon molecules

6

Carbon molecules
Carboxylic acids and carboxylic esters

6

Carbon molecules
Ketones and aldehydes

6

Carbon molecules
2 Exact cardinality of parts

6

Carbon molecules
Exactly two carbons

6

Carbon molecules
Exactly two carbons
Dicarboxylic acid

6

Carbon molecules
Exactly two carbons
Dicarboxylic acid
3 Exclusive composition

6

Carbon molecules
Exactly two carbons
Dicarboxylic acid
Inorganic molecules

6

Carbon molecules
Exactly two carbons
Dicarboxylic acid
Inorganic molecules
Hydrocarbons

6

Carbon molecules
Exactly two carbons
Dicarboxylic acid
Inorganic molecules
Hydrocarbons
Saturated molecules

6

Carbon molecules
Exactly two carbons
Dicarboxylic acid
Inorganic molecules
Hydrocarbons
Saturated molecules
4 Cyclicity-related classes

6

Carbon molecules
Exactly two carbons
Dicarboxylic acid
Inorganic molecules
Hydrocarbons
Saturated molecules
Benzenes

6

Carbon molecules
Exactly two carbons
Dicarboxylic acid
Inorganic molecules
Hydrocarbons
Saturated molecules
Benzenes
Cyclic molecules

6

Carbon molecules
Exactly two carbons
Dicarboxylic acid
Inorganic molecules
Hydrocarbons
Saturated molecules
Benzenes
Cyclic molecules
Alkanes

6

E MPIRICAL E VALUATION
Draws upon DLV, a deductive databases engine

7

Evaluation with data extracted from ChEBI

7

500 molecules under 51 chemical classes in 40 secs

7

Quicker than other approaches:

7

[Hastings et al., 2010] 140 molecules in 4 hours
[Magka et al., 2012] 70 molecules in 450 secs

7

Subsumptions exposed by our prototype:

7

ascorbic acid is a polyatomic entity, a carboxylic ester and a
cyclic molecule
missing from the ChEBI OWL ontology

7

cyclic molecule
Contradictory subclass relation from ChEBI:

7

cyclic molecule
Contradictory subclass relation from ChEBI:
Ascorbic acid is asserted to be a carboxylic acid (release 95)
Not listed among the subsumptions derived by our prototype

7

C ONCLUSION AND F URTHER R ESEARCH
Results
1 Expressive and decidable formalism for complex objects

8

Results
2 Wide range of structure-based classes

8

Results
3 DLV-based implementation exhibits a signiﬁcant speedup

8

Results
4 Evaluation over ChEBI ontology revealed modelling errors

8

Results

8

Results

Future directions
SMILES-based surface syntax

8

Results


Future directions

∧5 hasAtom(x, yi ) ∧ c(y1 ) ∧ o(y2 ) ∧ o(y3 ) ∧ c(y4 )∧
i=1
double(y1 , y2 ) ∧ single(y1 , y3 ) ∧ single(y3 , y4 ) ∧ single(y1 , y5 )
→ carboxylicEster(x)

8

Results


Future directions

deﬁne carboxylicEster
some hasAtom SMILES(COC(= O)[∗])
end.

8

Results

Future directions
Detect subsumptions between classes

8

Results

Future directions
E.g., Carboxylic ester is an organic molecular entity

8

Results

Future directions
Extensions with numerical datatypes

8

Results

Future directions
E.g., Small molecules if they weigh less than 800 daltons

8

Results

Future directions
Classiﬁcation of complex biological objects

8

Results

Future directions
Integration with Protégé, Bioclipse, JChemPaint,. . .

8

Results

Future directions
Mapping from our logic to RDF

8

Results

Future directions
Mapping from our logic to RDF
Thank you! Questions?!?

8

Ontology-Based Classification of Molecules: a Logic Programming Approach

Recommended

Recommended

More Related Content

Viewers also liked

Viewers also liked (19)

Similar to Ontology-Based Classification of Molecules: a Logic Programming Approach

Similar to Ontology-Based Classification of Molecules: a Logic Programming Approach (20)

Recently uploaded

Recently uploaded (20)

Ontology-Based Classification of Molecules: a Logic Programming Approach