Towards OWL-based Knowledge
  Representation in Petrology

   A.Shkotin, V.Ryakhovsky, D.Kudryavtsev
             GIS Department
    Vernadsky State Geological Museum
       Russian Academy of Sciences
               www.sgm.ru
             ashkotin@acm.org
Contents

    Introduction

    Fact formalization

    Dictionary formalization

    Formal definitions

    Conclusions and further plans

    Acknowledgments

    References

                                    2
Introduction


Petrology is a science investigating rocks and
their formation conditions.

Large volume of petrological information requiring
systematization, integration and maintenance in a
consistent state is accumulated at present.

These tasks can be solved through knowledge
formalization.

                                     3
Knowledge Formalization
Final goal: Create a formal theory for the key
concepts of petrology and relationships among
them.

Ontologies, that is conceptual structures
organized on the basis of mathematical logic, play
a pivotal role in the process of creation.

Definitions play a decisive role in a formal theory,
as they specify exactly those concepts whose
properties will be used and studied.
                                      4
Formal Theory


Building a formal theory of the field of natural
science similar of a mathematical one enables:

 Revealing primary concepts

 Giving definitions to other concepts

 Stating axioms and theorems




                                        5
Fact Formalization




                     6
The Approach
Databases are not a knowledge.

They require an essential and            thorough
processing to obtain a knowledge.

DB conversion to the traditional form of
knowledge, i.e. knowledge in a natural language,
is the direct way of obtaining knowledge from
data.

The natural language is limited to a CNL. CNL is a
universal    means     of     formal    knowledge
presentation.                         7
8
CNL Sentences. The Approach

Create templates of a CNL sentences to present
all the facts contained in the Proba DB.

Use local (‘internal’) proper names.

Connect words in composite terms using ‘_’ letter.

Global commonly known proper names such as
Iceland, Atlantic_Ocean can be found in a text.

                                       9
CNL Sentences. Example
PUB5633 is a publication.
PUB5633 title is "A CONTRIBUTION TO THE GEOLOGY OF THE K...".
SAM32994 is a sample. SAM32994 is a rhyolite.
PLC32994 is a place. PLC32994 is a part of Iceland.
SUB469812 is a substance.
WPC469812 is a weight_percent. WPC469812 value is
73.95.

PUB5633 describes SAM32994.
SAM32994 gathering_place is PLC32994.
SAM32994 includes SUB469812.
SUB469812 is a WPC469812 component.
                                             10
OWL Ontology

All generated sentences are ACE* language
statements.

The sentences are so that the APE* translator
translates them to OWL.

The DB is converted to 1,174 ontologies.

* Attempto Project. http://attempto.ifi.uzh.ch/site/

                                                 11
OWL Ontology #5633
 Classes: place,           Object properties:
  publication, rhyolite,     component,
  sample, substance,         gathering_place,
  weight_percent.            includes, mixture,
                             part...
 Data properties:          Individuails: Iceland,
  authorial_number,          PLC32994...,
  chemical_formula,          PUB5633,
  first_page, latitude,      SAM32994...,
  longitude, reference,      SUB469812...,
  title, value, year.        WPC469812...
                                      12
Dictionary Ontology
          and
Definition Concentrator




                          13
Definition Dictionary
is an important and specific type of knowledge

contains the terms of a subject area and informal
definitions of these terms.

Informal definitions are provided by experts
usually belonging to a scientific school.

Tasks:

 Convert a definition dictionary into formal knowledge

 Gather definitions given by various schools
                                              14
Dictionary Entry Example

HARZBURGITE. An ultramafic plutonic rock
composed essentially of olivine and
orthopyroxene. Now defined modally in the
ultramafic rock classification (Fig. 2.9, p.28).
(Rosenbusch, 1887, p.269; Harzburg, Harz
Mts, Lower Saxony, Germany; Tröger 732;
Johannsen v.4, p.438; Tomkeieff p.247)

[IRCGT], p.88

                                        15
From Dictionary Text to Ontology
Dictionary
“Dictionary of Terms of Igneous Rocks. 1,567
entries, the overwhelming majority of them being
rock names.
Owner
Inter-Departmental Petrographic Committee in the
Geoscience Division of the Russian Academy of
Sciences.
Text
http://www.igem.ru/site/petrokomitet/slovar.htm
Ontology http://earth.jscc.ru/ontologies/dic.owl
                                   16
Definition Concentrator
The goal is to collectively maintain definitions of
scientific terms, including formal definitions.

The ontology of igneous rocks is contained in
webProtege under the dic name.

Some terms are complemented with definitions
from other dictionaries.

Address at Geology portal:
http://earth.jscc.ru/webprotege/
                                     17
18
Name Spaces


prefix pgc: <http://www.igem.ru/site/petrokomitet/slovar#>

prefix dic: <http://earth.jscc.ru/ontologies/dic.owl#>

prefix gwr: <http://wiki.web.ru/wiki#>

prefix pgcc: <http://www.igem.ru/site/petrokomitet/code#>



                                                19
Formal Term Meaning Definition

abessedite is

peridotite    and      mineral_mixture      and
contains_mineral only (olivine or hornblende or
phlogopite)

OWL syntax – Manchester.



                                  20
Formal Definitions




                     21
Primary Source



Le Maitre, L.E., ed. 2002. Igneous Rocks: A
Classification and Glossary of Terms 2nd edition,
Cambridge.
http://amigoreader.com/book/?b=29372



                                    22
Building an Algorithm

The     classification   rules     described      in
methodologies and recommendations have to be
used to obtain precise definitions and to formalize
them.

We start with a revision of all parts of the
algorithm.



                                      23
VPC Definitions
Modal content of pyroxenes

VPC_Px(x) = VPC_Opx(x)+VPC_Cpx(x)

Mineral groups for diagrams

VPC_OOC(x) = VPC_Ol(x)+VPC_Opx(x)+VPC_Cpx(x)
VPC_OPH(x) = VPC_Ol(x)+VPC_Px(x)+VPC_hornblende(x)

Modal content of mafic minerals

VPC_M(x) = 100 - (VPC_Q(x)+VPC_A(x)+VPC_P(x)+VPC_F(x))
                                         24
Qualitative Characteristics
Predicates
pyroclastic,    kimberlite,    lamproite,           lamprophyre,
charnockite, plutonic, volcanic.

Definition
pyroclastic(x) = clastic(x) and
(∀y clast(y) ⋀ part_of(y,x)→ volcanic_eruption_result(y))

DL
Pyroclastic ≡ clastic⊓∀(part_of∘id(clast))-.
volcanic_eruption_result


                                               25
26
27
The Use of Reasoners


The described properties can be automatically
verified by loading definitions into a reasoner
working with linear inequalities.

Such reasoners do exist (e.g. Racer), and linear
inequalities can be written using the OWL 2
extension [OWL2LE].


                                   28
harzburgite

harzburgite(x) =
plutonic(x) and
not (pyroclastic(x) or kimberlite(x) or lamproite(x)
or lamprophyre(x) or charnockite(x))
and VPC_carbonates(x)≤50 and
VPC_melilite(x)≤10 and VPC_M(x) ≥ 90 and
VPC_kalsilite(x)=0 and VPC_leucite(x)=0 and
VPC_hornblende(x)=0 and
0.4*VPC_OOC(x)≤VPC_Ol(x)≤0.9*VPC_OOC(x)
and VPC_Cpx(x)<0.05*VPC_OOC(x)
                                       29
Conclusions and Further Plans

    A formula is possible   Ontology project in

    The construction of a    petrology:
    formal theory started   
                                Definition

    Tools of formal             concentrator
    knowledge               
                                Formalization of
    maintenance tested          [IRCGT]

    A definition            
                                GIS formalization
    concentrator            
                                CRL (Controlled
    prototype created           Russian language)
                                         30
Acknowledgments
We would like to thank

 Dr. Stephen M. Richard from Arizona Geological
Survey for comments on the report [otch10], a
helpful discussion and a reference to [BGSRCS]

 Pavel Klinov from University of Manchester for
numerous invaluable comments

 Dr. Kaarel Kaljurand from Attempto group for the
idea of using proper names
                                    31
References

[IRCGT] Le Maitre, L.E., ed. 2002. Igneous Rocks: A
Classification and Glossary of Terms 2nd edition, Cambridge.
url
[BGSRCS] Gillespie, M R, and Styles, M T. 1999. BGS Rock
Classification Scheme, Volume 1, Classification of igneous
rocks. British Geological Survey Research Report, (2nd
edition), RR 99–06. url
[OWL2LE] OWL 2 Web Ontology Language. Data Range
Extension: Linear Equations. url



                                            32

Towards OWL-based Knowledge Representation in Petrology

  • 1.
    Towards OWL-based Knowledge Representation in Petrology A.Shkotin, V.Ryakhovsky, D.Kudryavtsev GIS Department Vernadsky State Geological Museum Russian Academy of Sciences www.sgm.ru ashkotin@acm.org
  • 2.
    Contents  Introduction  Fact formalization  Dictionary formalization  Formal definitions  Conclusions and further plans  Acknowledgments  References 2
  • 3.
    Introduction Petrology is ascience investigating rocks and their formation conditions. Large volume of petrological information requiring systematization, integration and maintenance in a consistent state is accumulated at present. These tasks can be solved through knowledge formalization. 3
  • 4.
    Knowledge Formalization Final goal:Create a formal theory for the key concepts of petrology and relationships among them. Ontologies, that is conceptual structures organized on the basis of mathematical logic, play a pivotal role in the process of creation. Definitions play a decisive role in a formal theory, as they specify exactly those concepts whose properties will be used and studied. 4
  • 5.
    Formal Theory Building aformal theory of the field of natural science similar of a mathematical one enables:  Revealing primary concepts  Giving definitions to other concepts  Stating axioms and theorems 5
  • 6.
  • 7.
    The Approach Databases arenot a knowledge. They require an essential and thorough processing to obtain a knowledge. DB conversion to the traditional form of knowledge, i.e. knowledge in a natural language, is the direct way of obtaining knowledge from data. The natural language is limited to a CNL. CNL is a universal means of formal knowledge presentation. 7
  • 8.
  • 9.
    CNL Sentences. TheApproach Create templates of a CNL sentences to present all the facts contained in the Proba DB. Use local (‘internal’) proper names. Connect words in composite terms using ‘_’ letter. Global commonly known proper names such as Iceland, Atlantic_Ocean can be found in a text. 9
  • 10.
    CNL Sentences. Example PUB5633is a publication. PUB5633 title is "A CONTRIBUTION TO THE GEOLOGY OF THE K...". SAM32994 is a sample. SAM32994 is a rhyolite. PLC32994 is a place. PLC32994 is a part of Iceland. SUB469812 is a substance. WPC469812 is a weight_percent. WPC469812 value is 73.95. PUB5633 describes SAM32994. SAM32994 gathering_place is PLC32994. SAM32994 includes SUB469812. SUB469812 is a WPC469812 component. 10
  • 11.
    OWL Ontology All generatedsentences are ACE* language statements. The sentences are so that the APE* translator translates them to OWL. The DB is converted to 1,174 ontologies. * Attempto Project. http://attempto.ifi.uzh.ch/site/ 11
  • 12.
    OWL Ontology #5633 Classes: place,  Object properties: publication, rhyolite, component, sample, substance, gathering_place, weight_percent. includes, mixture, part...  Data properties:  Individuails: Iceland, authorial_number, PLC32994..., chemical_formula, PUB5633, first_page, latitude, SAM32994..., longitude, reference, SUB469812..., title, value, year. WPC469812... 12
  • 13.
    Dictionary Ontology and Definition Concentrator 13
  • 14.
    Definition Dictionary is animportant and specific type of knowledge contains the terms of a subject area and informal definitions of these terms. Informal definitions are provided by experts usually belonging to a scientific school. Tasks:  Convert a definition dictionary into formal knowledge  Gather definitions given by various schools 14
  • 15.
    Dictionary Entry Example HARZBURGITE.An ultramafic plutonic rock composed essentially of olivine and orthopyroxene. Now defined modally in the ultramafic rock classification (Fig. 2.9, p.28). (Rosenbusch, 1887, p.269; Harzburg, Harz Mts, Lower Saxony, Germany; Tröger 732; Johannsen v.4, p.438; Tomkeieff p.247) [IRCGT], p.88 15
  • 16.
    From Dictionary Textto Ontology Dictionary “Dictionary of Terms of Igneous Rocks. 1,567 entries, the overwhelming majority of them being rock names. Owner Inter-Departmental Petrographic Committee in the Geoscience Division of the Russian Academy of Sciences. Text http://www.igem.ru/site/petrokomitet/slovar.htm Ontology http://earth.jscc.ru/ontologies/dic.owl 16
  • 17.
    Definition Concentrator The goalis to collectively maintain definitions of scientific terms, including formal definitions. The ontology of igneous rocks is contained in webProtege under the dic name. Some terms are complemented with definitions from other dictionaries. Address at Geology portal: http://earth.jscc.ru/webprotege/ 17
  • 18.
  • 19.
    Name Spaces prefix pgc:<http://www.igem.ru/site/petrokomitet/slovar#> prefix dic: <http://earth.jscc.ru/ontologies/dic.owl#> prefix gwr: <http://wiki.web.ru/wiki#> prefix pgcc: <http://www.igem.ru/site/petrokomitet/code#> 19
  • 20.
    Formal Term MeaningDefinition abessedite is peridotite and mineral_mixture and contains_mineral only (olivine or hornblende or phlogopite) OWL syntax – Manchester. 20
  • 21.
  • 22.
    Primary Source Le Maitre,L.E., ed. 2002. Igneous Rocks: A Classification and Glossary of Terms 2nd edition, Cambridge. http://amigoreader.com/book/?b=29372 22
  • 23.
    Building an Algorithm The classification rules described in methodologies and recommendations have to be used to obtain precise definitions and to formalize them. We start with a revision of all parts of the algorithm. 23
  • 24.
    VPC Definitions Modal contentof pyroxenes VPC_Px(x) = VPC_Opx(x)+VPC_Cpx(x) Mineral groups for diagrams VPC_OOC(x) = VPC_Ol(x)+VPC_Opx(x)+VPC_Cpx(x) VPC_OPH(x) = VPC_Ol(x)+VPC_Px(x)+VPC_hornblende(x) Modal content of mafic minerals VPC_M(x) = 100 - (VPC_Q(x)+VPC_A(x)+VPC_P(x)+VPC_F(x)) 24
  • 25.
    Qualitative Characteristics Predicates pyroclastic, kimberlite, lamproite, lamprophyre, charnockite, plutonic, volcanic. Definition pyroclastic(x) = clastic(x) and (∀y clast(y) ⋀ part_of(y,x)→ volcanic_eruption_result(y)) DL Pyroclastic ≡ clastic⊓∀(part_of∘id(clast))-. volcanic_eruption_result 25
  • 26.
  • 27.
  • 28.
    The Use ofReasoners The described properties can be automatically verified by loading definitions into a reasoner working with linear inequalities. Such reasoners do exist (e.g. Racer), and linear inequalities can be written using the OWL 2 extension [OWL2LE]. 28
  • 29.
    harzburgite harzburgite(x) = plutonic(x) and not(pyroclastic(x) or kimberlite(x) or lamproite(x) or lamprophyre(x) or charnockite(x)) and VPC_carbonates(x)≤50 and VPC_melilite(x)≤10 and VPC_M(x) ≥ 90 and VPC_kalsilite(x)=0 and VPC_leucite(x)=0 and VPC_hornblende(x)=0 and 0.4*VPC_OOC(x)≤VPC_Ol(x)≤0.9*VPC_OOC(x) and VPC_Cpx(x)<0.05*VPC_OOC(x) 29
  • 30.
    Conclusions and FurtherPlans  A formula is possible Ontology project in  The construction of a petrology: formal theory started  Definition  Tools of formal concentrator knowledge  Formalization of maintenance tested [IRCGT]  A definition  GIS formalization concentrator  CRL (Controlled prototype created Russian language) 30
  • 31.
    Acknowledgments We would liketo thank  Dr. Stephen M. Richard from Arizona Geological Survey for comments on the report [otch10], a helpful discussion and a reference to [BGSRCS]  Pavel Klinov from University of Manchester for numerous invaluable comments  Dr. Kaarel Kaljurand from Attempto group for the idea of using proper names 31
  • 32.
    References [IRCGT] Le Maitre,L.E., ed. 2002. Igneous Rocks: A Classification and Glossary of Terms 2nd edition, Cambridge. url [BGSRCS] Gillespie, M R, and Styles, M T. 1999. BGS Rock Classification Scheme, Volume 1, Classification of igneous rocks. British Geological Survey Research Report, (2nd edition), RR 99–06. url [OWL2LE] OWL 2 Web Ontology Language. Data Range Extension: Linear Equations. url 32

Editor's Notes

  • #4 It plays an important role in describing Earth crust parts structure and revealing consistent patterns in mineral location .
  • #6 Formalization enables verification of many term definition properties ( e.g. inconsistency or that two terms are mutually exclusive ) using universal algorithms .
  • #8 Terms used in fact statement are the most important terms of the subject field. Scientific facts are mainly concentrated in databases at present. Task: Learn to extract knowledge from databases. Result: Proba DB storing rock details converted into an OWL-ontology of facts. The list of terms required to write down facts has been determined.
  • #9 Proba DB contains information from 1,174 scientific articles (bibliography table) about 49,285 samples of igneous rocks (measurements table). The samples have been gathered from all over the world, which is reflected in the localities, llocal, lglobal, and lgroup tables. Each sample is assigned a rock (rocks table), a type of origin (errupttypes), an age (ages), and the main thing - weight percent (concentrations) of chemical substances and isotopes (list in the elements table). Table and column identifiers only approximately match the terms used by petrologists in exchange of sample details. Task: Convert the data accumulated in RDB to a form directly understandable to specialist in the subject area.
  • #10 We get a simple and understandable sentences of the English language otherwise.
  • #11 A very constrained natural language is required to record the facts contained in a DB. This is because the DB is normalized. However, it is not everywhere so. Completing the normalization is a task of the large and tedious work on bringing the DB to a state where an automatic conversion to knowledge is possible. Rules have been developed for mapping the DB content into a CNL. These rules are a specification of SQL scripts exporting the DB into a CNL text [otch08].
  • #12 Column values mainly form attribute values, but some of them form class names (rhyolite, harzburgite) and individual names (Iceland).
  • #13 All the terms but except rhyolite refer to contexts related to petrology or even geology. So are the contexts of geography (place...), scientific publications (publication...), solid state physics (sample, substance, weight_percent...), chemistry (chemical_formula). We will further focus on obtaining definitions for rocks, including rhyolite.
  • #15 Verbal definitions of terms are accumulated in special dictionaries. Various scientific schools and lines may have different definitions. Task: Determine the list and definitions of the terms and the list of primary terms. Result: Dictionary of Terms of Igneous Rocks converted into an OWL ontology. A formal description of relationships among terms (e.g. synonymy) started.
  • #16 Vocabulary «_», «pele_s_hair» Entry title Term → class Synonymy (3,179 classes and 1,659 class equivalence axioms) Entry text Term definition, comment, list of references, term origin description.
  • #18 Goals: Gather various verbal definitions in a single place Enable experts to select the one to be formalized A wiki-class web system is required. The webProtege system enables storage and processing of the formal definitions, which are our target.
  • #19 Basic use methods: - By programs – e.g, to import the dictionary ontology into the fact ontology, as well as for queries – e.g. about existence, characteristics, or definition of a term - By people to view and to discuss definitions - By specialists to edit the ontology
  • #20 For the terms in our dictionary, terms of the ontology itself, MSU Geoweb portal terms, and terms in the Petrographic Code of Russia, respectively:
  • #21 It is important that petrologists can read it . To obtain formal ( mathematical ) definitions , especially in a form understandable to experts, is a most important project goal .
  • #23 The recommendations describe: Rules of initial classification Rules of further classification within the framework of revealed properties Diagrams of final classification by percentage of essential minerals
  • #24 The classification rules have been improved to an algorithm. Formal definitions have been obtained from the algorithm.
  • #25 The algorithm uses some functions returning a real number and predicates. VPC means Volume Percent Content, i.e. mineral content of sample by volume also known as ‘modal content’. VPC_melilite, VPC_kalsilite, VPC_leucite, VPC_Ol, VPC_Opx, VPC_Cpx, VPC_hornblende, VPC_garnet, VPC_spinel, VPC_biotite; VPC_Q, VPC_A, VPC_P, VPC_F, VPC_M.
  • #27 Input: Sample details Output: Term designating the sample rock The algorithm is specified as a group of functions defiend by flowcharts understandable to petrologists. The algorithm uses numeric functions and predicates. The functions and predicates are thought of as applied to a specific solid.
  • #28 The upper and lower triangles on Fig.2.9: OOC_diagram_field, OPH_diagram_field. The flowcharts use a set of conditions being systems of linear inequalities. The set as a whole possess important mathematical properties: - Every two systems are incompatible, as their corresponding areas do not overlay - The union of all conditions give inequalities for the triangle, as the conditions ‘cover’ the entire triangle
  • #30 The classification algorithm: Indirectly contains definitions of all igneous rocks, i.e. Specifies a rock predicate system Formulas have been obtained for harzburgite and dunite predicates, which proved to be formulas with a single free variable of first order logic with numbers. A formal definition of the harzburgite igneous rock consists of three parts: 1. Qualitative characteristics 2. Absolute restrictions on the modal composition 3. Relative restrictions on the modal composition A formal definition implicates nothing and contains no references to a diagram. It contains the required diagram part.
  • #33 [otch08] Proba DB. Ontology. Progress Report. Autumn 2008, SGM of RAS. url [otch09] Ontology of the Dictionary of Scientific Terms. Progress Report. Autumn 2009, SGM of RAS. url [otch10] Igneous Rock Classification Algorithm and a Formal Definition of the Rocks. Research Report. Autumn 2010, SGM of RAS. url