It plays an important role in describing Earth crust parts structure and revealing consistent patterns in mineral location .
Formalization enables verification of many term definition properties ( e.g. inconsistency or that two terms are mutually exclusive ) using universal algorithms .
Terms used in fact statement are the most important terms of the subject field. Scientific facts are mainly concentrated in databases at present. Task: Learn to extract knowledge from databases. Result: Proba DB storing rock details converted into an OWL-ontology of facts. The list of terms required to write down facts has been determined.
Proba DB contains information from 1,174 scientific articles (bibliography table) about 49,285 samples of igneous rocks (measurements table). The samples have been gathered from all over the world, which is reflected in the localities, llocal, lglobal, and lgroup tables. Each sample is assigned a rock (rocks table), a type of origin (errupttypes), an age (ages), and the main thing - weight percent (concentrations) of chemical substances and isotopes (list in the elements table). Table and column identifiers only approximately match the terms used by petrologists in exchange of sample details. Task: Convert the data accumulated in RDB to a form directly understandable to specialist in the subject area.
We get a simple and understandable sentences of the English language otherwise.
A very constrained natural language is required to record the facts contained in a DB. This is because the DB is normalized. However, it is not everywhere so. Completing the normalization is a task of the large and tedious work on bringing the DB to a state where an automatic conversion to knowledge is possible. Rules have been developed for mapping the DB content into a CNL. These rules are a specification of SQL scripts exporting the DB into a CNL text [otch08].
Column values mainly form attribute values, but some of them form class names (rhyolite, harzburgite) and individual names (Iceland).
All the terms but except rhyolite refer to contexts related to petrology or even geology. So are the contexts of geography (place...), scientific publications (publication...), solid state physics (sample, substance, weight_percent...), chemistry (chemical_formula). We will further focus on obtaining definitions for rocks, including rhyolite.
Verbal definitions of terms are accumulated in special dictionaries. Various scientific schools and lines may have different definitions. Task: Determine the list and definitions of the terms and the list of primary terms. Result: Dictionary of Terms of Igneous Rocks converted into an OWL ontology. A formal description of relationships among terms (e.g. synonymy) started.
Vocabulary «_», «pele_s_hair» Entry title Term → class Synonymy (3,179 classes and 1,659 class equivalence axioms) Entry text Term definition, comment, list of references, term origin description.
Goals: Gather various verbal definitions in a single place Enable experts to select the one to be formalized A wiki-class web system is required. The webProtege system enables storage and processing of the formal definitions, which are our target.
Basic use methods: - By programs – e.g, to import the dictionary ontology into the fact ontology, as well as for queries – e.g. about existence, characteristics, or definition of a term - By people to view and to discuss definitions - By specialists to edit the ontology
For the terms in our dictionary, terms of the ontology itself, MSU Geoweb portal terms, and terms in the Petrographic Code of Russia, respectively:
It is important that petrologists can read it . To obtain formal ( mathematical ) definitions , especially in a form understandable to experts, is a most important project goal .
The recommendations describe: Rules of initial classification Rules of further classification within the framework of revealed properties Diagrams of final classification by percentage of essential minerals
The classification rules have been improved to an algorithm. Formal definitions have been obtained from the algorithm.
The algorithm uses some functions returning a real number and predicates. VPC means Volume Percent Content, i.e. mineral content of sample by volume also known as ‘modal content’. VPC_melilite, VPC_kalsilite, VPC_leucite, VPC_Ol, VPC_Opx, VPC_Cpx, VPC_hornblende, VPC_garnet, VPC_spinel, VPC_biotite; VPC_Q, VPC_A, VPC_P, VPC_F, VPC_M.
Input: Sample details Output: Term designating the sample rock The algorithm is specified as a group of functions defiend by flowcharts understandable to petrologists. The algorithm uses numeric functions and predicates. The functions and predicates are thought of as applied to a specific solid.
The upper and lower triangles on Fig.2.9: OOC_diagram_field, OPH_diagram_field. The flowcharts use a set of conditions being systems of linear inequalities. The set as a whole possess important mathematical properties: - Every two systems are incompatible, as their corresponding areas do not overlay - The union of all conditions give inequalities for the triangle, as the conditions ‘cover’ the entire triangle
The classification algorithm: Indirectly contains definitions of all igneous rocks, i.e. Specifies a rock predicate system Formulas have been obtained for harzburgite and dunite predicates, which proved to be formulas with a single free variable of first order logic with numbers. A formal definition of the harzburgite igneous rock consists of three parts: 1. Qualitative characteristics 2. Absolute restrictions on the modal composition 3. Relative restrictions on the modal composition A formal definition implicates nothing and contains no references to a diagram. It contains the required diagram part.
[otch08] Proba DB. Ontology. Progress Report. Autumn 2008, SGM of RAS. url [otch09] Ontology of the Dictionary of Scientific Terms. Progress Report. Autumn 2009, SGM of RAS. url [otch10] Igneous Rock Classification Algorithm and a Formal Definition of the Rocks. Research Report. Autumn 2010, SGM of RAS. url
Towards OWL-based Knowledge Representation in Petrology
Towards OWL-based Knowledge Representation in Petrology A.Shkotin, V.Ryakhovsky, D.Kudryavtsev GIS Department Vernadsky State Geological Museum Russian Academy of Sciences www.sgm.ru email@example.com
Contents Introduction Fact formalization Dictionary formalization Formal definitions Conclusions and further plans Acknowledgments References 2
IntroductionPetrology is a science investigating rocks andtheir formation conditions.Large volume of petrological information requiringsystematization, integration and maintenance in aconsistent state is accumulated at present.These tasks can be solved through knowledgeformalization. 3
Knowledge FormalizationFinal goal: Create a formal theory for the keyconcepts of petrology and relationships amongthem.Ontologies, that is conceptual structuresorganized on the basis of mathematical logic, playa pivotal role in the process of creation.Definitions play a decisive role in a formal theory,as they specify exactly those concepts whoseproperties will be used and studied. 4
Formal TheoryBuilding a formal theory of the field of naturalscience similar of a mathematical one enables: Revealing primary concepts Giving definitions to other concepts Stating axioms and theorems 5
The ApproachDatabases are not a knowledge.They require an essential and thoroughprocessing to obtain a knowledge.DB conversion to the traditional form ofknowledge, i.e. knowledge in a natural language,is the direct way of obtaining knowledge fromdata.The natural language is limited to a CNL. CNL is auniversal means of formal knowledgepresentation. 7
CNL Sentences. The ApproachCreate templates of a CNL sentences to presentall the facts contained in the Proba DB.Use local (‘internal’) proper names.Connect words in composite terms using ‘_’ letter.Global commonly known proper names such asIceland, Atlantic_Ocean can be found in a text. 9
CNL Sentences. ExamplePUB5633 is a publication.PUB5633 title is "A CONTRIBUTION TO THE GEOLOGY OF THE K...".SAM32994 is a sample. SAM32994 is a rhyolite.PLC32994 is a place. PLC32994 is a part of Iceland.SUB469812 is a substance.WPC469812 is a weight_percent. WPC469812 value is73.95.PUB5633 describes SAM32994.SAM32994 gathering_place is PLC32994.SAM32994 includes SUB469812.SUB469812 is a WPC469812 component. 10
OWL OntologyAll generated sentences are ACE* languagestatements.The sentences are so that the APE* translatortranslates them to OWL.The DB is converted to 1,174 ontologies.* Attempto Project. http://attempto.ifi.uzh.ch/site/ 11
Definition Dictionaryis an important and specific type of knowledgecontains the terms of a subject area and informaldefinitions of these terms.Informal definitions are provided by expertsusually belonging to a scientific school.Tasks: Convert a definition dictionary into formal knowledge Gather definitions given by various schools 14
Dictionary Entry ExampleHARZBURGITE. An ultramafic plutonic rockcomposed essentially of olivine andorthopyroxene. Now defined modally in theultramafic rock classification (Fig. 2.9, p.28).(Rosenbusch, 1887, p.269; Harzburg, HarzMts, Lower Saxony, Germany; Tröger 732;Johannsen v.4, p.438; Tomkeieff p.247)[IRCGT], p.88 15
From Dictionary Text to OntologyDictionary“Dictionary of Terms of Igneous Rocks. 1,567entries, the overwhelming majority of them beingrock names.OwnerInter-Departmental Petrographic Committee in theGeoscience Division of the Russian Academy ofSciences.Texthttp://www.igem.ru/site/petrokomitet/slovar.htmOntology http://earth.jscc.ru/ontologies/dic.owl 16
Definition ConcentratorThe goal is to collectively maintain definitions ofscientific terms, including formal definitions.The ontology of igneous rocks is contained inwebProtege under the dic name.Some terms are complemented with definitionsfrom other dictionaries.Address at Geology portal:http://earth.jscc.ru/webprotege/ 17
Primary SourceLe Maitre, L.E., ed. 2002. Igneous Rocks: AClassification and Glossary of Terms 2nd edition,Cambridge.http://amigoreader.com/book/?b=29372 22
Building an AlgorithmThe classification rules described inmethodologies and recommendations have to beused to obtain precise definitions and to formalizethem.We start with a revision of all parts of thealgorithm. 23
VPC DefinitionsModal content of pyroxenesVPC_Px(x) = VPC_Opx(x)+VPC_Cpx(x)Mineral groups for diagramsVPC_OOC(x) = VPC_Ol(x)+VPC_Opx(x)+VPC_Cpx(x)VPC_OPH(x) = VPC_Ol(x)+VPC_Px(x)+VPC_hornblende(x)Modal content of mafic mineralsVPC_M(x) = 100 - (VPC_Q(x)+VPC_A(x)+VPC_P(x)+VPC_F(x)) 24
The Use of ReasonersThe described properties can be automaticallyverified by loading definitions into a reasonerworking with linear inequalities.Such reasoners do exist (e.g. Racer), and linearinequalities can be written using the OWL 2extension [OWL2LE]. 28
harzburgiteharzburgite(x) =plutonic(x) andnot (pyroclastic(x) or kimberlite(x) or lamproite(x)or lamprophyre(x) or charnockite(x))and VPC_carbonates(x)≤50 andVPC_melilite(x)≤10 and VPC_M(x) ≥ 90 andVPC_kalsilite(x)=0 and VPC_leucite(x)=0 andVPC_hornblende(x)=0 and0.4*VPC_OOC(x)≤VPC_Ol(x)≤0.9*VPC_OOC(x)and VPC_Cpx(x)<0.05*VPC_OOC(x) 29
Conclusions and Further Plans A formula is possible Ontology project in The construction of a petrology: formal theory started Definition Tools of formal concentrator knowledge Formalization of maintenance tested [IRCGT] A definition GIS formalization concentrator CRL (Controlled prototype created Russian language) 30
AcknowledgmentsWe would like to thank Dr. Stephen M. Richard from Arizona GeologicalSurvey for comments on the report [otch10], ahelpful discussion and a reference to [BGSRCS] Pavel Klinov from University of Manchester fornumerous invaluable comments Dr. Kaarel Kaljurand from Attempto group for theidea of using proper names 31
References[IRCGT] Le Maitre, L.E., ed. 2002. Igneous Rocks: AClassification and Glossary of Terms 2nd edition, Cambridge.url[BGSRCS] Gillespie, M R, and Styles, M T. 1999. BGS RockClassification Scheme, Volume 1, Classification of igneousrocks. British Geological Survey Research Report, (2ndedition), RR 99–06. url[OWL2LE] OWL 2 Web Ontology Language. Data RangeExtension: Linear Equations. url 32